<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Datamartist.com</title>
	<atom:link href="http://www.datamartist.com/feed" rel="self" type="application/rss+xml" />
	<link>http://www.datamartist.com</link>
	<description>Reduce cost with self serve data transformation</description>
	<lastBuildDate>Mon, 08 Apr 2013 20:38:04 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
		<item>
		<title>Exact isn&#8217;t everything- Surf your data!</title>
		<link>http://www.datamartist.com/surf-the-data-and-embrace-the-inexact</link>
		<comments>http://www.datamartist.com/surf-the-data-and-embrace-the-inexact#comments</comments>
		<pubDate>Mon, 08 Apr 2013 20:38:04 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Data Quality]]></category>
		<category><![CDATA[Management reporting]]></category>
		<category><![CDATA[Reality Check]]></category>
		<category><![CDATA[Analyst tools]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=6349</guid>
		<description><![CDATA[Sometimes an analyst needs to take off the accountants hat, forget the urge to chase down every last penny, and instead put on their surfing gear, grab the data surf board (i.e. their set of prefered data tools), and just surf some data. There are some cases were "Exact" is the only acceptable level of [...]]]></description>
				<content:encoded><![CDATA[<div class="page-restrict-output"><p><a href="http://www.datamartist.com/wp-content/uploads/2013/04/surf-the-data-wave.png"><img src="http://www.datamartist.com/wp-content/uploads/2013/04/surf-the-data-wave-300x269.png" alt="surf-the-data-wave" width="300" height="269" class="alignright size-medium wp-image-6356" /></a>Sometimes an analyst needs to take off the accountants hat, forget the urge to chase down every last penny, and instead put on their surfing gear, grab the data surf board (i.e. their set of prefered data tools), and just surf some data.</p>
<p>There are some cases were "Exact" is the only acceptable level of data quality.  When we're sending invoices to our customers, not only does the amount need to be right to the penny, the invoice needs to make it to the right person, on time.</p>
<p>But sending accurate invoices is not exactly the cutting edge in terms of data.</p>
<p>The challenges for todays data driven organizations are to be able to make sense out of the ocean of data available, and to do it faster than the competition.</p>
<p>And the ocean does have the highest quality water all the time.  In fact, some of the data is downright dirty.</p>
<p>But there is a lot of it, and it can tell us things.</p>
<p>Just like ocean surfing, when you are data surfing you can sometimes ride the wrong wave and end up underwater, you can spend hours on your board in flat water, hoping the surf will come up but instead getting no where.</p>
<p>But when the wave comes, and you ride it, letting yourself go with the flow- while it won't give you any exact answers, it will give you a "feel" and sense of where the wave is going, and if you are fast to grab it, and your board is good, you might just find yourself getting some awesome, gnarly insight that will score you some competitive advantage- and you just can't surf that wave if you are worried about only looking at Exact, perfect, "we've checked it three times" data.</p>
<p>Data Surf is up- and by the look of the ocean it's going to be a rocking rolling ride-  good luck out there!</p>
</div>]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/surf-the-data-and-embrace-the-inexact/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Data quality monitoring and reporting</title>
		<link>http://www.datamartist.com/data-quality-monitoring-and-reporting</link>
		<comments>http://www.datamartist.com/data-quality-monitoring-and-reporting#comments</comments>
		<pubDate>Thu, 17 Jan 2013 15:43:00 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Data Quality]]></category>
		<category><![CDATA[Datamartist Tool]]></category>
		<category><![CDATA[Data profiling]]></category>
		<category><![CDATA[Fixing Data]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=6160</guid>
		<description><![CDATA[In the vast majority of cases, useful data sets are not static, but are being updated, added to and purged constantly. Data quality monitoring aims to provide data quality information that is also being constantly updated, and can be used to detect issues quickly, before the bad data piles up. Don't let those bad records [...]]]></description>
				<content:encoded><![CDATA[<div class="page-restrict-output"><p>In the vast majority of cases, useful data sets are not static, but are being updated, added to and purged constantly.</p>
<p>Data quality monitoring aims to provide data quality information that is also being constantly updated, and can be used to detect issues quickly, before the bad data piles up.</p>
<h2>Don't let those bad records pile up.</h2>
<p>Imagine a company that does a mailing to its customers every 3 months.  Imagine that a new call center training program is put in place, and unknown to all, customer information starts being incorrectly entered and updated due to an error in the program.  When do you want to know about the growing number of invalid customer records?  When its time to do the mailing, after 90 days of bad data generation, or after just a few days of problem entries?</p>
<h2>ALERT! ALERT!  Bad data Alert!</h2>
<p><a href="http://www.datamartist.com/wp-content/uploads/2013/01/data-quality-monitoring-scorecard.png"><img src="http://www.datamartist.com/wp-content/uploads/2013/01/data-quality-monitoring-scorecard-300x226.png" alt="" title="data-quality-monitoring-scorecard" width="300" height="226" class="alignright size-medium wp-image-6343" /></a>The trick to data quality monitoring is having a set of data quality rules, and profiling tests that will automatically give indications of issues.  Some rules are easier than others, but the idea is that each record or group of records in a the table(s) in question is checked against a series of tests, and the number of data quality rule infringements are tracked.  As the size of the table grows, if the overall percentage of bad to good is increasing, you know you have an issue.  If it spikes up, you sound the alarm.</p>
<p>Actually having a large fog horn wired up in the CEOs office is optional, due to the challenges of false positives (detecting a "Bad" record when in fact the record is ok).   </p>
<p>I would also be very cautious in having automatic data modification processes going on, but setting alerts that will notify those responsible for potential data quality issues is a relatively straight forward exercise and will at the very least improve your visibility of data quality trends.  What you do to deal with them is up to you, but at least some of the battle is being aware of the problem.</p>
<p>It monitors a number of things, but the key here is to make a decision for ever record if it is "Good" or "Bad", and give some analysis of which data quality rules are broken.  You can see in the upper right, there are row counts of infractions by a number of data quality rules. In the middle is the overall Bad vs Good split on the records.  </p>
<p>The datamartist Pro version can quite easily create automated data quality monitoring.  Using visual blocks, and arranging them on a Canvas, Datamartist lets you create data quality rules, define profiling on selected columns, and then automaticaly, as a scheduled process, place the results either in Excel reports/dashboards or into a database for use by your reporting tool of choice.</p>
<p>By understanding how your data quality evolves, and catching problems early, you can reduce the amount of cleansing you need to do, and communicate clearly and regularly to your organisation the progress your data quality programs are making.</p>
</div>]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/data-quality-monitoring-and-reporting/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Automating data update for Tableau Server using Tabcmd and Datamartist</title>
		<link>http://www.datamartist.com/automating-data-update-for-tableau-server-using-tabcmd-and-datamartist</link>
		<comments>http://www.datamartist.com/automating-data-update-for-tableau-server-using-tabcmd-and-datamartist#comments</comments>
		<pubDate>Wed, 29 Aug 2012 17:41:37 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Datamartist Tool]]></category>
		<category><![CDATA[Tableau Software]]></category>
		<category><![CDATA[Visualization]]></category>
		<category><![CDATA[Tableau]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=6287</guid>
		<description><![CDATA[Tableau is a wonderful, powerful visualization tool. If you have the data, it will generate insightful, powerful dashboards and reports. But actually getting the data is often the tricky bit. And having it appear in your dashboards and reports without having to press lots of buttons is the goal. (No one likes to have to [...]]]></description>
				<content:encoded><![CDATA[<div class="page-restrict-output"><p>Tableau is a wonderful, powerful visualization tool.  If you have the data, it will generate insightful, powerful dashboards and reports.</p>
<p><a href="http://www.datamartist.com/wp-content/uploads/2012/08/datamartist-and-tableau-server-automation-with-tabcmd.png"><img src="http://www.datamartist.com/wp-content/uploads/2012/08/datamartist-and-tableau-server-automation-with-tabcmd.png" alt="" title="datamartist-and-tableau-server-automation-with-tabcmd" width="363" height="198" class="alignright size-full wp-image-6306" /></a>But actually getting the data is often the tricky bit.  And having it appear in your dashboards and reports without having to press lots of buttons is the goal.  (No one likes to have to do a long series of cut and paste operations to get the data in place).</p>
<p>Datamartist is a tool that lets you combine data from multiple sources, format it, clean it, organize it, and then make it ready for tools like Tableau.</p>
<p>Because Datamartist can be automated from the command line, and commands on Tableau server can be run from the command line using tabcmd, it means that with Datamartist and tableau together, you can have a complete and automated reporting solution.</p>
<p>Have you ever wished that you could update your Tableau dashboards automatically every day?  Do you have non-database information in the form of spreadsheets that you need to integrate into your Tableau workbooks?</p>
<p>With Datamartist and Tableau together, you can automate all these tasks, integrating formal database data with spreadsheet data from shared drives seamlessly, and quickly.  And when your users log into Tableau, they see the dashboards they need to see.</p>
<p>Datamartist lets you import data from Databases, excel files and text files, define data transformations that prepare and clean the data for visualization, and then export it to a location (very often a database, but potentially an Excel workbook or file) where it can be easily, automatically read by a visualization tool like Tableau server.</p>
<p>You can learn more about <a href="http://www.datamartist.com/product/video-and-screenshots/datamartist-functional-overview" target="_blank">what Datamartist does here</a>.   </p>
<p>To give you an idea about what a batch file that first runs a datamartist canvas, (which loads data from multiple sources, transforms it and makes it ready for Tableau) then refreshes workbooks on a remote Tableau Server automatically, here is an example of a bit of simple batch file code:</p>
<p>The first line runs Datamartist- this runs a .DMC file that you have built with datamartist that can extract the data from multiple sources- databases, files, internal hard coded sets-  then do joins, segmentations, filters, calculations etc. and export the data to the location that the Tableau server workbooks are connected to.</p>
<p>"C:\Program Files (x86)\nModal Solutions Inc\Datamartist\Datamartist.exe" RUN /f "C:\MyDataMartistFile.DMC" /l "C:\WhereIWantTheLogFiles\ </p>
<p>Then, using a utility called tabcmd from Tableau, you can log into a tableau server, and refresh the workbook, thus updating all your tableau dashboards to do this, you use the following lines in the batch file:</p>
<p>"C:\Program Files (x86)\Tableau\Tableau Server\7.0\extras\Command Line Utility\tabcmd.exe" login -s MyServerURL -u MyUserName -p MyPassword -t MySite<br />
"C:\Program Files (x86)\Tableau\Tableau Server\7.0\extras\Command Line Utility\tabcmd.exe" refreshextracts --project "MyProject" --workbook "MyWorkbook" --site MySite </p>
<p>You can find more information about <a href="http://onlinehelp.tableausoftware.com/current/server/en-us/tabcmd.htm" title="tableau software tabcmd" target="_blank"> tableau's tabcmd on their site.</a> </p>
<p>Net result?  You have full automation from data to visualization, Datamartist takes care of the data, Tableau creates fantastic dashboards.</p>
<p>If you build a batch file, and set it to run once a day, every morning when your users log into Tableau server, they'll see updated data, ready to go.</p>
<p>Try Datamartist- its a <a href="http://www.datamartist.com/downloads" title="Datamartist trial" target="_blank">full function free trial</a>.  Find out easy it is to automate tableau server with sophisticated data extraction.</p>
</div>]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/automating-data-update-for-tableau-server-using-tabcmd-and-datamartist/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>To Excel or not to Excel, that is the question</title>
		<link>http://www.datamartist.com/to-excel-or-not-to-excel-that-is-the-question</link>
		<comments>http://www.datamartist.com/to-excel-or-not-to-excel-that-is-the-question#comments</comments>
		<pubDate>Wed, 30 May 2012 23:17:19 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[data culture]]></category>
		<category><![CDATA[Datamartist Tool]]></category>
		<category><![CDATA[Microsoft Excel]]></category>
		<category><![CDATA[MS Excel]]></category>
		<category><![CDATA[Excel]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=6256</guid>
		<description><![CDATA[There is an ongoing debate in Business intelligence and Analytics circles about what role spreadsheets have in data management. At the extreme pro-spreadsheet side, Excel can fix all, be all, and IT departments should just be disbanded, because the world can run on workbooks and macros. At the other end of the spectrum, all that [...]]]></description>
				<content:encoded><![CDATA[<div class="page-restrict-output"><p><a href="http://www.datamartist.com/wp-content/uploads/2012/05/is-excel-king.jpg"><img src="http://www.datamartist.com/wp-content/uploads/2012/05/is-excel-king.jpg" alt="" title="is-excel-king" width="254" height="269" class="alignright size-full wp-image-6263" /></a>There is an ongoing debate in Business intelligence and Analytics circles about what role spreadsheets have in data management.</p>
<p>At the extreme pro-spreadsheet side, Excel can fix all, be all, and IT departments should just be disbanded, because the world can run on workbooks and macros.</p>
<p>At the other end of the spectrum, all that is evil and wrong, all the corrupts data quality, all that is unholy in the realm of data management process and master data management can be traced back to renegade spreadsheets, the "culture of excel" and spreadsheet applications should be uninstalled from every desktop never to be seen again.</p>
<p>And of course, in the middle, in the real world, we all use excel constantly, every day, all the time for lots of things.</p>
<p>Some organisations do use it too much, others probably spend way to much money on their business intelligence solutions because they don't let their users use it enough (although fewer of these than the first sort is my guess).</p>
<p>What is my take on it?  Excel is a fantastic scratchpad, and a powerful reporting tool.  It is not, however, a data repository.  The official version of the data needs to be in a database, managed by someone who knows data, data models, and master data management.  </p>
<p>But if you are looking for a quick tool to create dashboards, reports, and give people easy access to data- don't discount Excel.</p>
<p>This is why we've recently greatly <a href="http://www.datamartist.com/datamartist-v1-6-released-excel-dashboard-enhancements">enhanced Datamartist's ability</a> to read and (more importantly) generate Excel workbooks.  The fact is that excel is an application that is installed on every machine, is familiar to many, and is an excellent way to deliver data.  Not store it, but deliver it.</p>
<p>The challenge that many Datawarehouse projects have faced is that they are competing with all the excel spreadsheets that circulate through the typical organisation. The business intelligence team is hard pressed to deliver new reports as quickly as they are demanded- and as a result people bypass the system and create a web of spreadsheets that become the "real values".</p>
<p>How can you get data out of databases and into excel?  How can you export from SQL Server to excel, or oracle to excel, quickly and easily?  Datamartist provides a way to build reusable, automated queries and transformations from multiple databases and files and to generate excel workbooks with graphs and tables in an automated way.</p>
<p>You can run Datamartist as a scheduled job, generating the excel workbooks you need regularly, so users come to understand that while the spreadsheet is useful, it is ephemeral. The master copy of the data is in the datawarehouse.  Excel workbooks are reports.  But because they are available quickly, a hybrid solution of the core data warehouse and excel reports can deliver better data in a format users are comfortable with.</p>
<p>What are some of the things excel does particularly well?</p>
<ul>
<li>Good graphs, flexible, easy to set up and format</li>
<li>Excellent pixel by pixel formating control- ability to just get what you want in terms of look and feel- add in images, tables, anything</li>
<li>Great integration with MS office (obviously) so  going into Word or Power Point is easy</li>
<li>Great control of printing/formatting- ability to generate PDF by installing a print to PDF capability</li>
<li>Incredible adoption and broad compatibility- chances are, whoever you want to share with has excel.</li>
</ul>
<p>So don't be an extremist on either end of the spectrum-  don't think for a moment that "just spreadsheets" will solve your data management needs- on the other hand, don't be an excel bigot- excel has a roll to play, and in fact, can be a tool to drive adoption of your data warehouse and master data management process.</p>
</div>]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/to-excel-or-not-to-excel-that-is-the-question/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Transposing data- columns to rows</title>
		<link>http://www.datamartist.com/transposing-data-columns-to-rows</link>
		<comments>http://www.datamartist.com/transposing-data-columns-to-rows#comments</comments>
		<pubDate>Thu, 19 Apr 2012 16:58:30 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Quick Datamartist Tutorials]]></category>
		<category><![CDATA[Datamartist tutorials]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=6239</guid>
		<description><![CDATA[Being able to pivot and transpose data is a key part of reporting and data analysis. With the recently released V1.6 of Datamartist, there is new functionality in the Pivot block that lets you turn a data set that looks like this: Into one that looks like this: The columns have been transposed to rows, [...]]]></description>
				<content:encoded><![CDATA[<div class="page-restrict-output"><p>Being able to pivot and transpose data is a key part of reporting and data analysis.  With the recently released V1.6 of Datamartist, there is new functionality in the Pivot block that lets you turn a data set that looks like this:</p>
<p><a href="http://www.datamartist.com/wp-content/uploads/2012/04/Countries_as_columns_before_transpose.jpg"><img src="http://www.datamartist.com/wp-content/uploads/2012/04/Countries_as_columns_before_transpose.jpg" alt="" title="Countries_as_columns_before_transpose" width="600" height="238" class="aligncenter size-full wp-image-6242" /></a></p>
<p>Into one that looks like this:<br />
<a href="http://www.datamartist.com/wp-content/uploads/2012/04/countries_as_rows_after_transpose.jpg"><img src="http://www.datamartist.com/wp-content/uploads/2012/04/countries_as_rows_after_transpose.jpg" alt="" title="countries_as_rows_after_transpose" width="344" height="339" class="aligncenter size-full wp-image-6244" /></a></p>
<p>The columns have been transposed to rows, and the former column name is now a value in each row.</p>
<p>Why would you want to do this?  One very common reason is to prepare data for olap tools such as Tableau or Qlikview-  Having each individual country as a column will not allow you to take advantage of their slice and dice capabilities, with Country as a column, and the country names stored in each row, you can take advantage of pivot tables and olap tools.</p>
<h2>How its done in Datamartist</h2>
<p>To do this in datamartist, we use a PIVOT block, and connect it to the data set we want to transpose:<br />
<a href="http://www.datamartist.com/wp-content/uploads/2012/04/country_blocks_to_transpose.jpg"><img src="http://www.datamartist.com/wp-content/uploads/2012/04/country_blocks_to_transpose.jpg" alt="" title="country_blocks_to_transpose" width="342" height="174" class="aligncenter size-full wp-image-6246" /></a></p>
<p>Then we simply specify which columns are to be transposed, and provide a name for the the two new columns- one column will contain the former column names, the other column the values.<br />
<a href="http://www.datamartist.com/wp-content/uploads/2012/04/country_transpose_block_configuration.jpg"><img src="http://www.datamartist.com/wp-content/uploads/2012/04/country_transpose_block_configuration.jpg" alt="" title="country_transpose_block_configuration" width="581" height="274" class="aligncenter size-full wp-image-6247" /></a></p>
<p>You can try this yourself with <a href="http://www.datamartist.com/downloads">the free trial of datamartist</a>- you will find you can import data from files, excel and databases, and using easy to use blocks, create powerful data transformations, making your data ready for use in great tools like Excel, Tableau and Qlikview.</p>
</div>]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/transposing-data-columns-to-rows/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Datamartist V1.6 Released-  Excel Dashboard enhancements</title>
		<link>http://www.datamartist.com/datamartist-v1-6-released-excel-dashboard-enhancements</link>
		<comments>http://www.datamartist.com/datamartist-v1-6-released-excel-dashboard-enhancements#comments</comments>
		<pubDate>Thu, 29 Mar 2012 18:03:10 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Datamartist Tool]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=6220</guid>
		<description><![CDATA[We are pleased to announce the release of Datamartist V1.6, and a powerful new capability for the generation of Excel based dashboards. With the new version, Datamartist can insert data sets easily and visually into Excel workbooks as well as having the capability of defining a template, allowing automation of dashboard generation. As always, data [...]]]></description>
				<content:encoded><![CDATA[<div class="page-restrict-output"><p>We are pleased to announce the release of Datamartist V1.6, and a powerful new capability for the generation of Excel based dashboards.</p>
<p>With the new version, Datamartist can insert data sets easily and visually into Excel workbooks as well as having the capability of defining a template, allowing automation of dashboard generation.</p>
<p>As always, data transformations are created visually, using blocks and connectors- in this case, we are analyzing some sales and profitability data.</p>
<p>We use blocks to summarize, transform and pivot our data, and then export it into specific areas of the desired excel template.<br />
<a href="http://www.datamartist.com/wp-content/uploads/2012/03/excel-export-blocks-datamartist.png"><img src="http://www.datamartist.com/wp-content/uploads/2012/03/excel-export-blocks-datamartist.png" alt="" title="excel-export-blocks-datamartist" width="650" height="375" class="aligncenter size-full wp-image-6221" /></a></p>
<p>By inserting the data onto a "data" sheet in the template that has links to charts or tables on the dashboard sheet, we can create powerful, clear dashboards.</p>
<p><a href="http://www.datamartist.com/wp-content/uploads/2012/03/excel-dashboards-with-datamartist-data1.png"><img src="http://www.datamartist.com/wp-content/uploads/2012/03/excel-dashboards-with-datamartist-data1.png" alt="" title="excel-dashboards-with-datamartist-data" width="600" height="482" class="aligncenter size-full wp-image-6225" /></a></p>
<p>When in datamartist, you can see clearly where the data sets are going to be inserted- and datamartist generates the excel workbook(s) needed with all the data inserted.  Because Datamartist can be run from the command line, you can automate your excel reporting, pulling data from databases and files, and generating professional looking dashboards that are easy to share and use.</p>
<p><a href="http://www.datamartist.com/wp-content/uploads/2012/03/inserting-datamartist-data-into-excel.png"><img src="http://www.datamartist.com/wp-content/uploads/2012/03/inserting-datamartist-data-into-excel.png" alt="" title="inserting-datamartist-data-into-excel" width="600" height="360" class="aligncenter size-full wp-image-6232" /></a></p>
<p>The Excel export capability is far from the only enhancement that V1.6 brings.  I'll be blogging, and providing examples of new functionality in the coming weeks.</p>
</div>]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/datamartist-v1-6-released-excel-dashboard-enhancements/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A new years resolution to data profile</title>
		<link>http://www.datamartist.com/a-new-years-resolution-to-data-profile</link>
		<comments>http://www.datamartist.com/a-new-years-resolution-to-data-profile#comments</comments>
		<pubDate>Tue, 10 Jan 2012 15:54:05 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Data profiling]]></category>
		<category><![CDATA[Data Quality]]></category>
		<category><![CDATA[Reality Check]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=6165</guid>
		<description><![CDATA[Well, it's the time of making and breaking resolutions, a time when setting realistic goals is sometimes hard to do with all the optimism of the new year. Sometimes, we make decisions NOT to set a goal, because we don't want to break it. You might be thinking you really should step up your data [...]]]></description>
				<content:encoded><![CDATA[<div class="page-restrict-output"><p><a href="http://www.datamartist.com/wp-content/uploads/2012/01/data-profiling-some-data.jpg"><img src="http://www.datamartist.com/wp-content/uploads/2012/01/data-profiling-some-data-300x225.jpg" alt="" title="data-profiling-some-data" width="300" height="225" class="alignright size-medium wp-image-6171" /></a>Well, it's the time of making and breaking resolutions, a time when setting realistic goals is sometimes hard to do with all the optimism of the new year.  </p>
<p>Sometimes, we make decisions NOT to set a goal, because we don't want to break it.  </p>
<p>You might be thinking you really should step up your data quality monitoring- get some data profiling underway to help identify the data domains and areas you most want to tackle in 2012.  But you might be also thinking that with all the pressures and cutbacks that many companies are facing, you don't have the resources to implement a full scale profiling and monitoring effort, and so might decide to delay. </p>
<p>Don't wait. Just do it.  The perfect is the enemy of the good.</p>
<p>Rather than worrying about how much of your data you are going to be able to cover, or that you can't devote enough resources to tackle all of your reference areas at once, work at the problem from another direction.  </p>
<h1>First, start with master data.</h1>
<p>Master data is the data that all your other data is made from.  It's the data everyone uses to view the massive piles of transactional data, so one bad row in a master data table, and the impact is felt across perhaps hundreds of reports, and multiple time periods.  If you have a product in the wrong category, then every transaction, across perhaps hundreds of customers, and all time, will be mis-catagorized, and every total, sub-total and calculated metric using it will suffer.</p>
<p>While bad transactions are bad, bad reference data is deadly.  Bad reference data takes a good transaction and messes it up.</p>
<h1>Worst first!</h1>
<p>Make a list of your reference tables/area.  Customer, Product, Chart of account, etc. etc.  What are the most important for your business?  This isn't something I can tell you- you have to think about what is most critical.</p>
<p>If you are a company that purchases large amounts of materials from many vendors, and purchasing decisions are fast paced and critical, then maybe it's your vendor master, and your accounts payable.</p>
<p>On the other hand, if you have lots of interaction with your customers, and errors in the customer master cost you business, then start with that.</p>
<p>The key is to first make the list, and then think to yourself "if I have bad quality data, where am I most afraid it will be?"  Start profiling there.  You want to find the worst first, and fixing that will have the greatest positive impact.</p>
<h1>Get to know your data</h1>
<p>Don't worry about setting complex or work intensive goals right away.  Data profiling is about data discovery sometimes.  You need to wade into your reference data, play with it, tease out patterns and relationships.  As you get to know your data, you will be able to better identify where there are issues to tackle, and where root causes might lie for data quality issues.</p>
<p>One approach might be to simply resolve to spend an hour a week, every week, profiling some data.  If you aren't do that now, you will find that even just a bit of time set aside will give huge insight- sometimes we get too busy to do the basics, and we miss opportunities to make significant improvements with relatively little effort in our data.</p>
</div>]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/a-new-years-resolution-to-data-profile/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Datamartist V1.5 Released</title>
		<link>http://www.datamartist.com/datamartist-v1-5-released</link>
		<comments>http://www.datamartist.com/datamartist-v1-5-released#comments</comments>
		<pubDate>Wed, 31 Aug 2011 18:47:53 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Datamartist Tool]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=6114</guid>
		<description><![CDATA[We are pleased to announce that Datamartist V1.5 is now available. This version of Datamartist brings with it some useful new functionality, including new functions that can be used in expressions, new capabilities in terms of exporting to databases, and a new data block. In this post, we'll look at two new features, the Pivot [...]]]></description>
				<content:encoded><![CDATA[<div class="page-restrict-output"><p>We are pleased to announce that Datamartist V1.5 is now available.</p>
<p>This version of Datamartist brings with it some useful new functionality, including new functions that can be used in expressions, new capabilities in terms of exporting to databases, and a new data block.</p>
<p>In this post, we'll look at two new features, the Pivot block, and the enhanced database export capabilities.</p>
<h2>Pivot Block</h2>
<p>Our beta testers loved this new block.  The pivot block lets you do the equivalent of a cross-tab query, rolling up  a measure, and distributing the value in a new set of columns, where the column names are provided by the input data set.</p>
<p>Here is a simple example showing how it works:</p>
<p>Say we start with a set of data that has mutiple rows for each date, and different values in the color field, and a quantity measure:</p>
<p><a href="http://www.datamartist.com/wp-content/uploads/2011/08/pivot-block-input-data.png"><img src="http://www.datamartist.com/wp-content/uploads/2011/08/pivot-block-input-data.png" alt="" title="pivot-block-input-data" width="356" height="296" class="aligncenter size-full wp-image-6121" /></a></p>
<p>Then we can connect one of the new pivot blocks to this data set like so:</p>
<p><a href="http://www.datamartist.com/wp-content/uploads/2011/08/pivot-block-connected-to-internal-dataset.png"><img src="http://www.datamartist.com/wp-content/uploads/2011/08/pivot-block-connected-to-internal-dataset.png" alt="" title="pivot-block-connected-to-internal-dataset" width="666" height="338" class="aligncenter size-full wp-image-6122" /></a></p>
<p>The pivot block lets us select which columns to include (this defines the level of detail to roll up to), which string column to use to generate the new column names, and which measure to use as well as the rollup method (sum, average, min, max)</p>
<p><a href="http://www.datamartist.com/wp-content/uploads/2011/08/pivot-block-configuration.png"><img src="http://www.datamartist.com/wp-content/uploads/2011/08/pivot-block-configuration.png" alt="" title="pivot-block-configuration" width="578" height="238" class="aligncenter size-full wp-image-6125" /></a></p>
<p>The result?  The output of the pivot block looks like this: now we have a summary by color for each date, with a column for each color value.<br />
<a href="http://www.datamartist.com/wp-content/uploads/2011/08/pivot-block-resulting-dataset.png"><img src="http://www.datamartist.com/wp-content/uploads/2011/08/pivot-block-resulting-dataset.png" alt="" title="pivot-block-resulting-dataset" width="439" height="210" class="aligncenter size-full wp-image-6126" /></a></p>
<h2>Database export enhancements</h2>
<p>Now, when exporting to a database, there are a number of new options.</p>
<p>One of the most interesting is the capability to execute SQL commands in the database either before and/or after the data is exported into the table.</p>
<p>This provides the capability of running stored procedures, or launching follow on database side processing after Datamartist writes the data into the DB.</p>
<p>This is a powerful new capability, and makes it even easier to integrate datamartist into various systems, and get your data quality and profiling data where you need it.</p>
<p><a href="http://www.datamartist.com/wp-content/uploads/2011/08/sql-command-capability-example.png"><img src="http://www.datamartist.com/wp-content/uploads/2011/08/sql-command-capability-example.png" alt="" title="sql-command-capability-example" width="724" height="293" class="aligncenter size-full wp-image-6128" /></a></p>
<p>If you haven't checked out datamartist yet, we're not sure what you are waiting for-  <a href="/downloads">download the free trial,</a> and give it a go.</p>
</div>]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/datamartist-v1-5-released/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Datamartist data quality cartoons</title>
		<link>http://www.datamartist.com/datamartist-data-quality-cartoons</link>
		<comments>http://www.datamartist.com/datamartist-data-quality-cartoons#comments</comments>
		<pubDate>Tue, 21 Jun 2011 13:22:06 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Datamartist Cartoons]]></category>
		<category><![CDATA[Just for fun]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=4463</guid>
		<description><![CDATA[I've had lots of fun over the years building the little cartoons that have become a regular feature. Here are a few, reposted together just for fun. Data quality super powers. Fighting the anti-data forces of evil. Data silo fun. The joys of a moving target Data migration tools.]]></description>
				<content:encoded><![CDATA[<div class="page-restrict-output"><p>I've had lots of fun over the years building the little cartoons that have become a regular feature.  Here are a few, reposted together just for fun.</p>
<h2>Data quality super powers.</h2>
<p><a href="http://www.datamartist.com/wp-content/uploads/2011/06/data-quality-sense-tingling-april-birthdays.jpg"><img src="http://www.datamartist.com/wp-content/uploads/2011/06/data-quality-sense-tingling-april-birthdays.jpg" alt="" title="data-quality-sense-tingling-april-birthdays" width="338" height="244" class="aligncenter size-full wp-image-6050" /></a></p>
<p></p>
<h2>Fighting the anti-data forces of evil.</h2>
<p><a href="http://www.datamartist.com/wp-content/uploads/2011/06/the-data-days-no-the-ceo-says-yes.jpg"><img src="http://www.datamartist.com/wp-content/uploads/2011/06/the-data-days-no-the-ceo-says-yes.jpg" alt="" title="the-data-days-no-the-ceo-says-yes" width="446" height="331" class="aligncenter size-full wp-image-6049" /></a></p>
<h2>Data silo fun.</h2>
<p><a href="http://www.datamartist.com/wp-content/uploads/2011/06/data-silos-what-do-you-mean-data-silos.jpg"><img src="http://www.datamartist.com/wp-content/uploads/2011/06/data-silos-what-do-you-mean-data-silos.jpg" alt="" title="data-silos-what-do-you-mean-data-silos" width="373" height="276" class="aligncenter size-full wp-image-6052" /></a></p>
<p></p>
<p><a href="http://www.datamartist.com/wp-content/uploads/2011/06/datamigration-as-long-as-the-new-system-is-the-same.jpg"><img src="http://www.datamartist.com/wp-content/uploads/2011/06/datamigration-as-long-as-the-new-system-is-the-same.jpg" alt="" title="datamigration-as-long-as-the-new-system-is-the-same" width="463" height="343" class="aligncenter size-full wp-image-6051" /></a></p>
<p></p>
<h2>The joys of a moving target</h2>
<p><a href="http://www.datamartist.com/wp-content/uploads/2011/06/we-are-changing-all-the-product-codes-again-problem.jpg"><img src="http://www.datamartist.com/wp-content/uploads/2011/06/we-are-changing-all-the-product-codes-again-problem.jpg" alt="" title="we-are-changing-all-the-product-codes-again-problem" width="373" height="212" class="aligncenter size-full wp-image-6057" /></a></p>
<h2>Data migration tools.</h2>
<p>
<a href="http://www.datamartist.com/wp-content/uploads/2011/06/data-migration-get-the-hammer.jpg"><img src="http://www.datamartist.com/wp-content/uploads/2011/06/data-migration-get-the-hammer.jpg" alt="" title="data-migration-get-the-hammer" width="374" height="225" class="aligncenter size-full wp-image-6060" /></a></p>
</div>]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/datamartist-data-quality-cartoons/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Data Quality Rules</title>
		<link>http://www.datamartist.com/data-quality-rules</link>
		<comments>http://www.datamartist.com/data-quality-rules#comments</comments>
		<pubDate>Thu, 16 Jun 2011 17:00:07 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[data culture]]></category>
		<category><![CDATA[Data Quality]]></category>
		<category><![CDATA[Reality Check]]></category>
		<category><![CDATA[Data Quality rules]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=5995</guid>
		<description><![CDATA[What's the difference between good data and bad data? It is much like the difference between good children and bad children- the bad data doesn't follow the rules. But what are the rules? Unlike the rules for kids, which have been fixed in stone for decades (or at least, parents wish it were so), the [...]]]></description>
				<content:encoded><![CDATA[<div class="page-restrict-output"><p>What's the difference between good data and bad data?  It is much like the difference between good children and bad children- the bad data doesn't follow the rules.<br />
<a href="http://www.datamartist.com/wp-content/uploads/2011/04/data-quality-rules-data-freedom-or-death.jpg"><img src="http://www.datamartist.com/wp-content/uploads/2011/04/data-quality-rules-data-freedom-or-death-300x269.jpg" alt="" title="data-quality-rules-data-freedom-or-death" width="300" height="269" class="alignright size-medium wp-image-6011" /></a><br />
But what are the rules?  Unlike the rules for kids, which have been fixed in stone for decades  (or at least, parents wish it were so), the rules for data are slippery things that depend very much on the context and the database.</p>
<p>While it's a complex subject, some basic rules of thumb can avoid the deeper rabbit holes.</p>
<p>The first thing to understand about Data Quality rules is they aren't as easy as they may look.  Data is in theory something in the ordered world of computers, but in reality is in the "flexible" world of humans.  A huge amount of data is entered by members of the group "Homo sapiens" (or mutilated by software written by members of that group) and as a result is not as ordered as we would all like.</p>
<p>The challenge for data quality practitioners is to remove the chaos injected by those highly involved primates (us) and make the data the sterile, ordered, never any question about anything type that we all imagine in our fantasies.</p>
<p>But how?</p>
<p>In the end, it is amazing how powerful and complex the various solutions to this problem are.</p>
<p>But I suggest that there are some basic principles that can help guide us.</p>
<h2>First- do no harm.</h2>
<p>One of the risks of any data quality initiative is that it actually screws up the data more.  Don't define rules that are so complex, and so sure of themselves that they actually make the data worse.  Be humble. Don't change data unless you are pretty sure it's a good idea.  Err on the side of not screwing up the original.  And keep a copy of the original- so if things do go off the rails you can undo- or at least try to understand what when wrong.</p>
<h2>Go out and talk to the people</h2>
<p>Don't sit in your ivory tower and speculate as to what the data means.  Go out there and watch people enter it in.  See what real world type things are happening that never make it into bits and bytes.</p>
<h2>Attack the basics first</h2>
<p>Focus your first efforts on dealing with the basics- they will resolve the vast majority of the issues- don't chase after the outliers until you have the "easy" cases taken care of- the tough stuff is a case of diminishing returns- look first at how to fix processes and train your people to make the majority of typical data entry cases more accurate before you start looking into artificial intelligence based hyper-multi-semantic-algorithmic-learning-matching-holistic-flux-capacitor data quality systems.</p>
<h2>Less is more- the fewer rules the better.</h2>
<p>So whats the rule about making rules?  Try to make less rules, and test them in a pragmatic way.  It is possible to have so many rules that the rules themselves have data quality issues- don't go there.</p>
<p>Sometimes the simplest things will bring the greatest benefit.</p>
<p>In the coming weeks, I'll be posting about how to design, implement and monitor Data quality rules using the <a href="/">Datamartist tool</a>.</p>
</div>]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/data-quality-rules/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
