<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Datamartist.com</title>
	<atom:link href="http://www.datamartist.com/feed" rel="self" type="application/rss+xml" />
	<link>http://www.datamartist.com</link>
	<description>Reduce cost with self serve data transformation</description>
	<lastBuildDate>Wed, 25 Jan 2012 15:47:34 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.3</generator>
		<item>
		<title>A new years resolution to data profile</title>
		<link>http://www.datamartist.com/a-new-years-resolution-to-data-profile</link>
		<comments>http://www.datamartist.com/a-new-years-resolution-to-data-profile#comments</comments>
		<pubDate>Tue, 10 Jan 2012 15:54:05 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Data profiling]]></category>
		<category><![CDATA[Data Quality]]></category>
		<category><![CDATA[Reality Check]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=6165</guid>
		<description><![CDATA[Well, it's the time of making and breaking resolutions, a time when setting realistic goals is sometimes hard to do with all the optimism of the new year. Sometimes, we make decisions NOT to set a goal, because we don't want to break it. You might be thinking you really should step up your data [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.datamartist.com/wp-content/uploads/2012/01/data-profiling-some-data.jpg"><img src="http://www.datamartist.com/wp-content/uploads/2012/01/data-profiling-some-data-300x225.jpg" alt="" title="data-profiling-some-data" width="300" height="225" class="alignright size-medium wp-image-6171" /></a>Well, it's the time of making and breaking resolutions, a time when setting realistic goals is sometimes hard to do with all the optimism of the new year.  </p>
<p>Sometimes, we make decisions NOT to set a goal, because we don't want to break it.  </p>
<p>You might be thinking you really should step up your data quality monitoring- get some data profiling underway to help identify the data domains and areas you most want to tackle in 2012.  But you might be also thinking that with all the pressures and cutbacks that many companies are facing, you don't have the resources to implement a full scale profiling and monitoring effort, and so might decide to delay. </p>
<p>Don't wait. Just do it.  The perfect is the enemy of the good.</p>
<p>Rather than worrying about how much of your data you are going to be able to cover, or that you can't devote enough resources to tackle all of your reference areas at once, work at the problem from another direction.  </p>
<h1>First, start with master data.</h1>
<p>Master data is the data that all your other data is made from.  It's the data everyone uses to view the massive piles of transactional data, so one bad row in a master data table, and the impact is felt across perhaps hundreds of reports, and multiple time periods.  If you have a product in the wrong category, then every transaction, across perhaps hundreds of customers, and all time, will be mis-catagorized, and every total, sub-total and calculated metric using it will suffer.</p>
<p>While bad transactions are bad, bad reference data is deadly.  Bad reference data takes a good transaction and messes it up.</p>
<h1>Worst first!</h1>
<p>Make a list of your reference tables/area.  Customer, Product, Chart of account, etc. etc.  What are the most important for your business?  This isn't something I can tell you- you have to think about what is most critical.</p>
<p>If you are a company that purchases large amounts of materials from many vendors, and purchasing decisions are fast paced and critical, then maybe it's your vendor master, and your accounts payable.</p>
<p>On the other hand, if you have lots of interaction with your customers, and errors in the customer master cost you business, then start with that.</p>
<p>The key is to first make the list, and then think to yourself "if I have bad quality data, where am I most afraid it will be?"  Start profiling there.  You want to find the worst first, and fixing that will have the greatest positive impact.</p>
<h1>Get to know your data</h1>
<p>Don't worry about setting complex or work intensive goals right away.  Data profiling is about data discovery sometimes.  You need to wade into your reference data, play with it, tease out patterns and relationships.  As you get to know your data, you will be able to better identify where there are issues to tackle, and where root causes might lie for data quality issues.</p>
<p>One approach might be to simply resolve to spend an hour a week, every week, profiling some data.  If you aren't do that now, you will find that even just a bit of time set aside will give huge insight- sometimes we get too busy to do the basics, and we miss opportunities to make significant improvements with relatively little effort in our data.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/a-new-years-resolution-to-data-profile/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Datamartist V1.5 Released</title>
		<link>http://www.datamartist.com/datamartist-v1-5-released</link>
		<comments>http://www.datamartist.com/datamartist-v1-5-released#comments</comments>
		<pubDate>Wed, 31 Aug 2011 18:47:53 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Datamartist Tool]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=6114</guid>
		<description><![CDATA[We are pleased to announce that Datamartist V1.5 is now available. This version of Datamartist brings with it some useful new functionality, including new functions that can be used in expressions, new capabilities in terms of exporting to databases, and a new data block. In this post, we'll look at two new features, the Pivot [...]]]></description>
			<content:encoded><![CDATA[<p>We are pleased to announce that Datamartist V1.5 is now available.</p>
<p>This version of Datamartist brings with it some useful new functionality, including new functions that can be used in expressions, new capabilities in terms of exporting to databases, and a new data block.</p>
<p>In this post, we'll look at two new features, the Pivot block, and the enhanced database export capabilities.</p>
<h2>Pivot Block</h2>
<p>Our beta testers loved this new block.  The pivot block lets you do the equivalent of a cross-tab query, rolling up  a measure, and distributing the value in a new set of columns, where the column names are provided by the input data set.</p>
<p>Here is a simple example showing how it works:</p>
<p>Say we start with a set of data that has mutiple rows for each date, and different values in the color field, and a quantity measure:</p>
<p><a href="http://www.datamartist.com/wp-content/uploads/2011/08/pivot-block-input-data.png"><img src="http://www.datamartist.com/wp-content/uploads/2011/08/pivot-block-input-data.png" alt="" title="pivot-block-input-data" width="356" height="296" class="aligncenter size-full wp-image-6121" /></a></p>
<p>Then we can connect one of the new pivot blocks to this data set like so:</p>
<p><a href="http://www.datamartist.com/wp-content/uploads/2011/08/pivot-block-connected-to-internal-dataset.png"><img src="http://www.datamartist.com/wp-content/uploads/2011/08/pivot-block-connected-to-internal-dataset.png" alt="" title="pivot-block-connected-to-internal-dataset" width="666" height="338" class="aligncenter size-full wp-image-6122" /></a></p>
<p>The pivot block lets us select which columns to include (this defines the level of detail to roll up to), which string column to use to generate the new column names, and which measure to use as well as the rollup method (sum, average, min, max)</p>
<p><a href="http://www.datamartist.com/wp-content/uploads/2011/08/pivot-block-configuration.png"><img src="http://www.datamartist.com/wp-content/uploads/2011/08/pivot-block-configuration.png" alt="" title="pivot-block-configuration" width="578" height="238" class="aligncenter size-full wp-image-6125" /></a></p>
<p>The result?  The output of the pivot block looks like this: now we have a summary by color for each date, with a column for each color value.<br />
<a href="http://www.datamartist.com/wp-content/uploads/2011/08/pivot-block-resulting-dataset.png"><img src="http://www.datamartist.com/wp-content/uploads/2011/08/pivot-block-resulting-dataset.png" alt="" title="pivot-block-resulting-dataset" width="439" height="210" class="aligncenter size-full wp-image-6126" /></a></p>
<h2>Database export enhancements</h2>
<p>Now, when exporting to a database, there are a number of new options.</p>
<p>One of the most interesting is the capability to execute SQL commands in the database either before and/or after the data is exported into the table.</p>
<p>This provides the capability of running stored procedures, or launching follow on database side processing after Datamartist writes the data into the DB.</p>
<p>This is a powerful new capability, and makes it even easier to integrate datamartist into various systems, and get your data quality and profiling data where you need it.</p>
<p><a href="http://www.datamartist.com/wp-content/uploads/2011/08/sql-command-capability-example.png"><img src="http://www.datamartist.com/wp-content/uploads/2011/08/sql-command-capability-example.png" alt="" title="sql-command-capability-example" width="724" height="293" class="aligncenter size-full wp-image-6128" /></a></p>
<p>If you haven't checked out datamartist yet, we're not sure what you are waiting for-  <a href="/downloads">download the free trial,</a> and give it a go.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/datamartist-v1-5-released/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Datamartist data quality cartoons</title>
		<link>http://www.datamartist.com/datamartist-data-quality-cartoons</link>
		<comments>http://www.datamartist.com/datamartist-data-quality-cartoons#comments</comments>
		<pubDate>Tue, 21 Jun 2011 13:22:06 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Datamartist Cartoons]]></category>
		<category><![CDATA[Just for fun]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=4463</guid>
		<description><![CDATA[I've had lots of fun over the years building the little cartoons that have become a regular feature. Here are a few, reposted together just for fun. Data quality super powers. Fighting the anti-data forces of evil. Data silo fun. The joys of a moving target Data migration tools.]]></description>
			<content:encoded><![CDATA[<p>I've had lots of fun over the years building the little cartoons that have become a regular feature.  Here are a few, reposted together just for fun.</p>
<h2>Data quality super powers.</h2>
<p><a href="http://www.datamartist.com/wp-content/uploads/2011/06/data-quality-sense-tingling-april-birthdays.jpg"><img src="http://www.datamartist.com/wp-content/uploads/2011/06/data-quality-sense-tingling-april-birthdays.jpg" alt="" title="data-quality-sense-tingling-april-birthdays" width="338" height="244" class="aligncenter size-full wp-image-6050" /></a></p>
<p></p>
<h2>Fighting the anti-data forces of evil.</h2>
<p><a href="http://www.datamartist.com/wp-content/uploads/2011/06/the-data-days-no-the-ceo-says-yes.jpg"><img src="http://www.datamartist.com/wp-content/uploads/2011/06/the-data-days-no-the-ceo-says-yes.jpg" alt="" title="the-data-days-no-the-ceo-says-yes" width="446" height="331" class="aligncenter size-full wp-image-6049" /></a></p>
<h2>Data silo fun.</h2>
<p><a href="http://www.datamartist.com/wp-content/uploads/2011/06/data-silos-what-do-you-mean-data-silos.jpg"><img src="http://www.datamartist.com/wp-content/uploads/2011/06/data-silos-what-do-you-mean-data-silos.jpg" alt="" title="data-silos-what-do-you-mean-data-silos" width="373" height="276" class="aligncenter size-full wp-image-6052" /></a></p>
<p></p>
<p><a href="http://www.datamartist.com/wp-content/uploads/2011/06/datamigration-as-long-as-the-new-system-is-the-same.jpg"><img src="http://www.datamartist.com/wp-content/uploads/2011/06/datamigration-as-long-as-the-new-system-is-the-same.jpg" alt="" title="datamigration-as-long-as-the-new-system-is-the-same" width="463" height="343" class="aligncenter size-full wp-image-6051" /></a></p>
<p></p>
<h2>The joys of a moving target</h2>
<p><a href="http://www.datamartist.com/wp-content/uploads/2011/06/we-are-changing-all-the-product-codes-again-problem.jpg"><img src="http://www.datamartist.com/wp-content/uploads/2011/06/we-are-changing-all-the-product-codes-again-problem.jpg" alt="" title="we-are-changing-all-the-product-codes-again-problem" width="373" height="212" class="aligncenter size-full wp-image-6057" /></a></p>
<h2>Data migration tools.</h2>
<p>
<a href="http://www.datamartist.com/wp-content/uploads/2011/06/data-migration-get-the-hammer.jpg"><img src="http://www.datamartist.com/wp-content/uploads/2011/06/data-migration-get-the-hammer.jpg" alt="" title="data-migration-get-the-hammer" width="374" height="225" class="aligncenter size-full wp-image-6060" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/datamartist-data-quality-cartoons/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Data Quality Rules</title>
		<link>http://www.datamartist.com/data-quality-rules</link>
		<comments>http://www.datamartist.com/data-quality-rules#comments</comments>
		<pubDate>Thu, 16 Jun 2011 17:00:07 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[data culture]]></category>
		<category><![CDATA[Data Quality]]></category>
		<category><![CDATA[Reality Check]]></category>
		<category><![CDATA[Data Quality rules]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=5995</guid>
		<description><![CDATA[What's the difference between good data and bad data? It is much like the difference between good children and bad children- the bad data doesn't follow the rules. But what are the rules? Unlike the rules for kids, which have been fixed in stone for decades (or at least, parents wish it were so), the [...]]]></description>
			<content:encoded><![CDATA[<p>What's the difference between good data and bad data?  It is much like the difference between good children and bad children- the bad data doesn't follow the rules.<br />
<a href="http://www.datamartist.com/wp-content/uploads/2011/04/data-quality-rules-data-freedom-or-death.jpg"><img src="http://www.datamartist.com/wp-content/uploads/2011/04/data-quality-rules-data-freedom-or-death-300x269.jpg" alt="" title="data-quality-rules-data-freedom-or-death" width="300" height="269" class="alignright size-medium wp-image-6011" /></a><br />
But what are the rules?  Unlike the rules for kids, which have been fixed in stone for decades  (or at least, parents wish it were so), the rules for data are slippery things that depend very much on the context and the database.</p>
<p>While it's a complex subject, some basic rules of thumb can avoid the deeper rabbit holes.</p>
<p>The first thing to understand about Data Quality rules is they aren't as easy as they may look.  Data is in theory something in the ordered world of computers, but in reality is in the "flexible" world of humans.  A huge amount of data is entered by members of the group "Homo sapiens" (or mutilated by software written by members of that group) and as a result is not as ordered as we would all like.</p>
<p>The challenge for data quality practitioners is to remove the chaos injected by those highly involved primates (us) and make the data the sterile, ordered, never any question about anything type that we all imagine in our fantasies.</p>
<p>But how?</p>
<p>In the end, it is amazing how powerful and complex the various solutions to this problem are.</p>
<p>But I suggest that there are some basic principles that can help guide us.</p>
<h2>First- do no harm.</h2>
<p>One of the risks of any data quality initiative is that it actually screws up the data more.  Don't define rules that are so complex, and so sure of themselves that they actually make the data worse.  Be humble. Don't change data unless you are pretty sure it's a good idea.  Err on the side of not screwing up the original.  And keep a copy of the original- so if things do go off the rails you can undo- or at least try to understand what when wrong.</p>
<h2>Go out and talk to the people</h2>
<p>Don't sit in your ivory tower and speculate as to what the data means.  Go out there and watch people enter it in.  See what real world type things are happening that never make it into bits and bytes.</p>
<h2>Attack the basics first</h2>
<p>Focus your first efforts on dealing with the basics- they will resolve the vast majority of the issues- don't chase after the outliers until you have the "easy" cases taken care of- the tough stuff is a case of diminishing returns- look first at how to fix processes and train your people to make the majority of typical data entry cases more accurate before you start looking into artificial intelligence based hyper-multi-semantic-algorithmic-learning-matching-holistic-flux-capacitor data quality systems.</p>
<h2>Less is more- the fewer rules the better.</h2>
<p>So whats the rule about making rules?  Try to make less rules, and test them in a pragmatic way.  It is possible to have so many rules that the rules themselves have data quality issues- don't go there.</p>
<p>Sometimes the simplest things will bring the greatest benefit.</p>
<p>In the coming weeks, I'll be posting about how to design, implement and monitor Data quality rules using the <a href="/">Datamartist tool</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/data-quality-rules/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Data Profiler tool Datamartist V1.4 Released</title>
		<link>http://www.datamartist.com/data-profiler-tool-datamartist-v1-4-released</link>
		<comments>http://www.datamartist.com/data-profiler-tool-datamartist-v1-4-released#comments</comments>
		<pubDate>Mon, 16 May 2011 16:05:31 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Datamartist Tool]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=6017</guid>
		<description><![CDATA[We are pleased to announce the release of Datamartist V1.4 We've had lots of great feedback from our customers and are thrilled with how people are using Datamartist, not just for powerful and flexible data profiling, but for data migration, data quality work and ad-hoc datamart creation. As always we're committed to continually improve our [...]]]></description>
			<content:encoded><![CDATA[<p>We are pleased to announce the release of Datamartist V1.4</p>
<p>We've had lots of great feedback from our customers and are thrilled with how people are using Datamartist, not just for powerful and flexible data profiling, but for data migration, data quality work and ad-hoc datamart creation.  </p>
<p>As always we're committed to continually improve our products- Here are just a few of the features added in this latest version;</p>
<h2>Block definition import and export</h2>
<p><a href="http://www.datamartist.com/wp-content/uploads/2011/05/block-export-datamartist-data-profiling-tool2.jpg"><img src="http://www.datamartist.com/wp-content/uploads/2011/05/block-export-datamartist-data-profiling-tool2-300x195.jpg" alt="" title="block-export-datamartist-data-profiling-tool" width="300" height="195" class="alignright size-medium wp-image-6023" /></a>Datamartist adds a new level of reuse-ability and collaboration capabilities with the addition of block export/import. </p>
<p>This lets you export a block configuration (or a number of blocks, with all their connectors) to a file that can then be imported into any other Canvas, either by yourself or by your colleagues that are also using Datamartist.  Just select the blocks you want, and right click to export- just right click anywhere on any canvas to import an existing block file.</p>
<p>We've found this particularly useful in saving filters, segmentations, and even full data profiling blocks to give us a library of useful blocks and block groups that we use again and again.</p>
<h2>Improved database connectivity and connection management</h2>
<p><a href="http://www.datamartist.com/wp-content/uploads/2011/05/Database-connectivity-with-value-distribution-profiling-datamartist.jpg"><img src="http://www.datamartist.com/wp-content/uploads/2011/05/Database-connectivity-with-value-distribution-profiling-datamartist-300x222.jpg" alt="" title="Database-connectivity-with-value-distribution-profiling-datamartist" width="300" height="222" class="alignright size-medium wp-image-6018" /></a>We've also added some features and improved how we connect to databases in Datamartist</h2>
<p>Datamartist can connect to SQL Server, Oracle, MySQL, MS Access, Text files and Excel files, as well as having and ODBC driver that lets you connect to many other databases.  Now its even easier to manage a large number of database connections for multiple database types- keep all those servers and connections at your fingertips when you are combining all that data!</p>
<h2>Data profiling in an affordable, graphical, visual environment</h2>
<p>Find out why people are loving the combination of an ETL and a Data profiling tool in one- using Datamartist not just for data profiling, but for data migration, data quality audits, and ad hoc datamart creation.  Try the <a href="http://www.datamartist.com/downloads">free trial</a> today.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/data-profiler-tool-datamartist-v1-4-released/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Data quality sizzle</title>
		<link>http://www.datamartist.com/data-quality-sizzle</link>
		<comments>http://www.datamartist.com/data-quality-sizzle#comments</comments>
		<pubDate>Tue, 22 Mar 2011 18:08:56 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Data Quality]]></category>
		<category><![CDATA[Project Management]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=5985</guid>
		<description><![CDATA[I'm an engineer. Being an engineer, I'm pretty product focused, pretty technology focused, and pretty "does it work or not" focused. Having technical things like tools work is useful, and good. But just because you build it, does not mean they will come. The challenge often in Data Quality is that often what has to [...]]]></description>
			<content:encoded><![CDATA[<p>I'm an engineer. Being an engineer, I'm pretty product focused, pretty technology focused, and pretty "does it work or not" focused.  </p>
<p>Having technical things like tools work is useful, and good.  But just because you build it, does not mean they will come.</p>
<p>The challenge often in Data Quality is that often what has to change even more than the technology or tools is the behaviours and perspectives of the people in the organisation with data quality issues.  At the very least, the users have to use the tools.  Very few data quality solutions are of the "full autopilot" bad-data-goes-in-here-good-comes-out-here type.</p>
<p>As much as we engineers would like to solve everything with software, people are involved in Data Quality.  </p>
<p>While a fantastic bit of data profiling analysis or an elegant and powerful data transform would seem to be enough, the truth is sometimes how and when you present these things is key to getting the non-engineer people to buy in.  </p>
<p>Sometimes preparing people over time, and introducing things in a step by step way helps them understand, and makes the technology and the change required less daunting.</p>
<p><a href="http://www.datamartist.com/wp-content/uploads/2011/03/red-bbq.jpg"><img src="http://www.datamartist.com/wp-content/uploads/2011/03/red-bbq-300x199.jpg" alt="" title="red-bbq" width="300" height="199" class="alignright size-medium wp-image-5987" /></a>Because I'm looking out my window at a tentative (very tentative it's only March after all) spring day here in Toronto, I'm going to use a summer barbecue analogy.</p>
<p>The tools and technology are the steak.  The steak is key to the party.   In the end (at least for me in this analogy) the steak delivers most of the value in your summer BBQ party value proposition, but you'll have more guests and be more successful over all if you package the whole. </p>
<p>Sometimes, part of selling the steak is the sizzle, the preparation, the things around the steak.</p>
<p>It's the smell of the BBQ getting ready, it's the sound of the steak hitting the grill- its the cold drink, the conversation, the games on the lawn for the kids.</p>
<p>In the end, even if you know that 90% of the deal was that steak, if you just put a steak on a plate and give it to each guest the moment they arrive, its just not going to get the same response.</p>
<p>In my usual round about way the point I'm trying to get to is that you can't solve technical problems, then drop them on people desks and say "do it".  You need to invite them to the party.  Prepare them for the menu, ask preferences, give them some time to hear the sizzle, smell the charcoal, enjoy the sunshine in expectation of that steak.</p>
<p>Steak is good.  Remember to plan some sizzle too.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/data-quality-sizzle/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Which myths are holding you back?</title>
		<link>http://www.datamartist.com/which-myths-are-holding-you-back</link>
		<comments>http://www.datamartist.com/which-myths-are-holding-you-back#comments</comments>
		<pubDate>Thu, 10 Feb 2011 15:06:40 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[data culture]]></category>
		<category><![CDATA[Reality Check]]></category>
		<category><![CDATA[assumptions]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=5952</guid>
		<description><![CDATA[In your business you have "facts". Things that are considered to be true. Lots of folks have heard of them, or believe them, and propagate them. But are they true? You are making decisions every day based on these "facts". Obviously, we have to believe something. But today I'm asking you to be skeptical. Question [...]]]></description>
			<content:encoded><![CDATA[<p>In your business you have "facts". Things that are considered to be true. Lots of folks have heard of them, or believe them, and propagate them. But are they true?  You are making decisions every day based on these "facts".</p>
<p>Obviously, we have to believe something.  But today I'm asking you to be skeptical.  Question your facts.</p>
<p>Let me give you an example. I'm a Canadian, and looking out my window right now, I can see a pretty healthy snow fall accumulating. Lots of the white stuff.  Brings to mind the fact that some cultures in the far north have over 100 words for snow.<br />
<a href="http://www.datamartist.com/wp-content/uploads/2011/02/snowman-black-hat-and-scarf.jpg"><img src="http://www.datamartist.com/wp-content/uploads/2011/02/snowman-black-hat-and-scarf.jpg" alt="" title="snowman-black-hat-and-scarf" width="328" height="366" class="alignright size-full wp-image-5970" /></a><br />
Hang on. Is that a fact?  Tell me- have you heard a variation of that?</p>
<p>I'm sure I read that somewhere.  I've heard others mention it.  It makes sense- I mean, people living in the far north would see lots of snow, and would know all about it, and so their language would evolve to encompass lots of different qualities of snow. </p>
<p>Sounds good.</p>
<p>Only, is it?  It's an idea that "just makes sense".  People seem to just accept it as soon as you say it.  People are likely to pass the idea along to others- because it makes a compelling story.</p>
<p>But in fact, it's wrong.  I'll let you google to your hearts content if you like to find more evidence than my say so, but after reading a number of articles on the subject, (here is an <a href="http://www.princeton.edu/~browning/snow.html">example</a>, and of course <a href="http://en.wikipedia.org/wiki/Eskimo_words_for_snow">the Wikipedia entry</a>.) it seems clear that there are not 100 words for snow in any language.  In fact, English has about the same number of ways of talking about snow as languages from societies in the far north. </p>
<p>So the point of all this is-  what myths do you have in your organisation?  Things that "everyone" knows are true. Things that when they are explained to you make "perfect sense".  Things that you teach to every new hire so that they "know how things are".</p>
<p>The insidous thing about "facts" is that once they gain purchase, any contrary evidence tends to be called an "exception", or discounted.</p>
<p>Use data to find out what is true. Fight to improve the quality of your data to find more and more new truths. Question the status quo if the data contradicts it.  Don't assume that something is wrong with the data when "things don't make sense."  Maybe they don't make sense because your assumptions are just plain WRONG.</p>
<p>Be very aware that you might be making decisions based on myths that while sounding so plausible, so clear, so common sense, are pure fantasy.</p>
<p>The good news is that all your competitors might be doing the same thing.  If you look at your data, and see through it, you might show them all how wrong they are.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/which-myths-are-holding-you-back/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Preparing Data for QlikView</title>
		<link>http://www.datamartist.com/preparing-data-for-qlikview</link>
		<comments>http://www.datamartist.com/preparing-data-for-qlikview#comments</comments>
		<pubDate>Thu, 18 Nov 2010 14:44:34 +0000</pubDate>
		<dc:creator>Cam Quinn</dc:creator>
				<category><![CDATA[Datamartist Tool]]></category>
		<category><![CDATA[ETL]]></category>
		<category><![CDATA[Qlik View]]></category>
		<category><![CDATA[Qlikview]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=5870</guid>
		<description><![CDATA[In this blog post, I am going to play with some economic data- specifically, Canadian Import and Export data using Datamartist and then use QlikView Business Intelligence Software to analyze the results. The trick with public data like this is that often (ok almost ALWAYS) either data is missing, or the codes don't match up. [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.datamartist.com/wp-content/uploads/2010/11/QlikView-Introduction-Screen-Shot.jpeg"><img src="http://www.datamartist.com/wp-content/uploads/2010/11/QlikView-Introduction-Screen-Shot-300x150.jpg" alt="" width="300" height="150" class="alignleft size-medium wp-image-5875" /></a>In this blog post, I am going to play with some economic data- specifically,  Canadian Import and Export data using Datamartist and then use QlikView Business Intelligence Software to analyze the results.</p>
<p>The trick with public data like this is that often (ok almost ALWAYS) either data is missing, or the codes don't match up.  In this case, the country descriptions from various data sets I want to use don't match- and different data sets have different holes (i.e. not all datasets include data for all countries).  Finally, some data sets have a different definition of a country- for example, they break out places like "British Indian Ocean Territories" that need to get rolled up in the UK numbers.</p>
<p> Country statistics data such as GDP, GNI and Population were also incorporated to provide dimensions to carry the analysis out on. The raw trade data was obtained from Industry Canada's "Trade Data Online" (<a href="http://www.ic.gc.ca/sc_mrkti/tdst/tdo/tdo.php?lang=30&amp;productType=HS6" target="_blank">http://www.ic.gc.ca/sc_mrkti/tdst/tdo/tdo.php?lang=30&amp;productType=HS6</a>). The World Bank was the source of the country statistics data (<a href="http://data.worldbank.org/indicator" target="_blank">http://data.worldbank.org/indicator</a>). A zip file containing the raw data, as well as the Datamartist data transformation .dmc file is provided at the bottom of this post.</p>
<p>I started by transforming the country statistics data. The raw data included information on GDP, GNI, Total Population and Urban Population. A screenshot of the Datamartist canvas for the first portion of this data transformation is provided below.<a href="http://www.datamartist.com/wp-content/uploads/2010/11/Datamartist-Country-Statistics-Canvas-Screenshot.jpeg"><img src="http://www.datamartist.com/wp-content/uploads/2010/11/Datamartist-Country-Statistics-Canvas-Screenshot-300x133.jpg" alt="" width="300" height="133" class="aligncenter size-medium wp-image-5878" /></a>As seen above, the first step in the data transformation involved importing the four excel data files. During this import, columns with data not relevant to the year 2009 were filtered out and zeros were inserted into any null data rows, signifying that data for that row was not available. <a href="http://www.datamartist.com/wp-content/uploads/2010/11/Datamartist-GDP-GNI-Join-Screenshot.jpeg"><img src="http://www.datamartist.com/wp-content/uploads/2010/11/Datamartist-GDP-GNI-Join-Screenshot-300x128.jpg" alt="" width="300" height="128" class="alignright size-medium wp-image-5886" /></a>A series of data "Join" functions were then carried out to create one data file containing all of the country statistics information. Upon completion of joining these data files," a "Calculation" block was utilized to replace any null data values resulting from the data join with zero's. Finally, the country statistics information was joined with a country cross reference list. Basically, this join standardizes all of the country names.</p>
<p>The second part of the country statistics data transformation focused on segmenting the data, as shown in the Datamartist canvas screenshot below.<a href="http://www.datamartist.com/wp-content/uploads/2010/11/Datamartist-Country-Statistics-Segmentation-Screenshot.jpeg"><img src="http://www.datamartist.com/wp-content/uploads/2010/11/Datamartist-Country-Statistics-Segmentation-Screenshot-300x82.jpg" alt="" width="300" height="82" class="aligncenter size-medium wp-image-5891" /></a>Before the data could be segmented, it was summarized so that there was only one data row for each standardized country name. A "Calculation" block was then added to calculate the GDP per Capita, GNI per Capita and Urban Population Percentage using the Population data. With these calculations complete, a series of "Segment" blocks were added to the canvas. The "Segment" blocks are extremely useful because they add an additional column to the data set which is <a href="http://www.datamartist.com/wp-content/uploads/2010/11/Datamartist-Population-Segment-Block-Screenshot.jpeg"><img src="http://www.datamartist.com/wp-content/uploads/2010/11/Datamartist-Population-Segment-Block-Screenshot-300x141.jpg" alt="" width="300" height="141" class="alignleft size-medium wp-image-5895" /></a>populated according to a set of segmentation rules defined by the user. In this example, the "Segment" block was used to segment the GDP per Capita, GNI per Capita, Urban Population Percentage and Population data. The segmentation rules for the Population "Segment" block are shown in the screenshot on the left.</p>
<p>A similar set of data transformations was also carried out on the Canadian Import and Export Trade data. <a href="http://www.datamartist.com/wp-content/uploads/2010/11/Datamartist-Trade-Canvas-Screenshot.jpeg"><img src="http://www.datamartist.com/wp-content/uploads/2010/11/Datamartist-Trade-Canvas-Screenshot-300x92.jpg" alt="" width="300" height="92" class="aligncenter size-medium wp-image-5900" /></a>As seen in the Datamartist canvas screenshot, the Canadian Import and Export Trade data was imported, joined and null data values were replaced with zeros. The country names were then standardized and summarized so that the Canadian Import and Export Trade data could be joined with the Country Statistics data.</p>
<p>With all of the raw data transformed into a suitable format, a final set of data transformations were carried out to create a single text file. This text file was then exported so that QlikView could be used to analyze the data. A screenshot of the Datamartist canvas for this final set of data transformations is shown below.<a href="http://www.datamartist.com/wp-content/uploads/2010/11/Datamartist-Star-Schema-Canvas-Screenshot.jpeg"><img src="http://www.datamartist.com/wp-content/uploads/2010/11/Datamartist-Star-Schema-Canvas-Screenshot-300x135.jpg" alt="" width="300" height="135" class="aligncenter size-medium wp-image-5907" /></a> In this final set of data transformations, the "Star Schema" block was used first. The "Star Schema" block is a handy data transformation tool because it allows numerous data join operations to be carried out simultaneously.<a href="http://www.datamartist.com/wp-content/uploads/2010/11/Datamartist-Star-Schema-Block-Screenshot1.jpeg"><img src="http://www.datamartist.com/wp-content/uploads/2010/11/Datamartist-Star-Schema-Block-Screenshot1-300x121.jpg" alt="" width="300" height="121" class="alignright size-medium wp-image-5912" /></a> It was used to combine the Country Statistics data and Canadian Import and Export Trade data with data defining a country's geographical region. A screenshot of the "Star Schema" block configuration window is shown to the left. The joined data was then put through a "Calculation" block one last time to eliminate any null data values. Finally, the transformed data was exported as a text file so that it could be analyzed in QlikView.</p>
<p>The transformed data was then imported<a href="http://www.datamartist.com/wp-content/uploads/2010/11/Screen-shot-2010-11-11-at-10.40.08-AM.png"><img src="http://www.datamartist.com/wp-content/uploads/2010/11/Screen-shot-2010-11-11-at-10.40.08-AM-300x187.png" alt="" width="300" height="187" class="alignright size-medium wp-image-5917" /></a> into QlikView and a dashboard was created to analyze the data with. QlikView is a great data analysis tool because it allows data to be filtered and visualized very efficiently. In this example, I made a dashboard that allows Canada Import and Export Trade data to be visualized using the Country Statistics Data segments created using the Datamartist software as filters. A screenshot of the dashboard with no filters applied is shown to the right. I am now going to show a series of screenshots with different data filters applied. To start off, I want to see the countries that Canada Exports the most goods to. To do this, I just dragged a box over the largest bars on the export graph as seen below.<a href="http://www.datamartist.com/wp-content/uploads/2010/11/Screen-shot-2010-11-11-at-11.05.05-AM.png"><img src="http://www.datamartist.com/wp-content/uploads/2010/11/Screen-shot-2010-11-11-at-11.05.05-AM-300x187.png" alt="" width="300" height="187" class="aligncenter size-medium wp-image-5919" /></a> Once I finished making the data selection box on the graph, I released the mouse and QlikView automatically zoomed in on the area I selected. In addition, the upper right table in the dashboard updates as well. The image below shows the results. As seen in the image, Canada's biggest export trade partners in 2009 were the United States, the United Kingdom and China.<a href="http://www.datamartist.com/wp-content/uploads/2010/11/Screen-shot-2010-11-11-at-11.14.16-AM.png"><img src="http://www.datamartist.com/wp-content/uploads/2010/11/Screen-shot-2010-11-11-at-11.14.16-AM-300x187.png" alt="" width="300" height="187" class="aligncenter size-medium wp-image-5922" /></a> As a second example, I will filter the data using the data segments created in Datamartist. If I click on "Asia" in the "Region" box then only data from the countries in Asia is shown in the table and graphs. Furthermore, the segments in the other filter boxes (GDP per Capita, GNI per Capita, etc) updates to the region selection as well. It does this by highlighting the data filter segments that are valid for the "Asia" region in white. For example, in the "GDP per Capita" box, all data segments are valid except for the "GDP &gt; $100 Thousand" segment. A screenshot of QlikView with the "Asia" region filter on is shown below.<a href="http://www.datamartist.com/wp-content/uploads/2010/11/Screen-shot-2010-11-11-at-12.15.11-PM.png"><img src="http://www.datamartist.com/wp-content/uploads/2010/11/Screen-shot-2010-11-11-at-12.15.11-PM-300x187.png" alt="" width="300" height="187" class="aligncenter size-medium wp-image-5929" /></a> I can further filter the data by clicking on any other data filter segments that are white. As an example, if I select "$1 Thousand - $5 Thousand" in the "GDP per Capita" box and "40% - 60%" in the "Urban Population Percentage" box, the graphs and table update again. In this instance, the only countries that meet these filter requirements are China, Georgia and Mongolia. A QlikView screenshot with all three of the filters chosen is shown below.<a href="http://www.datamartist.com/wp-content/uploads/2010/11/Screen-shot-2010-11-11-at-12.21.03-PM.png"><img src="http://www.datamartist.com/wp-content/uploads/2010/11/Screen-shot-2010-11-11-at-12.21.03-PM-300x187.png" alt="" width="300" height="187" class="aligncenter size-medium wp-image-5930" /></a></p>
<h2>Try it out yourself with the free trial</h2>
<p>You can give Datamartist a try with this data, just <a href="/downloads">signup and download</a> the free trial, and then download <a href="http://www.nmodal.com/downloads/CanadaWorldTradingExample.zip">a zip file will all the data, and the example Datamartist file</a>.</p>
<p>Just extract all the files in the above ZIP file into the "My Datamartist" folder that the Datamartist trial will create when you run it, and open the "World Trading Example.DMC" file with Datamartist.</p>
<p>You'll find that Datamartist gives you a powerful, visual way to transform data from lots of places, and get it ready for great visualization tools like Qlikview in a step by step, clean, repeatable way.</p>
<p>On top of that, datamartist can be automated- so if you have data transformations you need to run on a schedule, you can design them in a graphical environment, test them, and then have them run automatically.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/preparing-data-for-qlikview/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Data profiling- a search or a code to crack?</title>
		<link>http://www.datamartist.com/data-profiling-a-search-or-a-code-to-crac</link>
		<comments>http://www.datamartist.com/data-profiling-a-search-or-a-code-to-crac#comments</comments>
		<pubDate>Wed, 03 Nov 2010 17:50:08 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Data profiling]]></category>
		<category><![CDATA[Data Quality]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=5848</guid>
		<description><![CDATA[Often, tracking down data quality issues is presented as a search for bad data- but sometimes the data isn't so much bad, as not understood. In legacy systems, you might be more trying to first find the meaning of data- in effect, decoding it as if it had been encrypted (which in a way, time [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.datamartist.com/wp-content/uploads/2010/11/300px-Enigma-rotor-stack.jpg"><img src="http://www.datamartist.com/wp-content/uploads/2010/11/300px-Enigma-rotor-stack.jpg" alt="" title="Photo by Bob Lord" width="300" height="225" class="alignright size-full wp-image-5850" /></a>Often, tracking down data quality issues is presented as a search for bad data- but sometimes the data isn't so much bad, as not understood.  In legacy systems, you might be more trying to first find the meaning of data- in effect, decoding it as if it had been encrypted (which in a way, time and lack of documentation might very well have done).</p>
<p>You know that all that data means something- but what?</p>
<p>One of my favorite code-busting stories is the epic victory over the Enigma code during the second world war.  One of the reasons its of interest is that it was one of the early applications of computing- but the key lesson I think is from not the brute force computation done, but the strategies used to crack the code.</p>
<p>When you are trying to crack a code, one of the key things you need are "Cribs"- some way have samples of coded message and clear text.  These cribs can radically reduce the number of possible ways a code can be decoded.</p>
<p>In the case of enigma, the allies would listen for German U-boat radio transmissions, while also using direction finding equipment to estimate their location.  Standard procedure was for a U-Boat to first radio a weather report.</p>
<p>By painstakingly back tracking known weather conditions and locations of U-Boats when they transmitted it was possible to take advantage of that first weather report- there were only so many ways to say "Sunny and calm".  Having this crib gave them a way to break into the code.</p>
<p>What is the point in terms of Data profiling?  While it's critical to have the right tools to analyse the data (a data profiler like <a href="/">Datamartist</a>, for example), its also important to get out there and talk to people, understand whats going on- collect some Cribs that will help it all make sense. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/data-profiling-a-search-or-a-code-to-crac/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Transforming Data for Tableau (part 2)</title>
		<link>http://www.datamartist.com/transforming-data-for-tableau-part-2</link>
		<comments>http://www.datamartist.com/transforming-data-for-tableau-part-2#comments</comments>
		<pubDate>Tue, 02 Nov 2010 15:48:52 +0000</pubDate>
		<dc:creator>Cam Quinn</dc:creator>
				<category><![CDATA[Datamartist Tool]]></category>
		<category><![CDATA[Public Data]]></category>
		<category><![CDATA[Tableau]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=5750</guid>
		<description><![CDATA[In this second part of the blog post, I am going to discuss how I added data about electricity generation in the U.S.A. to the output data file discussed in part one of this blog post. As before, I will transform the data using the Datamartist software so that Tableau's powerful visualization software can be [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.datamartist.com/wp-content/uploads/2010/10/tableau-electricity-supply-top-corner-screen-shot.jpeg"><img src="http://www.datamartist.com/wp-content/uploads/2010/10/tableau-electricity-supply-top-corner-screen-shot-300x171.jpg" alt="" width="300" height="171" class="alignright size-medium wp-image-5755" /></a>In this second part of the blog post, I am going to discuss how I added data about electricity generation in the U.S.A. to the output data file discussed in part one of this blog post. As before, I will transform the data using the Datamartist software so that Tableau's powerful visualization software can be used to share information about U.S. electricity generation.</p>
<p>Once again, the U.S. Energy Information Administration was the source of the data (<a href="http://www.eia.doe.gov/cneaf/electricity/esr/esr_sum.html" target="_blank">http://www.eia.doe.gov/cneaf/electricity/esr/esr_sum.html</a>, Table 10). This time, the data was provided as one Microsoft Excel spreadsheet giving detailed information about the thousands of electric power generation companies in the U.S.A. The raw Excel file from the website had some merged cells, and had column names in multiple rows, so we had to clean it up a bit before importing.  This is often the case, when using files that are more reports than raw data.</p>
<p>Even after these formating fixes, as is often the case, the data in its original form is not what we need- due to the large size and slightly different formatting of this spreadsheet, it was necessary to transform the data before combining it with the electricity consumer data. A visual map of these data transformations is provided in the screen shot of the Datamartist canvas below.<br />
<a href="/resources/images/electricity-supply-canvas-shot.jpg" target="_blank"><img src="http://www.datamartist.com/wp-content/uploads/2010/11/electricity-supply-600w.jpg" alt="" title="electricity-supply-600w" width="600" height="208" class="aligncenter size-full wp-image-5856" /></a><br />
As seen in the screen shot above, the first data transformation carried out was a data summarizing operation. In this operation, the input data was summarized by state and class of ownership, converting the 3200 row input data file into a data file containing only 170 rows of data. </p>
<p>The next data transformation step required the use of the "Join" function. <a href="http://www.datamartist.com/wp-content/uploads/2010/10/datamartist-join-window-screen-shot-left-side.jpeg" target="_blank"><img src="http://www.datamartist.com/wp-content/uploads/2010/10/datamartist-join-window-screen-shot-left-side-300x163.jpg" alt="" width="300" height="163" class="alignleft size-medium wp-image-5791" /></a> This is a very useful function in the Datamartist software because it allows two data files with different formatting to be joined together into one data file. In this instance, the summarized electricity generation data file was joined with a data file containing U.S.A. state abbreviations. This was done because the U.S.A. electricity generation input data file only contained state abbreviations. The result of this joining operation inserted a column containing the states full name based on the state abbreviation used in input data file. </p>
<p>With the join operation complete, the data was further transformed using the "Calculate" and "Sort" functions. <a href="http://www.datamartist.com/wp-content/uploads/2010/11/datamartist-calculate-window-screen-shot-2.jpeg" target="_blank"><img src="http://www.datamartist.com/wp-content/uploads/2010/11/datamartist-calculate-window-screen-shot-2-300x150.jpg" alt="" width="300" height="150" class="alignright size-medium wp-image-5821" /></a>The "Calculate" function was used to remove the data column containing the state abbreviations, as the previous "Join" function had added a column with the states full name. The "Sort" function was then used to sort the electricity data by state and class of ownership. </p>
<p>The data was then put through one last calculation function that renamed the column names, before being exported as a text file.<br />
<a href="http://www.datamartist.com/wp-content/uploads/2010/11/datamartist-export-window-screen-shot-2.jpeg" target="_blank"><img src="http://www.datamartist.com/wp-content/uploads/2010/11/datamartist-export-window-screen-shot-2-300x185.jpg" alt="" width="300" height="185" class="aligncenter size-medium wp-image-5829" /></a>Now that the electricity generation data has been transformed, the Tableau visualizations can be used to present the data. It is worthwhile noting that since I joined the electricity consumption data with the electricity generation data before exporting it from Datamartist, I only needed to import one data file into Tableau Public 5.2. This single data file contained all of the information required to create both the electricity consumption and generation visualizations. As with part one of this blog post, I have included a Tableau dashboard summarizing state level electricity generation statistics below. This visualization is very similar to the visualization in the first part of this blog post. You can try it for yourself by clicking a state on the map and watching the table below the map update to present data about that particular state. </p>
<p><iframe src="/Energy_supply_tableau.html" width="600" height="650" frameborder="#" style="border:0; padding-bottom: 35px;"></a><br />
</iframe> </p>
<p><strong>See the Data Transform Yourself!!!</strong><br />
All of the data files used in the data transformations in part one and part two of this blog post, as well as the Datamartist .dmc file, can be <a href="http://www.nmodal.com/downloads/DatamartistTableauElectricityExample.zip" target="_blank">downloaded in a ZIP file here</a>. You can see the data transformations discussed in these blog posts for yourself by <a href="/downloads">downloading the free trial</a>, install Datamartist, and then put all the files in the above zip file to the "My Datamartist" folder that datamartist creates in your "Documents" folder. Then just open the "Electricity Example.dmc" file in Datamartist and check it out.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/transforming-data-for-tableau-part-2/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

