<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Datamartist.com</title>
	<atom:link href="http://www.datamartist.com/feed" rel="self" type="application/rss+xml" />
	<link>http://www.datamartist.com</link>
	<description>Reduce cost with self serve data transformation</description>
	<lastBuildDate>Sat, 28 Aug 2010 03:14:05 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Cloudy thinking in the cloud</title>
		<link>http://www.datamartist.com/cloudy-thinking-in-the-cloud</link>
		<comments>http://www.datamartist.com/cloudy-thinking-in-the-cloud#comments</comments>
		<pubDate>Mon, 23 Aug 2010 15:42:57 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Humour]]></category>
		<category><![CDATA[Just for fun]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=4809</guid>
		<description><![CDATA[Maybe its just me, but the hype about "the cloud" seems to just keep growing. I think that not since the concept of vaporware was created has the moisture content been so high in information technology circles, the relative humdity is making my brain all foggy. It seems like everything just somehow gets fixed by [...]]]></description>
			<content:encoded><![CDATA[<p>Maybe its just me, but the hype about "the cloud" seems to just keep growing.   I think that not since the concept of vaporware was created has the moisture content been so high in information technology circles, the relative humdity is making my brain all foggy.</p>
<p><a href="http://www.datamartist.com/wp-content/uploads/2010/07/cloud-computing-need-bigger-drains.jpg"><img src="http://www.datamartist.com/wp-content/uploads/2010/07/cloud-computing-need-bigger-drains.jpg" alt="" title="cloud-computing-need-bigger-drains" width="378" height="251" class="alignright size-full wp-image-4811" /></a>It seems like everything just somehow gets fixed by putting it in the cloud.  Security?  No problem, our cloud does that.  Data volume- The cloud is just HUGE man, HUGE.  Dirty Data-  Hey, clouds are made of WATER right? Trust us.</p>
<p>I goggled the word "Cloud" and discovered that Google places cloud computing over the boring water vapour hanging in the air type clouds.  Although if you google "Types of Clouds" it does actually come up with a Wikipedia article talking about the different types of real clouds. </p>
<p>Then it struck me that there are probably different kinds of virtual clouds too, and we need to start naming those- so here goes a first run at it.</p>
<p><strong>cirrus duplicatus</strong></p>
<p>This cloud structure is particularly well suited to holding duplicate records.  It consists of billions and billions of special data stores that implicitly avoid any sort of primary keys.</p>
<p><strong>cirrocumulus delayious unconnectous</strong></p>
<p>This cloud structure has a highly looped internal structure- data goes in, but will not attempt to come back out until it detects that your internet connection has gone down.</p>
<p><strong>cirrus noworkus socialus</strong></p>
<p>This cloud is used by all the big social networks-  it enables lots of photo sharing, exchanging of comments, instant messaging, video viewing and has powerful anti-productivity filters in place at all times.</p>
<p><strong>cirrus backupus disappontious</strong></p>
<p>The backupus cloud promises complete data security-  "In the cloud, everything is backed up ALL the time" - yet through a series of human errors (the cloud makes no mistakes- only its keepers) all data is uniquely stored on hard drives with failure modes that ensure no data will be recovered.</p>
<p><strong>stratus brodcast allus privateous </strong></p>
<p>This cloud has extremely powerful semantic filters that are able to detect the most embarrassing and sensitive data and then utilizes  special high page rank domains to ensure your secrets are indexed immediately by all major search engines.</p>
<p><strong>cirrius tellus momis</strong></p>
<p>Just like "stratus brodcast allus privateous" only instead of using search engines it just emails the most compromising information directly to your mother.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/cloudy-thinking-in-the-cloud/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Datamartist V1.3.0 Value Distribution data profiling</title>
		<link>http://www.datamartist.com/datamartist-v1-3-0-value-distribution-data-profiling</link>
		<comments>http://www.datamartist.com/datamartist-v1-3-0-value-distribution-data-profiling#comments</comments>
		<pubDate>Mon, 26 Jul 2010 18:33:50 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Data profiling]]></category>
		<category><![CDATA[Datamartist Tool]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=4855</guid>
		<description><![CDATA[This video gives a quick (under two minute) look at the Datamartist data profiler's ability to explore the distribution of numeric values in a data set by counting the number of values that fall into a series of equal size buckets. It highlights the datamartists calculation, visualization, selection and drill down features using a simple [...]]]></description>
			<content:encoded><![CDATA[<p>This video gives a quick (under two minute) look at the Datamartist data profiler's ability to explore the distribution of numeric values in a data set by counting the number of values that fall into a series of equal size buckets.  It highlights the datamartists calculation, visualization, selection and drill down features using a simple example.</p>
<p><center>
<div id="media">
            <object id="csSWF" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="640" height="498" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=9,0,115,0"><param name="src" value="/resources/video/V1_3_0/Value-Dist-Quick-Look-1/Value-Distribution-Quick-Look_controller.swf"/><param name="bgcolor" value="#1a1a1a"/><param name="quality" value="best"/><param name="allowScriptAccess" value="always"/><param name="allowFullScreen" value="false"/><param name="scale" value="showall"/><param name="flashVars" value="autostart=false&#038;thumb=/resources/video/V1_3_0/Value-Dist-Quick-Look-1/FirstFrame.png&#038;thumbscale=65"/><embed name="csSWF" src="/resources/video/V1_3_0/Value-Dist-Quick-Look-1/Value-Distribution-Quick-Look_controller.swf" width="640" height="498" bgcolor="#1a1a1a" quality="best" allowScriptAccess="always" allowFullScreen="false" scale="showall" flashVars="autostart=false&#038;thumb=/resources/video/V1_3_0/Value-Dist-Quick-Look-1/FirstFrame.png&#038;thumbscale=65" pluginspage="http://www.macromedia.com/shockwave/download/index.cgi?P1_Prod_Version=ShockwaveFlash"></embed></object>
        </div>
<p></center></p>
<p>This value profiling tool is just one of many of the  Datamartist data profiling tools capabilities, <a href="/download/beta-download">download the free trial of the BETA</a> to try all the functionality with your own data.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/datamartist-v1-3-0-value-distribution-data-profiling/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How the general ledger can become a data warehouse</title>
		<link>http://www.datamartist.com/how-the-general-ledger-can-become-a-data-warehouse</link>
		<comments>http://www.datamartist.com/how-the-general-ledger-can-become-a-data-warehouse#comments</comments>
		<pubDate>Tue, 20 Jul 2010 14:54:15 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Management reporting]]></category>
		<category><![CDATA[data culture]]></category>
		<category><![CDATA[General Ledger]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=4787</guid>
		<description><![CDATA[Many companies today rely on the general ledger as key part of their management reporting, well beyond the obvious financial information. This has often been shaped by how companies first adopted information technology. In some firms, their management reporting systems reflect the fact that as information technology began to be used extensively by business, often [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.datamartist.com/wp-content/uploads/2010/07/general-ledger-is-a-data-warehouse.jpg"><img src="http://www.datamartist.com/wp-content/uploads/2010/07/general-ledger-is-a-data-warehouse.jpg" alt="" title="general-ledger-is-a-data-warehouse" width="356" height="275" class="alignright size-full wp-image-4794" /></a>Many companies today rely on the general ledger as key part of their management reporting, well beyond the obvious financial information.  This has often been shaped by how companies first adopted information technology.</p>
<p>In some firms, their management reporting systems reflect the fact that as information technology began to be used extensively by business, often the very first functional area to be automated was accounting, and the first database within an enterprise was often the general ledger.</p>
<p>In many companies, the general ledger became the clearing house for all information- not just financial, and in effect became a data warehouse before the concept of data warehousing had even evolved. </p>
<p>The problem is, in some organisations, the data warehouse didn't come. The general ledger kept its place as the central repository for not just financial, but also management reporting.  Finance argued successfully that the cost of all the business intelligence architecture was unnecessary- adding accounts and bolt on tables was cheaper.  ERP vendors supported this by creating ever more flexible ledger structures, allowing additional ledgers for parallel accounting and management reporting.</p>
<p>Huge amounts of non-financial information is still stored in many general ledgers. There are so many reasons this is a bad idea.  Here are just three:</p>
<p><strong>1) It forces you to compromise on level of detail and drill down, and history</strong></p>
<p>No general ledger can hold the level of detail available in many source systems. As a result, any interface from the sales system, manufacturing system etc. feeding into the GL will have to create journal entries that summarize a great deal of information. </p>
<p>While the detail of course will still exist in the source system, if your management reporting is all from a general ledger based system, upper management will tend to use this single source- and as a result important granularity may be lost to the decision making process.</p>
<p>This summarization also makes it more difficult to have drill down into the details, giving up some of the greatest benefits of modern business intelligence systems.</p>
<p>Finally, general ledger based data storage does not usually allow for the tracking of reference data changes over time. As sales regions are modified, and territories shift, comparing one period to another becomes increasingly difficult. Data warehouses, designed from the beginning to store this type of slowly changing reference information, can provide a much more insight and historical analysis.</p>
<p>The bottom line is, the data model of the general ledger module is just not designed for analysis.</p>
<p><strong>2) It results in an overly complex chart of accounts and may even affect month end close</strong></p>
<p>As the source systems become more and more capable of collecting data, the tendency is to want to increase the amount of management reporting. If this is being done in the general ledger, it means that further charts of account must be added, and an increased number of journal entries need to be done. Depending how the overall process is setup, its even possible that the increased complexity might affect the speed at which month end closing can be completed, if for no other reason that the same finance resources must both tend to the financial and the management reporting needs.</p>
<p><strong>3) It discourages cross functional definitions and collaboration on analysis</strong></p>
<p>By making one of the functional areas (finance) the center and owner of management reporting, a general ledger based reporting architecture can actually increase the severity of the information silos it is most likely trying to eliminate.<br />
<a href="http://www.datamartist.com/wp-content/uploads/2010/07/dont-use-these-numbers-ourselves-for-the-finance-reports-only.jpg"><img src="http://www.datamartist.com/wp-content/uploads/2010/07/dont-use-these-numbers-ourselves-for-the-finance-reports-only.jpg" alt="" title="dont-use-these-numbers-ourselves-for-the-finance-reports-only" width="364" height="208" class="alignright size-full wp-image-4799" /></a><br />
Because the general ledger reporting does not require all the detail available, each department only needs to provide the summarized information required by finance. While every department has to coordinate with finance, there is no requirement for departments to work with each other to coordinate data and definitions.  While at a high level data is integrated, any benefit from more tightly integrating information across silos that a data warehouse can bring is lost.</p>
<p>In a very real way, a successful general ledger based management reporting system is in fact an impediment to progress for an enterprises business intelligence and data analysis evolution.</p>
<p>Because management reporting is available, the justification or need for a data warehouse is not felt as strongly. However, as needs continue to evolve, the effort expended in the constantly growing general ledger, and its impact on the financial processes, and the companies overall information management culture will become increasingly damaging.</p>
<p>Ironically, companies who failed to ever establish a general ledger based management reporting system could leapfrog their more financially focused competitors, as they embrace the modern data warehouse and the the tools available for data analysis. </p>
<p>A true data warehouse is not an easy road, and is only one component of a broader data analysis strategy. </p>
<p>Readers of this blog know that we advocate an approach that balances "Big Business Intelligence" with nimble, user focused data exploration and transformation.</p>
<p>In the short term, using the general ledger for management reporting can seem easier, but in the long term, it probably makes the task of creating an enterprise wide architecture harder- while your general ledger has been growing in the center, individual departments have probably been pursuing uncoordinated, fragmented business intelligence architectures of their own. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/how-the-general-ledger-can-become-a-data-warehouse/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>V1.3.0 Public beta released</title>
		<link>http://www.datamartist.com/v1-3-0-public-beta-data-profiling-tools-enhance</link>
		<comments>http://www.datamartist.com/v1-3-0-public-beta-data-profiling-tools-enhance#comments</comments>
		<pubDate>Wed, 14 Jul 2010 02:45:17 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Datamartist Tool]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=4676</guid>
		<description><![CDATA[Come and get it while its still warm! The next release of Datamartist, a data profiling and data transformation tool (think ETL and data profiler rolled into one) is now available in BETA as a public trial download. The currently released version 1.2.6 is of course also still available, but for those who don't mind [...]]]></description>
			<content:encoded><![CDATA[<p>Come and get it while its still warm! The next release of Datamartist, a data profiling and data transformation tool (think ETL and data profiler rolled into one) is now available in BETA as a public trial download.</p>
<p>The currently released version 1.2.6 is of course also still available, but for those who don't mind risking a bug or two (we need your help in killing the last few) the Beta gives you a sneak peak at the new version and a bunch of new features that have been introduced to the Professional edition.</p>
<p>You can get it <a href="/datamartist-v1-3-0-beta-trial">here</a>.  Whats in it? Lots of data profiling goodies;</p>
<h2>Value distribution profiling</h2>
<p>One of the important additions to the data profiling capabilities in Datamartist Pro is the value distribution explorer.  This powerful functionality analyzes numeric fields and provides a value distribution graph (equal size buckets for row counts based on value) that lets you zoom in and out, and drill down into the rows to understand your data and spot any suspicious values at a glance.</p>
<p><a href="http://www.datamartist.com/wp-content/uploads/2010/07/datamartist-value-distribution-graph.jpg"><img src="http://www.datamartist.com/wp-content/uploads/2010/07/datamartist-value-distribution-graph.jpg" alt="" title="datamartist-value-distribution-graph" width="636" height="330" class="aligncenter size-full wp-image-4678" /></a></p>
<h2>Support for Regular expressions for pattern matching</h2>
<p>The new version has added a very powerful function to the function library in the professional edition;</p>
<blockquote><p>REGEX(text,regex expression)</p></blockquote>
<p>With this function, it is now possible to use regular expressions (regex expressions) to evaluate if string values conform to desired data formats.  Regular expressions are widely used for data quality testing, and there are lots of them available out there.</p>
<h2>Custom data format rules</h2>
<p><a href="http://www.datamartist.com/wp-content/uploads/2008/11/Data-format-rule-entry1.jpg"><img src="http://www.datamartist.com/wp-content/uploads/2008/11/Data-format-rule-entry1.jpg" alt="" title="Data-format-rule-entry" width="631" height="271" class="alignright size-full wp-image-4660" /></a>In V1.3.0 professional edition, it is now possible to add a series of custom rules for character mapping, creating much more flexibility in the data profiling tool.  For example, its possible to map numbers, letters, punctuation characters, or any combination of them.</p>
<h2>Data profiling block </h2>
<p>In the professional edition, the category of data quality blocks has been added, and the first block released is the data profiler block.</p>
<p>This lets data profiling results themselves be used within your data canvas, as well as written out to files or database tables, with time stamps- enabling data profiling automation and tracking.  It is now possible to define, calculate and track data profiling metrics automatically, sampling data at a sample frequency that lets you understand how your data quality is evolving.</p>
<p>We're excited about what can be done with this first data quality block- and we are sure our professional edition users will be tracking their data quality like never before.</p>
<p><a href="http://www.datamartist.com/wp-content/uploads/2008/11/datamartist-data-profiler-block.jpg"><img src="http://www.datamartist.com/wp-content/uploads/2008/11/datamartist-data-profiler-block.jpg" alt="" title="datamartist-data-profiler-block" width="610" height="300" class="aligncenter size-full wp-image-4662" /></a></p>
<p>I'll be blogging about all the new features in the coming days and weeks- I'm certain you'll find lots of things to like in the new version- give it a go.</p>
<p>As always with our betas, we love feedback, and we'll be giving some free PRO licenses away to our most active Beta testers.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/v1-3-0-public-beta-data-profiling-tools-enhance/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>When the right tool is not a standard tool.</title>
		<link>http://www.datamartist.com/when-the-right-tool-is-not-the-standard-tool</link>
		<comments>http://www.datamartist.com/when-the-right-tool-is-not-the-standard-tool#comments</comments>
		<pubDate>Mon, 12 Jul 2010 21:01:55 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Reality Check]]></category>
		<category><![CDATA[Software in General]]></category>
		<category><![CDATA[Analyst tools]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=4645</guid>
		<description><![CDATA[Phil Simon (@philsimon) tweeted a link to an article in the Harvard business review that talks about the dangers of being "overly tool standardized" within an organisation that I thought was very interesting. Now, of course, standards are needed, and for a broad range of tools its counter productive (and horrifically expensive) to let everyone [...]]]></description>
			<content:encoded><![CDATA[<p>Phil Simon (@philsimon) tweeted a link <a href="http://blogs.hbr.org/sviokla/2010/04/do_your_knowledge_workers.html" target="_blank">to an article in the Harvard business review</a> that talks about the dangers of being "overly tool standardized" within an organisation that I thought was very interesting.<a href="http://www.datamartist.com/wp-content/uploads/2010/07/taking-software-standards-very-seriously.jpg"><img src="http://www.datamartist.com/wp-content/uploads/2010/07/taking-software-standards-very-seriously.jpg" alt="" title="taking-software-standards-very-seriously" width="344" height="230" class="alignright size-full wp-image-4651" /></a></p>
<p>Now, of course, standards are needed, and for a broad range of tools its counter productive (and horrifically expensive) to let everyone just use what they want. The cost of data centers, integration, etc. will be radically more if a company does not bring order to the chaos and the marginal advantages that a very specific or niche technology might have in one department is often obliterated by the increased support costs and integration issues globally.</p>
<p>But if a company looks at these savings that come from standardization, and extrapolates too far, they can fall off the other side of the benefits curve and find that they're hurting, not helping.</p>
<p>In the article in Harvard business review, the researchers also throw around the term "bitsmith" to describe someone who has both subject knowledge, and the ability to wrangle software, and to even create GASP! custom software that does what the team needs to get done.  In many companies, current information technology dogma does not leave much room for people that have the time to be a "bitsmith".</p>
<p>In many companies "Custom software" is a four letter word.  Well, I personally have used it, commissioned it and written it and know that often it can provide fantastic value- I've also seen people spend thousands of hours building something that never quite worked when an off the shelf tool a hundred times as good could be bought for a few thousand dollars.  It's a matter of being realistic and learning how to see the difference between a problem that is specific to your industry/situation that could really benefit from some custom code, compared to a problem that is huge, main stream, and solved hundreds of times over by existing software vendors.</p>
<p>What I think can happen is a pendulum swing, where a company goes from "no standards, a jungle of wasteful custom software" to "Thou shalt use/buy only software on the following list."</p>
<p>The problem is, making a list that contains everything is just not possible.  Things change.  Stuff happens.  It is possible that a single person might need a single piece of software that allows them to understand something, design something, communicate something that will make that software have a truly massive payback, justifying all sorts of pain and config on the part of technical resources and infrastructure.</p>
<p>The challenge, as always, is to have an open, working relationship between those entrusted with establishing and enforcing standards in terms of tools and those who are expected to use those tools to do business.  As with so many things in governance, it's about balance, clear goals, and processes that allow for brilliance, change and creativity while not letting that process become the loop-hole that undermines all those savings standardization brings.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/when-the-right-tool-is-not-the-standard-tool/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>A simple ETL tool with data profiling tools built in</title>
		<link>http://www.datamartist.com/a-simple-etl-tool-with-data-profiling-tools-built-in</link>
		<comments>http://www.datamartist.com/a-simple-etl-tool-with-data-profiling-tools-built-in#comments</comments>
		<pubDate>Thu, 08 Jul 2010 04:36:36 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[ETL]]></category>
		<category><![CDATA[Datamartist Tool]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=4554</guid>
		<description><![CDATA[Datamartist is a new idea in ETL and data profiling tools. It gives people who are serious about getting at their data a powerful, simple to use, right sized tool. Easy to install Easy to use ETL features and data profiling capability Avoid using the wrong tool for the job Enterprise ETL tools (Extract Transform [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.datamartist.com/wp-content/uploads/2010/06/Sales-example-full-screen-shot-profiler-perspective-300w.jpg"><img src="http://www.datamartist.com/wp-content/uploads/2010/06/Sales-example-full-screen-shot-profiler-perspective-300w.jpg" alt="" title="Sales-example-full-screen-shot-profiler-perspective-300w" width="300" height="228" class="alignright size-full wp-image-4557" /></a>Datamartist is a new idea in ETL and data profiling tools.  It gives people who are serious about getting at their data a powerful, simple to use, right sized tool.</p>
<ul>
<li>Easy to install</li>
<li>Easy to use</li>
<li>ETL features and data profiling capability</li>
</ul>
<h2>Avoid using the wrong tool for the job</h2>
<p>Enterprise ETL tools (Extract Transform and Load) are very powerful but often extremely difficult to use.  </p>
<ul>
<li>expensive, particularly if multiple environments are needed</li>
<li>require server infrastructure, configuration and setup.</li>
<li>require expensive developers who have been trained in the specific programming language of each particular vendors tool.</li>
<li>designed for performance and data volume, not ease of use.</li>
</ul>
<p>Obviously they have their time and place, but when you want fast, visual access to your data, you end up getting slowed down by expensive ETL server overkill.</p>
<h2>A better choice- the visual, clean ETL tool</h2>
<p><a href="http://www.datamartist.com/wp-content/uploads/2010/07/Join-Block-Edit.jpg"><img src="http://www.datamartist.com/wp-content/uploads/2010/07/Join-Block-Edit.jpg" alt="" title="Join-Block-Edit" width="300" height="232" class="alignright size-full wp-image-4594" /></a>Datamartist is designed to let you extract data from multiple sources, and then mix it, match it, transform it, and understand it.</p>
<p>It uses a visual block and connector model, with the concept of "Data canvases" that let you easily manage and simplify complex data transformations.  But unlike many overly complex ETL tools, Datamartist provides visual, configurable blocks, rather than requiring code.</p>
<h2>Easy to install</h2>
<p>Datamartist installs in minutes, and runs on your desktop, giving you control of your data, and what you need to do.  Don't configure servers, don't worry about installing the right version of Java, don't spend hours searching wikis and forums and tweaking config files.  Just <a href="/downloads">download it</a>, single step install it, and use it.</p>
<p>It makes it easy for you to take a snapshot of the data you need- locally with cut and paste or drag and drop from files, and locally or remotely with native connections to SQL Server, Oracle, MySql and MS Access, and pretty much anything else via ODBC.</p>
<p>And since the Datamartist data transformation engine can be run from the command line or scripted, it can also be automated to implement ETL tasks running on a windows server.</p>
<h2>Speed up data delivery, reduce cost.</h2>
<p><a href="http://www.datamartist.com/wp-content/uploads/2010/07/Tree-Structure-Management.jpg"><img src="http://www.datamartist.com/wp-content/uploads/2010/07/Tree-Structure-Management.jpg" alt="" title="Tree-Structure-Management" width="400" height="300" class="alignleft size-full wp-image-4599" /></a>Datamartist provides a flexible, simple to use ETL environment that will let you shorten your time to delivery significantly for a wide range of data transformation tasks.</p>
<ul>
<li>Deliver small and medium sized data transformation tasks more quickly</li>
<li>Build rapid prototypes and proofs of concepts</li>
<li>Automate data profiling and data quality monitoring</li>
</ul>
<h2>Give the Datamartist ETL Tool a try</h2>
<p>You can <a href="/downloads">download the Datamartist trial</a> and be up and running in minutes.  You don't even have to register- and you will have full access to a fully functioning version of Datamartist to try out this simple, visual ETL tool on your own data.</p>
<p>We're also very excited about V1.3.0, currently in private beta.  If you'd like to participate in the public beta, drop me a line at "beta at datamartist.com", and we'll send you a link when that download is available.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/a-simple-etl-tool-with-data-profiling-tools-built-in/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Data quality from a four year old</title>
		<link>http://www.datamartist.com/data-quality-templates-from-a-four-year-old</link>
		<comments>http://www.datamartist.com/data-quality-templates-from-a-four-year-old#comments</comments>
		<pubDate>Tue, 08 Jun 2010 14:10:22 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Data Quality]]></category>
		<category><![CDATA[Duplicate Data]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=4473</guid>
		<description><![CDATA[I think my four year old would make a good data quality dude. He explained to me recently, why its better to use stickers than crayons, "for the things people use a lot". "Dad, if you use crayons, you might draw it different, but stickers- they are all the same." he then pointed to the [...]]]></description>
			<content:encoded><![CDATA[<p>I think my four year old would make a good data quality dude. He explained to me recently, why its better to use stickers than crayons, "for the things people use a lot".<br />
<a href="http://www.datamartist.com/wp-content/uploads/2010/06/data-entry-problems-just-enter-anything.jpg"><img src="http://www.datamartist.com/wp-content/uploads/2010/06/data-entry-problems-just-enter-anything-300x186.jpg" alt="" title="data-entry-problems-just-enter-anything" width="300" height="186" class="alignright size-medium wp-image-4539" /></a><br />
"Dad, if you use crayons, you might draw it different, but stickers- they are all the same."  he then pointed to the sheet of identical, machine generated stickers-  "All the same- so everyone who gets one of these, knows what it is."</p>
<p>"Using the crayon takes too long and sometimes I make mistakes."  Then he paused for a second. "But if it's something different- then I have to draw it. No stickers for that."</p>
<p>And off he went, blending hand drawn custom crayon work with high speed sticker application.</p>
<p>It strikes me that what my son has figured out as a basic rule of thumb in arts and crafts for the use of stickers, is a pretty good analogy for design of data entry systems.</p>
<p>Whenever you can, use something that restricts the users choices to a fixed, understood set of responses.  Use pre-made data stickers.</p>
<p>The enemy of data quality everywhere is the gaping, un-validated free form text entry field.  Only linguists and unstructured text analysts can get excited about the "endless possibilities" of what your users and customers can enter into those fields.</p>
<p>We've all seen the horrors of names and addresses run amok-  "John A Smith", "Jon A. Smith", "John Smith Jr.", "Smith, John A" or the even more amazing "John Smith (new customer)".</p>
<p>If you're in data, you don't want endless possibilities.  You want ordered sets of data that conform strictly to well defined rules.  Eliminating duplicates is a complex and time consuming effort.  Stopping as many of them before they are created is the first, best thing you can do to get a handle on the problem.</p>
<p>So think stickers.   For every field ask yourself- can I make this a combo box? radio buttons?  Can I do auto search in the existing records to suggest close matches?  Anything to stop users or customers from making things up- and to have the data points they enter conform to a defined domain.</p>
<p>The more constrained a field is, the better the chances are that the data stored in it will be useful... unless of course you make it so constrained that you force data quality to suffer.</p>
<h2>There is such a thing as too much...</h2>
<p>Every good rule has its exceptions, and the evil side of overly constraining your data entry folks is that because they are smarter than computers, they'll find ways to invent entirely new encoding methods.</p>
<p>If you tighten the entry on the postal code too much, so that international postal codes won't fit, you can be sure that data entry clerks will discover that by entering their own postal code, and putting the customers postal code in the comment field, they can get the system to accept the record (and at least feel as if they had tried their best to get the data needed in there).</p>
<p>This is where which stickers you have in your collection starts to matter.  </p>
<p>Have you ever noticed that at well run events, they always have some blank name tags, as well of the pre-printed ones?  That and a magic marker makes sure the process can go on.</p>
<p>In the end, you'll need to balance between the two extremes.  Tighten up your data entry and interfaces as much as you can, but realize that there is a point of diminishing returns, and in fact probably even a point where your data totalitarianism will be hurting your data quality, not helping it.</p>
<p>Now of course, there are some pretty high end tools that let you create all sorts of rules, and others that let you comb through the data and cleanse it, checking those postal codes to states and cities, and doing all sorts of fancy matching and analysis.  There is definitely an important role in many organisations and systems for approaches and tools such as these.</p>
<p>Using data profiling tools like <a href="/">Datamartist</a> will help you understand what issues are making it through your defenses.</p>
<p>But if you are not doing it already, focusing on the point of entry with practical, balanced techniques will make a step change improvement to your data quality.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/data-quality-templates-from-a-four-year-old/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Data integration is like a pizza</title>
		<link>http://www.datamartist.com/data-integration-is-like-a-pizza</link>
		<comments>http://www.datamartist.com/data-integration-is-like-a-pizza#comments</comments>
		<pubDate>Tue, 18 May 2010 12:52:12 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Data Integration]]></category>
		<category><![CDATA[Data Quality]]></category>
		<category><![CDATA[Data Transformation]]></category>
		<category><![CDATA[Business Intelligence]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=4520</guid>
		<description><![CDATA[I enjoy a slice of pizza as much as the next person (perhaps a bit more). The key to a good pizza is the raw materials- use the right stuff, and you'll be happy every time. What's great about pizza is that it has all sorts of great stuff on it, and presents them all [...]]]></description>
			<content:encoded><![CDATA[<p>I enjoy a slice of pizza as much as the next person (perhaps a bit more).  The key to a good pizza is the raw materials- use the right stuff, and you'll be happy every time.  What's great about pizza is that it has all sorts of great stuff on it, and presents them all in a single, easy to hold and eat meal. </p>
<p>Data integration can be like a really well put together pizza- lots of good cross-referencing cheese-data to keep everything in its place, great crust that supports it all, and a universal appeal that might even get people to try something they wouldn't normally consume (data wise).</p>
<p><a href="http://www.datamartist.com/wp-content/uploads/2010/05/data-integration-if-the-data-was-any-good.jpg"><img src="http://www.datamartist.com/wp-content/uploads/2010/05/data-integration-if-the-data-was-any-good.jpg" alt="" title="data-integration-if-the-data-was-any-good" width="357" height="297" class="alignleft size-full wp-image-4525" /></a>But without data quality, data integration can make pizza that nobody really wants to eat, and rather than enhancing the value of your data, your data integration efforts can make your bad data even less consumable than it was on its own.</p>
<p>While combining data from multiple systems can generate huge insights, it is important to understand that moving it and combining it with data from other systems will not <em>always</em> increase its value.  </p>
<p>With good quality data you can have fantastic results, but bad quality data requires so much effort and transformation that often your payback on doing the integration will be non-existent.</p>
<h2>Data integration enthusiasm </h2>
<p>So what happens when an enterprise hears its stomach rumble, and starts thinking data pizza?</p>
<p>Enthusiastic analysts spring into action, building various mockups of all the fantastic dashboards that they will be able to produce, once the data integration is done.  Terms like "near-real time, balanced, cross-functional score cards" start to get bounced around, and pretty soon, budget proposals and appropriation requests are flying from color printers everywhere.</p>
<p>Whats unfortunate in many cases is that cooler heads don't stop to ask the question-  "So... all this data we are going to put together, is it any good?"</p>
<p>When you are making your pizza, you have to know if the cheese has been left out a bit too long or the green pepper is soggy.</p>
<p>What can be worse, is that if heroic measures are taken to try to get the data to fit together, the integration jobs themselves might actually degrade the data quality further- or eliminate levels of detail that are not compatible, actually hiding important trends and structures.  A risk of integrated dashboards is that they pander to the lowest common denominator.</p>
<p>So if you are planning to do some data integration, to build a data pizza, think twice about putting that moldy pepperoni from the CRM system on it- sometimes less is more.  </p>
<p>In fact, it might be that data integration is not your first concern- improving the quality of the data in all those data silos will actually improve day to day operations immediately- and make any future data integration project cheaper, and more successful. </p>
<p>Any great chef will tell you- no matter how complex the recipe, and how impressive your kitchen and equipment, the raw ingredients matter.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/data-integration-is-like-a-pizza/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Why you should data profile.</title>
		<link>http://www.datamartist.com/data-profiling-do-it-do-it-now</link>
		<comments>http://www.datamartist.com/data-profiling-do-it-do-it-now#comments</comments>
		<pubDate>Fri, 07 May 2010 02:11:58 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Data migration]]></category>
		<category><![CDATA[Data profiling]]></category>
		<category><![CDATA[Datamartist Tool]]></category>
		<category><![CDATA[Data Quality]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=4496</guid>
		<description><![CDATA[Imagine that you have bought a new home, and you've decided to do some landscaping. So you pick three landscapers, draw a rough sketch of what you want, and ask them to bid on the job. But you don`t allow them to come see your property, and your sketch doesn't specify anything about the existing [...]]]></description>
			<content:encoded><![CDATA[<p>Imagine that you have bought a new home, and you've decided to do some landscaping.  So you pick three landscapers, draw a rough sketch of what you want, and ask them to bid on the job.</p>
<p>But you don`t allow them to come see your property, and your sketch doesn't specify anything about the existing landscaping- just the final configuration.  Do you think the landscapers would be willing to offer a reasonable price ? </p>
<p>Unlikely.   What if there are existing patio stones to remove- or an in-ground swimming pool that`s got to go? </p>
<p><a href="http://www.datamartist.com/wp-content/uploads/2010/05/did-the-consultants-data-profile-first.jpg"><img src="http://www.datamartist.com/wp-content/uploads/2010/05/did-the-consultants-data-profile-first.jpg" alt="" title="did-the-consultants-data-profile-first" width="347" height="235" class="alignright size-full wp-image-4513" /></a>No landscaper would take on a job without understanding the lay of the land, and the existing conditions.  It would be impossible to estimate the job. Anyone who did would give you a huge price to cover themselves, or demand extras upon discovering the extra work.</p>
<p>Yet when companies hire consultants to build them business intelligence solutions, or do data migration,  it often happens with only the roughest outline of the existing data sets.  Certainly, often a data model is included- but knowing what the table SHOULD contain rather than what it does is just not the same thing.  It never ceases to amaze me that the simple, cost effective practice of data profiling is just often not part of the initial phases of so many business intelligence and data migration projects.</p>
<p>With the right data profiling tool, and just a few days work, its possible to gain a huge amount of insight into the data quality in your systems, and as a result, be able to make radically more accurate estimates of the cost to go from the "as is" to the "to be".</p>
<p>Phil Simon talked about this in a great post on the Data flux blog called <a href="http://www.dataflux.com/dfblog/?p=2590" target="_blank">"What Consultants Don't tell you"</a>, and raises an important and somewhat ugly truth- many times, service providers don't WANT to do data profiling because it reveals the true extent of the work to be done, increasing the budget requirement, and makes the project less likely to be approved.</p>
<p>Now certainly, we can't use a broad brush to paint all consultants, but it does lead to a reduction in the number of times valuable tools such as data profiling are recommended even though in my opinion they are a low cost, no-brainer, do it unless you are crazy first step to any major project.  </p>
<p>You are going to spend potentially millions of dollars on a business intelligence or data migration project- spend a few weeks to look at the data with the right tools first for goodness sake!</p>
<p>If you want to get a reasonable cost estimate, and you want to go into your business intelligence or data migration project with open eyes, don't imagine you can know what it will cost to get from here to there if you don't take a good look at where here really is.</p>
<p><a href="/resources/screenshots/Data-Profiler-on-States.jpg"><img src="http://www.datamartist.com/wp-content/uploads/2010/05/Data-Profiler-on-States-Thumb.jpg" alt="" title="Data-Profiler-on-States-Thumb" width="220" height="165" class="alignright size-full wp-image-4506" /></a><strong>Full disclosure</strong>-  of course, you are reading the <a href="/">Datamartist</a> blog, and Datamartist has lots of data profiling functionality- so you have to understand that we are incredibly biased on this topic.  If you are able to overlook our inherent bias, <a href="/downloads">give the tool a try</a>- you`ll discover things about your data you might not have wanted to know, but its better to face the truth prepared, than to rely on wishful thinking, and then discover the bad news when you're well into the project, and your budget is almost gone.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/data-profiling-do-it-do-it-now/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Automated data profiling and reporting- Data quality behavioral modification?</title>
		<link>http://www.datamartist.com/automated-data-profiling-and-reporting-data-quality-behavioral-modification</link>
		<comments>http://www.datamartist.com/automated-data-profiling-and-reporting-data-quality-behavioral-modification#comments</comments>
		<pubDate>Wed, 14 Apr 2010 15:03:35 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Data profiling]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=4478</guid>
		<description><![CDATA[Recently, Jim Harris of Obsessive compulsive data quality speculated as to if the concept of the "Swear jar" could be used to improve data quality. It was an interesting post, and the discussion in the comments underlined the reality of data quality- much of the time, the problem is not about changing bits in a [...]]]></description>
			<content:encoded><![CDATA[<p>Recently, Jim Harris of Obsessive compulsive data quality <a href="http://www.ocdqblog.com/home/the-poor-data-quality-jar.html" target="_blank">speculated as to if the concept of the "Swear jar"</a> could be used to improve data quality.  It was an interesting post, and the discussion in the comments underlined the reality of data quality-  much of the time, the problem is not about changing bits in a database,  but of flipping neurons in the brains of the people putting the bad data in there.  And that's hard.</p>
<p>It's hard because although methods like data profiling can identify data quality problems, exactly who is to "blame" and how to manage it is difficult.</p>
<p>In thinking about this a bit more, I realised that the discussion was all about sticks- and not much about carrots.  We discussed different ideas about how to proportion cost out, (which makes sense as the swear jar is about putting money IN, or punishing the offender).<br />
<a href="http://www.datamartist.com/wp-content/uploads/2010/04/ceo-data-quality-incentive-plan-data-profiling.jpg"><img src="http://www.datamartist.com/wp-content/uploads/2010/04/ceo-data-quality-incentive-plan-data-profiling.jpg" alt="" title="ceo-data-quality-incentive-plan-data-profiling" width="437" height="311" class="alignright size-full wp-image-4480" /></a><br />
What about working it the other way around?  By using automated data profiling, and making the metrics now time based and available we could track data quality in key data sets, both in absolute terms (numbers of rows with issues), and relative ones (percentage of customer records with problems, etc.).  This would allow you to have data quality dashboards.</p>
<p>It would then be possible to establish a bonus plan based on data quality.  It could pay out for improvements (as a percentage), or have certain reoccurring payments that would decrease and then stop if data quality fell below a target level.  </p>
<p>While it is still necessary to identify who is responsible for which data set's data quality, and as with all reward schemes the people responsible must also have the means to improve data quality and therefore reap the reward, I think in a number of situations this would be possible- for example, data entry clerks would undoubtedly double check each address more carefully if they knew there was a tangible reward to do so.</p>
<p>It seems that two key things are necessary to make this kind of bonus plan work- first, the ability to automate data profiling, and have meaningful metrics that can't be "gamed" (because if there is a way, people will find it), and secondly to be able to identify the savings due to improvements in data quality- because that's whats funding the bonus pool, after all.</p>
<p>Have any readers implemented or heard of such a plan?  Do you track data quality using an automated data profiling tool and data quality dashboards?  </p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/automated-data-profiling-and-reporting-data-quality-behavioral-modification/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>
