<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Datamartist.com &#187; Data Standards</title>
	<atom:link href="http://www.datamartist.com/category/data-standards/feed" rel="self" type="application/rss+xml" />
	<link>http://www.datamartist.com</link>
	<description>Reduce cost with self serve data transformation</description>
	<lastBuildDate>Thu, 09 Feb 2012 20:00:31 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.3</generator>
		<item>
		<title>Data migration Part 5- Breaking down the information silos</title>
		<link>http://www.datamartist.com/data-migration-part-5-breaking-down-the-information-silos</link>
		<comments>http://www.datamartist.com/data-migration-part-5-breaking-down-the-information-silos#comments</comments>
		<pubDate>Tue, 22 Dec 2009 15:45:22 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Data migration]]></category>
		<category><![CDATA[Data Standards]]></category>
		<category><![CDATA[Project Management]]></category>
		<category><![CDATA[ERP Projects]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=3732</guid>
		<description><![CDATA[During a data migration project, the information technology department is in a unique position to either help or hinder how well all the different parts of a company work together. The transactional systems that a company uses can be the glue that binds, or can be a key part of the walls that block inter-departmental [...]]]></description>
			<content:encoded><![CDATA[<p>During a data migration project, the information technology department is in a unique position to either help or hinder how well all the different parts of a company work together.</p>
<p>The transactional systems that a company uses can be the glue that binds, or can be a key part of the walls that block inter-departmental collaboration and information sharing.</p>
<h2>The data migration project is your best chance to break down the data silos- but it won't be the easy path.</h2>
<p><img src="http://www.datamartist.com/wp-content/uploads/2009/12/data-silos-leave-your-silos-and-follow-me.jpg" alt="data-silos-leave-your-silos-and-follow-me" title="data-silos-leave-your-silos-and-follow-me" width="286" height="334" class="alignleft size-full wp-image-3748" />Not only can the ERP project create new opportunities for efficiency and process improvement, but it can also be the process through which people from across the company start to work together.</p>
<p>ERP projects often require teams of subject matter experts from various functional areas (finance, sales, manufacturing) to work together- depending on how serious your silos are it might be the first time many of these people have met.<br />
Make the most of it- see if you can't encourage some new working relationships that last beyond the project. A few key personal contacts between departments can make a huge difference.</p>
<h2>Best case and worst case: big difference</h2>
<ol style="margin-top:20px;">
<li><strong>Best case</strong>-  processes that stumbled along without any coordination are totally reworked for the better and everyone- including your customers- see a huge positive difference that ends up hitting your bottom line. Win!</li>
<li><strong>Worse case</strong>- no-one will accept change, everyones position is that either things stay the same, or others adapt to their vision of the future- you end up finding a way to shoehorn all the existing processes into the ERP, resulting in a messy, customized compromise that no one likes and limits progress while costing a fortune. So sad.</li>
<p>If you are doing a data migration project- how can you help make it be a more positive and unifying step, rather than a transfer of the existing silos intact into the new ERP at great expense?</p>
<h2>The first step is to admit you have a problem.</h2>
<p><img src="http://www.datamartist.com/wp-content/uploads/2009/12/data-silos-what-do-you-mean-data-silos.jpg" alt="data-silos-what-do-you-mean-data-silos" title="data-silos-what-do-you-mean-data-silos" width="373" height="276" class="alignright size-full wp-image-3744" />The first thing to do is to identify how bad your silos are, and get all the players to look the problem in the face.<br />
If you have silos, but people don't identify it as an issue, there is no way you'll get the resources you need to fix them.</p>
<h2>Make it clear that customization is the enemy.</h2>
<p><img src="http://www.datamartist.com/wp-content/uploads/2009/12/datamigration-as-long-as-the-new-system-is-the-same.jpg" alt="datamigration-as-long-as-the-new-system-is-the-same" title="datamigration-as-long-as-the-new-system-is-the-same" width="463" height="343" class="alignright size-full wp-image-3746" />Sure you could customize the ERP application that you are installing to do exactly what each department does now.  Add some tables, write some bolt on code.</p>
<p>I'm sure you can do it.  Thats not the point. </p>
<p>Technical resources love customization because it's more fun than configuration.  Don't fall into the trap of thinking "we're satisfying the customers requirements."</p>
<p>Customization in an ERP system is expensive, usually reduces functionality and in the long term drives significant maintenance and upgrade costs.</p>
<p>If you customize the new ERP system to accommodate the existing silos you are NOT satisfying requirements.  You are leading the business towards a failure that they don't really understand.  </p>
<p>You need to find language that the functional teams and management can understand and explain the risks clearly.</p>
<blockquote><p>If we don't get together and consolidate our processes and our data definitions we're going to end up with a mess in the ERP.</p></blockquote>
<blockquote><p>The system was not designed to do that- the point of an ERP is to be integrated- if our departments don't work together, than the ERP isn't going to make anything better- and its a waste of money.</p></blockquote>
<h2>Get top down support for the painful change that is necessary.</h2>
<p>Sometimes things can be grassroots, originating at a level below the executive suite.  If that works for your company, thats great.  </p>
<p>But in the majority of cases the kind of short term pain that breaking down entrenched silos will cause is too intense to be initiated anywhere but at the very top.</p>
<p>The leadership needs to make it clear that everyone is going to share in the change, and no-one has a "get out of change free" card.</p>
<p>It has to be clear that it is not a competition between the various approaches that might exist, but a move to a new approach.</p>
<h2> It's really really hard.  If it's not hard, you're not doing it right.</h2>
<p>There's nothing I can write here to make this kind of change easy, it's not.</p>
<p>But by building cross functional teams, making goals clear, and managing expectations it's possible to make progress.</p>
<p>So in summary what I've been trying to convey in this series of posts:</p>
<ul>
<li>Data migration projects are data quality projects.</li>
<li>Data migration projects are master data management projects.</li>
<li>Data migration projects are an opportunity to break down the walls between silos within the organisation.</li>
</ul>
<p>You can make a real difference if you strive to make the data migration project advance on these three fronts.</p>
<p>You always have to work within the constraints you have in terms of budget, company culture, existing systems and of course, internal politics, but if you take a step by step and pragmatic approach with these goals in mind, you'll be contributing to the solution.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/data-migration-part-5-breaking-down-the-information-silos/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Data.gov: Looking at the US governments data</title>
		<link>http://www.datamartist.com/datagov-looking-at-the-us-governments-data</link>
		<comments>http://www.datamartist.com/datagov-looking-at-the-us-governments-data#comments</comments>
		<pubDate>Tue, 26 May 2009 02:42:48 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Data Standards]]></category>
		<category><![CDATA[Public Data]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=2279</guid>
		<description><![CDATA[The Obama administration has taken another step in making government data available online in launching the Data.gov website. Whats on the site right now, and what can we do with it? The website has two catalogs of information available- the "raw data catalog", and the "tools" catalog. At the time of posting this, there are [...]]]></description>
			<content:encoded><![CDATA[<p>The Obama administration has taken another step in making government data available online in launching the Data.gov website. Whats on the site right now, and what can we do with it?</p>
<p>The website has two catalogs of information available-  the "raw data catalog", and the "tools" catalog.  At the time of posting this, there are 47 entries in the raw data section, and 27 in the tools section.  For this post, I focused on the "raw data catalog".</p>
<p>Thinks I liked:</p>
<ul>
<li> The site lets users rate data sets- giving an Overall rating, and then rating on Data utility, Usefulness and Ease of Access (giving you a vote of 0 to 5 stars). Democratic. Thats keeping with core values.
<li> Each data set came with some extra info, including data dictionarys- for the example set it can be <a href="http://www.data.gov/details/10#description"  target="_blank">found here</a>.</li>
<li> The fact that the data WAS raw.  Better to get it out there, then do something to it that makes it less useful.</li>
<li> Although Catalog doesn't yet have a lot of entries to need it yet, there is keyword search and filters by categories, government agency and file type.</li>
<li> The US Federal government isn't afraid to use the word "Potty" when it needs too.</li>
<blockquote><p>POTTYAGE,Age of the most used toilet ,1,27,Numeric<br />
POTTYLEAK,Does the most used toilet leak,1,28,Numeric<br />
POTTYPARTS,Any of the part of the most used toilet replaced,1,29,Numeric<br />
POTTYFIX,When was the most used toilet repaired,1,30,Numeric</p></blockquote>
</ul>
<p>Things I found disappointing:</p>
<ul>
<li> The data dictionary information was often in PDF format, when the contents were in fact a dimension table.  Having this data in the form of a data file would save us some parsing. <a href="http://www.eia.doe.gov/emeu/recs/recspubuse05/pdf/recs05codebook.pdf" target="_blank" rel="no-follow">Here was the example pdf</a> for the featured data set, for example.</li>
<li> Related data sets were not grouped together- the catalog only had one data set per entry, even if some were directly related or joinable.  It would be much more powerful to have the data sets grouped together. (This would also let us group together those dimension like tables if they were available.)</li>
<li> Just not enough data yet, so I found the mish-mash of various data sets a bit bizarre.  I didn't have any pressing need to analyze Community collaborative Rain, Hail and Snow observations, or the Migratory Bird Flyways, or the detail of the world wide earthquakes in the past 7 days (although admittedly that last one is kind of cool).</li>
</ul>
<p>Bottom line is any data is better than no data, and if there's something there you're interested in analyzing, then good on the US government to make it available for you.  There is not a lot there now, but the hope is that over time more federal CIOs will feel the pressure to get their data available.  </p>
<p>Whats going to be key, however, to make this all work is to establish a standard for both data and meta data that is more robust than throwing some csv files and PDF data dictionary documents on a web site.  I look forward to seeing how this evolves.</p>
<p>Update:  Sunlight labs, with sponsors including Google have a <a href="http://sunlightlabs.com/contests/appsforamerica2/" target="_blank" rel="no-follow">contest for the best analytical applications based on data.gov data</a>.</p>
<h2>Doing a deeper dive into the data with the Datamartist tool</h2>
<p>I couldn't resist taking a peak using <a href="http://www.datamartist.com/product" target="_blank">Datamartist</a>.</p>
<p>The featured data set that was displayed prominently on the Data.gov home page was the "Residential Energy Consumption Survey (RECS)" for 2005.  The description of the data file is as follows:</p>
<blockquote><p>The Residential Energy Consumption Survey (RECS), which is conducted every four years, provides national statistical survey data on the use of energy in residential housing units including physical housing unit types, appliances utilized, demographics, fuels, and other energy use information. This dataset (i.e., the full RECS dataset) is very large in size and may require specialized software to open on your computer. The file might not open completely in Excel 2003 or earlier versions.</p></blockquote>
<p>The file ends up being very WIDE rather than tall- over 1000 columns, and 4382 rows, totaling about 10 Mb in a csv file.</p>
<p>I did one pass to make a sub set file focusing in on a few columns, then I built some mini dimension tables to join up by cutting and pasting out of the PDF.   Once I had the data in datamartist I connected up a join block and checked it out in the data profiler:</p>
<p><img src="http://www.datamartist.com/wp-content/uploads/2009/05/data_gov-analysis-in-datamartist-small.jpg" alt="data_gov-analysis-in-datamartist-small" title="data_gov-analysis-in-datamartist-small" width="600" height="459" class="alignnone size-full wp-image-2286" /></p>
<p>Oddly, when investigating POTTYAGE, I found that the majority of the rows had the value zero, which was not listed in the data dictionary. Hmmm. Perhaps even the feds have data quality problems now and again.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/datagov-looking-at-the-us-governments-data/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

