<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Datamartist.com &#187; Project Management</title>
	<atom:link href="http://www.datamartist.com/category/project-management/feed" rel="self" type="application/rss+xml" />
	<link>http://www.datamartist.com</link>
	<description>Reduce cost with self serve data transformation</description>
	<lastBuildDate>Thu, 09 Feb 2012 20:00:31 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.3</generator>
		<item>
		<title>Data quality sizzle</title>
		<link>http://www.datamartist.com/data-quality-sizzle</link>
		<comments>http://www.datamartist.com/data-quality-sizzle#comments</comments>
		<pubDate>Tue, 22 Mar 2011 18:08:56 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Data Quality]]></category>
		<category><![CDATA[Project Management]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=5985</guid>
		<description><![CDATA[I'm an engineer. Being an engineer, I'm pretty product focused, pretty technology focused, and pretty "does it work or not" focused. Having technical things like tools work is useful, and good. But just because you build it, does not mean they will come. The challenge often in Data Quality is that often what has to [...]]]></description>
			<content:encoded><![CDATA[<p>I'm an engineer. Being an engineer, I'm pretty product focused, pretty technology focused, and pretty "does it work or not" focused.  </p>
<p>Having technical things like tools work is useful, and good.  But just because you build it, does not mean they will come.</p>
<p>The challenge often in Data Quality is that often what has to change even more than the technology or tools is the behaviours and perspectives of the people in the organisation with data quality issues.  At the very least, the users have to use the tools.  Very few data quality solutions are of the "full autopilot" bad-data-goes-in-here-good-comes-out-here type.</p>
<p>As much as we engineers would like to solve everything with software, people are involved in Data Quality.  </p>
<p>While a fantastic bit of data profiling analysis or an elegant and powerful data transform would seem to be enough, the truth is sometimes how and when you present these things is key to getting the non-engineer people to buy in.  </p>
<p>Sometimes preparing people over time, and introducing things in a step by step way helps them understand, and makes the technology and the change required less daunting.</p>
<p><a href="http://www.datamartist.com/wp-content/uploads/2011/03/red-bbq.jpg"><img src="http://www.datamartist.com/wp-content/uploads/2011/03/red-bbq-300x199.jpg" alt="" title="red-bbq" width="300" height="199" class="alignright size-medium wp-image-5987" /></a>Because I'm looking out my window at a tentative (very tentative it's only March after all) spring day here in Toronto, I'm going to use a summer barbecue analogy.</p>
<p>The tools and technology are the steak.  The steak is key to the party.   In the end (at least for me in this analogy) the steak delivers most of the value in your summer BBQ party value proposition, but you'll have more guests and be more successful over all if you package the whole. </p>
<p>Sometimes, part of selling the steak is the sizzle, the preparation, the things around the steak.</p>
<p>It's the smell of the BBQ getting ready, it's the sound of the steak hitting the grill- its the cold drink, the conversation, the games on the lawn for the kids.</p>
<p>In the end, even if you know that 90% of the deal was that steak, if you just put a steak on a plate and give it to each guest the moment they arrive, its just not going to get the same response.</p>
<p>In my usual round about way the point I'm trying to get to is that you can't solve technical problems, then drop them on people desks and say "do it".  You need to invite them to the party.  Prepare them for the menu, ask preferences, give them some time to hear the sizzle, smell the charcoal, enjoy the sunshine in expectation of that steak.</p>
<p>Steak is good.  Remember to plan some sizzle too.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/data-quality-sizzle/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Reduce Business Intelligence cost through better data migration</title>
		<link>http://www.datamartist.com/reduce-business-intelligence-cost-by-keeping-master-data-clean</link>
		<comments>http://www.datamartist.com/reduce-business-intelligence-cost-by-keeping-master-data-clean#comments</comments>
		<pubDate>Tue, 09 Mar 2010 18:49:29 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Cost Reduction]]></category>
		<category><![CDATA[Data migration]]></category>
		<category><![CDATA[Project Management]]></category>
		<category><![CDATA[Business Intelligence]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=4390</guid>
		<description><![CDATA[Managing Business Intelligence cost is not an easy task. But poorly or inconsistently structured data can make the task even harder. Unfortunately, a lazy data migration project can generate all sorts of headaches that will cause your Business Intelligence cost to explode. Of course, bad data quality also has many other costs and risks associated [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.datamartist.com/wp-content/uploads/2010/03/tell-the-ceo-forget-the-merger-data-is-read-only.jpg"><img src="http://www.datamartist.com/wp-content/uploads/2010/03/tell-the-ceo-forget-the-merger-data-is-read-only.jpg" alt="" title="tell-the-ceo-forget-the-merger-data-is-read-only" width="363" height="209" class="alignright size-full wp-image-4400" /></a>Managing Business Intelligence cost is not an easy task.  But poorly or inconsistently structured data can make the task even harder.  Unfortunately, a lazy data migration project can generate all sorts of headaches that will cause your Business Intelligence cost to explode.  Of course, bad data quality also has many other costs and risks associated with it in its own right, but I'm going to focus in on business intelligence today.  </p>
<p>The majority of the development cost in the current business intelligence methodology is often in getting the data out of source systems (Extract), and transforming it to make it consistent across all the various dimensions needed (Transform) and then putting it in a model that is easy to query and analyse (Load).  The creation of these ETL jobs is made dramatically harder if the data in the source systems is not consistent. </p>
<h2>Change is the challenge</h2>
<p>Companies are not static-  they grow, diversify, change strategies, reorganize, rename and restructure.  They acquire other companies or are acquired. The structure and content of the data their systems often tells you this story, and if the proper work is not done to keep the data consistent with itself and the new situation then this story will be painful and complex.</p>
<blockquote><p>Remember ten years ago when we acquired company X, but decided not to change their customer codes to our standard, so all the codes had an "X" prefixed so that we wouldn't have duplicates?  Well, those X's are still there, and all our queries have to deal with multiple code structures.</p></blockquote>
<blockquote><p>Remember how we used to have three independent databases, one for each region, then when we went to the new data center and put everything into a single database, we ended up with multiple schemas and all those crazy views rather than consolidating into a single instance?</p></blockquote>
<p>When the data migration project made the decision to reduce the project cost by not addressing data consistency, they simply pushed this cost in the future, most likely turning a one time expense into an ongoing and expanding annual business intelligence cost.</p>
<p>You end up with crazy ETL jobs that parse the same field in different ways depending on the date of the transaction, or on other fields-  "If the transaction is before 2002, then the first digit of the product code means X, otherwise it means Y, unless of course its from the western division, who do it differently so then you need to look at field A and use the CASE statement..."</p>
<h2>Reduce Business Intelligence cost through data cleanup</h2>
<p>If your data is cleaner you'll reduce business intelligence cost across your entire BI architecture.</p>
<ul>
<li>Reduce ETL and report development cost- both initial, and the cost of ongoing maintenance.  Every change request will take more time if all the models are complex due to underlying data complexity.</li>
<li>Reduce hardware costs- complex queries require more processing, and bigger servers to meet that nightly load window</li>
<li>Reduce time spent reconciling numbers. Complex ETL means that chances are business intelligence reports don't match up easily with the operational reports from the source systems.  People will spend time constantly double checking these discrepancies, and it will undermine confidence in all data.</li>
</ul>
<h2>Fix the problem at the source.  Not in the Business Intelligence.</h2>
<p><a href="http://www.datamartist.com/wp-content/uploads/2010/03/lazy-data-migration-get-jackets-business-intelligence-pays-the-bill.jpg"><img src="http://www.datamartist.com/wp-content/uploads/2010/03/lazy-data-migration-get-jackets-business-intelligence-pays-the-bill.jpg" alt="" title="lazy-data-migration-get-jackets-business-intelligence-pays-the-bill" width="420" height="285" class="alignright size-full wp-image-4396" /></a>Business intelligence is far too often left to fix all the issues in the source systems- and then becomes the focus of dissatisfaction when costs and delays become unacceptable.  </p>
<p>I've heard people argue "Thats what ETL is for right?  Why are you complaining?"  </p>
<p>Assuming that the ETL will fix the sins of the source system is an inefficient and costly strategy.</p>
<p>Everything is a balance, perfection does not exist, but when deciding what to fix and what to leave, don't let a lazy data migration project saddle you with years of business intelligence costs- when it's time to bulk load data into the system, make it as right as you can.  </p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/reduce-business-intelligence-cost-by-keeping-master-data-clean/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Data migration Part 5- Breaking down the information silos</title>
		<link>http://www.datamartist.com/data-migration-part-5-breaking-down-the-information-silos</link>
		<comments>http://www.datamartist.com/data-migration-part-5-breaking-down-the-information-silos#comments</comments>
		<pubDate>Tue, 22 Dec 2009 15:45:22 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Data migration]]></category>
		<category><![CDATA[Data Standards]]></category>
		<category><![CDATA[Project Management]]></category>
		<category><![CDATA[ERP Projects]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=3732</guid>
		<description><![CDATA[During a data migration project, the information technology department is in a unique position to either help or hinder how well all the different parts of a company work together. The transactional systems that a company uses can be the glue that binds, or can be a key part of the walls that block inter-departmental [...]]]></description>
			<content:encoded><![CDATA[<p>During a data migration project, the information technology department is in a unique position to either help or hinder how well all the different parts of a company work together.</p>
<p>The transactional systems that a company uses can be the glue that binds, or can be a key part of the walls that block inter-departmental collaboration and information sharing.</p>
<h2>The data migration project is your best chance to break down the data silos- but it won't be the easy path.</h2>
<p><img src="http://www.datamartist.com/wp-content/uploads/2009/12/data-silos-leave-your-silos-and-follow-me.jpg" alt="data-silos-leave-your-silos-and-follow-me" title="data-silos-leave-your-silos-and-follow-me" width="286" height="334" class="alignleft size-full wp-image-3748" />Not only can the ERP project create new opportunities for efficiency and process improvement, but it can also be the process through which people from across the company start to work together.</p>
<p>ERP projects often require teams of subject matter experts from various functional areas (finance, sales, manufacturing) to work together- depending on how serious your silos are it might be the first time many of these people have met.<br />
Make the most of it- see if you can't encourage some new working relationships that last beyond the project. A few key personal contacts between departments can make a huge difference.</p>
<h2>Best case and worst case: big difference</h2>
<ol style="margin-top:20px;">
<li><strong>Best case</strong>-  processes that stumbled along without any coordination are totally reworked for the better and everyone- including your customers- see a huge positive difference that ends up hitting your bottom line. Win!</li>
<li><strong>Worse case</strong>- no-one will accept change, everyones position is that either things stay the same, or others adapt to their vision of the future- you end up finding a way to shoehorn all the existing processes into the ERP, resulting in a messy, customized compromise that no one likes and limits progress while costing a fortune. So sad.</li>
<p>If you are doing a data migration project- how can you help make it be a more positive and unifying step, rather than a transfer of the existing silos intact into the new ERP at great expense?</p>
<h2>The first step is to admit you have a problem.</h2>
<p><img src="http://www.datamartist.com/wp-content/uploads/2009/12/data-silos-what-do-you-mean-data-silos.jpg" alt="data-silos-what-do-you-mean-data-silos" title="data-silos-what-do-you-mean-data-silos" width="373" height="276" class="alignright size-full wp-image-3744" />The first thing to do is to identify how bad your silos are, and get all the players to look the problem in the face.<br />
If you have silos, but people don't identify it as an issue, there is no way you'll get the resources you need to fix them.</p>
<h2>Make it clear that customization is the enemy.</h2>
<p><img src="http://www.datamartist.com/wp-content/uploads/2009/12/datamigration-as-long-as-the-new-system-is-the-same.jpg" alt="datamigration-as-long-as-the-new-system-is-the-same" title="datamigration-as-long-as-the-new-system-is-the-same" width="463" height="343" class="alignright size-full wp-image-3746" />Sure you could customize the ERP application that you are installing to do exactly what each department does now.  Add some tables, write some bolt on code.</p>
<p>I'm sure you can do it.  Thats not the point. </p>
<p>Technical resources love customization because it's more fun than configuration.  Don't fall into the trap of thinking "we're satisfying the customers requirements."</p>
<p>Customization in an ERP system is expensive, usually reduces functionality and in the long term drives significant maintenance and upgrade costs.</p>
<p>If you customize the new ERP system to accommodate the existing silos you are NOT satisfying requirements.  You are leading the business towards a failure that they don't really understand.  </p>
<p>You need to find language that the functional teams and management can understand and explain the risks clearly.</p>
<blockquote><p>If we don't get together and consolidate our processes and our data definitions we're going to end up with a mess in the ERP.</p></blockquote>
<blockquote><p>The system was not designed to do that- the point of an ERP is to be integrated- if our departments don't work together, than the ERP isn't going to make anything better- and its a waste of money.</p></blockquote>
<h2>Get top down support for the painful change that is necessary.</h2>
<p>Sometimes things can be grassroots, originating at a level below the executive suite.  If that works for your company, thats great.  </p>
<p>But in the majority of cases the kind of short term pain that breaking down entrenched silos will cause is too intense to be initiated anywhere but at the very top.</p>
<p>The leadership needs to make it clear that everyone is going to share in the change, and no-one has a "get out of change free" card.</p>
<p>It has to be clear that it is not a competition between the various approaches that might exist, but a move to a new approach.</p>
<h2> It's really really hard.  If it's not hard, you're not doing it right.</h2>
<p>There's nothing I can write here to make this kind of change easy, it's not.</p>
<p>But by building cross functional teams, making goals clear, and managing expectations it's possible to make progress.</p>
<p>So in summary what I've been trying to convey in this series of posts:</p>
<ul>
<li>Data migration projects are data quality projects.</li>
<li>Data migration projects are master data management projects.</li>
<li>Data migration projects are an opportunity to break down the walls between silos within the organisation.</li>
</ul>
<p>You can make a real difference if you strive to make the data migration project advance on these three fronts.</p>
<p>You always have to work within the constraints you have in terms of budget, company culture, existing systems and of course, internal politics, but if you take a step by step and pragmatic approach with these goals in mind, you'll be contributing to the solution.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/data-migration-part-5-breaking-down-the-information-silos/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Data migration Part 4- Creating a data dictionary how to tackle master data management</title>
		<link>http://www.datamartist.com/data-migration-creating-a-data-dictionary-how-to-tackle-master-data-management</link>
		<comments>http://www.datamartist.com/data-migration-creating-a-data-dictionary-how-to-tackle-master-data-management#comments</comments>
		<pubDate>Thu, 17 Dec 2009 20:52:39 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Data migration]]></category>
		<category><![CDATA[Meta Data]]></category>
		<category><![CDATA[Project Management]]></category>
		<category><![CDATA[Meta]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=3696</guid>
		<description><![CDATA[Migrating data is complicated. It's particularly hard because of course it's not just a physical move. Data definitions are different from the legacy to the new systems. To get this right you need to manage these data definitions. In this post, I'm going to discuss some things to keep in mind during this process. As [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://www.datamartist.com/wp-content/uploads/2009/12/data-migration-as-long-as-nothing-changes-we-are-ok-to-go.jpg" alt="data-migration-as-long-as-nothing-changes-we-are-ok-to-go" title="data-migration-as-long-as-nothing-changes-we-are-ok-to-go" width="373" height="269" class="alignright size-full wp-image-3710" />Migrating data is complicated.  It's particularly hard because of course it's not just a physical move. Data definitions are different from the legacy to the new systems. To get this right you need to manage these data definitions.</p>
<p>In this post, I'm going to discuss some things to keep in mind during this process.  As with the other posts in this series, I'm not going to be talking about specific tools or getting into super technical discussions.<br />
I'm going to assume that you do not have millions of dollars to spend on the state of the art master data management software and its configuration.  Instead, I'm going to present the high level concepts, and focus on some of the change management aspects.  How can you focus on what's important, and avoid having master data derail your data migration project?</p>
<p>This post is number four in a series.  If you are a linear type, and want to read all the posts in order, <a href="/data-migration-part-1-introduction-to-the-data-migration-delema">part 1 is here</a>,  <a href="/data-migration-part-2-determining-data-quality-is-the-first-key-step">then two</a>, and <a href="/data-migration-part-2-determining-data-quality-is-the-first-key-step">three</a>.</p>
<p>There is a reoccurring theme here- just as when I discussed in part two about how a data migration project is also a data quality project because often existing data quality issues must be resolved;</p>
<h2>A data migration project is also a master data management project.</h2>
<p>Since often the legacy systems were department based, the concept of formal data definitions, and the processes needed to manage them across functional boundaries simply don't exist.  But if your data migration project is moving master data from multiple legacy systems into a single new ERP application, you are going to need at least a basic set of processes to manage it going forward.</p>
<p>Welcome to change management, cross functional teams and data governance committees.</p>
<h2>Data migration means integrating data that used to live apart, and is owned by different groups within the company.</h2>
<p>Often, a data migration project is part of a new ERP project to combine a number of legacy systems (Finance, Sales, Manufacturing) into a single integrated application.  Because all of the legacy systems were independent, often the data definitions used are very different, and any mappings or reconciliations exist only at a high level if at all.</p>
<p>So what does this mean?</p>
<ul>
<li>There will be technical challenges:
<ul>
<li>Data will be in strange and wonderful (or at least many) databases from various vendors.</li>
<li>Data will have different codes and structures</li>
<li>Data will be stored at different granularity</li>
<li>Data will be stored in different units</li>
<li>You may have things like time zones, date formats etc. to deal with</li>
</ul>
</li>
<li>There will also be actual definition type challenges
<ul>
<li>Although something is refered to as "X" in more than one system, the definition may be very different.</li>
<li>Different functional groups look at data in fundamentally different ways- engineers vs accountants, sales people vs human resources professionals.</li>
<li>People will be attached professionally, emotionally and politically to their definition and view of things.</li>
</ul>
</li>
</ul>
<p>The technical challenges are significant- but its the definition challenges that are often the most difficult, because they involve people.  These often represent fundamental changes to the processes within a company, and require coordination across many departments, and the political empires that often exist.  It takes a firm hand, and good executive support. </p>
<p>Where to start? Well, know some of the potential pitfalls, and define realistic goals clearly.</p>
<h2>Just because we call something the same thing doesn't mean its the same thing.</h2>
<p><img src="http://www.datamartist.com/wp-content/uploads/2009/12/master-data-management-challenge-are-we-thinking-the-same-thing.jpg" alt="master-data-management-challenge-are-we-thinking-the-same-thing" title="master-data-management-challenge-are-we-thinking-the-same-thing" width="321" height="214" class="alignright size-full wp-image-3702" />I've seen people go into a meeting, have a one hour discussion regarding definitions, come out with a high level list of the metrics and a real belief that they agree on everything.  They tell you "Your data migration is going to be straightforward, we're pretty much on the same page." </p>
<p>Don't be fooled by this.</p>
<p>After 30 minutes of digging into the details in the two different systems, you will realize that there are lots of things they don't agree on.</p>
<p>Why did they think that they agreed?  Because they spoke the same words, but meant completely different things.  I say, a car has four wheels- you nod.  We're done.  But was it a sports car, a sedan, an SUV?  I say car- I see my car, you say car you see your car.  We nod.  Meeting over. </p>
<p>Doesn't work- you have to dig into the details, and put them on the table.</p>
<h2>Details are important, and everyone hates them. Find the detail people and get them together.</h2>
<p>No one likes the details, but the heart of any master data management work is in those nuts and bolts.  Start to dive into the details and most eyes will glaze over.  Find the people who care because the details affect their daily job.</p>
<p>What you need to do is make a "data definitions" working group made up of people from multiple departments.</p>
<ol>
<li>Get at least one detail oriented team member from each department</li>
<li>Make sure they are respected and experienced- you need people who know their stuff.</li>
<li>Make sure it's clear to all departments that the output of this working group will be the data dictionary used by the new system.</li>
<li>Add in one or two data people- someone who can query all the systems involved, and can both answer the detailed questions from the group, but also actually validate concepts as they are created by making prototype extractors and transformations</li>
<li>If you are missing key expertise, bring it in.  Even if its your core business, if you think you're not following industry best practices for your definitions, nomenclature or processes, don't just "keep doing it the way we do it".</li>
<li> take advantage of industry standards for naming and coding, this can sometimes also be a method for avoiding internal battles- we're going to follow the standard, not have a homemade coding system.</li>
</ol>
<p>So now you have to give this working group a goal.</p>
<h2>The solution is not always to have only one definition in the end.</h2>
<p>What?  Isn't that the whole point?  </p>
<p>Yes it is when there really is only one definition, but don't chase after "a single version of the truth" ignoring the reality of what people need, and what metrics are used in the business.  </p>
<p>Just because finance and manufacturing calculate "X" in two different ways does not necessarily mean that you have to stop using one of the definitions.  Both groups might have a perfectly valid reason to calculate it they way they do.  The key is- don't use just one name for it.  Its NOT "X".  </p>
<p>There are two metrics "X1" and "X2" and they have different definitions for different reasons.  Put them both in the dictionary, make it clear what the differences are, and when each measure is used.</p>
<h2>It's not a battle. Everyone is on the same team here.</h2>
<p><img src="http://www.datamartist.com/wp-content/uploads/2009/12/data-dictionary-troubles-getting-finance-and-sales-to-agree.jpg" alt="data-dictionary-troubles-getting-finance-and-sales-to-agree" title="data-dictionary-troubles-getting-finance-and-sales-to-agree" width="371" height="236" class="alignleft size-full wp-image-3704" />People of all sorts often tend to see things in terms of winning and losing.  </p>
<p>You might find people thinking in terms such as "Finances definition for "X" has to win.  We know how it is supposed to be calculated, and we are going to set it right."  Avoid framing the debate as a competition- present it from the start in an inclusive way;</p>
<p>"We need to list all the metrics and definitions involved, as used by all departments- and if there are duplicates we'll consolidate, but overall we want to capture them, and ensure they are available. We can all acknowledge that each department has different needs owing to their functional area."</p>
<h2>Create a central repository for the dictionary- make it public from the start.</h2>
<p>Now, if you have the million dollar master data management (MDM) softward solution chances are it has this functionality and more. If you don't, then you should plan to make an internal web site or equivalent means of providing access.</p>
<p>I won't go into the detail of what an entry in this data dictionary actually consists of, partly because this post is already past my usual length limit, but mostly because there is no hard and fast rule.  You can go anywhere from a basic dictionary (that describes what each metric is in english) all the way to a highly sophisticated meta data management tool that interacts directly with your extract transform and load (ETL) logic actually changing how numbers are transformed in your migration jobs. (I'm not sure I'd go that far, even with that mythical million dollar tool.)  </p>
<p>The bottom line is, be pragmatic- and be public.</p>
<p>In my experience it has been critical to create a published, work in progress dictionary from the start of the process.  The ideal is to have an online database that allows contributors to edit and comment directly. This is important for a number of reasons:</p>
<ol>
<li>It makes the process transparent- everyone can see what the definitions are going to be.</li>
<li>It makes the scope and complexity of the challenge visible.  By having all the definitions on line, everyone can see just how many definitions there are, and how much detail there is.</li>
<li>It tracks versioning and ensures that all the work is captured.  Don't have people with lots of excel files on their local hard drive holding up your progress.</li>
<li>It lets you publicly assign responsibility for each defintion.  <strong>DO NOT have a librarian who is reponsible for all definitions </strong>- you want to have a process owner from each department assigned to each definition.  The ideal number is hard to find and will depend on the company, too few and you don't have buy in, to many and it's not managable.</li>
<li>It allows metrics to be viewed by everyone.  Which department has defined and approved the most definitions?  Who is holding up the effort?  A bit of inter-departmental competitiveness might be useful in this regard.</li>
</ol>
<h2>Define the processes needed to maintain and expand  your data dictionary into the future</h2>
<p>Finally, as you capture the definitions you need to perform the data migration, the ideal is to define and kick off the processes and required organisational groups that will continue to manage the dictionary after the project go live.</p>
<p>In master data management circles this is often called data stewardship and its critical that you have at least a basic formal process.   In the past, when a given chunk of data was only being used by one department, changing how it was defined would be understood by the department, and could be managed somewhat informally.  Now, when the same bit of data might be used accross the company since everyone is in the same system, changing that query or how the value is calculated at the request of one department could create all sorts of issues for others.  There must be a process in place to validate the change, and to communicate it so that everyone in the system can be aware when the numbers shift beneath them.</p>
<h2>In summary, it's really really hard.</h2>
<p>The bottom line is, master data management requires a pragmatic, step by step approach, and is never finished.  Manage everyones expectations, and be very careful as to what you promise your data migration project can achieve in this area. </p>
<p>Very large big bang type projects are probably not a good idea, even if you are armed with the latest and greatest tools.  As <a href="http://en.wikipedia.org/wiki/Master_Data_Management">Wikipedia states</a> in its criticism section:</p>
<blockquote><p>The value and current approaches to MDM have come under criticism due to some parties claiming large costs and low return on investment from major MDM solution providers.</p></blockquote>
<p>Much like massive data warehouse projects, risks are high, particularly if you don't have near fanatical support from the very top.</p>
<p>The best approach is most likely to build step by step, clearly defining what has to be defined for the data migration project at hand, and putting in place realistic processes that can be augmented and enhanced over time as your company begins to "grok" why having a data dictionary is important.</p>
<p>It's a long road, but ignoring master data management in your data migration project adds risks to your projects success, and makes it even harder for future master data efforts to succeed.</p>
<p>Just as a data migration project is an opportunity to affect data quality for the better, it can also be a positive influence on your master data.</p>
<p>Next post- I'm going to wrap up this series with some discussion about how important it is to forge a collaboration between the information technology (IT) department and the various departments within a business, and how IT can play an important role in building bridges through the silos- both data silos and others</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/data-migration-creating-a-data-dictionary-how-to-tackle-master-data-management/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Making rapid prototypes for data warehouse ETL jobs</title>
		<link>http://www.datamartist.com/making-rapid-prototypes-for-data-warehouse-etl-jobs</link>
		<comments>http://www.datamartist.com/making-rapid-prototypes-for-data-warehouse-etl-jobs#comments</comments>
		<pubDate>Mon, 14 Sep 2009 20:39:00 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Data warehouse]]></category>
		<category><![CDATA[ETL]]></category>
		<category><![CDATA[Project Management]]></category>
		<category><![CDATA[Data Quality]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=3022</guid>
		<description><![CDATA[Data warehouses and even data marts can be expensive, complex projects. They are not projects to start lightly, and they are not projects that you want to launch without doing some solid planning. But there is a way to get a handle on the tricky parts of your data warehouse scope, and to reduce your [...]]]></description>
			<content:encoded><![CDATA[<p>Data warehouses and even data marts can be expensive, complex projects. They are not projects to start lightly, and they are not projects that you want to launch without doing some solid planning. </p>
<p>But there is a way to get a handle on the tricky parts of your data warehouse scope, and to reduce your projects overall cost.</p>
<p><img src="http://www.datamartist.com/wp-content/uploads/2009/09/ETL-Cost-vs-Other-Data-Warehouse-Cost.jpg" alt="ETL-Cost-vs-Other-Data-Warehouse-Cost" title="ETL-Cost-vs-Other-Data-Warehouse-Cost" width="302" height="236" class="alignright size-full wp-image-3122" />The major cost component of any data warehouse project is the Extract Transform and Load (ETL) development. Obviously every project is slightly different, but in my experience ETL will often make up in the order of 70% of the development cost.  One of the drivers of this cost is the relatively high priced ETL development resources required.  In the markets where I've hired resources, an ETL developer will often demand a 30-40% higher hourly rate than a business intelligence report writer, for example.</p>
<p><strong>Making ETL prototypes will give you insights that can reduce cost </strong> by shortening the ETL development process and making the optimum use of those highly talented and expensive ETL resources.</p>
<h2>What affects the cost and complexity of ETL jobs?</h2>
<p><img src="http://www.datamartist.com/wp-content/uploads/2009/09/DW-prototype-not-all-data-in-erp.jpg" alt="DW-prototype-not-all-data-in-erp" title="DW-prototype-not-all-data-in-erp" width="353" height="250" class="alignright size-full wp-image-3072" />For any given scope, the following will have a large impact on the number and complexity of ETL jobs and therefore their cost.</p>
<ol>
<li>The number of different data sources involved.</li>
<li>The consistency in terms of master data definitions between systems.</li>
<li>The level of data quality in the systems.</li>
</ol>
<p>Ideally, you want to get a good handle on these three things before you hire all the ETL developers, and be confident that you are going to satisfy the users needs before millions of dollars are spent on Extract Transform and Load (ETL) jobs and business intelligence reports.  </p>
<p>One part of the preparation needed to do this can be the creation of a proof of concept or mockup of key parts of the data warehouse ETL deliverable.</p>
<p>Now, there are mockups, there are prototypes, and there are "first versions".  The the most effective approach is to create a mockup or prototype that;</p>
<ul>
<li>Goes just deep enough into the data to:
<ul>
<li>Establish all data sources that will be required</li>
<li>Gives a high level audit of their master data and data quality</li>
</ul>
</li>
<li>Provides enough output that:
<ul>
<li>End users can be supplied with example reports or cubes to get hands on</li>
<li>The functional scope can be locked down with confidence on all sides.</li>
</ul>
</li>
</ul>
<p>The goal of a data warehouse prototype is to learn about the underlying data, and to be able to try different data transformation techniques and approaches on the real data.  The goal is not to make the finished product, nor to deliver actually usable reports to end users, although it may be to generate an example result for users to validate.</p>
<p>An example might be to create a prototype to calculate total sales by segment for a period under a new customer segmentation.  This would identify if the segmentation rules that have been suggested actually result in the expected segementation of sales data, and if the fields involved are complete and correctly populated in the source systems.</p>
<p>A prototype should focus on the dimensions and data sources that are expected to be the most difficult, and involve multi-source integration.  Don't spend time prototyping the easy stuff.</p>
<p>When you are making a prototype remember its a one-time development.  Manual steps and doing some "data cleaning by hand" are perfectly reasonable-  its what you learn from the prototype, not how you learn it that is important.  Take a snapshot, or a sample of the various tables and put them in a sandbox environment where you can manipulate them quickly and easily.</p>
<p>The whole point is to move quickly, get lots of feedback from users, and be able to avoid unpleasant discoveries during the actual data warehouse development.  </p>
<p>If you find a data quality issue, and it's a tough one, then just remove those rows and continue on- remember you don't have to solve all the problems in the prototype- you need to identify them.  Be open with your users about what the exercise is about- and that it is a very rough pass, and a mockup.</p>
<h2>How much could this impact cost? </h2>
<p>If you can identify issues during the prototype then you can solve them before all the ETL development resources are brought onto the project. </p>
<p>If you do not do a prototype, and find a data quality issue that requires some back and forth with the business, every week of delay will probably represent thousands or tens of thousands of dollars, with the project team waiting on the resolution before being able to resume coding the ETL jobs in question.</p>
<p>So in summary, making prototypes will:</p>
<ul>
<li>Reduce the risk of scope creep because users have actually seen and "touched" a mockup of the final output.</li>
<li>Reduce the amount of rework in ETL code because different data transformation approaches can be tested early.</li>
<li>Reduce the risk of the expensive ETL development phase of the project slipping due to unknown data quality issues.</li>
</ul>
<h2>The right tool for ETL Prototypes.</h2>
<p>Often prototypes are built in a combination of Excel, MS Access or other databases. These tools can work, but excel has serious issues handling larger data sets, and database development is often cumbersome-  the idea is to make a prototype, not actually build the SQL code.  Things like different data types, field formats, column naming rules etc. between different source databases often frustrate attempts to do something quickly.</p>
<p>Obviously another option is the enterprise ETL tools themselves- but the cost, complexity and overhead of these tools again makes them better suited to the production system- not a quick mockup or rapid prototype.</p>
<p>What you need to make an ETL prototype is an easy to use ETL tool that provides the basic type of functionality and graphical user interface of high end ETL tools, but also allows a more flexible treatment of data types, all with the ability to pull data from multiple sources, including more informal sources like Excel spreadsheets.</p>
<p>The <a href="/">Datamartist tool</a> was created to provide exactly such a data scratchpad, ideal for rapid prototyping data transformations. It lets you profile your data and build data transformations using a visual, block and connector interface.  But it represents a clear, focused and easy to use ETL tool, without all the feature bloat, cost and server configuration required by many expensive enterprise ETL solutions.  </p>
<p><a href="/downloads">Download the free trial</a>, and see for yourself.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/making-rapid-prototypes-for-data-warehouse-etl-jobs/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>6 Tips for making a business intelligence project budget</title>
		<link>http://www.datamartist.com/6-tips-for-making-a-business-intelligence-project-budget</link>
		<comments>http://www.datamartist.com/6-tips-for-making-a-business-intelligence-project-budget#comments</comments>
		<pubDate>Mon, 15 Jun 2009 14:17:04 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Project Management]]></category>
		<category><![CDATA[Business Intelligence]]></category>
		<category><![CDATA[Data warehouse]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=2389</guid>
		<description><![CDATA[Sometimes it seems like going over budget on a data warehouse project is an unwritten rule. But very often, there are some simple ways to help avoid making a budget that is doomed to be overrun. By budgeting correctly at the start, and managing cost, the project manager can deliver the benefits that the business [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://www.datamartist.com/wp-content/uploads/2009/06/data-warehouse-budget-overruns-waiting-for-reports.jpg" alt="data-warehouse-budget-overruns-waiting-for-reports" title="data-warehouse-budget-overruns-waiting-for-reports" width="305" height="222" class="alignright size-full wp-image-2394" />Sometimes it seems like going over budget on a data warehouse project is an unwritten rule.  But very often, there are some simple ways to help avoid making a budget that is doomed to be overrun. By budgeting correctly at the start, and managing cost, the project manager can deliver the benefits that the business was expecting, in the time frame and budget envelope agreed to.  Here are a few tips that I've found through experience to help;</p>
<p>1) <strong> Establish scope clearly before you establish budget.</strong> This sounds obvious but its amazing how often I've seen this go wrong. There is no way to know what something will cost if you don't know what it is. If you leave the scope too broad, user expectations will drive feature and scope creep that will wipe out your original budget. There is only one way to have a clear scope- write it down. And the best kind of scope document is one that is signed by all the various players. Make sure everyone understands what is being delivered, when, and at what cost. If there are gray zones, at least flag them as potential areas where additional costs might arise.</p>
<p>2) <strong>Take a look at the data and talk to people who work with it.</strong>  It seems obvious to say that what's in the data matters. (Or more accurately what's not in the data). But its amazing how many budgets are made without doing any serious analysis of the data that is actually in the source systems.   Don't look at the data model, see a field called "customer birthday" and design functionality around it.  Problems can range from missing data, to mandatory fields that are filled with garbage because "otherwise the system won't let us put the order through" to differences in interpretation of definitions between groups within the company.  For example if all the Asian sales offices classify customers into the same segments, but have slightly different rules then you will need to "reallocate" this segmentation- even though it is the same field, and the same codes.  That reallocation is an ETL job you didn't budget for, unless you found it in a pre-budget data audit.  Often the key here is not to launch a massive data audit, but to find the people who have been trying to make the global reports in spreadsheets-  they've run up against these sorts of issues, and can probably even offer some solutions. (That well respected analyst in head office who has already painstakingly established a cross mapping for for the customer segments working with his colleagues in Asia, for example).  These same folks are also going to be key in terms of adoption of the final solution, since they are often the current source of data for the underground data system the new data warehouse is supposed to be improving on.  By involving the key people, you gain credibility, save time, and ensure that the final solution addresses business needs.</p>
<p>3) <strong> Do proof of concepts for the tricky bits.</strong> In many projects there are areas where something is being tried for the first time (certainly in the more interesting ones)- not surprisingly this is often where the issues arise. One way to help quantify how much effort will really be required by these areas is to do some quick and dirty proof of concepts to validate the basic technical and/or functional aspects of the component or system. Often, if it is early in the project and software and hardware selection has not yet been done, your vendors will be willing to assist with a proof of concept (or even do it themselves) as part of the evaluation process. You can learn important things in this stage- For example, if you are doing a reporting project that needs to deliver 300 reports, by doing a proof of concept of 3-4 reports (even if they are not much more than mock-ups) you can at least get a first estimate of how much development effort is involved, how easy the tool is, and how the software performs. I've done proof of concepts that doubled my estimates- because when we actually sat down and used the tool, we realized there was additional data cleanup and hardware required to make it work. Better to know that before you set your budget rather than after.  The ideal proof of concept is to actually build a "wire frame" version of the key data marts with real data and let the users try it out.  Often, its possible to do this quickly, particularly with <a href="/product" target="_blank">a tool that lets you do rapid prototypes of ETL transformations.</a></p>
<p>4) <strong> Involve the project team in the budgeting process</strong> As a project manager, the responsibility for the budget is ultimately yours, however by getting input from the experts in the various fields you can greatly improve its accuracy. Don't guess how long it will take to configure the server- go to the infrastructure team and find out how long it took the last three times they did a similar install and configuration. Be aware of the differences, but for many items by talking to the people who have been there and done that you will get good estimates of the true cost/duration of your project line items.</p>
<p>5) <strong> Watch out for Infrastructure and User centric costs</strong> The following costs are often overlooked, and end up being part of the cost overrun in the end. Don't get caught by these classic end of project costs;</p>
<ul>
<li>Infrastructure costs- We start to take our IT infrastructure for granted, and assume it will accommodate the new system without modifications- but is the network fast enough? Is there enough data storage? Just assuming that the existing server capacity, storage and bandwidth are sufficient may hide significant costs that the project will need to take on just to make the system operational. I've also seen cases where three projects that were launched at the same time all checked that the storage was available- but of course each project did not take into account the other two. The last project to go live ended up having to buy more hard drive capacity- and went over budget.</li>
<li>Training for end users. There are few systems that don't require some amount of training for end users. Not taking this into account will provide a rude surprise at the end of the project. Costs here include actual training by third parties, but also travel expenses for trainers or trainees if they are not all based in the same location. Web based training can be a cost effective alternative, but results vary- and its difficult to ensure that "attendees" are really attending.</li>
<li>Transitional support costs. If the project has a wide scope and involves a large number of users, be aware that there may be an initial spike in help desk calls and PC support as the system goes live. Depending on how your help desk is structured, you might end up paying more to your outsourcing company, or need to hire some temporary employees to help handle the extra calls for a few months. </li>
</ul>
<p>6) <strong> Have a contingency amount included in your budget and do everything you can to keep it till the end</strong> There are two big mistakes commonly made regarding budget contingencies- first, not having one at all, and second, using it early in the project. Contingencies are often unpopular, first because by increasing the amount that needs to be approved, they might make it harder to get the green light for the project, and secondly because some view them as a "fudge" or a lack of willingness to do a proper cost estimate. However, ideally a contingency is a realistic number that should be based on the risk inherent in the project. Very small, simple projects might only need a 5% contingency. Large, complex projects that involve multiple departments, hundreds or thousands of users and multiple software and hardware vendors need higher contingencies. There are simply more things that can go wrong, and its not realistic to expect that cost analysis can be accurate enough to foresee everything. The key is not to consume your entire contingency with the first scope change- I've seen it happen again and again. Contingency is supposed to be for those unforeseen things- for example, a technical problem with the interaction between two vendors packages requires custom development to provide the original functionality envisioned, or requires more hardware than expected.</p>
<p>By spending the time required up front, a realistic, practical budget can be created- and you can get your project started off on the right foot.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/6-tips-for-making-a-business-intelligence-project-budget/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

