<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Datamartist.com &#187; Meta Data</title>
	<atom:link href="http://www.datamartist.com/category/meta-data/feed" rel="self" type="application/rss+xml" />
	<link>http://www.datamartist.com</link>
	<description>Reduce cost with self serve data transformation</description>
	<lastBuildDate>Thu, 09 Feb 2012 20:00:31 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.3</generator>
		<item>
		<title>Let&#8217;s admit it- centralized business intelligence alone just doesn&#8217;t work</title>
		<link>http://www.datamartist.com/centralized-business-intelligence-alone-does-not-work</link>
		<comments>http://www.datamartist.com/centralized-business-intelligence-alone-does-not-work#comments</comments>
		<pubDate>Wed, 03 Mar 2010 21:10:27 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Business Intelligence Architecture]]></category>
		<category><![CDATA[Meta Data]]></category>
		<category><![CDATA[Reality Check]]></category>
		<category><![CDATA[Business Intelligence trends]]></category>
		<category><![CDATA[Business Intelligence Workspace]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=4342</guid>
		<description><![CDATA[One version of the truth. Data warehouses. Centralized business intelligence teams. This has been the best practice for business intelligence for the last two decades. Users taking the initiative with data has been seen as the enemy of a successful business intelligence program. This needs to change. In a world of ever increasing data volumes [...]]]></description>
			<content:encoded><![CDATA[<p>One version of the truth.  Data warehouses.  Centralized business intelligence teams.  This has been the best practice for business intelligence for the last two decades.  </p>
<p>Users taking the initiative with data has been seen as the enemy of a successful business intelligence program.  </p>
<p>This needs to change.  In a world of ever increasing data volumes and complexity, faster business processes and more data savvy knowledge workers, a purely centralized solution is doomed to fail.</p>
<p>A consensus is starting form that the best architecture is one that blends centralized with more distributed and (gasp) free form, user guided methods.  In fact, when we look at what actually exists in most enterprises and take into account the unofficial shadow systems, we're already there, but in two separate camps that aren't talking. </p>
<p>The amount of freedom to allow ranges from letting the users have at it, to opening up the possibility of <a href="http://tdwi.org/blogs/wayneeckerson/2010/02/zen-bi-and-the-wisdom-of-letting--go.aspx" target="_blank">departmental data marts</a>, but the buzz out of TDWI clearly indicates a growing acknowledgement that a rigid top down architecture is not tenable.</p>
<p>What are Oracle, IBM, Microsoft SAP and SAS (who own more than 70% of the Business intelligence market share) advising as being the right approach?</p>
<p>They advocate big architectures, centralized meta data management, big databases, lots of command and control. They talk about "self serve"- but they mean to existing reports or report interfaces. To be fair, they need to sell the tools they have.</p>
<p>For a refreshing change from this, I very much enjoyed reading <a href="http://events.tdwi.org/Events/Las-Vegas-World-Conference-2010/Sessions/Thursday/Keynote-Stop-Paving-the-Cowpath.aspx" target="_blank">Mark Madsens keynote at TDWI</a> "Stop paving the cow path".  </p>
<p>We enjoy reading things that we agree with, and I nodded my way through his slide deck.</p>
<p>In his presentation, Madsen points out that centralization won't work, because it:</p>
<ul>
<li>Creates bottlenecks</li>
<li>Causes scale problems</li>
<li>Enforces a single model</li>
</ul>
<h2>Bottlenecks and Scale</h2>
<p><a href="http://www.datamartist.com/wp-content/uploads/2010/03/data-warehouse-super-popular-or-big-backlog.jpg"><img src="http://www.datamartist.com/wp-content/uploads/2010/03/data-warehouse-super-popular-or-big-backlog.jpg" alt="" title="data-warehouse-super-popular-or-big-backlog" width="377" height="275" class="alignright size-full wp-image-4363" /></a>In a centralized system, all requests go into the queue, and the backlog starts piling up. </p>
<p>The size of the department/team that is responsible for making it all work becomes the number one bottleneck. </p>
<p>Are there enough people able to prioritize and analyse the payback on analysis requests? Because in a centralized organisation, the gatekeepers are necessary, and how do they KNOW which requests are the good ones?  How does anyone really know?</p>
<p>I'm not sure any company can afford to staff a centralized data warehouse team to be able to handle all the requests as they are generated. Prioritization therefore becomes a single point of failure.  Get it wrong, and it can be all wrong.  In a more distributed structure, decisions are made at multiple points, some good, some bad, but diversity will often bring more innovative and experimental behavior, resulting in new avenues of analysis that a overly static central team might avoid.</p>
<p>For an indication as to how well users think the central team is listening to them, take a look at how many excel spreadsheets there are around, and how many shadow systems grow like mushrooms throughout the standard enterprise.  People think their analysis is important, and even if IT won't or can't they find a way to try to get it done.</p>
<p><a href="http://www.datamartist.com/wp-content/uploads/2010/03/data-warehouse-not-used-convert-storage-for-spreadsheets.jpg"><img src="http://www.datamartist.com/wp-content/uploads/2010/03/data-warehouse-not-used-convert-storage-for-spreadsheets.jpg" alt="" title="data-warehouse-not-used-convert-storage-for-spreadsheets" width="373" height="271" class="alignleft size-full wp-image-4364" /></a>In terms of scaling, I can hear the technical types starting to explain about how their servers, infrastructure and approach scales- diagrams and MPP theories pulled out with pride.  "Centralizing lets it be scalable- what are you talking about?"</p>
<p>Maybe. But there are traps here too- centralized organisations always want to put everything in one database.  Having everything in a single repository starts to become the goal- not the cost efficient analysis of the right data.  Not centralizing is very scalable- stand alone machines can just be added for ever.</p>
<p>It may in fact be that data can remain distributed and diverse at certain levels of detail, and more federated approaches can be used, resulting in cheaper hardware and software, and more importantly avoiding a lot of really hard master data management work.  Consolidation can sometimes happen at summary levels that make sense from a business point of view- not just blindly following the "one version" mantra.</p>
<h2>Enforcing a single model</h2>
<p>Isn't having a single data model good?  We've been told that it is.  In a way, this is the holy grail.  </p>
<p>But is there a single, correct, slowly changing model that satisfies everyone in an organisation?  </p>
<p>Why do I say slowly changing?  Because if there is only one for the entire enterprise, it will change slowly, if at all.  </p>
<p>Even if you happen to understand what the right model is, (and by model I mean data model, analysis model, process model, any model) and you manage to implement it while its still the right model, in a year its not going to be the right one.  And a centralized, high cost, committed architecture won't and can't adapt.  You'll still be paying the mortgage on the data warehouse.</p>
<p>Very large centralized models cannot be comprehensive and up to date, because to be comprehensive they have to be so complex as to be difficult to change, and as a result they quickly become out of date.  It's sort of a Heisenberg uncertainty principle for common meta data repositories.</p>
<h2>"Giving people their flying cars"</h2>
<p>Madsen of course doesn't solve the entire problem in his keynote, but he points out some directions that make sense.  And his graphic depicting a happy couple blasting off in their very locally controlled flying car sends the message- users can do their analysis without central oversight or interaction. (Although, one would imagine that some sort of air traffic control would be necessary, and the refueling stations for the cars would probably be run centrally- we're not advocating anarchy here.)</p>
<p>Having built data warehouses, established a data warehouse competency center, and provided business intelligence services for thousands of users, I can testify from first hand experience that centralizing alone is just not going to work.  People who worked with me a decade ago will remember the significant amount of time spent creating meta data repositories.  Are they still needed?  Yes.  But they simply can't do everything.  Use them with care, and be wary of your ambition for them.</p>
<p>First, accept the fact that users are not mindless consumers.  Learn from the fact that they use excel constantly, and they don't just read reports- they build things, adding data, fixing data, re-organizing data.  They think.  Give them tools that include them as part of the data processing.</p>
<p>Business intelligence cannot not be solely a process where formal requirements are gathered, followed by a publishing exercise of delivering the reports on time.</p>
<p>Are there some reports where this is the case? Sure.  Monthly management reports and dashboards shouldn't change every month.  The model can work for some amount of the delivered data analysis.  </p>
<p>The entire architecture isn't getting ripped out- but if the new architecture is successful in bringing the pent up demand that is currently being satisfied by shadow systems into the light, then distributed, user centric, user driven business intelligence will become a significant percentage of the total.</p>
<p>But the old way of thinking has to change.  Don't "Crack down on shadow systems".  </p>
<p>Find a way to provide better service, be it self, assisted or centralized service that makes the shadow systems simply a less effective way to do it.</p>
<p>The existence of shadow systems, and the extent of them, is the clearest argument that centralized business intelligence alone is simply not up to the task.</p>
<p>Once you have people doing whatever they want in the self directed part of your architecture, DO watch what they are doing- not to control it, but to learn from it.  Everyone constantly re-structuring the customer dimension?  Obviously it's time for an update.  By watching what users edit, what gaps they fill in, you can find the data quality issues, identify the fuel to put on the self directed fire.</p>
<p>Tools like Lyzasoft, <a href="/">our own Datamartist tool</a>, and Microsoft's Power Pivot in Excel 2010 and others are all going to drive power to the users, and introduce a new balanced approach between centralized and local parts of business intelligence architectures.  Visualization tools like Tableau will further give people the ability to create powerful, consumable analysis in a self serve mode.</p>
<p>Will there be challenges with data quality, risk management and wasted time doing pointless analysis? Most likely.  </p>
<p>Will the information we gather and the payoff from the successful bottom up analysis efforts make it hugely valuable overall? I for one think so.</p>
<p>We need to learn to trust our colleagues with the data, while at the same time managing the reality of data quality and risk of errors that more free form techniques can create.</p>
<p>Companies that include both top down and bottom up capabilities in their architecture will stop wasting time fighting internally, and start to take advantage of all that data.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/centralized-business-intelligence-alone-does-not-work/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Data migration Part 4- Creating a data dictionary how to tackle master data management</title>
		<link>http://www.datamartist.com/data-migration-creating-a-data-dictionary-how-to-tackle-master-data-management</link>
		<comments>http://www.datamartist.com/data-migration-creating-a-data-dictionary-how-to-tackle-master-data-management#comments</comments>
		<pubDate>Thu, 17 Dec 2009 20:52:39 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Data migration]]></category>
		<category><![CDATA[Meta Data]]></category>
		<category><![CDATA[Project Management]]></category>
		<category><![CDATA[Meta]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=3696</guid>
		<description><![CDATA[Migrating data is complicated. It's particularly hard because of course it's not just a physical move. Data definitions are different from the legacy to the new systems. To get this right you need to manage these data definitions. In this post, I'm going to discuss some things to keep in mind during this process. As [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://www.datamartist.com/wp-content/uploads/2009/12/data-migration-as-long-as-nothing-changes-we-are-ok-to-go.jpg" alt="data-migration-as-long-as-nothing-changes-we-are-ok-to-go" title="data-migration-as-long-as-nothing-changes-we-are-ok-to-go" width="373" height="269" class="alignright size-full wp-image-3710" />Migrating data is complicated.  It's particularly hard because of course it's not just a physical move. Data definitions are different from the legacy to the new systems. To get this right you need to manage these data definitions.</p>
<p>In this post, I'm going to discuss some things to keep in mind during this process.  As with the other posts in this series, I'm not going to be talking about specific tools or getting into super technical discussions.<br />
I'm going to assume that you do not have millions of dollars to spend on the state of the art master data management software and its configuration.  Instead, I'm going to present the high level concepts, and focus on some of the change management aspects.  How can you focus on what's important, and avoid having master data derail your data migration project?</p>
<p>This post is number four in a series.  If you are a linear type, and want to read all the posts in order, <a href="/data-migration-part-1-introduction-to-the-data-migration-delema">part 1 is here</a>,  <a href="/data-migration-part-2-determining-data-quality-is-the-first-key-step">then two</a>, and <a href="/data-migration-part-2-determining-data-quality-is-the-first-key-step">three</a>.</p>
<p>There is a reoccurring theme here- just as when I discussed in part two about how a data migration project is also a data quality project because often existing data quality issues must be resolved;</p>
<h2>A data migration project is also a master data management project.</h2>
<p>Since often the legacy systems were department based, the concept of formal data definitions, and the processes needed to manage them across functional boundaries simply don't exist.  But if your data migration project is moving master data from multiple legacy systems into a single new ERP application, you are going to need at least a basic set of processes to manage it going forward.</p>
<p>Welcome to change management, cross functional teams and data governance committees.</p>
<h2>Data migration means integrating data that used to live apart, and is owned by different groups within the company.</h2>
<p>Often, a data migration project is part of a new ERP project to combine a number of legacy systems (Finance, Sales, Manufacturing) into a single integrated application.  Because all of the legacy systems were independent, often the data definitions used are very different, and any mappings or reconciliations exist only at a high level if at all.</p>
<p>So what does this mean?</p>
<ul>
<li>There will be technical challenges:
<ul>
<li>Data will be in strange and wonderful (or at least many) databases from various vendors.</li>
<li>Data will have different codes and structures</li>
<li>Data will be stored at different granularity</li>
<li>Data will be stored in different units</li>
<li>You may have things like time zones, date formats etc. to deal with</li>
</ul>
</li>
<li>There will also be actual definition type challenges
<ul>
<li>Although something is refered to as "X" in more than one system, the definition may be very different.</li>
<li>Different functional groups look at data in fundamentally different ways- engineers vs accountants, sales people vs human resources professionals.</li>
<li>People will be attached professionally, emotionally and politically to their definition and view of things.</li>
</ul>
</li>
</ul>
<p>The technical challenges are significant- but its the definition challenges that are often the most difficult, because they involve people.  These often represent fundamental changes to the processes within a company, and require coordination across many departments, and the political empires that often exist.  It takes a firm hand, and good executive support. </p>
<p>Where to start? Well, know some of the potential pitfalls, and define realistic goals clearly.</p>
<h2>Just because we call something the same thing doesn't mean its the same thing.</h2>
<p><img src="http://www.datamartist.com/wp-content/uploads/2009/12/master-data-management-challenge-are-we-thinking-the-same-thing.jpg" alt="master-data-management-challenge-are-we-thinking-the-same-thing" title="master-data-management-challenge-are-we-thinking-the-same-thing" width="321" height="214" class="alignright size-full wp-image-3702" />I've seen people go into a meeting, have a one hour discussion regarding definitions, come out with a high level list of the metrics and a real belief that they agree on everything.  They tell you "Your data migration is going to be straightforward, we're pretty much on the same page." </p>
<p>Don't be fooled by this.</p>
<p>After 30 minutes of digging into the details in the two different systems, you will realize that there are lots of things they don't agree on.</p>
<p>Why did they think that they agreed?  Because they spoke the same words, but meant completely different things.  I say, a car has four wheels- you nod.  We're done.  But was it a sports car, a sedan, an SUV?  I say car- I see my car, you say car you see your car.  We nod.  Meeting over. </p>
<p>Doesn't work- you have to dig into the details, and put them on the table.</p>
<h2>Details are important, and everyone hates them. Find the detail people and get them together.</h2>
<p>No one likes the details, but the heart of any master data management work is in those nuts and bolts.  Start to dive into the details and most eyes will glaze over.  Find the people who care because the details affect their daily job.</p>
<p>What you need to do is make a "data definitions" working group made up of people from multiple departments.</p>
<ol>
<li>Get at least one detail oriented team member from each department</li>
<li>Make sure they are respected and experienced- you need people who know their stuff.</li>
<li>Make sure it's clear to all departments that the output of this working group will be the data dictionary used by the new system.</li>
<li>Add in one or two data people- someone who can query all the systems involved, and can both answer the detailed questions from the group, but also actually validate concepts as they are created by making prototype extractors and transformations</li>
<li>If you are missing key expertise, bring it in.  Even if its your core business, if you think you're not following industry best practices for your definitions, nomenclature or processes, don't just "keep doing it the way we do it".</li>
<li> take advantage of industry standards for naming and coding, this can sometimes also be a method for avoiding internal battles- we're going to follow the standard, not have a homemade coding system.</li>
</ol>
<p>So now you have to give this working group a goal.</p>
<h2>The solution is not always to have only one definition in the end.</h2>
<p>What?  Isn't that the whole point?  </p>
<p>Yes it is when there really is only one definition, but don't chase after "a single version of the truth" ignoring the reality of what people need, and what metrics are used in the business.  </p>
<p>Just because finance and manufacturing calculate "X" in two different ways does not necessarily mean that you have to stop using one of the definitions.  Both groups might have a perfectly valid reason to calculate it they way they do.  The key is- don't use just one name for it.  Its NOT "X".  </p>
<p>There are two metrics "X1" and "X2" and they have different definitions for different reasons.  Put them both in the dictionary, make it clear what the differences are, and when each measure is used.</p>
<h2>It's not a battle. Everyone is on the same team here.</h2>
<p><img src="http://www.datamartist.com/wp-content/uploads/2009/12/data-dictionary-troubles-getting-finance-and-sales-to-agree.jpg" alt="data-dictionary-troubles-getting-finance-and-sales-to-agree" title="data-dictionary-troubles-getting-finance-and-sales-to-agree" width="371" height="236" class="alignleft size-full wp-image-3704" />People of all sorts often tend to see things in terms of winning and losing.  </p>
<p>You might find people thinking in terms such as "Finances definition for "X" has to win.  We know how it is supposed to be calculated, and we are going to set it right."  Avoid framing the debate as a competition- present it from the start in an inclusive way;</p>
<p>"We need to list all the metrics and definitions involved, as used by all departments- and if there are duplicates we'll consolidate, but overall we want to capture them, and ensure they are available. We can all acknowledge that each department has different needs owing to their functional area."</p>
<h2>Create a central repository for the dictionary- make it public from the start.</h2>
<p>Now, if you have the million dollar master data management (MDM) softward solution chances are it has this functionality and more. If you don't, then you should plan to make an internal web site or equivalent means of providing access.</p>
<p>I won't go into the detail of what an entry in this data dictionary actually consists of, partly because this post is already past my usual length limit, but mostly because there is no hard and fast rule.  You can go anywhere from a basic dictionary (that describes what each metric is in english) all the way to a highly sophisticated meta data management tool that interacts directly with your extract transform and load (ETL) logic actually changing how numbers are transformed in your migration jobs. (I'm not sure I'd go that far, even with that mythical million dollar tool.)  </p>
<p>The bottom line is, be pragmatic- and be public.</p>
<p>In my experience it has been critical to create a published, work in progress dictionary from the start of the process.  The ideal is to have an online database that allows contributors to edit and comment directly. This is important for a number of reasons:</p>
<ol>
<li>It makes the process transparent- everyone can see what the definitions are going to be.</li>
<li>It makes the scope and complexity of the challenge visible.  By having all the definitions on line, everyone can see just how many definitions there are, and how much detail there is.</li>
<li>It tracks versioning and ensures that all the work is captured.  Don't have people with lots of excel files on their local hard drive holding up your progress.</li>
<li>It lets you publicly assign responsibility for each defintion.  <strong>DO NOT have a librarian who is reponsible for all definitions </strong>- you want to have a process owner from each department assigned to each definition.  The ideal number is hard to find and will depend on the company, too few and you don't have buy in, to many and it's not managable.</li>
<li>It allows metrics to be viewed by everyone.  Which department has defined and approved the most definitions?  Who is holding up the effort?  A bit of inter-departmental competitiveness might be useful in this regard.</li>
</ol>
<h2>Define the processes needed to maintain and expand  your data dictionary into the future</h2>
<p>Finally, as you capture the definitions you need to perform the data migration, the ideal is to define and kick off the processes and required organisational groups that will continue to manage the dictionary after the project go live.</p>
<p>In master data management circles this is often called data stewardship and its critical that you have at least a basic formal process.   In the past, when a given chunk of data was only being used by one department, changing how it was defined would be understood by the department, and could be managed somewhat informally.  Now, when the same bit of data might be used accross the company since everyone is in the same system, changing that query or how the value is calculated at the request of one department could create all sorts of issues for others.  There must be a process in place to validate the change, and to communicate it so that everyone in the system can be aware when the numbers shift beneath them.</p>
<h2>In summary, it's really really hard.</h2>
<p>The bottom line is, master data management requires a pragmatic, step by step approach, and is never finished.  Manage everyones expectations, and be very careful as to what you promise your data migration project can achieve in this area. </p>
<p>Very large big bang type projects are probably not a good idea, even if you are armed with the latest and greatest tools.  As <a href="http://en.wikipedia.org/wiki/Master_Data_Management">Wikipedia states</a> in its criticism section:</p>
<blockquote><p>The value and current approaches to MDM have come under criticism due to some parties claiming large costs and low return on investment from major MDM solution providers.</p></blockquote>
<p>Much like massive data warehouse projects, risks are high, particularly if you don't have near fanatical support from the very top.</p>
<p>The best approach is most likely to build step by step, clearly defining what has to be defined for the data migration project at hand, and putting in place realistic processes that can be augmented and enhanced over time as your company begins to "grok" why having a data dictionary is important.</p>
<p>It's a long road, but ignoring master data management in your data migration project adds risks to your projects success, and makes it even harder for future master data efforts to succeed.</p>
<p>Just as a data migration project is an opportunity to affect data quality for the better, it can also be a positive influence on your master data.</p>
<p>Next post- I'm going to wrap up this series with some discussion about how important it is to forge a collaboration between the information technology (IT) department and the various departments within a business, and how IT can play an important role in building bridges through the silos- both data silos and others</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/data-migration-creating-a-data-dictionary-how-to-tackle-master-data-management/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Data migration Part 3- Mapping the legacy systems</title>
		<link>http://www.datamartist.com/data-migration-part-3-mapping-the-legacy-systems-meta-data-and-application-mapping</link>
		<comments>http://www.datamartist.com/data-migration-part-3-mapping-the-legacy-systems-meta-data-and-application-mapping#comments</comments>
		<pubDate>Mon, 14 Dec 2009 18:17:07 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Data migration]]></category>
		<category><![CDATA[Data Modelling]]></category>
		<category><![CDATA[Meta Data]]></category>
		<category><![CDATA[Meta]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=3640</guid>
		<description><![CDATA[This is part three of an ongoing series that's taking a look at data migration projects. In this part we're going to talk about how important it is to know where you are starting from, before you head off on a new application journey. Understanding and mapping your legacy systems is a key success factor [...]]]></description>
			<content:encoded><![CDATA[<p>This is part three of an ongoing series that's taking a look at data migration projects. In this part we're going to talk about how important it is to know where you are starting from, before you head off on a new application journey.  <strong>Understanding and mapping your legacy systems is a key success factor for a data migration project</strong>, but can be a very difficult and time consuming battle.  In this post, I'll talk a bit about some approaches I've found useful in my experience.</p>
<p><img src="http://www.datamartist.com/wp-content/uploads/2009/12/we-might-have-some-undocumented-interfaces-to-consider1.jpg" alt="we-might-have-some-undocumented-interfaces-to-consider" title="we-might-have-some-undocumented-interfaces-to-consider" width="374" height="275" class="alignright size-full wp-image-3656" />If you like, you can start with Part <a href="/data-migration-part-1-introduction-to-the-data-migration-delema">one</a> which was a light hearted introduction to data migration projects in general, and part <a href="/data-migration-part-2-determining-data-quality-is-the-first-key-step">two</a>, where we talked about the importance of data quality.</p>
<blockquote><p>Why are we spending so much time on this? Thats the OLD system- we need to focus on the future!</p></blockquote>
<p>Here are just some of the important things the legacy mapping needs to clarify:</p>
<ol>
<li><strong>Data location</strong>- You can't migrate data if if you don't know what it is and where it is.</li>
<li><strong>Data dependencies to other systems</strong> All processes and interfaces that rely on interfaces to the legacy systems need to be either replaced or shut off.  Often this means that even if the new system is not involved, other systems may stop working because they get data from the legacy systems.  The data migration project is not just about turning on the new system.  The consequences of turning off the old system have to be known and managed.</li>
<li><strong>Legal requirements to keep legacy data available.</strong> Even if data is not migrated to the new system there may be additional data migration requirements into data warehouses or documents that have nothing to do with the new application.</li>
<li><strong>Infrastructure dependencies.</strong> The actual infrastructure that the legacy systems are on might perform other tasks that although not directly related to the legacy system will cause issues when that infrastructure is removed. (For example, someone installed a service of some sort on one of the servers that is used by other applications that are completely unlrelated from a data point of view).</li>
</ol>
<h2>Often the first time the Legacy system is documented is just before it's shut down.</h2>
<p>Despite our best intentions, sometimes documentation doesn't get updated.  This is the reality for many systems, and particularly for legacy systems.  </p>
<p>One of the first steps in a data migration project is to gather all the existing documentation for the legacy systems, and all the systems they talk to, and make sure its accessible to the data migration project team.</p>
<p>It is critical to have tight control over these documents, and to ensure that everyone works off a "live" version- because your mapping is going to update that documentation, and every developer, data modeler and application team member needs to know that they have the best and latest version.</p>
<h2>The application interface diagram.</h2>
<p>Now, the ideal situation is to have a dynamic, self correcting, scanning Configuration Management Database tool (CMDB tool) that already has every scrap of meta data about every application and all its interfaces ready to go. </p>
<p>If you have one of these, good for you, and you can stop reading.</p>
<p>For the rest of us, lets talk practical methods of mapping what we have.</p>
<h3>How to get the data.</h3>
<ol>
<li>Scan the environment- catch the interfaces in the act.
<ul>
<li>Monitor network traffic to detect exchanges between applications.</li>
<li>Scan file systems to find interface files and determine frequency.</li>
<li>Catalog services and activity of those services on servers.</li>
</ul>
</li>
<li>Get out there and talk to people.
<ul>
<li>Ask people-  where is data from this system used?</li>
<li>Look at management reports and trace backwards to find where information is pulled.</li>
<li>Don't assume the interface is direct.  My record discovered is 6 hops from source to the excel sheet used by the CEO, with the information passing through two of the same systems twice.</li>
<li>Hunt down people that were involved in the original installation. Often they'll have key information that can save you time.</li>
</ul>
</li>
<li>Any other way that works.</li>
</ol>
<h2>What to do with it.</h2>
<p>If you don't have a complex tool to do the mapping of all your systems, then one approach that is a step above the "lots of excel sheets and powerpoint slides" approach, is to use a tool like Microsoft Viso.  I've used it successfully to map applications, by having the drawing and the interfaces BE the database.  This ensures that everything in the drawing is on the interface list, and everything on the interface list shows up on the drawing.  </p>
<ol>
<li>Create different objects in Viso, and give them attributes. At a minimum you need an application, interface and database object.</li>
<li>Draw the applications and the interfaces between them in a single large viso drawing, and fill in the attributes in the visio objects.</li>
<li>Make some simple VBA code in the drawing to dump all the data into flat files or excel sheets (or directly to a DB if you get ambitious).</li>
</ol>
<p>  It's simple, but it is far better than having spreadsheets, and a drawing- and then constantly trying to determine if the two agree with each other.</p>
<p>In the ERP project where I used this technique, we identified over 1500 interfaces between hundreds of application instances.  The ERP project was a very large effort with hundreds of project resources, and multiple phased projects implementing a new common system.  The actual original mapping took two people about 3 months to do.  They had to work with about 30 different applications support people to systematically map all the applications, and the interfaces, one by one.</p>
<p>A key part of the job was to actually validate the documentation.  IE if the documentation said there was a chron job that ran a script on server X, actually go to server X and watch it run.  This meant that we could be confident in the map, and make plans based on it.</p>
<p>Everyone on the team used the drawing and lists generated from the drawing to stay on the same page.  And it was a big page- the key is to also have access to a plotter- we were plotting out a pretty good size wall poster by the time we were done.</p>
<p>The ERP teams had the drawing taped up to the wall- and they were making notes right on it and emailing my team.  We would update the master, and publish a new version, along with the generated lists.  </p>
<p>In building this drawing, we found that most of the interfaces were "under" or "un"-documented, and that if documentation did exist, generally it was wrong. By establishing the "official" document for the legacy systems, we focused and coordinated the design effort in a way that would not have happened, if each team just had their own marked up copy of the original documentation or the part that was of interest to them.</p>
<h2>Having the map means you can make the plan</h2>
<p>This drawing and the interfaces mapped were critical in planning the migration.</p>
<ol>
<li>Create different layers in your drawing for each phase "Phase 1", "Phase 2", "Phase 3", or "Feb 2010", "Aug 2010", "Jan 2011" etc.</li>
<li>Hide or show systems and interfaces (including the new applications and interfaces) as they were phased in or out for each layer.</li>
<li>By viewing and printing layers separately, you can see a step by step plan for the migration- with your application architecture and integration map at each phase.</li>
</ol>
<p>This was a powerful tool to both do the planning, and to make sure everyone understood the timing and sequence.  With multiple phases over a three year period, the project needed it, and without such an overall view, such critical planning would have been haphazard.</p>
<p>The challenge with this mapping is to find the right level of detail required.  Not detailed enough and it is wasted effort.  Too detailed and it will consume excessive resources and time.</p>
<h2>A simple approach- What talks to what and what it runs on.</h2>
<p>There are two key aspects to mapping your application architecture.  </p>
<ol>
<li>Functional relationships- applications talking to other applications, with interfaces between them.</li>
<li>Infrastructure relationships - which servers, network connections, services and databases are involved in the functional relationship</li>
</ol>
<p>You can't show both completely on a single drawing- don't try.  Some applications run on multiple servers, many servers run more than one application, data bases are shared by many, interfaces often use common infrastructure such as EAI tools etc.</p>
<p>The approach we took, and it worked well, was to show the functional relationships on the diagram, and hold the physical relationships (which databases were on which servers/clusters and which application ran on which server etc.) in the attributes of the applications. </p>
<p>We did sometimes show some physical attributes on the diagram for easy reading, but only as an annotation- the relationship was done via the attributes in the visio application objects.</p>
<p>This meant that you could ask "What runs on this server?" and could ask "Which servers are involved with this application?" by doing a filter or query on the data.  Very useful things if you are planning to shut down a server.  You make a checklist, and one by one make sure everything is either shutdown, or moved.</p>
<p>Here's a simple example to illustrate what the diagram might look like;<br />
<img src="http://www.datamartist.com/wp-content/uploads/2009/12/application-and-interface-drawing-example.jpg" alt="application-and-interface-drawing-example" title="application-and-interface-drawing-example" width="543" height="335" class="aligncenter size-full wp-image-3662" /></p>
<p>The circles with the numbers were the interfaces, each one had attributes like "To" , "From" and "Method" etc.  The level of detail you go to is a function of how ambitious you are, but at a minimum you need to record the fact that the interface exists.</p>
<p>So in summary:</p>
<ol>
<li>Create a single map of all your applications and interfaces and share it with everyone on the team</li>
<li>Make sure you validate your map carefully, looking into the actual systems, and talking with as many people as needed to ensure you have captured everything</li>
<li>Make a step by step plan for the migration, showing when each application, interface and infrastructure item is phased in or out.</li>
</ol>
<p>Next up- <a href="/data-migration-creating-a-data-dictionary-how-to-tackle-master-data-management">the data dictionary</a> and how do we get everyone to agree on those definitions?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/data-migration-part-3-mapping-the-legacy-systems-meta-data-and-application-mapping/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Meta WHAT?</title>
		<link>http://www.datamartist.com/meta-what</link>
		<comments>http://www.datamartist.com/meta-what#comments</comments>
		<pubDate>Wed, 06 Aug 2008 15:05:17 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Datamartist Tool]]></category>
		<category><![CDATA[Meta Data]]></category>
		<category><![CDATA[Consultants]]></category>
		<category><![CDATA[Meta]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=18</guid>
		<description><![CDATA[In the hallowed halls of any serious data analysis shop, you can expect to hear the phrase "meta data" spoken with reverence-  But what, to put it bluntly IS meta data? The definition I like the best is the dictionary one, Meta Data is Data about Data. Sure its much more complicated than that- Some of [...]]]></description>
			<content:encoded><![CDATA[<p>In the hallowed halls of any serious data analysis shop, you can expect to hear the phrase "meta data" spoken with reverence-  But what, to put it bluntly IS meta data?<img class="alignright" style="float: right;" src="/resources/images/MetaDataGraphicXLS.JPG" alt="Meta Data Graphic" width="300" height="148" /></p>
<p>The <a href="http://www.merriam-webster.com/dictionary/meta%20data" target="_blank">definition</a> I like the best is the dictionary one, Meta Data is Data about Data.</p>
<p>Sure its much more complicated than that- Some of the data about data is all about things only programmers really love- which table, what data format, etc.  But the meta data ( or "Reference Data" as it is sometimes called) that analysts are interested in describes the raw data in business terms.  What are the market segments that we follow, and for each customer which segment are they in? </p>
<p>You might have met consultants recently that talk a lot about meta data- or maybe they call it "Master Data" (yet another name).  They've told you that this is super important stuff, that its affecting your business and your bottom line, and thankfully (for a "reasonable" fee) they can fix you and your enterprises data right up.</p>
<p>Well, it is important.  You might know how many of product X you sold- but if you don't know what product X is, which categories it is in, what other products are similar, complementary or conflicting, its hard to do actual analysis.  So maybe you do need their services- but there is also that report that your Boss wants tomorrow, and the IT department says it can get you by March.  So of course reality sets in, and once again you launch your trusty spreadsheet software.</p>
<p>In a spreadsheet, data is data.  Some of the cells are actually meta data, and some are data (some are both), but generally we just sort of wing it and build what we need.  Sometimes spreadsheet meta data is easy to see- certainly formulas themselves are meta, and certainly anytime you use a VLOOKUP or related function you are referencing common definitions.  But overall its pretty much the wild west. Which means you can change it quick- but also means it sometimes gets a bit, well, messy.</p>
<p>In a formal data mart made by the IT department, meta data (one hopes) is tightly controlled, clearly defined and in its own tables and systems. As a result can often be difficult to change quickly (or cheaply, see "reasonable fee" mentioned above). So things are all squared away- but because of the rate of change they reflect what you asked for 8 months ago.</p>
<p>In the datamartist tool, what I'm building is a middle ground.  The goal is to allow users to manipulate the meta data quickly but within a tool that has more structure than just an empty spreadsheet. (And to let them do it without an advanced degree in data modelling).</p>
<p>By doing this, operations that are time consuming and repetitive in a spreadsheet- deduplication, ability to have multiple rollup paths and views, ability to re-categorise large data sets- can be done very quickly and in a visual drag and drop interface.  And of course then you can export the cleaned, structured data to your spreadsheet to do all the things that spreadsheets do best- all in time for your boss tomorrow, because in the end, she wants her <a title="Office Space" href="http://en.wikipedia.org/wiki/Office_Space" target="_blank">TPS report</a>- not an explanation of why meta data is holding you up.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/meta-what/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

