<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Datamartist.com &#187; Business Intelligence Architecture</title>
	<atom:link href="http://www.datamartist.com/category/bi-architecture/feed" rel="self" type="application/rss+xml" />
	<link>http://www.datamartist.com</link>
	<description>Reduce cost with self serve data transformation</description>
	<lastBuildDate>Wed, 25 Apr 2012 17:28:09 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<item>
		<title>Reduce Business Intelligence cost through better data migration</title>
		<link>http://www.datamartist.com/reduce-business-intelligence-cost-by-keeping-master-data-clean</link>
		<comments>http://www.datamartist.com/reduce-business-intelligence-cost-by-keeping-master-data-clean#comments</comments>
		<pubDate>Tue, 09 Mar 2010 18:49:29 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Cost Reduction]]></category>
		<category><![CDATA[Data migration]]></category>
		<category><![CDATA[Project Management]]></category>
		<category><![CDATA[Business Intelligence]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=4390</guid>
		<description><![CDATA[Managing Business Intelligence cost is not an easy task. But poorly or inconsistently structured data can make the task even harder. Unfortunately, a lazy data migration project can generate all sorts of headaches that will cause your Business Intelligence cost to explode. Of course, bad data quality also has many other costs and risks associated [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.datamartist.com/wp-content/uploads/2010/03/tell-the-ceo-forget-the-merger-data-is-read-only.jpg"><img src="http://www.datamartist.com/wp-content/uploads/2010/03/tell-the-ceo-forget-the-merger-data-is-read-only.jpg" alt="" title="tell-the-ceo-forget-the-merger-data-is-read-only" width="363" height="209" class="alignright size-full wp-image-4400" /></a>Managing Business Intelligence cost is not an easy task.  But poorly or inconsistently structured data can make the task even harder.  Unfortunately, a lazy data migration project can generate all sorts of headaches that will cause your Business Intelligence cost to explode.  Of course, bad data quality also has many other costs and risks associated with it in its own right, but I'm going to focus in on business intelligence today.  </p>
<p>The majority of the development cost in the current business intelligence methodology is often in getting the data out of source systems (Extract), and transforming it to make it consistent across all the various dimensions needed (Transform) and then putting it in a model that is easy to query and analyse (Load).  The creation of these ETL jobs is made dramatically harder if the data in the source systems is not consistent. </p>
<h2>Change is the challenge</h2>
<p>Companies are not static-  they grow, diversify, change strategies, reorganize, rename and restructure.  They acquire other companies or are acquired. The structure and content of the data their systems often tells you this story, and if the proper work is not done to keep the data consistent with itself and the new situation then this story will be painful and complex.</p>
<blockquote><p>Remember ten years ago when we acquired company X, but decided not to change their customer codes to our standard, so all the codes had an "X" prefixed so that we wouldn't have duplicates?  Well, those X's are still there, and all our queries have to deal with multiple code structures.</p></blockquote>
<blockquote><p>Remember how we used to have three independent databases, one for each region, then when we went to the new data center and put everything into a single database, we ended up with multiple schemas and all those crazy views rather than consolidating into a single instance?</p></blockquote>
<p>When the data migration project made the decision to reduce the project cost by not addressing data consistency, they simply pushed this cost in the future, most likely turning a one time expense into an ongoing and expanding annual business intelligence cost.</p>
<p>You end up with crazy ETL jobs that parse the same field in different ways depending on the date of the transaction, or on other fields-  "If the transaction is before 2002, then the first digit of the product code means X, otherwise it means Y, unless of course its from the western division, who do it differently so then you need to look at field A and use the CASE statement..."</p>
<h2>Reduce Business Intelligence cost through data cleanup</h2>
<p>If your data is cleaner you'll reduce business intelligence cost across your entire BI architecture.</p>
<ul>
<li>Reduce ETL and report development cost- both initial, and the cost of ongoing maintenance.  Every change request will take more time if all the models are complex due to underlying data complexity.</li>
<li>Reduce hardware costs- complex queries require more processing, and bigger servers to meet that nightly load window</li>
<li>Reduce time spent reconciling numbers. Complex ETL means that chances are business intelligence reports don't match up easily with the operational reports from the source systems.  People will spend time constantly double checking these discrepancies, and it will undermine confidence in all data.</li>
</ul>
<h2>Fix the problem at the source.  Not in the Business Intelligence.</h2>
<p><a href="http://www.datamartist.com/wp-content/uploads/2010/03/lazy-data-migration-get-jackets-business-intelligence-pays-the-bill.jpg"><img src="http://www.datamartist.com/wp-content/uploads/2010/03/lazy-data-migration-get-jackets-business-intelligence-pays-the-bill.jpg" alt="" title="lazy-data-migration-get-jackets-business-intelligence-pays-the-bill" width="420" height="285" class="alignright size-full wp-image-4396" /></a>Business intelligence is far too often left to fix all the issues in the source systems- and then becomes the focus of dissatisfaction when costs and delays become unacceptable.  </p>
<p>I've heard people argue "Thats what ETL is for right?  Why are you complaining?"  </p>
<p>Assuming that the ETL will fix the sins of the source system is an inefficient and costly strategy.</p>
<p>Everything is a balance, perfection does not exist, but when deciding what to fix and what to leave, don't let a lazy data migration project saddle you with years of business intelligence costs- when it's time to bulk load data into the system, make it as right as you can.  </p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/reduce-business-intelligence-cost-by-keeping-master-data-clean/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Let&#8217;s admit it- centralized business intelligence alone just doesn&#8217;t work</title>
		<link>http://www.datamartist.com/centralized-business-intelligence-alone-does-not-work</link>
		<comments>http://www.datamartist.com/centralized-business-intelligence-alone-does-not-work#comments</comments>
		<pubDate>Wed, 03 Mar 2010 21:10:27 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Business Intelligence Architecture]]></category>
		<category><![CDATA[Meta Data]]></category>
		<category><![CDATA[Reality Check]]></category>
		<category><![CDATA[Business Intelligence trends]]></category>
		<category><![CDATA[Business Intelligence Workspace]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=4342</guid>
		<description><![CDATA[One version of the truth. Data warehouses. Centralized business intelligence teams. This has been the best practice for business intelligence for the last two decades. Users taking the initiative with data has been seen as the enemy of a successful business intelligence program. This needs to change. In a world of ever increasing data volumes [...]]]></description>
			<content:encoded><![CDATA[<p>One version of the truth.  Data warehouses.  Centralized business intelligence teams.  This has been the best practice for business intelligence for the last two decades.  </p>
<p>Users taking the initiative with data has been seen as the enemy of a successful business intelligence program.  </p>
<p>This needs to change.  In a world of ever increasing data volumes and complexity, faster business processes and more data savvy knowledge workers, a purely centralized solution is doomed to fail.</p>
<p>A consensus is starting form that the best architecture is one that blends centralized with more distributed and (gasp) free form, user guided methods.  In fact, when we look at what actually exists in most enterprises and take into account the unofficial shadow systems, we're already there, but in two separate camps that aren't talking. </p>
<p>The amount of freedom to allow ranges from letting the users have at it, to opening up the possibility of <a href="http://tdwi.org/blogs/wayneeckerson/2010/02/zen-bi-and-the-wisdom-of-letting--go.aspx" target="_blank">departmental data marts</a>, but the buzz out of TDWI clearly indicates a growing acknowledgement that a rigid top down architecture is not tenable.</p>
<p>What are Oracle, IBM, Microsoft SAP and SAS (who own more than 70% of the Business intelligence market share) advising as being the right approach?</p>
<p>They advocate big architectures, centralized meta data management, big databases, lots of command and control. They talk about "self serve"- but they mean to existing reports or report interfaces. To be fair, they need to sell the tools they have.</p>
<p>For a refreshing change from this, I very much enjoyed reading <a href="http://events.tdwi.org/Events/Las-Vegas-World-Conference-2010/Sessions/Thursday/Keynote-Stop-Paving-the-Cowpath.aspx" target="_blank">Mark Madsens keynote at TDWI</a> "Stop paving the cow path".  </p>
<p>We enjoy reading things that we agree with, and I nodded my way through his slide deck.</p>
<p>In his presentation, Madsen points out that centralization won't work, because it:</p>
<ul>
<li>Creates bottlenecks</li>
<li>Causes scale problems</li>
<li>Enforces a single model</li>
</ul>
<h2>Bottlenecks and Scale</h2>
<p><a href="http://www.datamartist.com/wp-content/uploads/2010/03/data-warehouse-super-popular-or-big-backlog.jpg"><img src="http://www.datamartist.com/wp-content/uploads/2010/03/data-warehouse-super-popular-or-big-backlog.jpg" alt="" title="data-warehouse-super-popular-or-big-backlog" width="377" height="275" class="alignright size-full wp-image-4363" /></a>In a centralized system, all requests go into the queue, and the backlog starts piling up. </p>
<p>The size of the department/team that is responsible for making it all work becomes the number one bottleneck. </p>
<p>Are there enough people able to prioritize and analyse the payback on analysis requests? Because in a centralized organisation, the gatekeepers are necessary, and how do they KNOW which requests are the good ones?  How does anyone really know?</p>
<p>I'm not sure any company can afford to staff a centralized data warehouse team to be able to handle all the requests as they are generated. Prioritization therefore becomes a single point of failure.  Get it wrong, and it can be all wrong.  In a more distributed structure, decisions are made at multiple points, some good, some bad, but diversity will often bring more innovative and experimental behavior, resulting in new avenues of analysis that a overly static central team might avoid.</p>
<p>For an indication as to how well users think the central team is listening to them, take a look at how many excel spreadsheets there are around, and how many shadow systems grow like mushrooms throughout the standard enterprise.  People think their analysis is important, and even if IT won't or can't they find a way to try to get it done.</p>
<p><a href="http://www.datamartist.com/wp-content/uploads/2010/03/data-warehouse-not-used-convert-storage-for-spreadsheets.jpg"><img src="http://www.datamartist.com/wp-content/uploads/2010/03/data-warehouse-not-used-convert-storage-for-spreadsheets.jpg" alt="" title="data-warehouse-not-used-convert-storage-for-spreadsheets" width="373" height="271" class="alignleft size-full wp-image-4364" /></a>In terms of scaling, I can hear the technical types starting to explain about how their servers, infrastructure and approach scales- diagrams and MPP theories pulled out with pride.  "Centralizing lets it be scalable- what are you talking about?"</p>
<p>Maybe. But there are traps here too- centralized organisations always want to put everything in one database.  Having everything in a single repository starts to become the goal- not the cost efficient analysis of the right data.  Not centralizing is very scalable- stand alone machines can just be added for ever.</p>
<p>It may in fact be that data can remain distributed and diverse at certain levels of detail, and more federated approaches can be used, resulting in cheaper hardware and software, and more importantly avoiding a lot of really hard master data management work.  Consolidation can sometimes happen at summary levels that make sense from a business point of view- not just blindly following the "one version" mantra.</p>
<h2>Enforcing a single model</h2>
<p>Isn't having a single data model good?  We've been told that it is.  In a way, this is the holy grail.  </p>
<p>But is there a single, correct, slowly changing model that satisfies everyone in an organisation?  </p>
<p>Why do I say slowly changing?  Because if there is only one for the entire enterprise, it will change slowly, if at all.  </p>
<p>Even if you happen to understand what the right model is, (and by model I mean data model, analysis model, process model, any model) and you manage to implement it while its still the right model, in a year its not going to be the right one.  And a centralized, high cost, committed architecture won't and can't adapt.  You'll still be paying the mortgage on the data warehouse.</p>
<p>Very large centralized models cannot be comprehensive and up to date, because to be comprehensive they have to be so complex as to be difficult to change, and as a result they quickly become out of date.  It's sort of a Heisenberg uncertainty principle for common meta data repositories.</p>
<h2>"Giving people their flying cars"</h2>
<p>Madsen of course doesn't solve the entire problem in his keynote, but he points out some directions that make sense.  And his graphic depicting a happy couple blasting off in their very locally controlled flying car sends the message- users can do their analysis without central oversight or interaction. (Although, one would imagine that some sort of air traffic control would be necessary, and the refueling stations for the cars would probably be run centrally- we're not advocating anarchy here.)</p>
<p>Having built data warehouses, established a data warehouse competency center, and provided business intelligence services for thousands of users, I can testify from first hand experience that centralizing alone is just not going to work.  People who worked with me a decade ago will remember the significant amount of time spent creating meta data repositories.  Are they still needed?  Yes.  But they simply can't do everything.  Use them with care, and be wary of your ambition for them.</p>
<p>First, accept the fact that users are not mindless consumers.  Learn from the fact that they use excel constantly, and they don't just read reports- they build things, adding data, fixing data, re-organizing data.  They think.  Give them tools that include them as part of the data processing.</p>
<p>Business intelligence cannot not be solely a process where formal requirements are gathered, followed by a publishing exercise of delivering the reports on time.</p>
<p>Are there some reports where this is the case? Sure.  Monthly management reports and dashboards shouldn't change every month.  The model can work for some amount of the delivered data analysis.  </p>
<p>The entire architecture isn't getting ripped out- but if the new architecture is successful in bringing the pent up demand that is currently being satisfied by shadow systems into the light, then distributed, user centric, user driven business intelligence will become a significant percentage of the total.</p>
<p>But the old way of thinking has to change.  Don't "Crack down on shadow systems".  </p>
<p>Find a way to provide better service, be it self, assisted or centralized service that makes the shadow systems simply a less effective way to do it.</p>
<p>The existence of shadow systems, and the extent of them, is the clearest argument that centralized business intelligence alone is simply not up to the task.</p>
<p>Once you have people doing whatever they want in the self directed part of your architecture, DO watch what they are doing- not to control it, but to learn from it.  Everyone constantly re-structuring the customer dimension?  Obviously it's time for an update.  By watching what users edit, what gaps they fill in, you can find the data quality issues, identify the fuel to put on the self directed fire.</p>
<p>Tools like Lyzasoft, <a href="/">our own Datamartist tool</a>, and Microsoft's Power Pivot in Excel 2010 and others are all going to drive power to the users, and introduce a new balanced approach between centralized and local parts of business intelligence architectures.  Visualization tools like Tableau will further give people the ability to create powerful, consumable analysis in a self serve mode.</p>
<p>Will there be challenges with data quality, risk management and wasted time doing pointless analysis? Most likely.  </p>
<p>Will the information we gather and the payoff from the successful bottom up analysis efforts make it hugely valuable overall? I for one think so.</p>
<p>We need to learn to trust our colleagues with the data, while at the same time managing the reality of data quality and risk of errors that more free form techniques can create.</p>
<p>Companies that include both top down and bottom up capabilities in their architecture will stop wasting time fighting internally, and start to take advantage of all that data.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/centralized-business-intelligence-alone-does-not-work/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Estimating the cost of Business Intelligence</title>
		<link>http://www.datamartist.com/estimating-the-cost-of-business-intelligence</link>
		<comments>http://www.datamartist.com/estimating-the-cost-of-business-intelligence#comments</comments>
		<pubDate>Mon, 08 Feb 2010 22:27:49 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Business Intelligence Architecture]]></category>
		<category><![CDATA[Reality Check]]></category>
		<category><![CDATA[Business Intelligence]]></category>
		<category><![CDATA[Cost Reduction]]></category>
		<category><![CDATA[Forrester Research]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=3975</guid>
		<description><![CDATA[How much does a single Business Intelligence report cost a company? Well, obviously there is no single answer- but Boris Evelson of Forrester took a shot at it recently in a blog post. Even when it's not an easy question, it is worth pursuing, and Boris lays out a useful discussion. $150 000 is the [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://www.datamartist.com/wp-content/uploads/2010/02/is-it-the-last-truck-load-of-money-for-the-data-warehouse.jpg" alt="is-it-the-last-truck-load-of-money-for-the-data-warehouse" title="is-it-the-last-truck-load-of-money-for-the-data-warehouse" width="423" height="287" class="alignright size-full wp-image-4044" />How much does a single Business Intelligence report cost a company?  Well, obviously there is no single answer- but Boris Evelson of Forrester took a shot at it recently <a href="http://blogs.forrester.com/business_process/2010/01/bottom-up-and-top-down-approaches-to-estimating-cost-for-a-single-bi-report.html" target="_blank">in a blog post</a>.  Even when it's not an easy question, it is worth pursuing, and Boris lays out a useful discussion.</p>
<ul>
<li> $150 000   is the AVERAGE cost of business intelligence software for a DEPARTMENT</li>
<li> ETL software (Extract transform and load) is also $150 000 on average.</li>
</ul>
<p>And the rule of thumb for cost of effort and services is <strong>5 times the software cost</strong></p>
<p>I'm not making this up. Check the link.</p>
<p>In the end, Boris suggests that the cost of a single, fairly straight forward report might be <bold>$20,000.</bold>  Of course as he rightly points out there are lots of variables, and it's a classic case of "it depends",  but even so- clearly you want to be sure the reports add value when you are using a process that requires that kind of investment.</p>
<p>Boris mentions in passing that the cost of a single day of an external developer he uses for estimating is $800 USD.  You can buy two licenses of Datamartist and take a friend out for dinner for that.</p>
<p>Don't get me wrong- for a number of applications you need the big enterprise stuff- but in my mind it makes sense to avoid it when you can.  Enterprise business intelligence has its place, but there are alternatives.  The rampant use of Excel spreadsheets is evidence of the fact there is huge demand for data out there.  <a href="/downloads">Try Datamartist</a> and find another even more powerful way to get a the data for those cases where you need to do more than a spreadsheet, but it's not time to kick off a data warehouse project.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/estimating-the-cost-of-business-intelligence/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Spreadsheet errors- Fear, uncertainty and doubt</title>
		<link>http://www.datamartist.com/spreadsheet-risk-and-errors-fear-uncertainty-and-doubt</link>
		<comments>http://www.datamartist.com/spreadsheet-risk-and-errors-fear-uncertainty-and-doubt#comments</comments>
		<pubDate>Mon, 11 Jan 2010 18:54:46 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Business Intelligence Architecture]]></category>
		<category><![CDATA[data culture]]></category>
		<category><![CDATA[Data Quality]]></category>
		<category><![CDATA[MS Excel]]></category>
		<category><![CDATA[Reality Check]]></category>
		<category><![CDATA[Business Intelligence trends]]></category>
		<category><![CDATA[Excel]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=3831</guid>
		<description><![CDATA[I love the acronym FUD which stands for "Fear, uncertainty and doubt". What I don't love is the underhanded use of FUD to manipulate peoples behavior. Spreading FUD is not about creating something new, but destroying- destroying someones confidence in something, clouding the real issue, stopping a new or creative direction from being taken. FUD [...]]]></description>
			<content:encoded><![CDATA[<p>I love the acronym FUD which stands for "Fear, uncertainty and doubt".  What I don't love is the underhanded use of FUD to manipulate peoples behavior.  Spreading FUD is not about creating something new, but destroying- destroying someones confidence in something, clouding the real issue, stopping a new or creative direction from being taken.  FUD is often used to block reform and change because FUD can cause people to do nothing- and doing nothing is good for the incumbent.</p>
<p>In the data analysis realm, spreadsheet errors are often used to try to dissuade companies from letting their people "work with the data directly".  Software vendors of all sizes, but particularly the really big ones (those incumbants) spread FUD because if they can stop people from getting at the data themselves, it increases the chance of companies buying some more business intelligence suites.</p>
<p>The argument goes something like this:</p>
<blockquote><p>Spreadsheets have been shown to be plagued with errors, many studies showing error rates above 90%.  You need to reduce the risk that spreadsheets are creating in your organization by establishing formal, documented processes that are created an managed by professionals using sophisticated tools.</p></blockquote>
<p>Then the usual nightmare scenarios are brought out, all involving rabid Auditors, Sarbane-Oxley, governance failures etc.</p>
<p><img src="http://www.datamartist.com/wp-content/uploads/2010/01/accidently-put-last-years-spreadsheet-number-into-annual-report1.jpg" alt="accidently-put-last-years-spreadsheet-number-into-annual-report" title="accidently-put-last-years-spreadsheet-number-into-annual-report" width="341" height="226" class="alignright size-full wp-image-3839" />Now, don't get me wrong, spreadsheet errors are a very real and serious problem, and there are all sorts of data applications that should never be done in Excel or other ad-hoc, user driven tools. Ever.  Formal documented processes are critically important, and there are lots of places where you better be using the right tools and professionals.  </p>
<p>I have seen the culture of the spreadsheet completely undermine initiatives that would have driven better data quality, data analysis and business processes.  The spreadsheet certainly has its dark side.</p>
<p>But the problem is that FUD paints with a broad brush.  People take it as "Spreadsheets with data in them? Bad news. Don't do it.  Individuals able to get at the data, and quickly transform it, analyze it?  Who knows what they'll do- shut them down!"</p>
<p>Sadly, from a data quality point of view, sometimes the spreadsheets have the BEST data quality- because people have fixed the issues they can't fix in the transactional system due to constraints or IT department delays.</p>
<h2>Encourage positive change with reasonable controls.</h2>
<p>Intelligent, responsible people should be encouraged to use "informal" methods and tools to do data analysis.  </p>
<p>These people will find things, learn things, and drive positive change (including change in those big formal professional systems).  </p>
<p>They should do it with a reasonable understanding that doing things in an informal way, with spreadsheets or other tools does introduce errors, and should consider this when they recommend taking action based on the results. </p>
<h2>Balance between two extremes </h2>
<p><strong>The totalitarian state:</strong> I don't think there is an  IT department in the world that is capable of stopping all unofficial data analysis.  In fact, I would suggest that the moment such an IT department comes into existence, it would kill the host company, a harsh sort of self-regulation.  People interested in data and thinking for themselves would just pack up and leave. So who would be left making the decisions and based on what?</p>
<p><strong>The twisted web of spreadsheets:</strong> Companies that allow an anything goes, visual basic code, macros and manual cut and paste direct to the annual report environment are not going to be long for the world either.  They populate the horror story pages on <a href="http://www.eusprig.org/horror-stories.htm" target="_blank">the spreadsheet risk websites.</a></p>
<h2>The zone of win.</h2>
<p>You want to be somewhere between insane spreadsheet addiction and strict formal big tool paralysis.  </p>
<p>I submit that companies that balance risk while still encouraging their smart people to "play" with the data and do analysis in new and interesting ways with new tools are going to win.</p>
<p>Again, don't let this process generate your profit and loss statement- understand where and what the informal discovery process is for- but do let it discover things.  If it discovers something interesting you'll have the chance to check for the errors.  Make sure its part of the process to do so.</p>
<p>By letting the FUD get you down, you'll never get that far and who knows what insights you might be giving up?</p>
<p>Of course,  we believe you should go even further and give those intelligent, responsible people new tools that are less error prone than spreadsheets but still provide as much or even greater flexibility.  That's why we're building Datamartist after all.</p>
<p>Openness, balance, and clear minded pragmatism will get you further than FUD every time.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/spreadsheet-risk-and-errors-fear-uncertainty-and-doubt/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Data migration- Part 1 Introduction to the data migration dilema</title>
		<link>http://www.datamartist.com/data-migration-part-1-introduction-to-the-data-migration-delema</link>
		<comments>http://www.datamartist.com/data-migration-part-1-introduction-to-the-data-migration-delema#comments</comments>
		<pubDate>Tue, 01 Dec 2009 15:12:37 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Business Intelligence Architecture]]></category>
		<category><![CDATA[Data migration]]></category>
		<category><![CDATA[Data Quality]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=3477</guid>
		<description><![CDATA[The world is a dynamic place. Businesses change. Companies merge, technologies shift and applications come and go. Our data, however, is often less dynamic. All those sales records in the old sales system don't morph into the format required in the shiny new ERP. The data has to be dragged kicking and screaming into the [...]]]></description>
			<content:encoded><![CDATA[<p>The world is a dynamic place.  Businesses change.  Companies merge, technologies shift and applications come and go.  </p>
<p>Our data, however, is often less dynamic.  All those sales records in the old sales system don't morph into the format required in the shiny new ERP.  The data has to be dragged kicking and screaming into the new format.<br />
<img src="http://www.datamartist.com/wp-content/uploads/2009/11/data-migration-get-the-hammer.jpg" alt="data-migration-get-the-hammer" title="data-migration-get-the-hammer" width="374" height="225" class="alignright size-full wp-image-3498" /><br />
Thus, the art and science of data migration is one of the most challenging, and most important, of the information technology black arts.  In the next few posts, I'm going to take a light hearted, but hopefully useful look at data migration.</p>
<p>So imagine you are in a company that, for whatever important, unavoidable, and "it was super clear at the time" reason is going to change its sales system.  The new system is just so much better- as the VP of marketing points out:</p>
<blockquote style="font-size: 14px;"><p> "The new sales system will drive synergies and encompass our core strategy for value creation and customer focus." </p></blockquote>
<p>Oh good.</p>
<h2>What are the possible strategies in regards to moving your data?</h2>
<h3>Abandon your data</h3>
<p>Sure, just shut down the old server, send the hard drive to the recycler, turn on the new system empty, and wait by the phone for the next order. <img src="http://www.datamartist.com/wp-content/uploads/2009/11/good-news-no-performance-issues1.jpg" alt="good-news-no-performance-issues" title="good-news-no-performance-issues" width="288" height="194" class="alignright size-full wp-image-3514" /></p>
<p> "Hello, Acme lots of products, can I help you?"<br />
"You'd like to place an order, ok, can I get your company name and address please."<br />
"Yes, I know you've got a customer number, but we don't use those any more, they're from the old system.  We have a new one now."<br />
"You want product 23432-  that's an old code, let me try to find the new one, just a sec..."</p>
<p>Probably not going to work out so well.</p>
<h3>Enter all your data by hand over the weekend.</h3>
<p>Depending on the amount of data, you might actually be tempted to do this.  Sit a bunch of people in front of a bunch of computers, give each of them a stack of reports from the old system, and show them how to enter the data into the new system.  Buy them pizza.  Look really stressed as 6am Monday morning approaches.</p>
<p>The "advantages" of this approach:</p>
<ul>
<li>Because the new system thinks that you aquired every one of your existing customers over a two day period your companys growth rate looks fantastic.</li>
<li>You get written up in computer science journals as a case study on the error rate for manually keyed data.  The researchers are particularly excited about the clearly defined ramp affect built in by the progressive sleep deprivation of the data entry clerks.</li>
<li>You get lots of practice calculating customer abandonment metrics as  over the next six months data errors cause undelivered invoices, mis-routed orders and incorrect product configurations.  You get excellent data on what makes customers most angry, and drives them to switch to your competitors most quickly.</li>
<li>Even though the new sales system doesn't work out so well, the reduction in transactions makes it possible to save lots of money by reducing the number of servers and the amount of storage needed in the data center.</li>
</ul>
<h3>Or, you could launch a data migration project</h3>
<p>Yep. Probably what you should do.  But how?</p>
<h2>Seriously, what are some things to consider in a data migration project?</h2>
<p>Over the next few posts, I'm going to continue on the theme of a sales system migration, and talk about strategies that can help get you through your project, and the kinds of "Gotchas" that can spring up.  Every data migration project is different, but hopefully I'll shed some light on the process, and share some of the experience I've had.</p>
<p>For me, the number one thing to think about in a data migration project is the impact on your customers.</p>
<p>Remember those customer people?  The ones that give us money and expect something good in return?  Turns out they're the cornerstone of your business. </p>
<h3>Minimize the impact on your customers</h3>
<p>Now, I'm not saying that you shouldn't consider other stakeholders too, and there is always political and financial constraints to any large project, but if it comes down to a choice between making life difficult for someone inside your company, vs a customer- well, someone has to take it for the team, and the customer isn't the right choice.</p>
<p>A badly executed migration project can cause enough issues for customers to push those that were wavering to the competition.  A really badly executed one can push loyal customers away.</p>
<p>Depending on how cut throat your industry is, it could cost you significant marketshare, maybe even end up being the start of the end.</p>
<p>On the other hand, a well executed migration to a system that not only cuts your costs but provides real value to the customer could be a key part of your competitive advantage. </p>
<p><a href="/data-migration-part-2-determining-data-quality-is-the-first-key-step">Next post</a>:  A data migration project is never just a data migration project-  it's a data quality project too.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/data-migration-part-1-introduction-to-the-data-migration-delema/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Data to the people- why self serve ETL</title>
		<link>http://www.datamartist.com/data-to-the-people-why-self-serve-etl</link>
		<comments>http://www.datamartist.com/data-to-the-people-why-self-serve-etl#comments</comments>
		<pubDate>Tue, 21 Jul 2009 17:11:37 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Business Intelligence Architecture]]></category>
		<category><![CDATA[Data Transformation]]></category>
		<category><![CDATA[ETL]]></category>
		<category><![CDATA[Analyst tools]]></category>
		<category><![CDATA[Business Intelligence trends]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=2866</guid>
		<description><![CDATA[As regular readers of this blog know, I believe in a balance between formal and informal data analysis tools. I believe in an approach that firmly places people in the center of a new way of looking at the data analysis process. In the past, “big business intelligence” created an infrastructure heavy, highly centralised and [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://www.datamartist.com/wp-content/uploads/2009/07/you-have-used-unautorised-data-transformation.jpg" alt="you-have-used-unautorised-data-transformation" title="you-have-used-unautorised-data-transformation" width="403" height="354" class="alignright size-full wp-image-2887" />As regular readers of this blog know, I believe in a balance between formal and informal data analysis tools.</p>
<p>I believe in an approach that firmly places people in the center of a new way of looking at the data analysis process.</p>
<p>In the past, “big business intelligence” created an infrastructure heavy, highly centralised and technology focused approach to getting data from source systems into reports in the hands of the users.  Under this regime, users were not to be trusted with raw data, but were given tightly controlled, managed and aggregated reports in order to protect the “single version of the truth”.</p>
<blockquote><p>The theory and practice were tightly defined, and had been honed over decades of business intelligence and data warehouse orthodoxy.   Giving raw data to end users would lead to chaos. Letting end users define new ways to look at the data would corrupt the master data, and lead to everyone looking at something different.</p></blockquote>
<p>You can guess the  <a href="http://datadoodle.com/2009/07/16/just-give-me-the-data/" target="_blank">sort of response</a> this "don't give them the raw data" approach gets from capable, curious people that want to get down to some real analysis.  </p>
<p>But to be fair you can see why these concerns are thought to be well founded.  Almost every large enterprise is awash in a sea of excel files and a tangle of links and formulas.  Excel is a wonderful tool, but it only offers the illusion of solving the data transformation problem.  It is a much better reporting/dashboard tool than an ETL. (Although in the right hands it can do remarkable things.)</p>
<p>And this is the true state of affairs now.  When the “official” system does not provide the answers that the business needs the people who need to make decisions get the data anyway, and they do it themselves. They do it in excel, they take night courses in Structured Query Language (SQL) they hire consultants (or even summer students) to build rogue data bases that they run on servers hidden under desks to get at the answers they need.</p>
<p>It is easy for the data warehouse theorists to highlight the clear issues with "spreadmarts" and "shadow systems".  </p>
<p>But we need to be pragmatic. The reality of building a centralized structure that imposes strict formal rules and change management processes is that often while it does ensure that there is only one version of the truth,  it is a version of the truth that no one can use because it has been so formalized, aggregated,  compromised and delayed that by the time it is delivered the pressing business questions have changed and meaning has been expunged.  The data warehouse becomes reporting rather than analysis.</p>
<p>Its clear that enterprises need this kind of reporting- I'm not advocating abandoning the existing approach- but augmenting it.  Up till now, the solution has often been "more of the same".</p>
<blockquote><p>The regime decided that the solution was to add more technology to the central systems, increase enforcement, and search out and repress all the dissident data manipulators.  The data resistance was forced to go underground, to hide their spreadsheets, to outwardly appear to be following the official line.</p></blockquote>
<p>It is very true that there are some risks in allowing people to analyze their own data, but there is also a reward.  There are a small group of people who love data, who understand the business questions, who work to tease insight out of a steaming pile of raw data and can find things that are game changing.  Massive, formal, designed by committee data warehouses can deliver a powerful and useful view of things, but they rarely offer flashes of insight.  When they do, it is often during the design and discovery process- rarely by users using the system after it has gone live.</p>
<p>The <a href="/product">Datamartist tool</a> has been built based on the belief that both formal, centralized systems AND local, personal data transformation have a place in the architecture and that both should be official places.</p>
<p>People can be trusted with the data.  In fact I think for an organisation to truly be successful at mastering its information, they have to be.</p>
<p>We have to realize that we can't allow our obsession with the quest for a single version of the truth to turn us into totalitarian regimes, certain that OUR truth is THE truth, and that messing around with the data is by its very nature subversive and dangerous.</p>
<p>Data to the people.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/data-to-the-people-why-self-serve-etl/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Wolfram Alpha- Dimensional Generator?</title>
		<link>http://www.datamartist.com/wolfram-alpha-dimensional-generator</link>
		<comments>http://www.datamartist.com/wolfram-alpha-dimensional-generator#comments</comments>
		<pubDate>Fri, 10 Apr 2009 15:56:56 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Business Intelligence Architecture]]></category>
		<category><![CDATA[Dimension Tables]]></category>
		<category><![CDATA[web services]]></category>

		<guid isPermaLink="false">http://www.datamartgenerator.com/?p=1568</guid>
		<description><![CDATA[Wolfram research is always doing some interesting things- and now they are aiming at providing an answer machine- they are calling it Wolfram Alpha. Its not a search engine that returns documents related to the inputed search terms- but something that computes answers to a question looking for a factual answer. This is interesting, because [...]]]></description>
			<content:encoded><![CDATA[<p><img src="/wp-content/uploads/2009/04/ask-wolfram-alpha-if-its-sure1.jpg" alt="ask-wolfram-alpha-if-its-sure1" title="ask-wolfram-alpha-if-its-sure1" width="300" height="258" class="alignright size-full wp-image-1541" />Wolfram research is always doing some interesting things- and now they are aiming at providing an answer machine- they are calling it <a href="http://blog.wolfram.com/2009/03/05/wolframalpha-is-coming/" target= "_blank">Wolfram Alpha</a>.  Its not a search engine that returns documents related to the inputed search terms- but something that computes answers to a question looking for a factual answer.</p>
<p>This is interesting, because often when we are searching for something via, say, Google, we are actually looking for an answer.  We do it in two steps-  "Country population list"-  which gets us the document, and then we look up the countries we are interested in.</p>
<p>Unfortunately, the way Wolfram Alpha was launched and the way the media and observers in general tended to react has created a fair amount of hype, and misconception.  Although Wolfram Alpha (lets call it WA) will have a natural language interface, people always get carried away with their expectations for such things.  I'm certain that WA will be impressive, but I'm equally pretty sure that you won't be able to say "Roughly how many people like to have peanut butter on their toast in Ohio" and get a reasonable answer.</p>
<p>In a <a href="http://www.hplusmagazine.com/articles/ai/wolframalpha-searching-truth" target="_blank">recent interview</a> with Rudy Rucker, Stephen Wolfram said that rather than sell WA to the search engines, “We’d rather look for things like partnerships or licensing deals or APIs.  I see a new field of knowledge-based computing.  Imagine a spread sheet that can pull in knowledge about the entries.”</p>
<p>Now this <em>is really</em> interesting. What if it has a way to ask questions like "what is the GDP of [country]" just like that?  What if it can tell you the population of any given Zip Code?  What if it knows the rate of income change by county? What if it can tell you for any geo-location/date if it was a statutory holiday or not on that day?  These are things that could be very very useful in doing data analysis- and are the kinds of things that can build interesting real world dimensional tables in a data warehouse or data mart.</p>
<p>There are of course, various sources to get this information now- but if one, massive, super flexible, broad "answer engine" existed- this might be a real boon to business intelligence practitioners.</p>
<p>Imagine generating a dimensional table by accessing the web service and enriching what your business users can analyze- and knowing that the values are as up to date and as accurate as the brain trust at Wolfram Research can make them.</p>
<p>Although there is some question as to if the API will really be a focus its clear that there is some <a href="http://thenoisychannel.com/2009/03/31/wolfram-alpha-first-hand-impressions/" target="_blank">interest in business applications</a>.  Its just not clear if <a href="http://thenoisychannel.com/2009/04/06/wolfram-talks-about-wolfram-alpha/" target="_blank">Wolfram research shares this vision</a>.</p>
<p>Although the masses that are expecting to have a conversation with HAL will be disappointed, there might be a new resource in the world for dimension building for data warehousing- I will be following this with interest.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/wolfram-alpha-dimensional-generator/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Spreadmarts and Data Shadow Systems- The Debate</title>
		<link>http://www.datamartist.com/spreadmarts-and-data-shadow-systems-the-debate</link>
		<comments>http://www.datamartist.com/spreadmarts-and-data-shadow-systems-the-debate#comments</comments>
		<pubDate>Wed, 18 Feb 2009 01:13:28 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Business Intelligence Architecture]]></category>
		<category><![CDATA[Cost Reduction]]></category>
		<category><![CDATA[MS Access]]></category>
		<category><![CDATA[Spreadmarts]]></category>
		<category><![CDATA[Access]]></category>
		<category><![CDATA[Excel]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=1017</guid>
		<description><![CDATA[When business users are not getting what they want out of the enterprise business intelligence system they very rarely just give up. Successful business people didn't get where they are by giving up when someone doesn't deliver something, they take things into their own hands and get it done. Knowing this, it's not surprising that [...]]]></description>
			<content:encoded><![CDATA[<p><img src="/wp-content/uploads/2009/02/spreadmarts-another-100-spreadsheets1.jpg" alt="spreadmarts-another-100-spreadsheets1" title="spreadmarts-another-100-spreadsheets1" width="300" height="316" class="alignright size-full wp-image-1043" />When business users are not getting what they want out of the enterprise business intelligence system they very rarely just give up.  Successful business people didn't get where they are by giving up when someone doesn't deliver something, they take things into their own hands and get it done.</p>
<p>Knowing this, it's not surprising that a huge amount of data collection, extraction, and transformation happens in Excel spreadsheets, or Access databases that are made without the involvement (and often under the direct scorn of) the IT department in large companies.  In my previous life I was in the IT department, and I saw some amazing systems generated with hundreds of spreadsheets and databases.  This mix of spreadsheets and databases, created without the involvement of the IT department by power users or external consultants (financed out of departmental budgets) are often referred to as <a href="http://www.doubletongued.org/index.php/citations/spreadmart_1/" target="_blank">Spreadmarts</a> or <a href="http://en.wikipedia.org/wiki/Shadow_system" target="_blank">Shadow Systems</a>.</p>
<p>For an interesting survey on the subject, take a look at <a href="https://www.tdwi.org/research/display.aspx?ID=8874" target="_blank">TDWI's report "Strategies for Managing Spreadmarts: Migrating to a Managed BI Environment".</a>  This report is now a year old, but I'm certain as valid as ever.</p>
<p>The title suggests that the solution is managed BI-  I won't get into that right now, but you'll notice the study was sponsored by the likes of Microsoft, Cognos, Microstrategy and SAP- so of course the solution is Big Business Intelligence solutions.</p>
<p>But what's really interesting from the survey, is how the different groups within the respondent companies feel about spreadmarts and shadow data systems.  The analysts love them,  the executives are unsure, and IT hates with a passion.  This makes for an interesting mix.<br />
<img src="/wp-content/uploads/2009/02/position-on-spreadsheets.jpg" alt="position-on-spreadsheets" title="position-on-spreadsheets" width="450" height="301" class="alignnone size-full wp-image-1029" /></p>
<p>This is very much what I've seen in my experience.  IT and the Business are at odds with each other, and senior management is either disinterested or forced to take sides.</p>
<p>Where do I stand?  I'm in the "avoid them if you can" camp when we're talking about a tangle of spreadsheets and undocumented MS Access databases that can be error prone and time consuming.  I understand why it's often unavoidable, but I've seen first hand how painful these systems are to maintain.  </p>
<p>On the other hand, I don't subscribe to the school of thought that says "Excel needs to be eliminated- analysts should use the Business Intelligence systems only, otherwise there will be chaos."  Let's not go overboard.  Excel and spreadsheets are useful tools, and have their place.  Additionally, I really feel for business users who simply can't get what they want from the IT departments.  I used to be the IT department, and it was frustrating to not have the resources available to build what people needed.</p>
<p>As one of the authors of the above report, <a href="http://www.athena-solutions.com/index.shtml" target="_blank">Rick Sherman</a>, said in <a href="http://searchcio.techtarget.com/generic/0,295582,sid182_gci1344289,00.html?asrc=SS_CLA_308990&#038;psrc=CLT_182" target="_blank">a recent podcast</a>:</p>
<blockquote><p>"reality is no matter how many IT folks that you have in your company you're not likely to have enough resources or time to meet every business users reporting or analytical requirements..."</p></blockquote>
<p>He presents what is a refreshingly balanced approach to Excel.  In his <a href="http://datadoghouse.typepad.com/data_doghouse/2009/02/business-intelligencedata-warehousing-emerging-trends-but-not-breakouts-9-for-09.html" target="_blank">predictions for trends in 2009</a>, number 5 is "Excel becomes an accepted tool in a BI portfolio". He points out that this may not be mainstream in 2009, but I hope he's right about the trend.  A pragmatic, inclusive strategy with more power to the people while avoiding the chaotic side of spreadmarts is where the solution is.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/spreadmarts-and-data-shadow-systems-the-debate/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Joining the Dimension Table to the Fact Table- Purchasing Data mart (Part 5)</title>
		<link>http://www.datamartist.com/joining-the-dimension-table-to-the-fact-table-purchasing-data-mart-part-5</link>
		<comments>http://www.datamartist.com/joining-the-dimension-table-to-the-fact-table-purchasing-data-mart-part-5#comments</comments>
		<pubDate>Tue, 17 Feb 2009 16:31:48 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Cost Reduction]]></category>
		<category><![CDATA[Data Modelling]]></category>
		<category><![CDATA[Datamartist Tool]]></category>
		<category><![CDATA[ETL]]></category>
		<category><![CDATA[Personal Data Marts]]></category>
		<category><![CDATA[Purchasing Analysis]]></category>
		<category><![CDATA[Data Mart Example]]></category>
		<category><![CDATA[Dimension Tables]]></category>
		<category><![CDATA[Purchasing Data Warehouse]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=991</guid>
		<description><![CDATA[After we have created the dimension tables and the fact table and populated them with data the final step to getting a star schema is of course to actually join the dimension tables to the fact table. In the datamartist tool we do this with a Join block. Check out the first four parts of [...]]]></description>
			<content:encoded><![CDATA[<p><img src="/wp-content/uploads/2009/02/join1.jpg" alt="join1" title="join1" width="200" height="200" class="alignright size-full wp-image-995" />After we have created the dimension tables and the fact table and populated them with data the final step to getting a star schema is of course to actually join the dimension tables to the fact table.  In the datamartist tool we do this with a Join block.</p>
<p>Check out the first four parts of this series (<a href="/purchasing-data-mart-cutting-costs-with-analysis-part-1">1</a>,<a href="http://www.datamartist.com/creating-a-fact-table-with-the-vendor-dimension-purchasing-dm-part-2">2</a> , <a href="/connecting-the-dimension-table-to-the-fact-table-vendor-example-part-3">3</a> and <a href="/hierarchies-and-tree-structures-in-dimensions-an-example-item-dimension-part-4">4</a>) where we created an example data mart, with some fictitious purchasing data.</p>
<p>The final step is to join the dimensions we have created to the fact table. To do this, we connect up the two dimensions (Vendor and Item) to the Join block and connect an export block to the output.  What has in effect been created is a complete Extract, Transform Load (ETL) and the final star schema join.<br />
<a href="/wp-content/uploads/2009/02/po-data-mart-screen-shot2.png"><img src="/wp-content/uploads/2009/02/po-datamart-blocks1.jpg" alt="po-datamart-blocks1" title="po-datamart-blocks1" width="400" height="208" class="alignnone size-full wp-image-1002" /></a></p>
<p>(If thats a bit hard to read- click on the image to see the full size screen shot.)</p>
<p>With the generated data set I used for this example, summarizing the data to yearly totals but keeping all the detail on Vendor and Item causes the roughly 4 million row raw data file to be reduced to around 800 thousand rows.  (This summarizing was done on another canvas- although it could have been done on this canvas just as easily).</p>
<p><img src="/wp-content/uploads/2009/02/join-column-selection.jpg" alt="join-column-selection" title="join-column-selection" width="249" height="361" class="alignleft size-full wp-image-1007" />This data mart, with 800 k rows and two dimensions of about three thousand members each took my laptop about a minute and 45 seconds to solve, and save to a 360 Mb text file out.</p>
<p>Of course, by summarizing or filtering (just add blocks) analysis subsets could easily be exported directly to Excel, managing the data volumes involved, and letting you create the graphs, dashboards and reports that you need.</p>
<p>This is part of a 5 part series- here are the links to the various parts: <a href="/purchasing-data-mart-cutting-costs-with-analysis-part-1">1</a>,<a href="/creating-a-fact-table-with-the-vendor-dimension-purchasing-dm-part-2">2</a> , <a href="/connecting-the-dimension-table-to-the-fact-table-vendor-example-part-3">3</a> , <a href="/hierarchies-and-tree-structures-in-dimensions-an-example-item-dimension-part-4">4</a> and <a href="/joining-the-dimension-table-to-the-fact-table-purchasing-data-mart-part-5">5</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/joining-the-dimension-table-to-the-fact-table-purchasing-data-mart-part-5/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Connecting the dimension table to the fact table- Vendor Example (Part 3)</title>
		<link>http://www.datamartist.com/connecting-the-dimension-table-to-the-fact-table-vendor-example-part-3</link>
		<comments>http://www.datamartist.com/connecting-the-dimension-table-to-the-fact-table-vendor-example-part-3#comments</comments>
		<pubDate>Mon, 09 Feb 2009 20:47:55 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Cost Reduction]]></category>
		<category><![CDATA[Data Modelling]]></category>
		<category><![CDATA[Data Quality]]></category>
		<category><![CDATA[Datamartist Tool]]></category>
		<category><![CDATA[ETL]]></category>
		<category><![CDATA[Personal Data Marts]]></category>
		<category><![CDATA[Data Mart Example]]></category>
		<category><![CDATA[Dimension Tables]]></category>
		<category><![CDATA[Duplicate Data]]></category>
		<category><![CDATA[Purchasing Data Warehouse]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=858</guid>
		<description><![CDATA[In parts one and two of this series we introduced our challenge (to make a data mart to analyze the Acme Company's spending) and showed how the Datamartist tool could import millions of rows of data and then turn it into a fact table we can use in Excel. Now we need to create a [...]]]></description>
			<content:encoded><![CDATA[<p><img src="/wp-content/uploads/2009/02/makingdimseasyway.jpg" alt="makingdimseasyway" title="makingdimseasyway" width="250" height="97" class="alignright size-full wp-image-883" />In parts <a href="/purchasing-data-mart-cutting-costs-with-analysis-part-1">one</a> and <a href="/creating-a-fact-table-with-the-vendor-dimension-purchasing-dm-part-2">two</a> of this series we introduced our challenge (to make a data mart to analyze the Acme Company's spending) and showed how the <a href="/product">Datamartist tool</a> could import millions of rows of data and then turn it into a fact table we can use in Excel.</p>
<p>Now we need to create a Vendor dimension table and join it to this fact table to determine who our big vendors are.</p>
<p>In Datamartist it is a simple task to create this vendor dimension. As always we use blocks and connect them together.  We define a dimension by using a reference definition block. All we have to do to configure the reference block is to specify which columns uniquely define the dimension (or almost uniquely, Datamartist will resolve duplicate keys using a majority/first rule set for you if you have some data glitches).</p>
<p>We start with an import block that brings in the Vendor master text file, then we define the reference by specifying "Vendor_ID" as the key.  These first two blocks look like this:<br />
<img src="/wp-content/uploads/2009/02/vendor-master-in-and-reference-block.jpg" alt="vendor-master-in-and-reference-block" title="vendor-master-in-and-reference-block" width="302" height="148" class="alignnone size-full wp-image-878" /></p>
<p>Then we join it to the fact table we created in part two of this series with a join block.  This means that now instead of just the vendor ID number that was in the fact table, we have the name, and address for the vendor in our mini star schema.</p>
<p><img src="/wp-content/uploads/2009/02/vendor-dimension-and-join.jpg" alt="vendor-dimension-and-join" title="vendor-dimension-and-join" width="436" height="283" class="alignnone size-full wp-image-879" /></p>
<p>And finally we put a summarize block after that to total up all the monthly values for each vendor, and we export to excel. This is what the canvas looks like:<br />
<img src="/wp-content/uploads/2009/02/vendor-dimension-without-dedup1.jpg" alt="vendor-dimension-without-dedup1" title="vendor-dimension-without-dedup1" width="501" height="198" class="alignnone size-full wp-image-865" /><br />
After we do this, we grab the excel file Datamartist just created for us, do a quick sort, and come up with a list of Acme's top ten suppliers.  Feeling pretty good about ourselves, we do a review with the head of purchasing.</p>
<p>"Where's Mega brothers?" she says with a frown "I think your data is screwy- no way that Mega brothers didn't make the top ten- we spend a fortune on railways, and a lot of our freight goes with the Mega Brothers Rail company. Of course it is probably entered under different vendors, each location works with the office local to them... But we've got to view them as a single vendor in the data mart- you <em><strong>can</strong></em> do that right?"</p>
<p><img src="/wp-content/uploads/2009/02/vendor-dimension-with-dedupe1.jpg" alt="vendor-dimension-with-dedupe1" title="vendor-dimension-with-dedupe1" width="300" height="205" class="alignright size-full wp-image-870" /></p>
<h2>Fixing Duplicate Rows</h2>
<p>  Having to deal with duplicate data is a very common issue in any type of data analysis.  So, back to the canvas.  By simply adding a de-duplicate block to our Vendor dimension table (after the Reference block, and before the join) we can find and resolve the Mega Brothers duplicates.<br />
We just use the filter to find the records- (Easy to do, looking for "Mega" "rail" "brothers" etc. and we map them to a single instance.)  This is the filter control that lets us find and tag the duplicates:<br />
<img src="/wp-content/uploads/2009/02/mega-bros-duplicates-in-picker1.jpg" alt="mega-bros-duplicates-in-picker1" title="mega-bros-duplicates-in-picker1" width="400" height="280" class="alignnone size-full wp-image-871" /></p>
<p><img src="/wp-content/uploads/2009/02/mega-bros-duplicates-in-mapper.jpg" alt="mega-bros-duplicates-in-mapper" title="mega-bros-duplicates-in-mapper" width="312" height="247" class="alignright size-full wp-image-872" />As we tag them, they show up in the mapper, which lets us see which duplicate records we have eliminated for the dimension. We run the canvas again, and this time, sure enough, Mega Brothers Rail is in our top ten.  But even though the head of purchasing knew it was a lot, this is actually the first time she's seen the number.  "Wow. I've got to give them a call- can you give me that in an Excel spreadsheet?"</p>
<p>Stay tuned, more to come as we go further into Datamartist's ability to segment, filter and organize large data sets.</p>
<p>If you want to see the interface in action watch our first <a href="/product/video-and-screenshots/introductory-tutorial-video">Tutorial Video</a>.  Or just get right to it with your own data- <a href="/downloads">download the free trial now</a>- there is no registration required, and it installs in minutes.</p>
<p>This is part of a 5 part series- here are the links to the various parts: <a href="/purchasing-data-mart-cutting-costs-with-analysis-part-1">1</a>,<a href="/creating-a-fact-table-with-the-vendor-dimension-purchasing-dm-part-2">2</a> , <a href="/connecting-the-dimension-table-to-the-fact-table-vendor-example-part-3">3</a> , <a href="/hierarchies-and-tree-structures-in-dimensions-an-example-item-dimension-part-4">4</a> and <a href="/joining-the-dimension-table-to-the-fact-table-purchasing-data-mart-part-5">5</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/connecting-the-dimension-table-to-the-fact-table-vendor-example-part-3/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

