<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Datamartist.com &#187; Data Transformation</title>
	<atom:link href="http://www.datamartist.com/category/data-transformation/feed" rel="self" type="application/rss+xml" />
	<link>http://www.datamartist.com</link>
	<description>Reduce cost with self serve data transformation</description>
	<lastBuildDate>Thu, 09 Feb 2012 20:00:31 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.3</generator>
		<item>
		<title>Data integration is like a pizza</title>
		<link>http://www.datamartist.com/data-integration-is-like-a-pizza</link>
		<comments>http://www.datamartist.com/data-integration-is-like-a-pizza#comments</comments>
		<pubDate>Tue, 18 May 2010 12:52:12 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Data Integration]]></category>
		<category><![CDATA[Data Quality]]></category>
		<category><![CDATA[Data Transformation]]></category>
		<category><![CDATA[Business Intelligence]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=4520</guid>
		<description><![CDATA[I enjoy a slice of pizza as much as the next person (perhaps a bit more). The key to a good pizza is the raw materials- use the right stuff, and you'll be happy every time. What's great about pizza is that it has all sorts of great stuff on it, and presents them all [...]]]></description>
			<content:encoded><![CDATA[<p>I enjoy a slice of pizza as much as the next person (perhaps a bit more).  The key to a good pizza is the raw materials- use the right stuff, and you'll be happy every time.  What's great about pizza is that it has all sorts of great stuff on it, and presents them all in a single, easy to hold and eat meal. </p>
<p>Data integration can be like a really well put together pizza- lots of good cross-referencing cheese-data to keep everything in its place, great crust that supports it all, and a universal appeal that might even get people to try something they wouldn't normally consume (data wise).</p>
<p><a href="http://www.datamartist.com/wp-content/uploads/2010/05/data-integration-if-the-data-was-any-good.jpg"><img src="http://www.datamartist.com/wp-content/uploads/2010/05/data-integration-if-the-data-was-any-good.jpg" alt="" title="data-integration-if-the-data-was-any-good" width="357" height="297" class="alignleft size-full wp-image-4525" /></a>But without data quality, data integration can make pizza that nobody really wants to eat, and rather than enhancing the value of your data, your data integration efforts can make your bad data even less consumable than it was on its own.</p>
<p>While combining data from multiple systems can generate huge insights, it is important to understand that moving it and combining it with data from other systems will not <em>always</em> increase its value.  </p>
<p>With good quality data you can have fantastic results, but bad quality data requires so much effort and transformation that often your payback on doing the integration will be non-existent.</p>
<h2>Data integration enthusiasm </h2>
<p>So what happens when an enterprise hears its stomach rumble, and starts thinking data pizza?</p>
<p>Enthusiastic analysts spring into action, building various mockups of all the fantastic dashboards that they will be able to produce, once the data integration is done.  Terms like "near-real time, balanced, cross-functional score cards" start to get bounced around, and pretty soon, budget proposals and appropriation requests are flying from color printers everywhere.</p>
<p>Whats unfortunate in many cases is that cooler heads don't stop to ask the question-  "So... all this data we are going to put together, is it any good?"</p>
<p>When you are making your pizza, you have to know if the cheese has been left out a bit too long or the green pepper is soggy.</p>
<p>What can be worse, is that if heroic measures are taken to try to get the data to fit together, the integration jobs themselves might actually degrade the data quality further- or eliminate levels of detail that are not compatible, actually hiding important trends and structures.  A risk of integrated dashboards is that they pander to the lowest common denominator.</p>
<p>So if you are planning to do some data integration, to build a data pizza, think twice about putting that moldy pepperoni from the CRM system on it- sometimes less is more.  </p>
<p>In fact, it might be that data integration is not your first concern- improving the quality of the data in all those data silos will actually improve day to day operations immediately- and make any future data integration project cheaper, and more successful. </p>
<p>Any great chef will tell you- no matter how complex the recipe, and how impressive your kitchen and equipment, the raw ingredients matter.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/data-integration-is-like-a-pizza/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Datamartist V1.2 now available</title>
		<link>http://www.datamartist.com/datamartist-v1-2-now-available</link>
		<comments>http://www.datamartist.com/datamartist-v1-2-now-available#comments</comments>
		<pubDate>Tue, 02 Mar 2010 14:45:33 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Data Transformation]]></category>
		<category><![CDATA[Datamartist Tool]]></category>
		<category><![CDATA[ETL]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=4266</guid>
		<description><![CDATA[nModal solutions is pleased to announce that Datamartist V1.2 is now available. In this version, we've introduced a Standard and Pro edition, letting customers get the features they need at the right price. Datamartist Standard: $349 Datamartist Professional: $745 A comparison of the feature sets explains the details. Whats new in V1.2 Data source import [...]]]></description>
			<content:encoded><![CDATA[<p>nModal solutions is pleased to announce that Datamartist V1.2 is now available.</p>
<p><a href="http://www.datamartist.com/wp-content/uploads/2010/02/Sales-example-full-screen-shot-profiler-perspective-300w.jpg"><img src="http://www.datamartist.com/wp-content/uploads/2010/02/Sales-example-full-screen-shot-profiler-perspective-300w.jpg" alt="" title="Sales-example-full-screen-shot-profiler-perspective-300w" width="300" height="228" class="alignright size-full wp-image-4302" /></a>In this version, we've introduced a Standard and Pro edition, letting customers get the features they need at the right price. </p>
<ul>
<li>Datamartist Standard:       $349</h3>
<li>Datamartist Professional:   $745</h3>
</ul>
<p>A <a href="/product/datamartist-pricing-and-edition-comparison">comparison of the feature sets explains</a> the details.</p>
<h1>Whats new in V1.2</h1>
<h2>Data source import enhancements</h2>
<ul style="margin-top:10px;">
<li>Ability to cut and paste between Excel, Text files, the Datamartist canvas and any Datamartist data viewer.</li>
<li>New integrated data source repository with drag and drop to canvas.</li>
<li>SQL Editor to allow the creation of SQL queries to get data from databases.</li>
</ul>
<p><a href="http://www.datamartist.com/wp-content/uploads/2010/02/Edit-SQL-Datamartist1.jpg"><img src="http://www.datamartist.com/wp-content/uploads/2010/02/Edit-SQL-Datamartist1.jpg" alt="" title="Edit-SQL-Datamartist" width="609" height="342" class="aligncenter size-full wp-image-4307" /></a></p>
<h2>Running Datamartist canvases automatically</h2>
<p>Now that Datamartist can be run from the command line, it is possible to schedule datamartist transforms- even running it on a Windows server.  Details about the logging and options <a href="/resources/datamartist-doc-files/V1_0_Documentation/DM-running-from-cmd-line-Doc.html">are here</a>.<br />
<a href="http://www.datamartist.com/wp-content/uploads/2010/02/Running-datamartist-from-the-command-line-610w.jpg"><img src="http://www.datamartist.com/wp-content/uploads/2010/02/Running-datamartist-from-the-command-line-610w.jpg" alt="" title="Running-datamartist-from-the-command-line-610w" width="610" height="308" class="aligncenter size-full wp-image-4310" /></a></p>
<h2>Edit Internal data sets.</h2>
<p>The addition of fully editable internal data sets that are stored within the DMC file itself gives a powerful new ability to create "What if" type scenarios.  Imagine you want to see the effect of changing the sales regions slightly-  just copy and paste the existing from a data viewer onto the canvas- that gives you an internal data set block with that data in it-  now you can add a column "New Region" or rename the column, then edit some values, join it back into the original data with a join block, and be trying different scenarios in no time.<br />
<a href="http://www.datamartist.com/wp-content/uploads/2010/02/Internal-edit-regions-list.jpg"><img src="http://www.datamartist.com/wp-content/uploads/2010/02/Internal-edit-regions-list.jpg" alt="" title="Internal-edit-regions-list" width="547" height="389" class="aligncenter size-full wp-image-4313" /></a></p>
<p>We're excited about this new release, and thanks to all our customers and testers for their feedback- we're glad to be incorporating some of those great ideas into the product.</p>
<p>If you haven't tried Datamartist yet, <a href="/downloads">this is the perfect time</a>, and now with two editions to choose from you can get the features you need at the right price.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/datamartist-v1-2-now-available/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Adding self serve data transformation to reduce shadow systems</title>
		<link>http://www.datamartist.com/adding-self-serve-data-transformation-to-reduce-shadow-systems</link>
		<comments>http://www.datamartist.com/adding-self-serve-data-transformation-to-reduce-shadow-systems#comments</comments>
		<pubDate>Sat, 07 Nov 2009 16:04:36 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Business Intelligence Architecture]]></category>
		<category><![CDATA[Data Transformation]]></category>
		<category><![CDATA[Spreadmarts]]></category>
		<category><![CDATA[Business Intelligence trends]]></category>
		<category><![CDATA[Personal data mart]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=2203</guid>
		<description><![CDATA[Do you have lots of unoffical spreadsheets in your organization being used for data analysis? Is the data warehouse use low to non-existent, yet somehow lots of data is appearing in power point presentations and excel spreadsheets all over the company? I believe a key to understanding how information moves around your organization is to [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://www.datamartist.com/wp-content/uploads/2009/11/spreadsheet-data-is-official-its-just-seasoned1.jpg" alt="spreadsheet-data-is-official-its-just-seasoned" title="spreadsheet-data-is-official-its-just-seasoned" width="376" height="291" class="alignright size-full wp-image-3424" />Do you have lots of unoffical spreadsheets in your organization being used for data analysis? Is the data warehouse use low to non-existent, yet somehow lots of data is appearing in power point presentations and excel spreadsheets all over the company?</p>
<p>I believe a key to understanding how information moves around your organization is to think of it as a mini economy. (I know, the economy is not our favorite subject right now, but bear with me).</p>
<p>There are information suppliers, and information consumers.  The consumers are willing to pay more or less for different types of information, and different methods of supplying information have different costs.  In the end, the market decides what gets done and what does not get done.</p>
<p>And like many markets, there is also a underground economy- places consumers go if the official prices don't make sense, or the products they want are not available on the open market.</p>
<p>In many companies, the IT department in theory has a monopoly on information supply, however the underground is active and constitutes a significant supply.  The underground in this case is all the excel spreadsheets, the MS Access databases etc. used to make the shadow systems and spread marts.  Spreadmarts seem to exist in the majority of enterprises- I've mentioned an <a href="/spreadmarts-and-data-shadow-systems-the-debate/" target="_blank">interesting study regarding these shadow systems</a> previously, and the attitudes people have.</p>
<h2>To help illustrate this I am going to make up some data and put it in colorful graphs.</h2>
<p><img src="/wp-content/uploads/2009/05/relative-cost-data-warehouse-data-mart-spreadmart2.jpg" alt="relative-cost-data-warehouse-data-mart-spreadmart2" title="relative-cost-data-warehouse-data-mart-spreadmart2" width="376" height="221" class="alignleft size-full wp-image-2211"/></p>
<p>Looking at the first graph, in broad terms a data warehouse based approach will have higher costs than one based on data marts (because data warehouse provide more cross enterprise integration, which requires more effort), and the spreadmarts will have the lowest perceived cost.  It's important to note that the actual cost of spreadmarts are higher, but <strong>percieved</strong> cost is what drives the consumers choice.</p>
<p>The trick is that because the percieved cost of spreadmarts is so low, and because there is no sanctioned enterprise solution to compete, a significant amount of effort is put in to these systems for any type of analysis that is percieved to be possible.  Of course for certain data volumes or complexities there is no alternative to a full fledged data warehouse or data mart project, but for almost everything else, business users and analysts will often try to go it alone creating a chaos of spreadsheets and data bases.</p>
<p>The problem is, even "experts" can't accurately estimate how much effort the data analysis is.  So estimates for how long it will take to "whip it up in excel" by non-experts are almost always low by orders of magnitude.</p>
<h2>Don't dictate.  Engage with sanctioned tools that work the way people want to work.</h2>
<p>The key to adjusting this market imbalance is to introduce a new sanctioned product line, in effect undercutting the "black market".<br />
<img src="/wp-content/uploads/2009/05/relative-cost-data-warehouse-data-mart-spreadmart-plus-self-serve.jpg" alt="relative-cost-data-warehouse-data-mart-spreadmart-plus-self-serve" title="relative-cost-data-warehouse-data-mart-spreadmart-plus-self-serve" width="470" height="274" class="alignnone size-full wp-image-2213" /></p>
<p>This is exactly what self serve data transformation is about.  Rather than leaving users to do it themselves in Excel- IT can provide specific tools, and thereby reduce the amount of completely opaque data transformation going on, while still providing users with the ability to get what they need. </p>
<h2>So why is that better?</h2>
<ul>
<li><strong>It opens up the dialog</strong> -  Talking is better than having a "Us" vs "Them" mentality.  It lets you meet the people involved, lets you discuss their challenges with them, and provides an opening for discussion of important topics like data quality, master data management and data security.</li>
<li><strong>You'll know who the power users are</strong> -  Right now, it is potentially anyone who has Excel- chances are that's everyone in your organisation.</li>
<li><strong>It gives you visibility on what matters to the business</strong> - If you know what the hot topics are, it can help you keep the official systems relevant and prioritize your efforts where they will do the most good.</li>
</ul>
<p>What has to be different in this new relationship, however, is that IT has to understand about the "self" in self-serve.  People will do things that no self-respecting ETL developer or data warehouse architect would ever sanction.  If you clamp down and stop them, they will abandon the tools and return to the wild west.  IT believes that it has the power in the relationship, but in fact the users are able to walk at any time.  So add value, communicate, educate, but don't dictate.  If your relationship with the business users, and the "Kings of the spreadmart" is poor to start, you have to give it time to evolve.</p>
<h2>"But we just can't let them do that."</h2>
<p>Resist the urge to clamp down.</p>
<p>Keep your systems secure, guard your infrastructure, but don't have any illusions that you can stop people from analyzing and transforming their data.</p>
<p>If they want to calculate net sales in a particular way then they'll do it in excel, and it will be the number that the CEO sees.  The business is made up of grownups, after all.  IT has a responsibility to explain the issues and challenges that shadow systems and rampant spreadsheeting can cause, but I have yet to see or hear of a company where an authoritarian approach works.   As Princess Leia said- <a href="http://www.entertonement.com/clips/qswvtcydps--Star-Wars-Episode-IV-A-New-Hope-Carrie-Fisher-Princess-Leia-Organa-The-more-you-tighten-your-grip-Tarkin-the-more-star-systems-will-slip-through-your-fingers">"The more you tighten your grip, Tarkin, the more star systems will slip through your fingers."</a></p>
<h2>Arming the rebels</h2>
<p>The business intelligence vendors are all realizing what the crowd pleasers are-  really good integration into office applications, excel at the forefront.  People want at their data.</p>
<p>Microsoft has of course long provided the main weapons for the shadow systems, MS Excel and MS Access- and they are going nuclear with the addition of "Power Pivot" to Excel 2010-  although it is largely a presentation layer tool, and probably won't be used widely for data transformation itself.</p>
<p>Trying to fight all this with the standard tools of closing down the ability to export data, hiring an army of report writers, and constantly raving about the dangers and pitfalls of run away spreadsheets is like pushing on a rope.  </p>
<h2>Provide a safe, legal alternative to the free for all.</h2>
<p>Talk to your business users.  Understand their needs.  Provide them with tools.  Work with them to both empower responsible analysts, and avoid the worst issues that existing shadow systems are creating.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/adding-self-serve-data-transformation-to-reduce-shadow-systems/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Data to the people- why self serve ETL</title>
		<link>http://www.datamartist.com/data-to-the-people-why-self-serve-etl</link>
		<comments>http://www.datamartist.com/data-to-the-people-why-self-serve-etl#comments</comments>
		<pubDate>Tue, 21 Jul 2009 17:11:37 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Business Intelligence Architecture]]></category>
		<category><![CDATA[Data Transformation]]></category>
		<category><![CDATA[ETL]]></category>
		<category><![CDATA[Analyst tools]]></category>
		<category><![CDATA[Business Intelligence trends]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=2866</guid>
		<description><![CDATA[As regular readers of this blog know, I believe in a balance between formal and informal data analysis tools. I believe in an approach that firmly places people in the center of a new way of looking at the data analysis process. In the past, “big business intelligence” created an infrastructure heavy, highly centralised and [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://www.datamartist.com/wp-content/uploads/2009/07/you-have-used-unautorised-data-transformation.jpg" alt="you-have-used-unautorised-data-transformation" title="you-have-used-unautorised-data-transformation" width="403" height="354" class="alignright size-full wp-image-2887" />As regular readers of this blog know, I believe in a balance between formal and informal data analysis tools.</p>
<p>I believe in an approach that firmly places people in the center of a new way of looking at the data analysis process.</p>
<p>In the past, “big business intelligence” created an infrastructure heavy, highly centralised and technology focused approach to getting data from source systems into reports in the hands of the users.  Under this regime, users were not to be trusted with raw data, but were given tightly controlled, managed and aggregated reports in order to protect the “single version of the truth”.</p>
<blockquote><p>The theory and practice were tightly defined, and had been honed over decades of business intelligence and data warehouse orthodoxy.   Giving raw data to end users would lead to chaos. Letting end users define new ways to look at the data would corrupt the master data, and lead to everyone looking at something different.</p></blockquote>
<p>You can guess the  <a href="http://datadoodle.com/2009/07/16/just-give-me-the-data/" target="_blank">sort of response</a> this "don't give them the raw data" approach gets from capable, curious people that want to get down to some real analysis.  </p>
<p>But to be fair you can see why these concerns are thought to be well founded.  Almost every large enterprise is awash in a sea of excel files and a tangle of links and formulas.  Excel is a wonderful tool, but it only offers the illusion of solving the data transformation problem.  It is a much better reporting/dashboard tool than an ETL. (Although in the right hands it can do remarkable things.)</p>
<p>And this is the true state of affairs now.  When the “official” system does not provide the answers that the business needs the people who need to make decisions get the data anyway, and they do it themselves. They do it in excel, they take night courses in Structured Query Language (SQL) they hire consultants (or even summer students) to build rogue data bases that they run on servers hidden under desks to get at the answers they need.</p>
<p>It is easy for the data warehouse theorists to highlight the clear issues with "spreadmarts" and "shadow systems".  </p>
<p>But we need to be pragmatic. The reality of building a centralized structure that imposes strict formal rules and change management processes is that often while it does ensure that there is only one version of the truth,  it is a version of the truth that no one can use because it has been so formalized, aggregated,  compromised and delayed that by the time it is delivered the pressing business questions have changed and meaning has been expunged.  The data warehouse becomes reporting rather than analysis.</p>
<p>Its clear that enterprises need this kind of reporting- I'm not advocating abandoning the existing approach- but augmenting it.  Up till now, the solution has often been "more of the same".</p>
<blockquote><p>The regime decided that the solution was to add more technology to the central systems, increase enforcement, and search out and repress all the dissident data manipulators.  The data resistance was forced to go underground, to hide their spreadsheets, to outwardly appear to be following the official line.</p></blockquote>
<p>It is very true that there are some risks in allowing people to analyze their own data, but there is also a reward.  There are a small group of people who love data, who understand the business questions, who work to tease insight out of a steaming pile of raw data and can find things that are game changing.  Massive, formal, designed by committee data warehouses can deliver a powerful and useful view of things, but they rarely offer flashes of insight.  When they do, it is often during the design and discovery process- rarely by users using the system after it has gone live.</p>
<p>The <a href="/product">Datamartist tool</a> has been built based on the belief that both formal, centralized systems AND local, personal data transformation have a place in the architecture and that both should be official places.</p>
<p>People can be trusted with the data.  In fact I think for an organisation to truly be successful at mastering its information, they have to be.</p>
<p>We have to realize that we can't allow our obsession with the quest for a single version of the truth to turn us into totalitarian regimes, certain that OUR truth is THE truth, and that messing around with the data is by its very nature subversive and dangerous.</p>
<p>Data to the people.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/data-to-the-people-why-self-serve-etl/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>MS Access query example and comparision to Datamartist</title>
		<link>http://www.datamartist.com/microsoft-access-query-example-and-comparision-to-datamartist</link>
		<comments>http://www.datamartist.com/microsoft-access-query-example-and-comparision-to-datamartist#comments</comments>
		<pubDate>Tue, 31 Mar 2009 22:59:55 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Business Intelligence Architecture]]></category>
		<category><![CDATA[Data Modelling]]></category>
		<category><![CDATA[Data Transformation]]></category>
		<category><![CDATA[Microsoft Excel]]></category>
		<category><![CDATA[MS Access]]></category>
		<category><![CDATA[Access]]></category>
		<category><![CDATA[Data Mart Example]]></category>
		<category><![CDATA[Excel]]></category>
		<category><![CDATA[Personal data mart]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=1321</guid>
		<description><![CDATA[Microsoft Access allows users to create complex queries and analyze large data sets. However, it can be complicated to use compared to Excel. In this post, I'll talk about ms access queries and the equivalent way to perform the same data transformation in the Datamartist tool- visually and simply. Microsoft Access has a clear role [...]]]></description>
			<content:encoded><![CDATA[<p>Microsoft Access allows users to create complex queries and analyze large data sets.  However, it can be complicated to use compared to Excel.  In this post, I'll talk about <a href="/help-support/tutorials/microsoft-access-examples-and-tutorials">ms access queries</a> and the equivalent way to perform the same data transformation in the <a href="/product">Datamartist tool</a>- visually and simply.</p>
<p>Microsoft Access has a clear role to play when a small, light database application is required.  However, it has a learning curve, and is not necessarily the best tool for data analysis.</p>
<h2>Product Segmentation Query Example</h2>
<p>Lets look at an example ms access query or two and see how we can do the same thing Datamartist, only without the queries and without any SQL. For this example, lets say that we have two sets of sales data from different time periods, and a product list, and we want to define some product segments based on color and price.  We want to get a summary of the sales Qty and average price sold by month, broken out by the new categories which are as follows:</p>
<ul>
<li> "Red and High Priced" If the product is Red and its minimum price is more than $1000</li>
<li> "Red Low Price wide price range" If the product is Red, has a minimum price less than $1000 but has a min to max price of more than $200</li>
<li> "Red Low Price small price range" If its Red and not in the first two segments</li>
<li> "Yellow" if the product is yellow. </li>
<li> "Other" for all the rest</li>
</ul>
<p>The three data tables we have are as follows:</p>
<ol>
<li> Sales 03-06 with about 120 000 rows, which contains sales data from 2003 - 2006</li>
<li> Sales 2007  with about 30 000 rows, which contains sales data for 2007</li>
<li> Products  which contains the colors for all the products and their minimum and maximum prices</li>
</ol>
<p>So- first step is to combine the two data tables, in Access, this is done with a UNION query with the following SQL code:</p>
<blockquote><p>select * from [Sales Data 03-06] UNION select * from [Sales Data 2007];</p></blockquote>
<p>In Datamartist, we simply connect the two tables up to a combine block.<br />
<img src="/wp-content/uploads/2009/03/segmentation-example-datamartist-combine1.jpg" alt="segmentation-example-datamartist-combine1" title="segmentation-example-datamartist-combine1" width="264" height="234" class="alignnone size-full wp-image-1394" /></p>
<p>Next, we need to define the segmentation-  again in Access this is done with a Query, this time by nesting IIF statements to add a new column called "Product_Segment" to the resulting query.</p>
<blockquote><p>SELECT Products.Product_ID, Products.Product_Name, Products.Product_Group, Products.Product_Category, Products.Product_SubCategory, Products.Shipping_Weight, Products.Color, Products.Price_Min, Products.Price_Max, IIf([Color]="Red" And [Price_Min]>1000,"Red and High Priced",IIf([Color]="Red" And ([Price_max]-[Price_min])>200,"Red Low Price wide price range",IIf([Color]="Red","Red Low Price small price range",IIf([Color]="Yellow","Yellow","Other")))) AS Product_Segment<br />
FROM Products;</p></blockquote>
<p>In Datamartist, we use a segmentation block to do the same thing.  The interface is graphical, and the syntax is the same as you would use in Excel.  There is no need to nest any IF statements, because the overall block is designed to do that.  Heres what the blocks look like-  the MS Access import block on the left, and the segmentation rule block on the right.<br />
<img src="/wp-content/uploads/2009/03/segmentation-example-datamartist-segment-block.jpg" alt="segmentation-example-datamartist-segment-block" title="segmentation-example-datamartist-segment-block" width="418" height="211" class="alignnone size-full wp-image-1428" /><br />
Each segment has the statement that defines if a row is in the segment or not.   The block tests each segment rule in order, starting at the top- the first statement that solves as "TRUE" defines the value for the Product_Segment column for that row. Dragging the segments up and down changes what order the rules are checked.</p>
<p><a href="/resources/images/Segmentation-Example-Product.jpg" target="_blank" onClick="javascript: pageTracker._trackPageview('/screenshots/Segmentation-Example-Product'); "><img src="/resources/images/Segmentation-Example-Product-Thumb.jpg">
<p style="padding:8px;">(Click to Enlarge)</p>
<p></a></p>
<p>Then we have to Join this new product dimension (with the segmentation column) to the sales data, and summarize.</p>
<p>In MS Access, this is done with more queries-  Heres what Access looks like when we're done.<br />
<img src="/wp-content/uploads/2009/03/segmentation-example-access-gui1.jpg" alt="segmentation-example-access-gui1" title="segmentation-example-access-gui1" width="450" height="485" class="alignnone size-full wp-image-1405" /><br />
Compare that list of Tables and Queries to the visual, left to right layout of the Datamartist data canvas that does the same thing.  Without ever having to write any SQL code:</p>
<h2>The VISUAL way to do it</h2>
<p><img src="/wp-content/uploads/2009/03/segmentation-example-solved-canvas.jpg" alt="segmentation-example-solved-canvas" title="segmentation-example-solved-canvas" width="406" height="314" class="alignnone size-full wp-image-1403" /></p>
<p><a href="/resources/images/Segmentation-Example-Datamartist-full-app-shot.jpg" target="_blank" onClick="javascript: pageTracker._trackPageview('/screenshots/Segmentation-Example-Datamartist-full-app-shot'); "><img src="/resources/images/Segmentation-Example-Datamartist-full-app-shot-Thumb.jpg" class="alignright size-full wp-image-1430" ></a><br />
In Datamartist you can see the flow of the data, the row counts are clearly displayed, and clicking on the connectors will bring up the underlying data set in the data viewer.  Its clear which block feeds which, and by adding more blocks and connecting them at the desired point in the data flow, new analysis can be created.</p>
<p>Take Datamartist for a trial run-  <a href="/downloads">download it now</a> because maybe you don't have to learn microsoft access queries after all.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/microsoft-access-query-example-and-comparision-to-datamartist/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Creating a Fact Table with the Vendor dimension Purchasing DM (Part 2)</title>
		<link>http://www.datamartist.com/creating-a-fact-table-with-the-vendor-dimension-purchasing-dm-part-2</link>
		<comments>http://www.datamartist.com/creating-a-fact-table-with-the-vendor-dimension-purchasing-dm-part-2#comments</comments>
		<pubDate>Fri, 06 Feb 2009 00:23:50 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Data Modelling]]></category>
		<category><![CDATA[Data Transformation]]></category>
		<category><![CDATA[Datamartist Tool]]></category>
		<category><![CDATA[ETL]]></category>
		<category><![CDATA[Personal Data Marts]]></category>
		<category><![CDATA[Data Mart Example]]></category>
		<category><![CDATA[Excel Data Import]]></category>
		<category><![CDATA[Excel Performance]]></category>
		<category><![CDATA[Purchasing Data Warehouse]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=781</guid>
		<description><![CDATA[In creating a data warehouse or data mart data model there are two key types of tables- fact tables and dimension tables. Fact tables hold the data to be analyzed, dimensional tables provide categories and analysis values that organize the data. So we have our mission from Part 1: to analyze the "Acme does everything" [...]]]></description>
			<content:encoded><![CDATA[<p><img src="/wp-content/uploads/2009/02/four_million_rows_no_worries1.jpg" alt="four_million_rows_no_worries1" title="four_million_rows_no_worries1" width="300" height="136" class="alignright size-full wp-image-812" />In creating a data warehouse or data mart data model there are two key types of tables- fact tables and dimension tables.  Fact tables hold the data to be analyzed, dimensional tables provide categories and analysis values that organize the data.<br />
So we have our <a href="/purchasing-data-mart-cutting-costs-with-analysis-part-1">mission from Part 1</a>: to analyze the "Acme does everything" company's purchasing data and find ways to save money.  The first step, however is getting a handle on the data.  The IT department has given us the files, and with a smug smile told us to "have fun".  We've been given three files that are a snapshot of the purchasing data:</p>
<ul>
<li><strong>Item_Master.txt</strong>  - this holds all the items that Acme buys</li>
<li><strong>Vendor_Master.txt</strong> - this holds a list of all the vendors, with information such as their address</li>
<li><strong>PO_Detail.tx</strong>t - this is the huge data set, all the purchase order data for the last four years</li>
</ul>
<p>The Item and Vendor files aren't very big, but the PO_Detail is over 340 Mb, and it holds almost four million purchase order lines.  Don't try to import it into Excel. Of course you need Excel 2007 to even try to import 4 million rows. In Excel 2003 it would take over sixty sheets and probably some VBA code to try it.  I tried the import in Excel 2007- it takes 20 seconds just to tell me I'll have to go back to the text file import multiple times to do multiple imports onto separate sheets. It took almost two minutes to do the first million rows.  Even once we have the data spread across four sheets it's not clear how to summarize millions of rows in excel easily.<img src="/wp-content/uploads/2009/02/po_detail_columns.jpg" alt="po_detail_columns" title="po_detail_columns" width="247" height="398" class="alignright size-full wp-image-785" /></p>
<p>Instead, let's use the <a href="/product">Datamartist tool</a> to manage this data set and generate one thats more useful.</p>
<p>The first analysis we will do will be on the Vendor dimension, to determine who Acme's big vendors are, and if we can negotiate some price reductions where we have leverage.</p>
<p>In Datamartist, very large files are not an issue because the tool can load in only preview data- this means that it's possible to look at a sampling of a few hundred thousand rows, and design the transformation before running it on the whole data set.</p>
<p>The PO Detail file has the columns shown- let's answer the question - "Who are our biggest suppliers"?<br />
 So which columns do we need?  We probably want to have some sense of trends over time so we'll keep the <strong>order date</strong>, but summarize to <strong>Month</strong>,  we'll keep the <strong>Vendor ID</strong> of course, and then we need to use the <strong>Quantity and Price</strong> fields to calculate the total amount spent.  Then we want to write this summarized data into Excel to check it out.</p>
<p>To do this in Datamartist all it takes is four simple blocks;  A Text import block to load in the PO_Detail.txt file, a calculate block to multiply QTY by PRICE, a Summarize block to do all the summarizing, and an Excel export block to generate the excel file;</p>
<p><img src="/wp-content/uploads/2009/02/po_detail_summarize_blocks.jpg" alt="po_detail_summarize_blocks" title="po_detail_summarize_blocks" width="463" height="92" class="alignnone size-full wp-image-806" /></p>
<p>Each block passes its result to the next block via the connectors, and the last block saves it to an excel file we've specified.</p>
<p>Defining the calculation uses standard spreadsheet functions- here's what the config area looks like;<br />
<img src="/wp-content/uploads/2009/02/calculate_total_closeup.jpg" alt="calculate_total_closeup" title="calculate_total_closeup" width="400" height="91" class="alignnone size-full wp-image-801" /></p>
<p>And defining the summary is as simple as it looks- pick the columns you want, and select what kind of summary you want done.<br />
<img src="/wp-content/uploads/2009/02/summary_block_closeup1.jpg" alt="summary_block_closeup1" title="summary_block_closeup1" width="417" height="111" class="alignnone size-full wp-image-797" /></p>
<p>We run it on a preview set of 100 thousand rows (takes about twelve seconds to run), and check the output.</p>
<p>It looks good, so we run on the whole 4 million rows;</p>
<p><img src="/wp-content/uploads/2009/02/summarize_progress_po_detail.jpg" alt="summarize_progress_po_detail" title="summarize_progress_po_detail" width="466" height="128" class="alignnone size-full wp-image-804" /></p>
<p>About seven minutes later we have our result- an excel sheet with a manageable 130 thousand rows, total spend, by vendor, by month for four years;<br />
<img src="/wp-content/uploads/2009/02/completed_po_detail_summary.jpg" alt="completed_po_detail_summary" title="completed_po_detail_summary" width="461" height="95" class="alignnone size-full wp-image-807" /></p>
<p>Next up we need to create our vendor dimension, and join it to this mini fact table we have created.  Stay tuned.</p>
<p>This is part of a 5 part series- here are the links to the various parts: <a href="/purchasing-data-mart-cutting-costs-with-analysis-part-1">1</a>,<a href="/creating-a-fact-table-with-the-vendor-dimension-purchasing-dm-part-2">2</a> , <a href="/connecting-the-dimension-table-to-the-fact-table-vendor-example-part-3">3</a> , <a href="/hierarchies-and-tree-structures-in-dimensions-an-example-item-dimension-part-4">4</a> and <a href="/joining-the-dimension-table-to-the-fact-table-purchasing-data-mart-part-5">5</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/creating-a-fact-table-with-the-vendor-dimension-purchasing-dm-part-2/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Easy to use ETL</title>
		<link>http://www.datamartist.com/easy-to-use-etl</link>
		<comments>http://www.datamartist.com/easy-to-use-etl#comments</comments>
		<pubDate>Fri, 05 Dec 2008 21:32:02 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Business Intelligence Architecture]]></category>
		<category><![CDATA[Data Transformation]]></category>
		<category><![CDATA[ETL]]></category>
		<category><![CDATA[BI Consolidation]]></category>
		<category><![CDATA[easy to use etl]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=530</guid>
		<description><![CDATA[As We've been creating Datamartist, we've been trying to avoid using acronyms to describe what it is, but when I'm talking to people who have a background in data warehousing, I only have to say "its an easy to use desktop ETL tool", and suddenly they know what I am talking about. An Extract, Transform, Load (ETL) tool is [...]]]></description>
			<content:encoded><![CDATA[<p>As We've been creating Datamartist, we've been trying to avoid using acronyms to describe what it is, but when I'm talking to people who have a background in data warehousing, I only have to say "its an easy to use desktop ETL tool", and suddenly they know what I am talking about.<a href="/wp-content/uploads/2008/12/etl-eating-money.jpg"><img class="alignright size-full wp-image-540" title="etl-eating-money" src="/wp-content/uploads/2008/12/etl-eating-money.jpg" alt="" width="350" height="221" /></a></p>
<p>An Extract, Transform, Load (ETL) tool is an intermediate software application that extracts the data from the source system, transforms it (often another way to say it FIXES it) and then loads it into the destination system. They are also very expensive.</p>
<p>The destination system is usually a data warehouse or data mart, and most of the ETL tools available are server based.  The ETL tool and related development is key to any any data warehouse project (and represent a third or more of the cost on a typical project).</p>
<p>Although most ETL tools use a visual interface of one sort or another, at the core they require programming skills and specialized knowledge. <a href="http://www.google.ca/search?hl=en&amp;q=datastage+training&amp;start=10&amp;sa=N" target="_blank">Google "datastage training"</a> and you'll see that there is an industry grown up around learning how to use these tools.</p>
<p>But there's nothing magical about it. If you ever made a spreadsheet with data from multiple sources, transformed the data, and then either made reports or moved the data into another spreadsheet then you have made (or most likely were an integral human part of) an ETL. The problem is that out of the box tools like Excel and Access are so flexible, that too much is possible.  Where to start?</p>
<p>The amazing thing is that EVERYONE needs ETL functionality, yet overwhelmingly the tools available are expensive, hard to learn and designed for the really, really heavy lifting.</p>
<p>Surely not every data manipulation task that is too much for Excel needs an enterprise ready server based ETL tool?  Particularly in the current economic environment, oversized solutions are not an option.</p>
<p>A hard working analyst that has a bit of data analysis to do, and nothing but Excel or maybe Access on his/her desktop is short on options and long on messy spreadsheets or the need to "learn SQL in 21 easy steps".<a href="/wp-content/uploads/2008/12/datamartist-etl.jpg"><img class="alignright size-full wp-image-552" title="datamartist-etl" src="/wp-content/uploads/2008/12/datamartist-etl.jpg" alt="" width="350" height="315" /></a></p>
<p>The vision behind Datamartist is to provide an easy to use, powerful, yet low cost data transformation tool, that guides users to generate well structured data analysis sets.  And all at a price that represents less than a single day of those consultants you have to hire to use the other software you paid too much for.</p>
<p>This is the perfect time find out what easy, flexible, visual data transformation can be like-  <a href="/downloads">download now</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/easy-to-use-etl/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

