<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Datamartist.com &#187; Personal Data Marts</title>
	<atom:link href="http://www.datamartist.com/tag/personal-data-marts/feed" rel="self" type="application/rss+xml" />
	<link>http://www.datamartist.com</link>
	<description>Reduce cost with self serve data transformation</description>
	<lastBuildDate>Thu, 09 Feb 2012 20:00:31 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.3</generator>
		<item>
		<title>MS Access vs Excel vs Datamartist</title>
		<link>http://www.datamartist.com/ms-access-vs-excel-vs-datamartist-a-do-it-yourself-guide</link>
		<comments>http://www.datamartist.com/ms-access-vs-excel-vs-datamartist-a-do-it-yourself-guide#comments</comments>
		<pubDate>Fri, 06 Mar 2009 02:33:06 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Data Modelling]]></category>
		<category><![CDATA[MS Access]]></category>
		<category><![CDATA[MS Excel]]></category>
		<category><![CDATA[Datamartist Tool]]></category>
		<category><![CDATA[Excel Data Import]]></category>
		<category><![CDATA[Personal Data Marts]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=1251</guid>
		<description><![CDATA[When data analysis requirements really get tough, the tough get going- and start to seriously use databases. Let's face it, if you're considering Microsoft Access chances are what you need to get done is beyond what Excel does well, so you're looking for options. Its also likely that your IT department is unable or un-willing [...]]]></description>
			<content:encoded><![CDATA[<p><img src="/wp-content/uploads/2009/03/excel-database-datamartist1.jpg" alt="excel-database-datamartist1" title="excel-database-datamartist1" width="200" height="183" class="alignright size-full wp-image-1301" />When data analysis requirements really get tough, the tough get going- and start to seriously use databases.</p>
<p>Let's face it, if you're considering Microsoft Access chances are what you need to get done is beyond what Excel does well, so you're looking for options.  Its also likely that your IT department is unable or un-willing to help you out- this being even more likely as the recession reduces reporting budgets left, right and center.</p>
<p>Two of the key things that lead someone to search for a database solution are:</p>
<ul>
<li><strong>Data Volume</strong>- More than a million rows and Excel becomes very difficult, even before that the performance suffers.</li>
<li><strong>Flexibility to Join Tables</strong> - Vlookup and VBA code only go so far- Access gives an easy way to make joins between tables, one of the powerful features of relational databases.</li>
</ul>
<p>Now, the data volume is what it is- if you have millions and millions of rows, you need something to cut it down to size before you move it into your Excel spreadsheet. </p>
<p>On the other point, however, I can hear the Excel fans saying "now wait a minute, Excel can do that, I don't really need a database" and they are right.  But they are almost always right- Excel can do almost anything. It does not mean, however that its the best tool for the job. Using Vlookup and VBA scripts to join up multiple tables is not my idea of a fun time. And even in Excel 2007 I find the pivot tables annoying and prone to break if I'm adding categories, moving data sets or heaven forbid changing number and order of columns.</p>
<p>Microsoft Access has a very nice interface for creating joins between tables, just a simple drag and drop between fields. The cross tab query capability is useful and good, and being a relational database it's more tolerant of changes to table structure because it's not messing with cell references.</p>
<p>"But", many who have used MS Access will say, "its pretty complex to learn, and even if I do start to get the query stuff down, it doesn't handle bad data well."</p>
<p>Bad data?  Who has bad data? Isn't all data pristine, as intended, correctly formatted and accurate?</p>
<p><img src="/wp-content/uploads/2009/03/enough-to-make-access-decide-its-text1.jpg" alt="enough-to-make-access-decide-its-text1" title="enough-to-make-access-decide-its-text1" width="210" height="225" class="alignright size-full wp-image-1289" />One of the huge differences between Excel and MS Access is that Excel is extremely flexible.  (Probably more flexible than your auditor would like, but thats a different story).  One source of Excels flexibility is its ability to accept different data types in the same column, and to allow editing of cells quickly. In Microsoft Access, for example, when it sees some variation it either discards the data or defaults to the data type "Text"- meaning now you can't perform the calculations you need to do on your data.<img src="/wp-content/uploads/2009/03/sales-data-import-errors.jpg" alt="sales-data-import-errors" title="sales-data-import-errors" width="365" height="232" class="alignright size-full wp-image-1289" /></p>
<p>This illustrates one of the challenges people face in trying to use a database - databases are very strict on data types.  Once you declare a data type for a column, if you import data into the table, the database will discard the values that do not conform to that data type.  In Excel, you get cell errors if you try calculations but the original data is still there.</p>
<p>One of the powerful features of the <a href="/product">Datamartist tool</a> is the fact that it has an underlying database structure that provides flexibility on data types.  Unlike MS Access and other databases, Datamartist can store dates, numbers, strings and booleans natively in a single column. (It does not convert to strings- it stores the full object).  Take a look at this example:<br />
<img src="/wp-content/uploads/2009/03/datamartist-dynamicly-handles-data-type-at-row-level1.jpg" alt="datamartist-dynamicly-handles-data-type-at-row-level1" title="datamartist-dynamicly-handles-data-type-at-row-level1" width="425" height="226" class="aligncenter size-full wp-image-1294" /></p>
<p>In each individual row, Datamartist completes the calculation if possible.  Datamartist is a database that gives you the freedom of a Spreadsheet. Of course, just like excel, if you ask for a calculation on a value that is meaningless you will get an error- but at the individual value- not a full row discard.  This means that with messy data you can still work with it, bring it in, and fix it.  In Access or another database, you can't even get it through the front door (or it defaults to text, making many calculations impossible).</p>
<p>This won't be the last time I compare these three tools- and the types of data structures and tasks each of them are most effective with.</p>
<p>In the mean time- Download <a href="/downloads">Datamartist</a>- see what I'm talking about with your own data.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/ms-access-vs-excel-vs-datamartist-a-do-it-yourself-guide/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Connecting the dimension table to the fact table- Vendor Example (Part 3)</title>
		<link>http://www.datamartist.com/connecting-the-dimension-table-to-the-fact-table-vendor-example-part-3</link>
		<comments>http://www.datamartist.com/connecting-the-dimension-table-to-the-fact-table-vendor-example-part-3#comments</comments>
		<pubDate>Mon, 09 Feb 2009 20:47:55 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Cost Reduction]]></category>
		<category><![CDATA[Data Modelling]]></category>
		<category><![CDATA[Data Quality]]></category>
		<category><![CDATA[Datamartist Tool]]></category>
		<category><![CDATA[ETL]]></category>
		<category><![CDATA[Personal Data Marts]]></category>
		<category><![CDATA[Data Mart Example]]></category>
		<category><![CDATA[Dimension Tables]]></category>
		<category><![CDATA[Duplicate Data]]></category>
		<category><![CDATA[Purchasing Data Warehouse]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=858</guid>
		<description><![CDATA[In parts one and two of this series we introduced our challenge (to make a data mart to analyze the Acme Company's spending) and showed how the Datamartist tool could import millions of rows of data and then turn it into a fact table we can use in Excel. Now we need to create a [...]]]></description>
			<content:encoded><![CDATA[<p><img src="/wp-content/uploads/2009/02/makingdimseasyway.jpg" alt="makingdimseasyway" title="makingdimseasyway" width="250" height="97" class="alignright size-full wp-image-883" />In parts <a href="/purchasing-data-mart-cutting-costs-with-analysis-part-1">one</a> and <a href="/creating-a-fact-table-with-the-vendor-dimension-purchasing-dm-part-2">two</a> of this series we introduced our challenge (to make a data mart to analyze the Acme Company's spending) and showed how the <a href="/product">Datamartist tool</a> could import millions of rows of data and then turn it into a fact table we can use in Excel.</p>
<p>Now we need to create a Vendor dimension table and join it to this fact table to determine who our big vendors are.</p>
<p>In Datamartist it is a simple task to create this vendor dimension. As always we use blocks and connect them together.  We define a dimension by using a reference definition block. All we have to do to configure the reference block is to specify which columns uniquely define the dimension (or almost uniquely, Datamartist will resolve duplicate keys using a majority/first rule set for you if you have some data glitches).</p>
<p>We start with an import block that brings in the Vendor master text file, then we define the reference by specifying "Vendor_ID" as the key.  These first two blocks look like this:<br />
<img src="/wp-content/uploads/2009/02/vendor-master-in-and-reference-block.jpg" alt="vendor-master-in-and-reference-block" title="vendor-master-in-and-reference-block" width="302" height="148" class="alignnone size-full wp-image-878" /></p>
<p>Then we join it to the fact table we created in part two of this series with a join block.  This means that now instead of just the vendor ID number that was in the fact table, we have the name, and address for the vendor in our mini star schema.</p>
<p><img src="/wp-content/uploads/2009/02/vendor-dimension-and-join.jpg" alt="vendor-dimension-and-join" title="vendor-dimension-and-join" width="436" height="283" class="alignnone size-full wp-image-879" /></p>
<p>And finally we put a summarize block after that to total up all the monthly values for each vendor, and we export to excel. This is what the canvas looks like:<br />
<img src="/wp-content/uploads/2009/02/vendor-dimension-without-dedup1.jpg" alt="vendor-dimension-without-dedup1" title="vendor-dimension-without-dedup1" width="501" height="198" class="alignnone size-full wp-image-865" /><br />
After we do this, we grab the excel file Datamartist just created for us, do a quick sort, and come up with a list of Acme's top ten suppliers.  Feeling pretty good about ourselves, we do a review with the head of purchasing.</p>
<p>"Where's Mega brothers?" she says with a frown "I think your data is screwy- no way that Mega brothers didn't make the top ten- we spend a fortune on railways, and a lot of our freight goes with the Mega Brothers Rail company. Of course it is probably entered under different vendors, each location works with the office local to them... But we've got to view them as a single vendor in the data mart- you <em><strong>can</strong></em> do that right?"</p>
<p><img src="/wp-content/uploads/2009/02/vendor-dimension-with-dedupe1.jpg" alt="vendor-dimension-with-dedupe1" title="vendor-dimension-with-dedupe1" width="300" height="205" class="alignright size-full wp-image-870" /></p>
<h2>Fixing Duplicate Rows</h2>
<p>  Having to deal with duplicate data is a very common issue in any type of data analysis.  So, back to the canvas.  By simply adding a de-duplicate block to our Vendor dimension table (after the Reference block, and before the join) we can find and resolve the Mega Brothers duplicates.<br />
We just use the filter to find the records- (Easy to do, looking for "Mega" "rail" "brothers" etc. and we map them to a single instance.)  This is the filter control that lets us find and tag the duplicates:<br />
<img src="/wp-content/uploads/2009/02/mega-bros-duplicates-in-picker1.jpg" alt="mega-bros-duplicates-in-picker1" title="mega-bros-duplicates-in-picker1" width="400" height="280" class="alignnone size-full wp-image-871" /></p>
<p><img src="/wp-content/uploads/2009/02/mega-bros-duplicates-in-mapper.jpg" alt="mega-bros-duplicates-in-mapper" title="mega-bros-duplicates-in-mapper" width="312" height="247" class="alignright size-full wp-image-872" />As we tag them, they show up in the mapper, which lets us see which duplicate records we have eliminated for the dimension. We run the canvas again, and this time, sure enough, Mega Brothers Rail is in our top ten.  But even though the head of purchasing knew it was a lot, this is actually the first time she's seen the number.  "Wow. I've got to give them a call- can you give me that in an Excel spreadsheet?"</p>
<p>Stay tuned, more to come as we go further into Datamartist's ability to segment, filter and organize large data sets.</p>
<p>If you want to see the interface in action watch our first <a href="/product/video-and-screenshots/introductory-tutorial-video">Tutorial Video</a>.  Or just get right to it with your own data- <a href="/downloads">download the free trial now</a>- there is no registration required, and it installs in minutes.</p>
<p>This is part of a 5 part series- here are the links to the various parts: <a href="/purchasing-data-mart-cutting-costs-with-analysis-part-1">1</a>,<a href="/creating-a-fact-table-with-the-vendor-dimension-purchasing-dm-part-2">2</a> , <a href="/connecting-the-dimension-table-to-the-fact-table-vendor-example-part-3">3</a> , <a href="/hierarchies-and-tree-structures-in-dimensions-an-example-item-dimension-part-4">4</a> and <a href="/joining-the-dimension-table-to-the-fact-table-purchasing-data-mart-part-5">5</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/connecting-the-dimension-table-to-the-fact-table-vendor-example-part-3/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Dimensional Tables and Fact Tables</title>
		<link>http://www.datamartist.com/dimensional-tables-and-fact-tables</link>
		<comments>http://www.datamartist.com/dimensional-tables-and-fact-tables#comments</comments>
		<pubDate>Fri, 31 Oct 2008 02:41:21 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Data Modelling]]></category>
		<category><![CDATA[Personal Data Marts]]></category>
		<category><![CDATA[Dimension Tables]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=276</guid>
		<description><![CDATA[One of the secrets to putting together a good set of data marts is the concept of dimensions.  There are two key steps being able to analyse your data, and to build a working data mart model.   Build a set of clean, consistent dimension tables that store reference information about your key dimensions like Product, [...]]]></description>
			<content:encoded><![CDATA[<p>One of the secrets to putting together a good set of data marts is the concept of dimensions.  There are two key steps being able to analyse your data, and to build a working data mart model.  </p>
<ul>
<li>Build a set of clean, consistent dimension tables that store reference information about your key dimensions like Product, Customer, Geographical Areas, Sales Areas etc.</li>
<li>Join them up to a fact table that does NOT have dimensional data in it.  Just the facts, ma’am.</li>
</ul>
<p>Usually, to make a proper star schema data mart, it is necessary to transform the source data set, removing dimensional data, and generating a fact set.  The dimensional data that is removed must be transformed to remove duplicate rows and to resolve any data quality issues that might exist.  Transactional systems don't know about dimensions- but you do.</p>
<p>A key part of the data modelling is to determine which fields in the source data should be put in the Dimensional tables and which fields should go to the Fact table.<a href="/wp-content/uploads/2008/10/dimensiontablevsfacttable2.jpg"><img class="alignright size-medium wp-image-280" title="dimensiontablevsfacttable2" src="/wp-content/uploads/2008/10/dimensiontablevsfacttable2-300x150.jpg" alt="" width="300" height="150" /></a></p>
<h2>Determining the Grain of the fact table</h2>
<p>The very first step is to determine WHAT exactly is one fact in our fact table going to be?  The GRAIN or GRANULARITY of the fact table refers to the level of detail of each row in the fact table.  For example, an order fact table might have a grain of order, with one row per order, or order line, with a row for every line on each order (meaning more than one line for some orders).  It is key to make a decision on the grain of the fact table first.  This is often a balance between keeping detail, and managing complexity.<br />
This a key question, and is driven by what it is you want to analyse.  For example, if the decision is made to have a granularity of one row per order, then it might be necessary to remove all product information (since any given order might have multiple products) and only have total order value.  This won’t work if you want to analyse product segments, or compare different products.<br />
To have our cake and eat it too, we’ll use a simplified example of order data where the grain is one row equals one order and every order in our system has one and only one product.  This table has the following columns:</p>
<blockquote><p>Order Number, Order Date, Ship Date, Customer Name, Customer Segment, Product Name, Product Category, Product Sub Category, Quantity Sold, Unit Price</p></blockquote>
<h2>Some Simple Questions to guide us</h2>
<p>To determine which columns should be in the dimension table and which columns in the fact table, ask yourself these questions:</p>
<p><strong>Is the data in the column something that is unique for every order?</strong> – if Yes, then its definitely part of the fact table-  So order number is definitely in the fact table, as is Order Date, Ship Date, Quantity Sold and (most likely) Unit Price. Since all these things are linked to the order, and might change for each order.<br />
<strong>Is the data in the column referring to data in another column and will always be the same?</strong>  If yes, then this is probably a candidate for a dimensional table.  In this example, the Customer Segment is probably something that is the same for a given customer on ALL the orders, so should be in a Customer Dimension.  Likewise, the product category and sub-category are probably used to organise products, and therefore can be determined from the product name alone and don’t change from order to order.<br />
Another way to help determine which columns go into the fact table is to think about <strong>the directness of the relationship between what is stored in the column and the grain of the fact table</strong>.  For example in this case the Customer Name field is directly related to the order, but the Customer Segment field is related to the Customer Name field, which is then related to the order.  Once removed or more usually means it should be in the dimensional table (again, providing the value is consistent for all orders, or should be).</p>
<p>Taking the time to think about the fact table grain, and determine which dimension tables you are going to build and what you are going to put in them is an important first step to creating a good data model for your data mart, and needs to be done no matter which tools you use to build it.  If you want to try a visual, easy to use data transformation tool that lets you get at your data without having to resort to data base programming, check out the <a href="/product">Datamartist tool</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/dimensional-tables-and-fact-tables/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

