<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Datamartist.com &#187; Excel Performance</title>
	<atom:link href="http://www.datamartist.com/tag/excel-performance/feed" rel="self" type="application/rss+xml" />
	<link>http://www.datamartist.com</link>
	<description>Reduce cost with self serve data transformation</description>
	<lastBuildDate>Thu, 09 Feb 2012 20:00:31 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.3</generator>
		<item>
		<title>Creating a Fact Table with the Vendor dimension Purchasing DM (Part 2)</title>
		<link>http://www.datamartist.com/creating-a-fact-table-with-the-vendor-dimension-purchasing-dm-part-2</link>
		<comments>http://www.datamartist.com/creating-a-fact-table-with-the-vendor-dimension-purchasing-dm-part-2#comments</comments>
		<pubDate>Fri, 06 Feb 2009 00:23:50 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Data Modelling]]></category>
		<category><![CDATA[Data Transformation]]></category>
		<category><![CDATA[Datamartist Tool]]></category>
		<category><![CDATA[ETL]]></category>
		<category><![CDATA[Personal Data Marts]]></category>
		<category><![CDATA[Data Mart Example]]></category>
		<category><![CDATA[Excel Data Import]]></category>
		<category><![CDATA[Excel Performance]]></category>
		<category><![CDATA[Purchasing Data Warehouse]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=781</guid>
		<description><![CDATA[In creating a data warehouse or data mart data model there are two key types of tables- fact tables and dimension tables. Fact tables hold the data to be analyzed, dimensional tables provide categories and analysis values that organize the data. So we have our mission from Part 1: to analyze the "Acme does everything" [...]]]></description>
			<content:encoded><![CDATA[<p><img src="/wp-content/uploads/2009/02/four_million_rows_no_worries1.jpg" alt="four_million_rows_no_worries1" title="four_million_rows_no_worries1" width="300" height="136" class="alignright size-full wp-image-812" />In creating a data warehouse or data mart data model there are two key types of tables- fact tables and dimension tables.  Fact tables hold the data to be analyzed, dimensional tables provide categories and analysis values that organize the data.<br />
So we have our <a href="/purchasing-data-mart-cutting-costs-with-analysis-part-1">mission from Part 1</a>: to analyze the "Acme does everything" company's purchasing data and find ways to save money.  The first step, however is getting a handle on the data.  The IT department has given us the files, and with a smug smile told us to "have fun".  We've been given three files that are a snapshot of the purchasing data:</p>
<ul>
<li><strong>Item_Master.txt</strong>  - this holds all the items that Acme buys</li>
<li><strong>Vendor_Master.txt</strong> - this holds a list of all the vendors, with information such as their address</li>
<li><strong>PO_Detail.tx</strong>t - this is the huge data set, all the purchase order data for the last four years</li>
</ul>
<p>The Item and Vendor files aren't very big, but the PO_Detail is over 340 Mb, and it holds almost four million purchase order lines.  Don't try to import it into Excel. Of course you need Excel 2007 to even try to import 4 million rows. In Excel 2003 it would take over sixty sheets and probably some VBA code to try it.  I tried the import in Excel 2007- it takes 20 seconds just to tell me I'll have to go back to the text file import multiple times to do multiple imports onto separate sheets. It took almost two minutes to do the first million rows.  Even once we have the data spread across four sheets it's not clear how to summarize millions of rows in excel easily.<img src="/wp-content/uploads/2009/02/po_detail_columns.jpg" alt="po_detail_columns" title="po_detail_columns" width="247" height="398" class="alignright size-full wp-image-785" /></p>
<p>Instead, let's use the <a href="/product">Datamartist tool</a> to manage this data set and generate one thats more useful.</p>
<p>The first analysis we will do will be on the Vendor dimension, to determine who Acme's big vendors are, and if we can negotiate some price reductions where we have leverage.</p>
<p>In Datamartist, very large files are not an issue because the tool can load in only preview data- this means that it's possible to look at a sampling of a few hundred thousand rows, and design the transformation before running it on the whole data set.</p>
<p>The PO Detail file has the columns shown- let's answer the question - "Who are our biggest suppliers"?<br />
 So which columns do we need?  We probably want to have some sense of trends over time so we'll keep the <strong>order date</strong>, but summarize to <strong>Month</strong>,  we'll keep the <strong>Vendor ID</strong> of course, and then we need to use the <strong>Quantity and Price</strong> fields to calculate the total amount spent.  Then we want to write this summarized data into Excel to check it out.</p>
<p>To do this in Datamartist all it takes is four simple blocks;  A Text import block to load in the PO_Detail.txt file, a calculate block to multiply QTY by PRICE, a Summarize block to do all the summarizing, and an Excel export block to generate the excel file;</p>
<p><img src="/wp-content/uploads/2009/02/po_detail_summarize_blocks.jpg" alt="po_detail_summarize_blocks" title="po_detail_summarize_blocks" width="463" height="92" class="alignnone size-full wp-image-806" /></p>
<p>Each block passes its result to the next block via the connectors, and the last block saves it to an excel file we've specified.</p>
<p>Defining the calculation uses standard spreadsheet functions- here's what the config area looks like;<br />
<img src="/wp-content/uploads/2009/02/calculate_total_closeup.jpg" alt="calculate_total_closeup" title="calculate_total_closeup" width="400" height="91" class="alignnone size-full wp-image-801" /></p>
<p>And defining the summary is as simple as it looks- pick the columns you want, and select what kind of summary you want done.<br />
<img src="/wp-content/uploads/2009/02/summary_block_closeup1.jpg" alt="summary_block_closeup1" title="summary_block_closeup1" width="417" height="111" class="alignnone size-full wp-image-797" /></p>
<p>We run it on a preview set of 100 thousand rows (takes about twelve seconds to run), and check the output.</p>
<p>It looks good, so we run on the whole 4 million rows;</p>
<p><img src="/wp-content/uploads/2009/02/summarize_progress_po_detail.jpg" alt="summarize_progress_po_detail" title="summarize_progress_po_detail" width="466" height="128" class="alignnone size-full wp-image-804" /></p>
<p>About seven minutes later we have our result- an excel sheet with a manageable 130 thousand rows, total spend, by vendor, by month for four years;<br />
<img src="/wp-content/uploads/2009/02/completed_po_detail_summary.jpg" alt="completed_po_detail_summary" title="completed_po_detail_summary" width="461" height="95" class="alignnone size-full wp-image-807" /></p>
<p>Next up we need to create our vendor dimension, and join it to this mini fact table we have created.  Stay tuned.</p>
<p>This is part of a 5 part series- here are the links to the various parts: <a href="/purchasing-data-mart-cutting-costs-with-analysis-part-1">1</a>,<a href="/creating-a-fact-table-with-the-vendor-dimension-purchasing-dm-part-2">2</a> , <a href="/connecting-the-dimension-table-to-the-fact-table-vendor-example-part-3">3</a> , <a href="/hierarchies-and-tree-structures-in-dimensions-an-example-item-dimension-part-4">4</a> and <a href="/joining-the-dimension-table-to-the-fact-table-purchasing-data-mart-part-5">5</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/creating-a-fact-table-with-the-vendor-dimension-purchasing-dm-part-2/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Importing Data into Excel</title>
		<link>http://www.datamartist.com/importing-data-into-excel</link>
		<comments>http://www.datamartist.com/importing-data-into-excel#comments</comments>
		<pubDate>Mon, 01 Sep 2008 15:50:43 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Personal Data Marts]]></category>
		<category><![CDATA[Spreadsheet Tips]]></category>
		<category><![CDATA[Business Intelligence]]></category>
		<category><![CDATA[Excel]]></category>
		<category><![CDATA[Excel Data Import]]></category>
		<category><![CDATA[Excel Performance]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=20</guid>
		<description><![CDATA[I've seen lots of Business Intelligence (BI) solutions, (data marts, data warehouses and the accompanying reports and dashboards) using all sorts of different tools. But I'll tell you- NO tool has yet been as successful as Microsoft Excel for providing a do it yourself data analysis platform to import data into. Now, I'm not suggesting [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-107" title="excelisking3" src="/wp-content/uploads/2008/09/excelisking3.jpg" alt="" width="254" height="269" />I've seen lots of <a href="http://en.wikipedia.org/wiki/Business_intelligence" target="_blank">Business Intelligence</a> (BI) solutions, (<a href="http://en.wikipedia.org/wiki/Data_mart" target="_blank">data marts</a>, <a href="http://en.wikipedia.org/wiki/Data_warehouse" target="_blank">data warehouses</a> and the accompanying reports and dashboards) using all sorts of <a href="http://en.wikipedia.org/wiki/Business_intelligence_tools" target="_blank">different tools</a>. But I'll tell you- NO tool has yet been as successful as Microsoft Excel for providing a do it yourself data analysis platform to import data into. Now, I'm not suggesting that Excel (even when used with the <a href="/product">upcoming Datamartist tool </a> <img src='http://www.datamartist.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' />  ) will make traditional data marts obsolete. Clearly the <a title="Market Growth of Enterprise BI" href="http://www.gartner.com/it/page.jsp?id=580708" target="_blank">billions of dollars being spent on "enterprise BI"</a>are not going to dry up. But there are enough times you have to wait- or your needs are "too specific"- for a large BI project. Often the existing data marts or data warehouses will be the source of raw data. But you will still need to prepare data for Excel import. In the next few posts I'm going to discuss various aspects of using excel for data analysis. In this first part, I'll talk about data size in excel and performance which is important - when should you import the data? Import the HUGE raw file, or treat it before import to reduce its size?</p>
<h2>Data Size Limits in Excel</h2>
<p>There are different types of limits-</p>
<ol>
<li>The size in rows and columns the actual spreadsheet has.</li>
<li>Excel's (and your PC's) ability to crunch the numbers in a reasonable time. (RAM, CPU)</li>
<li>The size of the files involved and load and save times.</li>
</ol>
<p>In Excel 2003, a spreadsheet has rows 1 to 65 536 and columns A to IV. This makes it a grid 256 X 65536. In Excel 2007 the spreadsheet is much, much larger, with rows from 1 to 1 048 576 and columns from A to XFD. (Making a grid 16384 X 1 048 576).<a href="/wp-content/uploads/2008/09/importtoexcel1.jpg"><img class="alignright size-medium wp-image-63" title="importtoexcel1" src="/wp-content/uploads/2008/09/importtoexcel1-300x227.jpg" alt="" width="300" height="227" /></a> Now before you get too excited about how much space you have in 2007, the reality is that limits number 2 and 3 define how you can actually use that space. But it is more and more is good.</p>
<p>So lets kick the tires on large data sets in Excel 2007. For these very informal tests I'm using a Quad-core workstation with 4Gb of RAM, so the results I get represent a best case compared to a typical laptop or desktop PC. First of all- putting a million rows of data in Excel 2007 (even a "narrow table" of only 3-4 columns) slows everything down. Delete a column, and you'll often see a 5-10 second freeze-up while excel churns away in the background- roughly the same amount of time needed to save the file. Plus, when I push it I've had it lock up on me a few times- requiring some Ctrl-Alt-Del action to kill it. Even a narrow table such as this makes the Excel file be at minimum 15-20 megabytes. For the particular text file I used, the .txt version was 9 Mb, the .xlsx file was double the size at 18 Mb. I added a few columns and the file quickly became 80 Mb.</p>
<p>Also, strangely, doing exactly the same thing multiple times results in very different times to complete- when I'm mentioning times its the average of 2-3 trials (see graph).</p>
<p> <a href="/wp-content/uploads/2008/09/excel-operations-times.jpg"><img class="size-medium wp-image-65 alignleft" title="excel-operations-times" src="/wp-content/uploads/2008/09/excel-operations-times-300x169.jpg" alt="" width="300" height="169" /></a>All in all, although Excel 2007 can technically store a million rows, I'd advise against it. There are other reasons its a pain- scroll bars and page-up page-down don't scale well to 1M rows- its just hard to copy 250000 rows accurately- takes for ever to get to the end, and then you overshoot by a mile, and page up again forever to find it etc. etc. (And yes you can use the Go To command on the Home&gt;Editing&gt;Find and Select&gt;Go to - but a model of ease its not.</p>
<p>I can tell you, however, that using all the other features on more reasonable data sets (up to say, 100 k rows), I LOVE what it can do in terms of analysis and reporting. Once you have the data in reasonable result sets, there is no better place to have it than in Excel if you want full control in my opinion. But how to get it there. Next posts: how to link to data in Access and build a mini personal data mart. We'll learn how to make a personal data mart given the currently available tools. (And you just know there will be some posts later where I show you how to do the same thing, but using Datamartist. ) <strong>Update:  Datamartist now available.</strong>  <a href="/downloads">Download the tool now</a>, and find a whole new way to transform and managed your data, including <strong>managing huge data imports into excel</strong>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/importing-data-into-excel/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

