<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Datamartist.com &#187; Excel Data Import</title>
	<atom:link href="http://www.datamartist.com/tag/excel-data-import/feed" rel="self" type="application/rss+xml" />
	<link>http://www.datamartist.com</link>
	<description>Reduce cost with self serve data transformation</description>
	<lastBuildDate>Thu, 09 Feb 2012 20:00:31 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.3</generator>
		<item>
		<title>MS Access vs Excel vs Datamartist</title>
		<link>http://www.datamartist.com/ms-access-vs-excel-vs-datamartist-a-do-it-yourself-guide</link>
		<comments>http://www.datamartist.com/ms-access-vs-excel-vs-datamartist-a-do-it-yourself-guide#comments</comments>
		<pubDate>Fri, 06 Mar 2009 02:33:06 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Data Modelling]]></category>
		<category><![CDATA[MS Access]]></category>
		<category><![CDATA[MS Excel]]></category>
		<category><![CDATA[Datamartist Tool]]></category>
		<category><![CDATA[Excel Data Import]]></category>
		<category><![CDATA[Personal Data Marts]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=1251</guid>
		<description><![CDATA[When data analysis requirements really get tough, the tough get going- and start to seriously use databases. Let's face it, if you're considering Microsoft Access chances are what you need to get done is beyond what Excel does well, so you're looking for options. Its also likely that your IT department is unable or un-willing [...]]]></description>
			<content:encoded><![CDATA[<p><img src="/wp-content/uploads/2009/03/excel-database-datamartist1.jpg" alt="excel-database-datamartist1" title="excel-database-datamartist1" width="200" height="183" class="alignright size-full wp-image-1301" />When data analysis requirements really get tough, the tough get going- and start to seriously use databases.</p>
<p>Let's face it, if you're considering Microsoft Access chances are what you need to get done is beyond what Excel does well, so you're looking for options.  Its also likely that your IT department is unable or un-willing to help you out- this being even more likely as the recession reduces reporting budgets left, right and center.</p>
<p>Two of the key things that lead someone to search for a database solution are:</p>
<ul>
<li><strong>Data Volume</strong>- More than a million rows and Excel becomes very difficult, even before that the performance suffers.</li>
<li><strong>Flexibility to Join Tables</strong> - Vlookup and VBA code only go so far- Access gives an easy way to make joins between tables, one of the powerful features of relational databases.</li>
</ul>
<p>Now, the data volume is what it is- if you have millions and millions of rows, you need something to cut it down to size before you move it into your Excel spreadsheet. </p>
<p>On the other point, however, I can hear the Excel fans saying "now wait a minute, Excel can do that, I don't really need a database" and they are right.  But they are almost always right- Excel can do almost anything. It does not mean, however that its the best tool for the job. Using Vlookup and VBA scripts to join up multiple tables is not my idea of a fun time. And even in Excel 2007 I find the pivot tables annoying and prone to break if I'm adding categories, moving data sets or heaven forbid changing number and order of columns.</p>
<p>Microsoft Access has a very nice interface for creating joins between tables, just a simple drag and drop between fields. The cross tab query capability is useful and good, and being a relational database it's more tolerant of changes to table structure because it's not messing with cell references.</p>
<p>"But", many who have used MS Access will say, "its pretty complex to learn, and even if I do start to get the query stuff down, it doesn't handle bad data well."</p>
<p>Bad data?  Who has bad data? Isn't all data pristine, as intended, correctly formatted and accurate?</p>
<p><img src="/wp-content/uploads/2009/03/enough-to-make-access-decide-its-text1.jpg" alt="enough-to-make-access-decide-its-text1" title="enough-to-make-access-decide-its-text1" width="210" height="225" class="alignright size-full wp-image-1289" />One of the huge differences between Excel and MS Access is that Excel is extremely flexible.  (Probably more flexible than your auditor would like, but thats a different story).  One source of Excels flexibility is its ability to accept different data types in the same column, and to allow editing of cells quickly. In Microsoft Access, for example, when it sees some variation it either discards the data or defaults to the data type "Text"- meaning now you can't perform the calculations you need to do on your data.<img src="/wp-content/uploads/2009/03/sales-data-import-errors.jpg" alt="sales-data-import-errors" title="sales-data-import-errors" width="365" height="232" class="alignright size-full wp-image-1289" /></p>
<p>This illustrates one of the challenges people face in trying to use a database - databases are very strict on data types.  Once you declare a data type for a column, if you import data into the table, the database will discard the values that do not conform to that data type.  In Excel, you get cell errors if you try calculations but the original data is still there.</p>
<p>One of the powerful features of the <a href="/product">Datamartist tool</a> is the fact that it has an underlying database structure that provides flexibility on data types.  Unlike MS Access and other databases, Datamartist can store dates, numbers, strings and booleans natively in a single column. (It does not convert to strings- it stores the full object).  Take a look at this example:<br />
<img src="/wp-content/uploads/2009/03/datamartist-dynamicly-handles-data-type-at-row-level1.jpg" alt="datamartist-dynamicly-handles-data-type-at-row-level1" title="datamartist-dynamicly-handles-data-type-at-row-level1" width="425" height="226" class="aligncenter size-full wp-image-1294" /></p>
<p>In each individual row, Datamartist completes the calculation if possible.  Datamartist is a database that gives you the freedom of a Spreadsheet. Of course, just like excel, if you ask for a calculation on a value that is meaningless you will get an error- but at the individual value- not a full row discard.  This means that with messy data you can still work with it, bring it in, and fix it.  In Access or another database, you can't even get it through the front door (or it defaults to text, making many calculations impossible).</p>
<p>This won't be the last time I compare these three tools- and the types of data structures and tasks each of them are most effective with.</p>
<p>In the mean time- Download <a href="/downloads">Datamartist</a>- see what I'm talking about with your own data.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/ms-access-vs-excel-vs-datamartist-a-do-it-yourself-guide/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Excel auto formating is getting into your genes</title>
		<link>http://www.datamartist.com/excel-auto-formating-is-getting-into-your-genes</link>
		<comments>http://www.datamartist.com/excel-auto-formating-is-getting-into-your-genes#comments</comments>
		<pubDate>Wed, 04 Mar 2009 16:03:54 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Data Quality]]></category>
		<category><![CDATA[Microsoft Excel]]></category>
		<category><![CDATA[Software in General]]></category>
		<category><![CDATA[Excel]]></category>
		<category><![CDATA[Excel Data Import]]></category>
		<category><![CDATA[Fixing Data]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=1261</guid>
		<description><![CDATA[We often give Excel our data, and trust it to do the right thing. There was a link posted on meta-filter today that sparked some lively discussion amongst the crowd. The Excel auto formating "feature" loves to scramble common genetic nomenclature. It turns out that in the genetics field, common codes get converted to incorrect [...]]]></description>
			<content:encoded><![CDATA[<p>We often give Excel our data, and trust it to do the right thing.</p>
<p>There was a link posted on <a href="http://www.metafilter.com/">meta-filter</a> today that sparked some lively discussion amongst the crowd.  The Excel auto formating "feature" loves to <a href="http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pubmed&#038;pubmedid=15214961" target="_blank">scramble common genetic nomenclature.</a><img src="/wp-content/uploads/2009/03/my-gene-therapist-is-an-excel-nut.jpg" alt="my-gene-therapist-is-an-excel-nut" title="my-gene-therapist-is-an-excel-nut" width="300" height="241" class="alignright size-full wp-image-1267" /></p>
<p>It turns out that in the genetics field, common codes get converted to incorrect values regularly.  One example given was the code for tumor suppressor "DEC1" which gets coverted to the date December 1.  Another was the code "2310009E13" (apparently a "RIKEN clone identifier") - which would be converted to a number, 2.31E+19.  I'm not a geneticist but I can just see how this wouldn't be helpful.</p>
<p>I checked these examples on Excel 2007, and sure enough, the default will make changes right at import that scramble the mentioned codes- no error, no notification, no problem.   Of course Excel is perfectly capable of handling this data properly- the user needs to specify the field as text, and the conversions won't be done.<br />
The key point brought up in the article (and is always true about excel spreadsheets) is not just that in this case the data gets corrupted but that depending on how carefully  a user checks the error may not be detected.<br />
If undetected, what decisions, conclusions and actions will be taken based on the incorrect information?</p>
<p>Excel is super powerful, and super useful, but we have to always remind ourselves to balance the ease of use with how critical our data is, and what the impact of errors might be.  In the end, as with all computer use, we have to test, validate and test again at a level consistent with whatever use we are putting the data to.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/excel-auto-formating-is-getting-into-your-genes/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Creating a Fact Table with the Vendor dimension Purchasing DM (Part 2)</title>
		<link>http://www.datamartist.com/creating-a-fact-table-with-the-vendor-dimension-purchasing-dm-part-2</link>
		<comments>http://www.datamartist.com/creating-a-fact-table-with-the-vendor-dimension-purchasing-dm-part-2#comments</comments>
		<pubDate>Fri, 06 Feb 2009 00:23:50 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Data Modelling]]></category>
		<category><![CDATA[Data Transformation]]></category>
		<category><![CDATA[Datamartist Tool]]></category>
		<category><![CDATA[ETL]]></category>
		<category><![CDATA[Personal Data Marts]]></category>
		<category><![CDATA[Data Mart Example]]></category>
		<category><![CDATA[Excel Data Import]]></category>
		<category><![CDATA[Excel Performance]]></category>
		<category><![CDATA[Purchasing Data Warehouse]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=781</guid>
		<description><![CDATA[In creating a data warehouse or data mart data model there are two key types of tables- fact tables and dimension tables. Fact tables hold the data to be analyzed, dimensional tables provide categories and analysis values that organize the data. So we have our mission from Part 1: to analyze the "Acme does everything" [...]]]></description>
			<content:encoded><![CDATA[<p><img src="/wp-content/uploads/2009/02/four_million_rows_no_worries1.jpg" alt="four_million_rows_no_worries1" title="four_million_rows_no_worries1" width="300" height="136" class="alignright size-full wp-image-812" />In creating a data warehouse or data mart data model there are two key types of tables- fact tables and dimension tables.  Fact tables hold the data to be analyzed, dimensional tables provide categories and analysis values that organize the data.<br />
So we have our <a href="/purchasing-data-mart-cutting-costs-with-analysis-part-1">mission from Part 1</a>: to analyze the "Acme does everything" company's purchasing data and find ways to save money.  The first step, however is getting a handle on the data.  The IT department has given us the files, and with a smug smile told us to "have fun".  We've been given three files that are a snapshot of the purchasing data:</p>
<ul>
<li><strong>Item_Master.txt</strong>  - this holds all the items that Acme buys</li>
<li><strong>Vendor_Master.txt</strong> - this holds a list of all the vendors, with information such as their address</li>
<li><strong>PO_Detail.tx</strong>t - this is the huge data set, all the purchase order data for the last four years</li>
</ul>
<p>The Item and Vendor files aren't very big, but the PO_Detail is over 340 Mb, and it holds almost four million purchase order lines.  Don't try to import it into Excel. Of course you need Excel 2007 to even try to import 4 million rows. In Excel 2003 it would take over sixty sheets and probably some VBA code to try it.  I tried the import in Excel 2007- it takes 20 seconds just to tell me I'll have to go back to the text file import multiple times to do multiple imports onto separate sheets. It took almost two minutes to do the first million rows.  Even once we have the data spread across four sheets it's not clear how to summarize millions of rows in excel easily.<img src="/wp-content/uploads/2009/02/po_detail_columns.jpg" alt="po_detail_columns" title="po_detail_columns" width="247" height="398" class="alignright size-full wp-image-785" /></p>
<p>Instead, let's use the <a href="/product">Datamartist tool</a> to manage this data set and generate one thats more useful.</p>
<p>The first analysis we will do will be on the Vendor dimension, to determine who Acme's big vendors are, and if we can negotiate some price reductions where we have leverage.</p>
<p>In Datamartist, very large files are not an issue because the tool can load in only preview data- this means that it's possible to look at a sampling of a few hundred thousand rows, and design the transformation before running it on the whole data set.</p>
<p>The PO Detail file has the columns shown- let's answer the question - "Who are our biggest suppliers"?<br />
 So which columns do we need?  We probably want to have some sense of trends over time so we'll keep the <strong>order date</strong>, but summarize to <strong>Month</strong>,  we'll keep the <strong>Vendor ID</strong> of course, and then we need to use the <strong>Quantity and Price</strong> fields to calculate the total amount spent.  Then we want to write this summarized data into Excel to check it out.</p>
<p>To do this in Datamartist all it takes is four simple blocks;  A Text import block to load in the PO_Detail.txt file, a calculate block to multiply QTY by PRICE, a Summarize block to do all the summarizing, and an Excel export block to generate the excel file;</p>
<p><img src="/wp-content/uploads/2009/02/po_detail_summarize_blocks.jpg" alt="po_detail_summarize_blocks" title="po_detail_summarize_blocks" width="463" height="92" class="alignnone size-full wp-image-806" /></p>
<p>Each block passes its result to the next block via the connectors, and the last block saves it to an excel file we've specified.</p>
<p>Defining the calculation uses standard spreadsheet functions- here's what the config area looks like;<br />
<img src="/wp-content/uploads/2009/02/calculate_total_closeup.jpg" alt="calculate_total_closeup" title="calculate_total_closeup" width="400" height="91" class="alignnone size-full wp-image-801" /></p>
<p>And defining the summary is as simple as it looks- pick the columns you want, and select what kind of summary you want done.<br />
<img src="/wp-content/uploads/2009/02/summary_block_closeup1.jpg" alt="summary_block_closeup1" title="summary_block_closeup1" width="417" height="111" class="alignnone size-full wp-image-797" /></p>
<p>We run it on a preview set of 100 thousand rows (takes about twelve seconds to run), and check the output.</p>
<p>It looks good, so we run on the whole 4 million rows;</p>
<p><img src="/wp-content/uploads/2009/02/summarize_progress_po_detail.jpg" alt="summarize_progress_po_detail" title="summarize_progress_po_detail" width="466" height="128" class="alignnone size-full wp-image-804" /></p>
<p>About seven minutes later we have our result- an excel sheet with a manageable 130 thousand rows, total spend, by vendor, by month for four years;<br />
<img src="/wp-content/uploads/2009/02/completed_po_detail_summary.jpg" alt="completed_po_detail_summary" title="completed_po_detail_summary" width="461" height="95" class="alignnone size-full wp-image-807" /></p>
<p>Next up we need to create our vendor dimension, and join it to this mini fact table we have created.  Stay tuned.</p>
<p>This is part of a 5 part series- here are the links to the various parts: <a href="/purchasing-data-mart-cutting-costs-with-analysis-part-1">1</a>,<a href="/creating-a-fact-table-with-the-vendor-dimension-purchasing-dm-part-2">2</a> , <a href="/connecting-the-dimension-table-to-the-fact-table-vendor-example-part-3">3</a> , <a href="/hierarchies-and-tree-structures-in-dimensions-an-example-item-dimension-part-4">4</a> and <a href="/joining-the-dimension-table-to-the-fact-table-purchasing-data-mart-part-5">5</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/creating-a-fact-table-with-the-vendor-dimension-purchasing-dm-part-2/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Importing Data into Excel</title>
		<link>http://www.datamartist.com/importing-data-into-excel</link>
		<comments>http://www.datamartist.com/importing-data-into-excel#comments</comments>
		<pubDate>Mon, 01 Sep 2008 15:50:43 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Personal Data Marts]]></category>
		<category><![CDATA[Spreadsheet Tips]]></category>
		<category><![CDATA[Business Intelligence]]></category>
		<category><![CDATA[Excel]]></category>
		<category><![CDATA[Excel Data Import]]></category>
		<category><![CDATA[Excel Performance]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=20</guid>
		<description><![CDATA[I've seen lots of Business Intelligence (BI) solutions, (data marts, data warehouses and the accompanying reports and dashboards) using all sorts of different tools. But I'll tell you- NO tool has yet been as successful as Microsoft Excel for providing a do it yourself data analysis platform to import data into. Now, I'm not suggesting [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-107" title="excelisking3" src="/wp-content/uploads/2008/09/excelisking3.jpg" alt="" width="254" height="269" />I've seen lots of <a href="http://en.wikipedia.org/wiki/Business_intelligence" target="_blank">Business Intelligence</a> (BI) solutions, (<a href="http://en.wikipedia.org/wiki/Data_mart" target="_blank">data marts</a>, <a href="http://en.wikipedia.org/wiki/Data_warehouse" target="_blank">data warehouses</a> and the accompanying reports and dashboards) using all sorts of <a href="http://en.wikipedia.org/wiki/Business_intelligence_tools" target="_blank">different tools</a>. But I'll tell you- NO tool has yet been as successful as Microsoft Excel for providing a do it yourself data analysis platform to import data into. Now, I'm not suggesting that Excel (even when used with the <a href="/product">upcoming Datamartist tool </a> <img src='http://www.datamartist.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' />  ) will make traditional data marts obsolete. Clearly the <a title="Market Growth of Enterprise BI" href="http://www.gartner.com/it/page.jsp?id=580708" target="_blank">billions of dollars being spent on "enterprise BI"</a>are not going to dry up. But there are enough times you have to wait- or your needs are "too specific"- for a large BI project. Often the existing data marts or data warehouses will be the source of raw data. But you will still need to prepare data for Excel import. In the next few posts I'm going to discuss various aspects of using excel for data analysis. In this first part, I'll talk about data size in excel and performance which is important - when should you import the data? Import the HUGE raw file, or treat it before import to reduce its size?</p>
<h2>Data Size Limits in Excel</h2>
<p>There are different types of limits-</p>
<ol>
<li>The size in rows and columns the actual spreadsheet has.</li>
<li>Excel's (and your PC's) ability to crunch the numbers in a reasonable time. (RAM, CPU)</li>
<li>The size of the files involved and load and save times.</li>
</ol>
<p>In Excel 2003, a spreadsheet has rows 1 to 65 536 and columns A to IV. This makes it a grid 256 X 65536. In Excel 2007 the spreadsheet is much, much larger, with rows from 1 to 1 048 576 and columns from A to XFD. (Making a grid 16384 X 1 048 576).<a href="/wp-content/uploads/2008/09/importtoexcel1.jpg"><img class="alignright size-medium wp-image-63" title="importtoexcel1" src="/wp-content/uploads/2008/09/importtoexcel1-300x227.jpg" alt="" width="300" height="227" /></a> Now before you get too excited about how much space you have in 2007, the reality is that limits number 2 and 3 define how you can actually use that space. But it is more and more is good.</p>
<p>So lets kick the tires on large data sets in Excel 2007. For these very informal tests I'm using a Quad-core workstation with 4Gb of RAM, so the results I get represent a best case compared to a typical laptop or desktop PC. First of all- putting a million rows of data in Excel 2007 (even a "narrow table" of only 3-4 columns) slows everything down. Delete a column, and you'll often see a 5-10 second freeze-up while excel churns away in the background- roughly the same amount of time needed to save the file. Plus, when I push it I've had it lock up on me a few times- requiring some Ctrl-Alt-Del action to kill it. Even a narrow table such as this makes the Excel file be at minimum 15-20 megabytes. For the particular text file I used, the .txt version was 9 Mb, the .xlsx file was double the size at 18 Mb. I added a few columns and the file quickly became 80 Mb.</p>
<p>Also, strangely, doing exactly the same thing multiple times results in very different times to complete- when I'm mentioning times its the average of 2-3 trials (see graph).</p>
<p> <a href="/wp-content/uploads/2008/09/excel-operations-times.jpg"><img class="size-medium wp-image-65 alignleft" title="excel-operations-times" src="/wp-content/uploads/2008/09/excel-operations-times-300x169.jpg" alt="" width="300" height="169" /></a>All in all, although Excel 2007 can technically store a million rows, I'd advise against it. There are other reasons its a pain- scroll bars and page-up page-down don't scale well to 1M rows- its just hard to copy 250000 rows accurately- takes for ever to get to the end, and then you overshoot by a mile, and page up again forever to find it etc. etc. (And yes you can use the Go To command on the Home&gt;Editing&gt;Find and Select&gt;Go to - but a model of ease its not.</p>
<p>I can tell you, however, that using all the other features on more reasonable data sets (up to say, 100 k rows), I LOVE what it can do in terms of analysis and reporting. Once you have the data in reasonable result sets, there is no better place to have it than in Excel if you want full control in my opinion. But how to get it there. Next posts: how to link to data in Access and build a mini personal data mart. We'll learn how to make a personal data mart given the currently available tools. (And you just know there will be some posts later where I show you how to do the same thing, but using Datamartist. ) <strong>Update:  Datamartist now available.</strong>  <a href="/downloads">Download the tool now</a>, and find a whole new way to transform and managed your data, including <strong>managing huge data imports into excel</strong>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/importing-data-into-excel/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

