<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Datamartist.com &#187; Excel</title>
	<atom:link href="http://www.datamartist.com/tag/excel/feed" rel="self" type="application/rss+xml" />
	<link>http://www.datamartist.com</link>
	<description>Reduce cost with self serve data transformation</description>
	<lastBuildDate>Mon, 26 Jul 2010 18:33:50 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>Spreadsheet errors- Fear, uncertainty and doubt</title>
		<link>http://www.datamartist.com/spreadsheet-risk-and-errors-fear-uncertainty-and-doubt</link>
		<comments>http://www.datamartist.com/spreadsheet-risk-and-errors-fear-uncertainty-and-doubt#comments</comments>
		<pubDate>Mon, 11 Jan 2010 18:54:46 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Business Intelligence Architecture]]></category>
		<category><![CDATA[Data Quality]]></category>
		<category><![CDATA[MS Excel]]></category>
		<category><![CDATA[Reality Check]]></category>
		<category><![CDATA[data culture]]></category>
		<category><![CDATA[Business Intelligence trends]]></category>
		<category><![CDATA[Excel]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=3831</guid>
		<description><![CDATA[I love the acronym FUD which stands for "Fear, uncertainty and doubt". What I don't love is the underhanded use of FUD to manipulate peoples behavior. Spreading FUD is not about creating something new, but destroying- destroying someones confidence in something, clouding the real issue, stopping a new or creative direction from being taken. FUD [...]]]></description>
			<content:encoded><![CDATA[<p>I love the acronym FUD which stands for "Fear, uncertainty and doubt".  What I don't love is the underhanded use of FUD to manipulate peoples behavior.  Spreading FUD is not about creating something new, but destroying- destroying someones confidence in something, clouding the real issue, stopping a new or creative direction from being taken.  FUD is often used to block reform and change because FUD can cause people to do nothing- and doing nothing is good for the incumbent.</p>
<p>In the data analysis realm, spreadsheet errors are often used to try to dissuade companies from letting their people "work with the data directly".  Software vendors of all sizes, but particularly the really big ones (those incumbants) spread FUD because if they can stop people from getting at the data themselves, it increases the chance of companies buying some more business intelligence suites.</p>
<p>The argument goes something like this:</p>
<blockquote><p>Spreadsheets have been shown to be plagued with errors, many studies showing error rates above 90%.  You need to reduce the risk that spreadsheets are creating in your organization by establishing formal, documented processes that are created an managed by professionals using sophisticated tools.</p></blockquote>
<p>Then the usual nightmare scenarios are brought out, all involving rabid Auditors, Sarbane-Oxley, governance failures etc.</p>
<p><img src="http://www.datamartist.com/wp-content/uploads/2010/01/accidently-put-last-years-spreadsheet-number-into-annual-report1.jpg" alt="accidently-put-last-years-spreadsheet-number-into-annual-report" title="accidently-put-last-years-spreadsheet-number-into-annual-report" width="341" height="226" class="alignright size-full wp-image-3839" />Now, don't get me wrong, spreadsheet errors are a very real and serious problem, and there are all sorts of data applications that should never be done in Excel or other ad-hoc, user driven tools. Ever.  Formal documented processes are critically important, and there are lots of places where you better be using the right tools and professionals.  </p>
<p>I have seen the culture of the spreadsheet completely undermine initiatives that would have driven better data quality, data analysis and business processes.  The spreadsheet certainly has its dark side.</p>
<p>But the problem is that FUD paints with a broad brush.  People take it as "Spreadsheets with data in them? Bad news. Don't do it.  Individuals able to get at the data, and quickly transform it, analyze it?  Who knows what they'll do- shut them down!"</p>
<p>Sadly, from a data quality point of view, sometimes the spreadsheets have the BEST data quality- because people have fixed the issues they can't fix in the transactional system due to constraints or IT department delays.</p>
<h2>Encourage positive change with reasonable controls.</h2>
<p>Intelligent, responsible people should be encouraged to use "informal" methods and tools to do data analysis.  </p>
<p>These people will find things, learn things, and drive positive change (including change in those big formal professional systems).  </p>
<p>They should do it with a reasonable understanding that doing things in an informal way, with spreadsheets or other tools does introduce errors, and should consider this when they recommend taking action based on the results. </p>
<h2>Balance between two extremes </h2>
<p><strong>The totalitarian state:</strong> I don't think there is an  IT department in the world that is capable of stopping all unofficial data analysis.  In fact, I would suggest that the moment such an IT department comes into existence, it would kill the host company, a harsh sort of self-regulation.  People interested in data and thinking for themselves would just pack up and leave. So who would be left making the decisions and based on what?</p>
<p><strong>The twisted web of spreadsheets:</strong> Companies that allow an anything goes, visual basic code, macros and manual cut and paste direct to the annual report environment are not going to be long for the world either.  They populate the horror story pages on <a href="http://www.eusprig.org/horror-stories.htm" target="_blank">the spreadsheet risk websites.</a></p>
<h2>The zone of win.</h2>
<p>You want to be somewhere between insane spreadsheet addiction and strict formal big tool paralysis.  </p>
<p>I submit that companies that balance risk while still encouraging their smart people to "play" with the data and do analysis in new and interesting ways with new tools are going to win.</p>
<p>Again, don't let this process generate your profit and loss statement- understand where and what the informal discovery process is for- but do let it discover things.  If it discovers something interesting you'll have the chance to check for the errors.  Make sure its part of the process to do so.</p>
<p>By letting the FUD get you down, you'll never get that far and who knows what insights you might be giving up?</p>
<p>Of course,  we believe you should go even further and give those intelligent, responsible people new tools that are less error prone than spreadsheets but still provide as much or even greater flexibility.  That's why we're building Datamartist after all.</p>
<p>Openness, balance, and clear minded pragmatism will get you further than FUD every time.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/spreadsheet-risk-and-errors-fear-uncertainty-and-doubt/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Vlookup, Excel and social media.</title>
		<link>http://www.datamartist.com/vlookup-excel-and-social-media</link>
		<comments>http://www.datamartist.com/vlookup-excel-and-social-media#comments</comments>
		<pubDate>Tue, 17 Nov 2009 16:36:00 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Microsoft Excel]]></category>
		<category><![CDATA[Social media]]></category>
		<category><![CDATA[Excel]]></category>
		<category><![CDATA[Life]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=3557</guid>
		<description><![CDATA[Somebody is always doing a study on how much lost productivity is costing companies when people do something other than work at work. It used to be that the business intelligence companies were always going on about how much time people "wasted" using excel- now a days it's the huge cost of employees using social [...]]]></description>
			<content:encoded><![CDATA[<p>Somebody is always doing a study on how much lost productivity is costing companies when people do something other than work at work.  It used to be that the business intelligence companies were always going on about how much time people "wasted" using excel-  now a days it's the huge cost of employees using social media sites like twitter and facebook.  Lets look at BOTH.  What happens when people use Excel <em>and</em> tweet about it?  Bankrupcy can't be far...</p>
<p><img src="http://www.datamartist.com/wp-content/uploads/2009/11/vlookup-wasting-time-or-serious-business1.jpg" alt="vlookup-wasting-time-or-serious-business" title="vlookup-wasting-time-or-serious-business" width="300" height="184" class="alignright size-full wp-image-3571" />On a whim (ok, I was procrastinating at work on social media sites when I should have been doing data analysis), I decided to do a twitter search for things excel like, including "vlookup" that granddaddy of all Excel functions.</p>
<p>Heard recently on twitter; </p>
<blockquote><p>I just discovered VLOOKUP in excel. This may just be the best day of my life.</p></blockquote>
<p>I predict your life will get better and better. (It's obviously been no great shakes so far.)</p>
<blockquote><p>lol, I'm such a banker...I haven't done this stuff in Excel in over 2 yrs, but after like 30 min, I'm back vlookup-in' my a@$ off!</p></blockquote>
<p>Go, go go!</p>
<blockquote><p>omg excel documents are boring!</p></blockquote>
<p>Data is never boring, just misunderstood.</p>
<blockquote><p>If only Excel would stop freezing, things would be better.</p></blockquote>
<p>Amen.</p>
<blockquote><p> Excel crashes everytime I try and make a pie chart. How am I meant to do my report now?</p></blockquote>
<p>Use a bar chart, you should never use pie charts.</p>
<p>And from the overworked department:</p>
<blockquote><p>VLOOKUP + CTRC C + CTRL V for 10 hours today.. What a life <img src='http://www.datamartist.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />
</p></blockquote>
<blockquote><p>starting to see everything in cells and formulas... =IF(ISERROR(VLOOKUP(HUNGRY,DONUTS:BURGER,1,FALSE)),"GO HOME &#038; SLEEP","DINNER")</p></blockquote>
<blockquote><p> I hate spreadsheets and I hate data alignment, thankfully I like VLookup</p></blockquote>
<blockquote><p>My method of teaching VLOOKUP to people involves telling them to look for the banana in the fruit bowl. Strange? Yes. Effective? Absolutely.</p></blockquote>
<p>I tend to use the watermelon analogy, but whatever works for you.</p>
<blockquote><p>I fought Excel and...I won! VLOOKUP function, I own you! (insert evil laugh here)
</p></blockquote>
<blockquote><p>so boring! you wan't to help me on my essey question? Explain how a #VLOOKUP and a #HLOOKUP benifit business? LOL
</p></blockquote>
<blockquote><p>Just did an excel spreadsheet with nested VLOOKUP functions. I feel dirty.</p></blockquote>
<blockquote><p>I've showed you how to do a vlookup 5x already. If you can't figure this out, mebbe you shld look for a new job.
</p></blockquote>
<p>A job that doesn't require Excel skills? Is that possible?</p>
<p>And one final sentiment:</p>
<blockquote><p>Excel should die.</p></blockquote>
<p>There is an excel tweet about every 15 seconds.  No wonder the global economy is having some issues on the rebound.  Get to work people!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/vlookup-excel-and-social-media/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Microsoft Version Skip- XP Mode</title>
		<link>http://www.datamartist.com/the-microsoft-version-skip-xp-mode</link>
		<comments>http://www.datamartist.com/the-microsoft-version-skip-xp-mode#comments</comments>
		<pubDate>Mon, 27 Apr 2009 21:26:16 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Microsoft Excel]]></category>
		<category><![CDATA[Software in General]]></category>
		<category><![CDATA[Excel]]></category>
		<category><![CDATA[Windows]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=2158</guid>
		<description><![CDATA[Microsoft has announced that they will include a license of XP, running on a "seamless" virtual environment that can be run inside Windows 7. It's being dubbed "XP Mode". They have decided not to include a "Vista mode", citing lack of demand. This is a very smart move on the part of Microsoft. Aside from [...]]]></description>
			<content:encoded><![CDATA[<p>Microsoft has announced that they will include a license of XP, running on a "seamless" virtual environment that can be run inside Windows 7. It's being dubbed <a href="http://windowsteamblog.com/blogs/business/archive/2009/04/24/coming-soon-windows-xp-mode-and-windows-virtual-pc.aspx" target="_blank">"XP Mode"</a>.</p>
<p>They have decided not to include a "Vista mode", citing lack of demand.</p>
<p>This is a very smart move on the part of Microsoft.  Aside from the general bad press that Vista got (making users less likely to embrace a change), one of the key things that stops an IT department from recommending moving to a new operating system is the concern that certain desktop applications won't run on the new version. XP mode offers a way to ensure that legacy applications will ALWAYS run on this new version of the OS.  Of course, in software "always" should perhaps never be capitalized, but running on a virtual machine its highly likely that your desktop apps that run on XP will still run on Windows 7.</p>
<p>Of course, the key here is not the technology side- its the fact that the XP license will be included in the Windows 7 license.  You could have run a VM on windows 7- but no company will buy double the OS licenses it needs.  Microsoft has just lit a rocket under Windows 7 in my opinion.</p>
<p>I imagine right now across the world IT departments are convening meetings to plan their Windows 7 rollout- and arguing about if they wait for service pack 1 or not.</p>
<h2>Excel 2007 Adoption</h2>
<p>Although good numbers are hard to come by, it seems like Excel 2007 has suffered a similar fate to Windows Vista, becoming the version for which people decide "we'll just skip this one."</p>
<p><img src="http://www.datamartist.com/wp-content/uploads/2009/04/waiting-for-the-next-excel-version-graph-300x184.jpg" alt="waiting-for-the-next-excel-version-graph" title="waiting-for-the-next-excel-version-graph" width="300" height="184" class="alignleft size-medium wp-image-2164" />Looking at a number of different sites that have surveys or discuss general estimates in terms of Excel 2007, and using a completely unscientific method of making up numbers that seem to be about the average, (see chart) we can see the early adopters, about 25% that use only Excel 2007.  Then 25% percent use both (perhaps because some of their collegues are in the first 25%, so they have to be able to open those @#@*%#@ Excel 2007 files). And fully 50% are content with their previous version. </p>
<p>What is particularly interesting about these numbers (keeping in mind that my data sources and particularly my methods are suspect) is that <strong>they seem to be relatively stable for the last year</strong>.  Regardless if the sources data was recent, six months or a year ago, it seems that the early adopters moved, and now everyone else is waiting.  Waiting for the next version?  Waiting for the Windows 7 upgrade that is coming? </p>
<p>The interesting question is "Will the next version of Excel entice them over?". We will find out in 2010.  I'm going out on a limb here, but I don't think its a question of the new versions not having enough features.  We're seeing a continued evolution in the business intelligence space, including desktop applications such as <a href="http://www.datamartist.com/product">datamartist</a> because existing tools just don't do what people need, but obviously spreadsheets are a more mature market.  Of course Microsoft is also positioning Excel to be more of a desktop BI tool (in many ways it already is) but at the core its still a spreadsheet, and people know what they want from a spreadsheet.</p>
<p><img src="http://www.datamartist.com/wp-content/uploads/2009/04/waiting-for-the-next-excel-version-300x206.jpg" alt="waiting-for-the-next-excel-version" title="waiting-for-the-next-excel-version" width="300" height="206" class="alignright size-medium wp-image-2161" />Will Excel 2003 become the longest running application ever?  And will every future version of Windows have the older versions supported with a virtual machine feature?  Will Windows 8 have Windows 7 mode, that includes XP mode?  Questionable.</p>
<p>Legacy software has been a reality in the enterprise world for decades, and this is officially sanctioned legacy at the operating system level.  It used to be that the realities of change forced a certain number of the legacy applications to be re-written and made at least the desktop environment refresh relatively quickly.</p>
<p>It may be that that is about to change.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/the-microsoft-version-skip-xp-mode/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>MS Access query example and comparision to Datamartist</title>
		<link>http://www.datamartist.com/microsoft-access-query-example-and-comparision-to-datamartist</link>
		<comments>http://www.datamartist.com/microsoft-access-query-example-and-comparision-to-datamartist#comments</comments>
		<pubDate>Tue, 31 Mar 2009 22:59:55 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Business Intelligence Architecture]]></category>
		<category><![CDATA[Data Modelling]]></category>
		<category><![CDATA[Data Transformation]]></category>
		<category><![CDATA[MS Access]]></category>
		<category><![CDATA[Microsoft Excel]]></category>
		<category><![CDATA[Access]]></category>
		<category><![CDATA[Data Mart Example]]></category>
		<category><![CDATA[Excel]]></category>
		<category><![CDATA[Personal data mart]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=1321</guid>
		<description><![CDATA[Microsoft Access allows users to create complex queries and analyze large data sets. However, it can be complicated to use compared to Excel. In this post, I'll talk about ms access queries and the equivalent way to perform the same data transformation in the Datamartist tool- visually and simply. Microsoft Access has a clear role [...]]]></description>
			<content:encoded><![CDATA[<p>Microsoft Access allows users to create complex queries and analyze large data sets.  However, it can be complicated to use compared to Excel.  In this post, I'll talk about <a href="/help-support/tutorials/microsoft-access-examples-and-tutorials">ms access queries</a> and the equivalent way to perform the same data transformation in the <a href="/product">Datamartist tool</a>- visually and simply.</p>
<p>Microsoft Access has a clear role to play when a small, light database application is required.  However, it has a learning curve, and is not necessarily the best tool for data analysis.</p>
<h2>Product Segmentation Query Example</h2>
<p>Lets look at an example ms access query or two and see how we can do the same thing Datamartist, only without the queries and without any SQL. For this example, lets say that we have two sets of sales data from different time periods, and a product list, and we want to define some product segments based on color and price.  We want to get a summary of the sales Qty and average price sold by month, broken out by the new categories which are as follows:</p>
<ul>
<li> "Red and High Priced" If the product is Red and its minimum price is more than $1000</li>
<li> "Red Low Price wide price range" If the product is Red, has a minimum price less than $1000 but has a min to max price of more than $200</li>
<li> "Red Low Price small price range" If its Red and not in the first two segments</li>
<li> "Yellow" if the product is yellow. </li>
<li> "Other" for all the rest</li>
</ul>
<p>The three data tables we have are as follows:</p>
<ol>
<li> Sales 03-06 with about 120 000 rows, which contains sales data from 2003 - 2006</li>
<li> Sales 2007  with about 30 000 rows, which contains sales data for 2007</li>
<li> Products  which contains the colors for all the products and their minimum and maximum prices</li>
</ol>
<p>So- first step is to combine the two data tables, in Access, this is done with a UNION query with the following SQL code:</p>
<blockquote><p>select * from [Sales Data 03-06] UNION select * from [Sales Data 2007];</p></blockquote>
<p>In Datamartist, we simply connect the two tables up to a combine block.<br />
<img src="/wp-content/uploads/2009/03/segmentation-example-datamartist-combine1.jpg" alt="segmentation-example-datamartist-combine1" title="segmentation-example-datamartist-combine1" width="264" height="234" class="alignnone size-full wp-image-1394" /></p>
<p>Next, we need to define the segmentation-  again in Access this is done with a Query, this time by nesting IIF statements to add a new column called "Product_Segment" to the resulting query.</p>
<blockquote><p>SELECT Products.Product_ID, Products.Product_Name, Products.Product_Group, Products.Product_Category, Products.Product_SubCategory, Products.Shipping_Weight, Products.Color, Products.Price_Min, Products.Price_Max, IIf([Color]="Red" And [Price_Min]>1000,"Red and High Priced",IIf([Color]="Red" And ([Price_max]-[Price_min])>200,"Red Low Price wide price range",IIf([Color]="Red","Red Low Price small price range",IIf([Color]="Yellow","Yellow","Other")))) AS Product_Segment<br />
FROM Products;</p></blockquote>
<p>In Datamartist, we use a segmentation block to do the same thing.  The interface is graphical, and the syntax is the same as you would use in Excel.  There is no need to nest any IF statements, because the overall block is designed to do that.  Heres what the blocks look like-  the MS Access import block on the left, and the segmentation rule block on the right.<br />
<img src="/wp-content/uploads/2009/03/segmentation-example-datamartist-segment-block.jpg" alt="segmentation-example-datamartist-segment-block" title="segmentation-example-datamartist-segment-block" width="418" height="211" class="alignnone size-full wp-image-1428" /><br />
Each segment has the statement that defines if a row is in the segment or not.   The block tests each segment rule in order, starting at the top- the first statement that solves as "TRUE" defines the value for the Product_Segment column for that row. Dragging the segments up and down changes what order the rules are checked.</p>
<p><a href="/resources/images/Segmentation-Example-Product.jpg" target="_blank" onClick="javascript: pageTracker._trackPageview('/screenshots/Segmentation-Example-Product'); "><img src="/resources/images/Segmentation-Example-Product-Thumb.jpg">
<p style="padding:8px;">(Click to Enlarge)</p>
<p></a></p>
<p>Then we have to Join this new product dimension (with the segmentation column) to the sales data, and summarize.</p>
<p>In MS Access, this is done with more queries-  Heres what Access looks like when we're done.<br />
<img src="/wp-content/uploads/2009/03/segmentation-example-access-gui1.jpg" alt="segmentation-example-access-gui1" title="segmentation-example-access-gui1" width="450" height="485" class="alignnone size-full wp-image-1405" /><br />
Compare that list of Tables and Queries to the visual, left to right layout of the Datamartist data canvas that does the same thing.  Without ever having to write any SQL code:</p>
<h2>The VISUAL way to do it</h2>
<p><img src="/wp-content/uploads/2009/03/segmentation-example-solved-canvas.jpg" alt="segmentation-example-solved-canvas" title="segmentation-example-solved-canvas" width="406" height="314" class="alignnone size-full wp-image-1403" /></p>
<p><a href="/resources/images/Segmentation-Example-Datamartist-full-app-shot.jpg" target="_blank" onClick="javascript: pageTracker._trackPageview('/screenshots/Segmentation-Example-Datamartist-full-app-shot'); "><img src="/resources/images/Segmentation-Example-Datamartist-full-app-shot-Thumb.jpg" class="alignright size-full wp-image-1430" ></a><br />
In Datamartist you can see the flow of the data, the row counts are clearly displayed, and clicking on the connectors will bring up the underlying data set in the data viewer.  Its clear which block feeds which, and by adding more blocks and connecting them at the desired point in the data flow, new analysis can be created.</p>
<p>Take Datamartist for a trial run-  <a href="/downloads">download it now</a> because maybe you don't have to learn microsoft access queries after all.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/microsoft-access-query-example-and-comparision-to-datamartist/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Excel auto formating is getting into your genes</title>
		<link>http://www.datamartist.com/excel-auto-formating-is-getting-into-your-genes</link>
		<comments>http://www.datamartist.com/excel-auto-formating-is-getting-into-your-genes#comments</comments>
		<pubDate>Wed, 04 Mar 2009 16:03:54 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Data Quality]]></category>
		<category><![CDATA[Microsoft Excel]]></category>
		<category><![CDATA[Software in General]]></category>
		<category><![CDATA[Excel]]></category>
		<category><![CDATA[Excel Data Import]]></category>
		<category><![CDATA[Fixing Data]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=1261</guid>
		<description><![CDATA[We often give Excel our data, and trust it to do the right thing. There was a link posted on meta-filter today that sparked some lively discussion amongst the crowd. The Excel auto formating "feature" loves to scramble common genetic nomenclature. It turns out that in the genetics field, common codes get converted to incorrect [...]]]></description>
			<content:encoded><![CDATA[<p>We often give Excel our data, and trust it to do the right thing.</p>
<p>There was a link posted on <a href="http://www.metafilter.com/">meta-filter</a> today that sparked some lively discussion amongst the crowd.  The Excel auto formating "feature" loves to <a href="http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pubmed&#038;pubmedid=15214961" target="_blank">scramble common genetic nomenclature.</a><img src="/wp-content/uploads/2009/03/my-gene-therapist-is-an-excel-nut.jpg" alt="my-gene-therapist-is-an-excel-nut" title="my-gene-therapist-is-an-excel-nut" width="300" height="241" class="alignright size-full wp-image-1267" /></p>
<p>It turns out that in the genetics field, common codes get converted to incorrect values regularly.  One example given was the code for tumor suppressor "DEC1" which gets coverted to the date December 1.  Another was the code "2310009E13" (apparently a "RIKEN clone identifier") - which would be converted to a number, 2.31E+19.  I'm not a geneticist but I can just see how this wouldn't be helpful.</p>
<p>I checked these examples on Excel 2007, and sure enough, the default will make changes right at import that scramble the mentioned codes- no error, no notification, no problem.   Of course Excel is perfectly capable of handling this data properly- the user needs to specify the field as text, and the conversions won't be done.<br />
The key point brought up in the article (and is always true about excel spreadsheets) is not just that in this case the data gets corrupted but that depending on how carefully  a user checks the error may not be detected.<br />
If undetected, what decisions, conclusions and actions will be taken based on the incorrect information?</p>
<p>Excel is super powerful, and super useful, but we have to always remind ourselves to balance the ease of use with how critical our data is, and what the impact of errors might be.  In the end, as with all computer use, we have to test, validate and test again at a level consistent with whatever use we are putting the data to.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/excel-auto-formating-is-getting-into-your-genes/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Spreadmarts and Data Shadow Systems- The Debate</title>
		<link>http://www.datamartist.com/spreadmarts-and-data-shadow-systems-the-debate</link>
		<comments>http://www.datamartist.com/spreadmarts-and-data-shadow-systems-the-debate#comments</comments>
		<pubDate>Wed, 18 Feb 2009 01:13:28 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Business Intelligence Architecture]]></category>
		<category><![CDATA[Cost Reduction]]></category>
		<category><![CDATA[MS Access]]></category>
		<category><![CDATA[Spreadmarts]]></category>
		<category><![CDATA[Access]]></category>
		<category><![CDATA[Excel]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=1017</guid>
		<description><![CDATA[When business users are not getting what they want out of the enterprise business intelligence system they very rarely just give up. Successful business people didn't get where they are by giving up when someone doesn't deliver something, they take things into their own hands and get it done. Knowing this, it's not surprising that [...]]]></description>
			<content:encoded><![CDATA[<p><img src="/wp-content/uploads/2009/02/spreadmarts-another-100-spreadsheets1.jpg" alt="spreadmarts-another-100-spreadsheets1" title="spreadmarts-another-100-spreadsheets1" width="300" height="316" class="alignright size-full wp-image-1043" />When business users are not getting what they want out of the enterprise business intelligence system they very rarely just give up.  Successful business people didn't get where they are by giving up when someone doesn't deliver something, they take things into their own hands and get it done.</p>
<p>Knowing this, it's not surprising that a huge amount of data collection, extraction, and transformation happens in Excel spreadsheets, or Access databases that are made without the involvement (and often under the direct scorn of) the IT department in large companies.  In my previous life I was in the IT department, and I saw some amazing systems generated with hundreds of spreadsheets and databases.  This mix of spreadsheets and databases, created without the involvement of the IT department by power users or external consultants (financed out of departmental budgets) are often referred to as <a href="http://www.doubletongued.org/index.php/citations/spreadmart_1/" target="_blank">Spreadmarts</a> or <a href="http://en.wikipedia.org/wiki/Shadow_system" target="_blank">Shadow Systems</a>.</p>
<p>For an interesting survey on the subject, take a look at <a href="https://www.tdwi.org/research/display.aspx?ID=8874" target="_blank">TDWI's report "Strategies for Managing Spreadmarts: Migrating to a Managed BI Environment".</a>  This report is now a year old, but I'm certain as valid as ever.</p>
<p>The title suggests that the solution is managed BI-  I won't get into that right now, but you'll notice the study was sponsored by the likes of Microsoft, Cognos, Microstrategy and SAP- so of course the solution is Big Business Intelligence solutions.</p>
<p>But what's really interesting from the survey, is how the different groups within the respondent companies feel about spreadmarts and shadow data systems.  The analysts love them,  the executives are unsure, and IT hates with a passion.  This makes for an interesting mix.<br />
<img src="/wp-content/uploads/2009/02/position-on-spreadsheets.jpg" alt="position-on-spreadsheets" title="position-on-spreadsheets" width="450" height="301" class="alignnone size-full wp-image-1029" /></p>
<p>This is very much what I've seen in my experience.  IT and the Business are at odds with each other, and senior management is either disinterested or forced to take sides.</p>
<p>Where do I stand?  I'm in the "avoid them if you can" camp when we're talking about a tangle of spreadsheets and undocumented MS Access databases that can be error prone and time consuming.  I understand why it's often unavoidable, but I've seen first hand how painful these systems are to maintain.  </p>
<p>On the other hand, I don't subscribe to the school of thought that says "Excel needs to be eliminated- analysts should use the Business Intelligence systems only, otherwise there will be chaos."  Let's not go overboard.  Excel and spreadsheets are useful tools, and have their place.  Additionally, I really feel for business users who simply can't get what they want from the IT departments.  I used to be the IT department, and it was frustrating to not have the resources available to build what people needed.</p>
<p>As one of the authors of the above report, <a href="http://www.athena-solutions.com/index.shtml" target="_blank">Rick Sherman</a>, said in <a href="http://searchcio.techtarget.com/generic/0,295582,sid182_gci1344289,00.html?asrc=SS_CLA_308990&#038;psrc=CLT_182" target="_blank">a recent podcast</a>:</p>
<blockquote><p>"reality is no matter how many IT folks that you have in your company you're not likely to have enough resources or time to meet every business users reporting or analytical requirements..."</p></blockquote>
<p>He presents what is a refreshingly balanced approach to Excel.  In his <a href="http://datadoghouse.typepad.com/data_doghouse/2009/02/business-intelligencedata-warehousing-emerging-trends-but-not-breakouts-9-for-09.html" target="_blank">predictions for trends in 2009</a>, number 5 is "Excel becomes an accepted tool in a BI portfolio". He points out that this may not be mainstream in 2009, but I hope he's right about the trend.  A pragmatic, inclusive strategy with more power to the people while avoiding the chaotic side of spreadmarts is where the solution is.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/spreadmarts-and-data-shadow-systems-the-debate/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Importing Data into Excel</title>
		<link>http://www.datamartist.com/importing-data-into-excel</link>
		<comments>http://www.datamartist.com/importing-data-into-excel#comments</comments>
		<pubDate>Mon, 01 Sep 2008 15:50:43 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Personal Data Marts]]></category>
		<category><![CDATA[Spreadsheet Tips]]></category>
		<category><![CDATA[Business Intelligence]]></category>
		<category><![CDATA[Excel]]></category>
		<category><![CDATA[Excel Data Import]]></category>
		<category><![CDATA[Excel Performance]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=20</guid>
		<description><![CDATA[I've seen lots of Business Intelligence (BI) solutions, (data marts, data warehouses and the accompanying reports and dashboards) using all sorts of different tools. But I'll tell you- NO tool has yet been as successful as Microsoft Excel for providing a do it yourself data analysis platform to import data into. Now, I'm not suggesting [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-107" title="excelisking3" src="/wp-content/uploads/2008/09/excelisking3.jpg" alt="" width="254" height="269" />I've seen lots of <a href="http://en.wikipedia.org/wiki/Business_intelligence" target="_blank">Business Intelligence</a> (BI) solutions, (<a href="http://en.wikipedia.org/wiki/Data_mart" target="_blank">data marts</a>, <a href="http://en.wikipedia.org/wiki/Data_warehouse" target="_blank">data warehouses</a> and the accompanying reports and dashboards) using all sorts of <a href="http://en.wikipedia.org/wiki/Business_intelligence_tools" target="_blank">different tools</a>. But I'll tell you- NO tool has yet been as successful as Microsoft Excel for providing a do it yourself data analysis platform to import data into. Now, I'm not suggesting that Excel (even when used with the <a href="/product">upcoming Datamartist tool </a> <img src='http://www.datamartist.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' />  ) will make traditional data marts obsolete. Clearly the <a title="Market Growth of Enterprise BI" href="http://www.gartner.com/it/page.jsp?id=580708" target="_blank">billions of dollars being spent on "enterprise BI"</a>are not going to dry up. But there are enough times you have to wait- or your needs are "too specific"- for a large BI project. Often the existing data marts or data warehouses will be the source of raw data. But you will still need to prepare data for Excel import. In the next few posts I'm going to discuss various aspects of using excel for data analysis. In this first part, I'll talk about data size in excel and performance which is important - when should you import the data? Import the HUGE raw file, or treat it before import to reduce its size?</p>
<h2>Data Size Limits in Excel</h2>
<p>There are different types of limits-</p>
<ol>
<li>The size in rows and columns the actual spreadsheet has.</li>
<li>Excel's (and your PC's) ability to crunch the numbers in a reasonable time. (RAM, CPU)</li>
<li>The size of the files involved and load and save times.</li>
</ol>
<p>In Excel 2003, a spreadsheet has rows 1 to 65 536 and columns A to IV. This makes it a grid 256 X 65536. In Excel 2007 the spreadsheet is much, much larger, with rows from 1 to 1 048 576 and columns from A to XFD. (Making a grid 16384 X 1 048 576).<a href="/wp-content/uploads/2008/09/importtoexcel1.jpg"><img class="alignright size-medium wp-image-63" title="importtoexcel1" src="/wp-content/uploads/2008/09/importtoexcel1-300x227.jpg" alt="" width="300" height="227" /></a> Now before you get too excited about how much space you have in 2007, the reality is that limits number 2 and 3 define how you can actually use that space. But it is more and more is good.</p>
<p>So lets kick the tires on large data sets in Excel 2007. For these very informal tests I'm using a Quad-core workstation with 4Gb of RAM, so the results I get represent a best case compared to a typical laptop or desktop PC. First of all- putting a million rows of data in Excel 2007 (even a "narrow table" of only 3-4 columns) slows everything down. Delete a column, and you'll often see a 5-10 second freeze-up while excel churns away in the background- roughly the same amount of time needed to save the file. Plus, when I push it I've had it lock up on me a few times- requiring some Ctrl-Alt-Del action to kill it. Even a narrow table such as this makes the Excel file be at minimum 15-20 megabytes. For the particular text file I used, the .txt version was 9 Mb, the .xlsx file was double the size at 18 Mb. I added a few columns and the file quickly became 80 Mb.</p>
<p>Also, strangely, doing exactly the same thing multiple times results in very different times to complete- when I'm mentioning times its the average of 2-3 trials (see graph).</p>
<p> <a href="/wp-content/uploads/2008/09/excel-operations-times.jpg"><img class="size-medium wp-image-65 alignleft" title="excel-operations-times" src="/wp-content/uploads/2008/09/excel-operations-times-300x169.jpg" alt="" width="300" height="169" /></a>All in all, although Excel 2007 can technically store a million rows, I'd advise against it. There are other reasons its a pain- scroll bars and page-up page-down don't scale well to 1M rows- its just hard to copy 250000 rows accurately- takes for ever to get to the end, and then you overshoot by a mile, and page up again forever to find it etc. etc. (And yes you can use the Go To command on the Home&gt;Editing&gt;Find and Select&gt;Go to - but a model of ease its not.</p>
<p>I can tell you, however, that using all the other features on more reasonable data sets (up to say, 100 k rows), I LOVE what it can do in terms of analysis and reporting. Once you have the data in reasonable result sets, there is no better place to have it than in Excel if you want full control in my opinion. But how to get it there. Next posts: how to link to data in Access and build a mini personal data mart. We'll learn how to make a personal data mart given the currently available tools. (And you just know there will be some posts later where I show you how to do the same thing, but using Datamartist. ) <strong>Update:  Datamartist now available.</strong>  <a href="/downloads">Download the tool now</a>, and find a whole new way to transform and managed your data, including <strong>managing huge data imports into excel</strong>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/importing-data-into-excel/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The wonderous resistance to CHANGE</title>
		<link>http://www.datamartist.com/the-wonderous-resistance-to-change</link>
		<comments>http://www.datamartist.com/the-wonderous-resistance-to-change#comments</comments>
		<pubDate>Fri, 06 Jun 2008 03:39:22 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Software in General]]></category>
		<category><![CDATA[Excel]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=14</guid>
		<description><![CDATA[Its not easy to get good figures on just how many people have switched to Excel 2007, but from what I've been able to find its sounding like not more than 20%- this figure a near random guess based on complex methods such as random Google searches.  (There are of course power user, or early [...]]]></description>
			<content:encoded><![CDATA[<p>Its not easy to get good figures on just how many people have switched to Excel 2007, but from what I've been able to find its sounding like not more than 20%- this figure a near random guess based on complex methods such as random Google searches.  (There are of course power user, or early adopter communities where use is much higher- or companies where its been mandated which will provide samples higher than this, but I'm thinking about the entire community of Excel users.)  This is evolving quickly too- since many get a new PC every three years or so, Vista and office 2007 will arrive by default to some extent.</p>
<p>Interestingly, there seems also to be a significant population who use both versions at once. (Its possible-  THANK YOU MICROSOFT- to install both versions at the same time.)</p>
<p>Being in the software development business, and since my company is a registered Microsoft Partner, I have access to lots of Microsoft software, for test and internal use purposes.  I have lots of different combinations of XP, Vista, Office 2003, Office 2007 with various options installed on my virtual machines for testing and design work.</p>
<p>But for my personal use? The long and the short of it is that I had Excel 2007 installed on my primary machine for about 48 hours. </p>
<p>It wasn't that I didn't like it- I just didn't have a pressing need right then, and so I didn't want to spend the time and effort of learning the new interface- yet.  I will soon, as I move into the next phases and let the Datamartist tools take advantage of its new features.  But at this point I'm doing data transformation and management work- Excel is better at the presentation side- and it seems to me that that's where most of the new features in Excel 2007 are focused on.  I'm sure its awesome-  I'm working on tools for getting data in a state to be used in it first.</p>
<p>And I wouldn't be surprised if many of you are in the same boat.  You do all sorts of analysis in Excel, you have come to know its interface inside and out, and well, although you have nothing against the new way of doing things... its just different, is all.</p>
<p>And lets face it- unless there is a specific new feature some specific need drives you to, Excel 2003 is pretty powerful for what it does. </p>
<p>But we can be assured that we are not alone in this resistance to change- some of the most popular add-ins for Excel 2007 provide <a href="http://www.addintools.com/english/menuexcel/" target="_blank">retro-menus</a> to avoid searching the ribbon.</p>
<p>So don't feel that you are behind the times- we'll all get there when it makes sense, and when there is payback- not just <a href="http://www.askoxford.com/worldofwords/quotations/quotefrom/mallory/" target="_blank">because its there</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/the-wonderous-resistance-to-change/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Setting the stage: managing data issues</title>
		<link>http://www.datamartist.com/setting-the-stage-managing-data-issues</link>
		<comments>http://www.datamartist.com/setting-the-stage-managing-data-issues#comments</comments>
		<pubDate>Thu, 01 May 2008 17:56:55 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Data Quality]]></category>
		<category><![CDATA[Spreadsheet Tips]]></category>
		<category><![CDATA[Excel]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=12</guid>
		<description><![CDATA[Anyone who has done any data analysis with more than a few lines of data knows that some of the biggest time wasters are data quality issues. What is bad data?  Well, some of it is easy to see, some is downright impossible to find. Lets look at an easy example; a row of data where the country [...]]]></description>
			<content:encoded><![CDATA[<p>Anyone who has done any data analysis with more than a few lines of data knows that some of the biggest time wasters are data quality issues.</p>
<p>What is bad data?</p>
<p> Well, some of it is easy to see, some is downright impossible to find. Lets look at an easy example; a row of data where the country is "US" and the state/province is "Ontario".</p>
<p> You just know both those values can't be right.  So why did the source system let it happen-  good question- when the programmers of the application tell you let me know...</p>
<p>So for this easy one  should we assume that the country is right, and change the state to Michigan since that's close to Ontario?  Or maybe Ohio because they both start with "O"?</p>
<p>The right answer is of course to go back to the source and fix the problem-  but if you've got hundreds or thousands of users, an application that can't be modified to stop this type of error at the source, and and IT department that is overworked then that is probably not an option.</p>
<p>Say your company does $500 million dollars of business in a year, and the Ontario, US data represents $1500 of sales- for practical analysis purposes it just doesn't really matter where it goes-  just so you don't see Ontario as a State in the report you give the CEO.</p>
<p>Often the solution in Excel is to just "fix it"- but if you reload the data each month, you have to go back and fix it again and again.</p>
<p>Or maybe you only load in the new months data, so you don't overwrite last months.  This works until some definitions change, and now all the historical data is out of sync with the new months data.  A good example of this is if you have sales regions.  If the regions are changed (new ones added, existing ones split up or merged) then the historical data you have on your machine will have to be dumped to get the sales region codes corrected for the past.  But then all your fixes have to be redone-  could be a real nightmare if you've been using it for a while. </p>
<p>Another issue is that eventually the data in the source system might get fixed- and it turns out the IT departments fix wasn't the same as yours-  Ohio rather than Michigan- so you have more discrepancies to chase down.</p>
<p>On top of that, even though the amounts are small, its disconcerting to see different spreadsheets show different totals because you "fixed it" in one of them, and didn't in the other even just within your spreadsheets- not to mention the spreadsheets of your colleagues.  I hate it when my boss sees three different numbers.</p>
<p>The key is to create a single data-set from all your sources, fix the problem once, and do it in a way that the entire data set can be refreshed automatically, the fixes (those that still apply) can be "re-run" and then all your spreadsheets link to THAT single version of your truth.  The more you share this "master" sheet with your co-workers the better.</p>
<p>Of course, that's the whole point of having data warehouses and data marts and Enterprise business intelligence systems.  But what if the analysis you need to do hasn't been covered, or isn't scheduled to go live for another eight months?</p>
<p>If its just you and your best friend Excel, then here are some pointers;</p>
<ul>
<li><strong>Stage your data.</strong>  Don't make 10 spreadsheets that all take data from the raw source, rather make one spreadsheet that is the "Fixed data", and have all other spreadsheets link to this. This will mean you will have a single "staging" spreadsheet, and then a number of "reporting" spreadsheets.</li>
<li><strong>Record all your "fixes" on a sheet in this staging spreadsheet</strong> called "Known Issues".  The ideal would be to automate it so that the fixes get applied each time you reload the entire data-set, but by at least having a clear record you wil be able to quickly get the data where you want it each time you reload.</li>
<li><strong>Don't think about report layout or formatting in your staging spreadsheet</strong>-  keep the data in simple tables that is more aligned with how you get the raw data- every column has the same values in it all the way down.  If you have different data-sets, use different sheets.  Do the reporting in your reporting spreadsheets where you can have tables with different mixes of data from different sources.</li>
</ul>
<p>The more macros or scripts you can use the better- but even if you have a set order of cut and paste, and follow it to the letter (write it down on a sheet called "Refresh Steps" in the staging spreadsheet maybe) it will reduce the amount of time it takes to update each time new data is available, and you only do the fixes once, and if you've linked it cleanly to the reporting sheets then the rework will be reduced.</p>
<p> Of course, its still a lot of work.  In a nutshell that's why I'm working on the Datamartist tool that will automate this and much more, allowing you to easily and without programming be able to manage your spreadsheets much more effectively.  Stay tuned, and in the meantime happy staging.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/setting-the-stage-managing-data-issues/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
