<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Datamartist.com</title>
	<atom:link href="http://www.datamartist.com/feed" rel="self" type="application/rss+xml" />
	<link>http://www.datamartist.com</link>
	<description>Reduce cost with self serve data transformation</description>
	<lastBuildDate>Wed, 10 Mar 2010 21:46:56 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Reduce Business Intelligence cost through better data migration</title>
		<link>http://www.datamartist.com/reduce-business-intelligence-cost-by-keeping-master-data-clean</link>
		<comments>http://www.datamartist.com/reduce-business-intelligence-cost-by-keeping-master-data-clean#comments</comments>
		<pubDate>Tue, 09 Mar 2010 18:49:29 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Cost Reduction]]></category>
		<category><![CDATA[Data migration]]></category>
		<category><![CDATA[Project Management]]></category>
		<category><![CDATA[Business Intelligence]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=4390</guid>
		<description><![CDATA[Managing Business Intelligence cost is not an easy task.  But poorly or inconsistently structured data can make the task even harder.  Unfortunately, a lazy data migration project can generate all sorts of headaches that will cause your Business Intelligence cost to explode.  Of course, bad data quality also has many other costs [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.datamartist.com/wp-content/uploads/2010/03/tell-the-ceo-forget-the-merger-data-is-read-only.jpg"><img src="http://www.datamartist.com/wp-content/uploads/2010/03/tell-the-ceo-forget-the-merger-data-is-read-only.jpg" alt="" title="tell-the-ceo-forget-the-merger-data-is-read-only" width="363" height="209" class="alignright size-full wp-image-4400" /></a>Managing Business Intelligence cost is not an easy task.  But poorly or inconsistently structured data can make the task even harder.  Unfortunately, a lazy data migration project can generate all sorts of headaches that will cause your Business Intelligence cost to explode.  Of course, bad data quality also has many other costs and risks associated with it in its own right, but I'm going to focus in on business intelligence today.  </p>
<p>The majority of the development cost in the current business intelligence methodology is often in getting the data out of source systems (Extract), and transforming it to make it consistent across all the various dimensions needed (Transform) and then putting it in a model that is easy to query and analyse (Load).  The creation of these ETL jobs is made dramatically harder if the data in the source systems is not consistent. </p>
<h2>Change is the challenge</h2>
<p>Companies are not static-  they grow, diversify, change strategies, reorganize, rename and restructure.  They acquire other companies or are acquired. The structure and content of the data their systems often tells you this story, and if the proper work is not done to keep the data consistent with itself and the new situation then this story will be painful and complex.</p>
<blockquote><p>Remember ten years ago when we acquired company X, but decided not to change their customer codes to our standard, so all the codes had an "X" prefixed so that we wouldn't have duplicates?  Well, those X's are still there, and all our queries have to deal with multiple code structures.</p></blockquote>
<blockquote><p>Remember how we used to have three independent databases, one for each region, then when we went to the new data center and put everything into a single database, we ended up with multiple schemas and all those crazy views rather than consolidating into a single instance?</p></blockquote>
<p>When the data migration project made the decision to reduce the project cost by not addressing data consistency, they simply pushed this cost in the future, most likely turning a one time expense into an ongoing and expanding annual business intelligence cost.</p>
<p>You end up with crazy ETL jobs that parse the same field in different ways depending on the date of the transaction, or on other fields-  "If the transaction is before 2002, then the first digit of the product code means X, otherwise it means Y, unless of course its from the western division, who do it differently so then you need to look at field A and use the CASE statement..."</p>
<h2>Reduce Business Intelligence cost through data cleanup</h2>
<p>If your data is cleaner you'll reduce business intelligence cost across your entire BI architecture.</p>
<ul>
<li>Reduce ETL and report development cost- both initial, and the cost of ongoing maintenance.  Every change request will take more time if all the models are complex due to underlying data complexity.</li>
<li>Reduce hardware costs- complex queries require more processing, and bigger servers to meet that nightly load window</li>
<li>Reduce time spent reconciling numbers. Complex ETL means that chances are business intelligence reports don't match up easily with the operational reports from the source systems.  People will spend time constantly double checking these discrepancies, and it will undermine confidence in all data.</li>
</ul>
<h2>Fix the problem at the source.  Not in the Business Intelligence.</h2>
<p><a href="http://www.datamartist.com/wp-content/uploads/2010/03/lazy-data-migration-get-jackets-business-intelligence-pays-the-bill.jpg"><img src="http://www.datamartist.com/wp-content/uploads/2010/03/lazy-data-migration-get-jackets-business-intelligence-pays-the-bill.jpg" alt="" title="lazy-data-migration-get-jackets-business-intelligence-pays-the-bill" width="420" height="285" class="alignright size-full wp-image-4396" /></a>Business intelligence is far too often left to fix all the issues in the source systems- and then becomes the focus of dissatisfaction when costs and delays become unacceptable.  </p>
<p>I've heard people argue "Thats what ETL is for right?  Why are you complaining?"  </p>
<p>Assuming that the ETL will fix the sins of the source system is an inefficient and costly strategy.</p>
<p>Everything is a balance, perfection does not exist, but when deciding what to fix and what to leave, don't let a lazy data migration project saddle you with years of business intelligence costs- when it's time to bulk load data into the system, make it as right as you can.  </p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/reduce-business-intelligence-cost-by-keeping-master-data-clean/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Data.gov Hey, where&#8217;s the RAW data?</title>
		<link>http://www.datamartist.com/data-gov-hey-wheres-the-raw-data</link>
		<comments>http://www.datamartist.com/data-gov-hey-wheres-the-raw-data#comments</comments>
		<pubDate>Tue, 09 Mar 2010 01:02:24 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Public Data]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=4410</guid>
		<description><![CDATA[I've been poking around data.gov to find a fun datset to play with in Datamartist, and I have to report that its a frustrating experience.
I'm finding often that a number of the files are not really raw data at all.  They are reports, rendered as files.  So when you look into the file, [...]]]></description>
			<content:encoded><![CDATA[<p>I've been poking around data.gov to find a fun datset to play with in <a href="/">Datamartist</a>, and I have to report that its a frustrating experience.</p>
<p>I'm finding often that a number of the files are not really raw data at all.  They are reports, rendered as files.  So when you look into the file, you'll find that it will not parse as a delimited file, and contains  subtotals, spaces, blank lines etc.  Or it is an Excel file with merged cells, a copy paste from some sort of OLAP tool or perhaps a pivot table.  Add to this the complication of multiple files (all with varying size headers and spacing because they were designed to be printed and read, not analyzed), and what becomes clear is that faced with deadlines to publish data, agencies just dumped reports into files and uploaded.  Looking at some DOD "datasets" I found that different years had different formats- HTML for earlier years, then a switch to PDF, with of course different columns, metrics and summary levels.  </p>
<p>Of course, the data is there, and we can get it out- but there is a pile of work to do, with the data spread far and wide across multiple files, spreadsheets and formats. </p>
<p>In a lot of these cases it probably took more time for someone to generate these reports than it would have to just publish the raw data.  If you can make a pile of pivot table reports, then you must be able to just dump the raw data from somewhere.</p>
<p>RAW is ok, we data junkies can deal with raw. Just give us a row per line, delimited or fixed width is fine, and give us a nice data dictionary that tells us what each column is.  We'll take it from there.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/data-gov-hey-wheres-the-raw-data/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Let&#8217;s admit it- centralized business intelligence alone just doesn&#8217;t work</title>
		<link>http://www.datamartist.com/centralized-business-intelligence-alone-does-not-work</link>
		<comments>http://www.datamartist.com/centralized-business-intelligence-alone-does-not-work#comments</comments>
		<pubDate>Wed, 03 Mar 2010 21:10:27 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Business Intelligence Architecture]]></category>
		<category><![CDATA[Meta Data]]></category>
		<category><![CDATA[Reality Check]]></category>
		<category><![CDATA[Business Intelligence trends]]></category>
		<category><![CDATA[Business Intelligence Workspace]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=4342</guid>
		<description><![CDATA[One version of the truth.  Data warehouses.  Centralized business intelligence teams.  This has been the best practice for business intelligence for the last two decades.  
Users taking the initiative with data has been seen as the enemy of a successful business intelligence program.  
This needs to change.  In a [...]]]></description>
			<content:encoded><![CDATA[<p>One version of the truth.  Data warehouses.  Centralized business intelligence teams.  This has been the best practice for business intelligence for the last two decades.  </p>
<p>Users taking the initiative with data has been seen as the enemy of a successful business intelligence program.  </p>
<p>This needs to change.  In a world of ever increasing data volumes and complexity, faster business processes and more data savvy knowledge workers, a purely centralized solution is doomed to fail.</p>
<p>A consensus is starting form that the best architecture is one that blends centralized with more distributed and (gasp) free form, user guided methods.  In fact, when we look at what actually exists in most enterprises and take into account the unofficial shadow systems, we're already there, but in two separate camps that aren't talking. </p>
<p>The amount of freedom to allow ranges from letting the users have at it, to opening up the possibility of <a href="http://tdwi.org/blogs/wayneeckerson/2010/02/zen-bi-and-the-wisdom-of-letting--go.aspx" target="_blank">departmental data marts</a>, but the buzz out of TDWI clearly indicates a growing acknowledgement that a rigid top down architecture is not tenable.</p>
<p>What are Oracle, IBM, Microsoft SAP and SAS (who own more than 70% of the Business intelligence market share) advising as being the right approach?</p>
<p>They advocate big architectures, centralized meta data management, big databases, lots of command and control. They talk about "self serve"- but they mean to existing reports or report interfaces. To be fair, they need to sell the tools they have.</p>
<p>For a refreshing change from this, I very much enjoyed reading <a href="http://events.tdwi.org/Events/Las-Vegas-World-Conference-2010/Sessions/Thursday/Keynote-Stop-Paving-the-Cowpath.aspx" target="_blank">Mark Madsens keynote at TDWI</a> "Stop paving the cow path".  </p>
<p>We enjoy reading things that we agree with, and I nodded my way through his slide deck.</p>
<p>In his presentation, Madsen points out that centralization won't work, because it:</p>
<ul>
<li>Creates bottlenecks</li>
<li>Causes scale problems</li>
<li>Enforces a single model</li>
</ul>
<h2>Bottlenecks and Scale</h2>
<p><a href="http://www.datamartist.com/wp-content/uploads/2010/03/data-warehouse-super-popular-or-big-backlog.jpg"><img src="http://www.datamartist.com/wp-content/uploads/2010/03/data-warehouse-super-popular-or-big-backlog.jpg" alt="" title="data-warehouse-super-popular-or-big-backlog" width="377" height="275" class="alignright size-full wp-image-4363" /></a>In a centralized system, all requests go into the queue, and the backlog starts piling up. </p>
<p>The size of the department/team that is responsible for making it all work becomes the number one bottleneck. </p>
<p>Are there enough people able to prioritize and analyse the payback on analysis requests? Because in a centralized organisation, the gatekeepers are necessary, and how do they KNOW which requests are the good ones?  How does anyone really know?</p>
<p>I'm not sure any company can afford to staff a centralized data warehouse team to be able to handle all the requests as they are generated. Prioritization therefore becomes a single point of failure.  Get it wrong, and it can be all wrong.  In a more distributed structure, decisions are made at multiple points, some good, some bad, but diversity will often bring more innovative and experimental behavior, resulting in new avenues of analysis that a overly static central team might avoid.</p>
<p>For an indication as to how well users think the central team is listening to them, take a look at how many excel spreadsheets there are around, and how many shadow systems grow like mushrooms throughout the standard enterprise.  People think their analysis is important, and even if IT won't or can't they find a way to try to get it done.</p>
<p><a href="http://www.datamartist.com/wp-content/uploads/2010/03/data-warehouse-not-used-convert-storage-for-spreadsheets.jpg"><img src="http://www.datamartist.com/wp-content/uploads/2010/03/data-warehouse-not-used-convert-storage-for-spreadsheets.jpg" alt="" title="data-warehouse-not-used-convert-storage-for-spreadsheets" width="373" height="271" class="alignleft size-full wp-image-4364" /></a>In terms of scaling, I can hear the technical types starting to explain about how their servers, infrastructure and approach scales- diagrams and MPP theories pulled out with pride.  "Centralizing lets it be scalable- what are you talking about?"</p>
<p>Maybe. But there are traps here too- centralized organisations always want to put everything in one database.  Having everything in a single repository starts to become the goal- not the cost efficient analysis of the right data.  Not centralizing is very scalable- stand alone machines can just be added for ever.</p>
<p>It may in fact be that data can remain distributed and diverse at certain levels of detail, and more federated approaches can be used, resulting in cheaper hardware and software, and more importantly avoiding a lot of really hard master data management work.  Consolidation can sometimes happen at summary levels that make sense from a business point of view- not just blindly following the "one version" mantra.</p>
<h2>Enforcing a single model</h2>
<p>Isn't having a single data model good?  We've been told that it is.  In a way, this is the holy grail.  </p>
<p>But is there a single, correct, slowly changing model that satisfies everyone in an organisation?  </p>
<p>Why do I say slowly changing?  Because if there is only one for the entire enterprise, it will change slowly, if at all.  </p>
<p>Even if you happen to understand what the right model is, (and by model I mean data model, analysis model, process model, any model) and you manage to implement it while its still the right model, in a year its not going to be the right one.  And a centralized, high cost, committed architecture won't and can't adapt.  You'll still be paying the mortgage on the data warehouse.</p>
<p>Very large centralized models cannot be comprehensive and up to date, because to be comprehensive they have to be so complex as to be difficult to change, and as a result they quickly become out of date.  It's sort of a Heisenberg uncertainty principle for common meta data repositories.</p>
<h2>"Giving people their flying cars"</h2>
<p>Madsen of course doesn't solve the entire problem in his keynote, but he points out some directions that make sense.  And his graphic depicting a happy couple blasting off in their very locally controlled flying car sends the message- users can do their analysis without central oversight or interaction. (Although, one would imagine that some sort of air traffic control would be necessary, and the refueling stations for the cars would probably be run centrally- we're not advocating anarchy here.)</p>
<p>Having built data warehouses, established a data warehouse competency center, and provided business intelligence services for thousands of users, I can testify from first hand experience that centralizing alone is just not going to work.  People who worked with me a decade ago will remember the significant amount of time spent creating meta data repositories.  Are they still needed?  Yes.  But they simply can't do everything.  Use them with care, and be wary of your ambition for them.</p>
<p>First, accept the fact that users are not mindless consumers.  Learn from the fact that they use excel constantly, and they don't just read reports- they build things, adding data, fixing data, re-organizing data.  They think.  Give them tools that include them as part of the data processing.</p>
<p>Business intelligence cannot not be solely a process where formal requirements are gathered, followed by a publishing exercise of delivering the reports on time.</p>
<p>Are there some reports where this is the case? Sure.  Monthly management reports and dashboards shouldn't change every month.  The model can work for some amount of the delivered data analysis.  </p>
<p>The entire architecture isn't getting ripped out- but if the new architecture is successful in bringing the pent up demand that is currently being satisfied by shadow systems into the light, then distributed, user centric, user driven business intelligence will become a significant percentage of the total.</p>
<p>But the old way of thinking has to change.  Don't "Crack down on shadow systems".  </p>
<p>Find a way to provide better service, be it self, assisted or centralized service that makes the shadow systems simply a less effective way to do it.</p>
<p>The existence of shadow systems, and the extent of them, is the clearest argument that centralized business intelligence alone is simply not up to the task.</p>
<p>Once you have people doing whatever they want in the self directed part of your architecture, DO watch what they are doing- not to control it, but to learn from it.  Everyone constantly re-structuring the customer dimension?  Obviously it's time for an update.  By watching what users edit, what gaps they fill in, you can find the data quality issues, identify the fuel to put on the self directed fire.</p>
<p>Tools like Lyzasoft, <a href="/">our own Datamartist tool</a>, and Microsoft's Power Pivot in Excel 2010 and others are all going to drive power to the users, and introduce a new balanced approach between centralized and local parts of business intelligence architectures.  Visualization tools like Tableau will further give people the ability to create powerful, consumable analysis in a self serve mode.</p>
<p>Will there be challenges with data quality, risk management and wasted time doing pointless analysis? Most likely.  </p>
<p>Will the information we gather and the payoff from the successful bottom up analysis efforts make it hugely valuable overall? I for one think so.</p>
<p>We need to learn to trust our colleagues with the data, while at the same time managing the reality of data quality and risk of errors that more free form techniques can create.</p>
<p>Companies that include both top down and bottom up capabilities in their architecture will stop wasting time fighting internally, and start to take advantage of all that data.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/centralized-business-intelligence-alone-does-not-work/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Datamartist V1.2 now available</title>
		<link>http://www.datamartist.com/datamartist-v1-2-now-available</link>
		<comments>http://www.datamartist.com/datamartist-v1-2-now-available#comments</comments>
		<pubDate>Tue, 02 Mar 2010 14:45:33 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Data Transformation]]></category>
		<category><![CDATA[Datamartist Tool]]></category>
		<category><![CDATA[ETL]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=4266</guid>
		<description><![CDATA[nModal solutions is pleased to announce that Datamartist V1.2 is now available.
In this version, we've introduced a Standard and Pro edition, letting customers get the features they need at the right price. 

Datamartist Standard:       $349
Datamartist Professional:   $745

A comparison of the feature sets explains the details.
Whats new in [...]]]></description>
			<content:encoded><![CDATA[<p>nModal solutions is pleased to announce that Datamartist V1.2 is now available.</p>
<p><a href="http://www.datamartist.com/wp-content/uploads/2010/02/Sales-example-full-screen-shot-profiler-perspective-300w.jpg"><img src="http://www.datamartist.com/wp-content/uploads/2010/02/Sales-example-full-screen-shot-profiler-perspective-300w.jpg" alt="" title="Sales-example-full-screen-shot-profiler-perspective-300w" width="300" height="228" class="alignright size-full wp-image-4302" /></a>In this version, we've introduced a Standard and Pro edition, letting customers get the features they need at the right price. </p>
<ul>
<li>Datamartist Standard:       $349</h3>
<li>Datamartist Professional:   $745</h3>
</ul>
<p>A <a href="/product/datamartist-pricing-and-edition-comparison">comparison of the feature sets explains</a> the details.</p>
<h1>Whats new in V1.2</h1>
<h2>Data source import enhancements</h2>
<ul style="margin-top:10px;">
<li>Ability to cut and paste between Excel, Text files, the Datamartist canvas and any Datamartist data viewer.</li>
<li>New integrated data source repository with drag and drop to canvas.</li>
<li>SQL Editor to allow the creation of SQL queries to get data from databases.</li>
</ul>
<p><a href="http://www.datamartist.com/wp-content/uploads/2010/02/Edit-SQL-Datamartist1.jpg"><img src="http://www.datamartist.com/wp-content/uploads/2010/02/Edit-SQL-Datamartist1.jpg" alt="" title="Edit-SQL-Datamartist" width="609" height="342" class="aligncenter size-full wp-image-4307" /></a></p>
<h2>Running Datamartist canvases automatically</h2>
<p>Now that Datamartist can be run from the command line, it is possible to schedule datamartist transforms- even running it on a Windows server.  Details about the logging and options <a href="/resources/datamartist-doc-files/V1_0_Documentation/DM-running-from-cmd-line-Doc.html">are here</a>.<br />
<a href="http://www.datamartist.com/wp-content/uploads/2010/02/Running-datamartist-from-the-command-line-610w.jpg"><img src="http://www.datamartist.com/wp-content/uploads/2010/02/Running-datamartist-from-the-command-line-610w.jpg" alt="" title="Running-datamartist-from-the-command-line-610w" width="610" height="308" class="aligncenter size-full wp-image-4310" /></a></p>
<h2>Edit Internal data sets.</h2>
<p>The addition of fully editable internal data sets that are stored within the DMC file itself gives a powerful new ability to create "What if" type scenarios.  Imagine you want to see the effect of changing the sales regions slightly-  just copy and paste the existing from a data viewer onto the canvas- that gives you an internal data set block with that data in it-  now you can add a column "New Region" or rename the column, then edit some values, join it back into the original data with a join block, and be trying different scenarios in no time.<br />
<a href="http://www.datamartist.com/wp-content/uploads/2010/02/Internal-edit-regions-list.jpg"><img src="http://www.datamartist.com/wp-content/uploads/2010/02/Internal-edit-regions-list.jpg" alt="" title="Internal-edit-regions-list" width="547" height="389" class="aligncenter size-full wp-image-4313" /></a></p>
<p>We're excited about this new release, and thanks to all our customers and testers for their feedback- we're glad to be incorporating some of those great ideas into the product.</p>
<p>If you haven't tried Datamartist yet, <a href="/downloads">this is the perfect time</a>, and now with two editions to choose from you can get the features you need at the right price.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/datamartist-v1-2-now-available/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Tableau Public- great visualization now where do we get the data?</title>
		<link>http://www.datamartist.com/tableau-public-feature-review-and-use-with-datamartist</link>
		<comments>http://www.datamartist.com/tableau-public-feature-review-and-use-with-datamartist#comments</comments>
		<pubDate>Fri, 12 Feb 2010 03:44:53 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Datamartist Tool]]></category>
		<category><![CDATA[Visualization]]></category>
		<category><![CDATA[Business Intelligence]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=4121</guid>
		<description><![CDATA[Good news on the visualization front this week when Tableau announced that it was making its well received visualization software available in a free public version, as well as providing a structure to allow users to integrate Tableau visualizations into their websites.  
Tableau has received a fair amount of positive response from the visualization [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://www.datamartist.com/wp-content/uploads/2010/02/tableau-public-logo-300x65.jpg" alt="tableau-public-logo" title="tableau-public-logo" width="300" height="65" class="alignright size-medium wp-image-4135" />Good news on the visualization front this week when Tableau announced that it was making its well received visualization software available in <a href="http://www.tableausoftware.com/public/" target="_blank">a free public version</a>, as well as providing a structure to allow users to integrate Tableau visualizations into their websites.  </p>
<p>Tableau has received a fair amount of positive response from the visualization world.  Even <a href="http://www.perceptualedge.com/" target="_blank">Stephen Few</a>, who isn't shy to point out when visualizations are straying from the straight and narrow has been supportive of Tableau from the start.</p>
<p>We're excited about this new access to such a great data visualization tool because we know that people who do visualization have to transform their data- and if users of Tableau want a flexible, visual data transformation tool, the <a href="/">Datamartist tool</a> is an obvious choice.</p>
<h2> Free Tableau vs Professional version</h2>
<p><img src="http://www.datamartist.com/wp-content/uploads/2010/02/Tableau-public-intro-data-source-selection.jpg" alt="Tableau-public-intro-data-source-selection" title="Tableau-public-intro-data-source-selection" width="300" height="299" class="alignright size-full wp-image-4122" /></p>
<p>There are of course <a href="http://www.tableausoftware.com/forum/data-requirements-and-limitations-tableau-public" target="_blank">some limitations</a> with the free version of Tableau, in comparison with the full featured professional version ($1600 USD per seat.)</p>
<ul>
<li> Data import capabilities - Only MS Access, Excel and Text files</li>
<li> 100 000 row Limit per table</li>
<li> 50 Mb Limit per organization for the web server.</li>
</ul>
<p>None of these limitations will stop you from making some pretty fantastic visualizations as long as your final summarized data set fits within the limits and you put it in the right format.  But it does mean that to use this version of Tableau, you need to use another tool to get those large data sets summarized.  This is probably something you are doing anyway, because there is almost always some data cleanup to do.</p>
<h2>Get your data sets ready- there are going to be some beautiful viz getting made</h2>
<p>The trick with visualization of course, is that you need data to visualize.  With tools like Tableau, as good as they are at making the pictures, you have to get the data set to them first- and we all know how many pre-formatted, all is well, no data quality issues data sets there are lying around the real world (Hint: none.).</p>
<p>But I'll tell you, once you do have the data, what a fantastic bit of interactive web based visualization Tableau can do.</p>
<p>Often, people use a combination of MS Excel and Access to create the datasets that they want, then connect to the Access database with Tableau.  Of course, we suggest you try Datamartist instead.  Datamartist and Tableau are a powerful combination- first, using Datamartist pull data from mutliple sources (Datamartist loads out of SQL Server, Oracle, MySQL, and at a license price much less than Tableau Professional).  </p>
<p>Datamartist lets you join tables visually (using a venn diagram interface that we're proud of), segment data using rule sets, summarize millions of rows if needed, and generally parse and transform with an easy to use calculation engine.  Once you have the data where you want it, export it easily to an Access database, and let Tableau Public generate the visualizations that you need.</p>
<p>I'm excited about Tableau's decision to make its power available in this public version.  I intend to do some serious data crunching with the <a href="/">Datamartist Tool</a>, followed by some interactive visualizations with Tableau Public.  The beauty of the Tableau Public setup is that I can then publish the visualizations right here in the blog, and highlight what the combination of these two tools can do.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/tableau-public-feature-review-and-use-with-datamartist/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Inner and outer joins SQL examples and the Join block</title>
		<link>http://www.datamartist.com/sql-inner-join-left-outer-join-full-outer-join-examples-with-syntax-for-sql-server</link>
		<comments>http://www.datamartist.com/sql-inner-join-left-outer-join-full-outer-join-examples-with-syntax-for-sql-server#comments</comments>
		<pubDate>Wed, 10 Feb 2010 16:13:45 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Datamartist Tool]]></category>
		<category><![CDATA[ETL]]></category>
		<category><![CDATA[SQL Code]]></category>
		<category><![CDATA[Joining data]]></category>
		<category><![CDATA[SQL]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=3966</guid>
		<description><![CDATA[
In this post I'll show you how to do all the main types of Joins with clear SQL examples.  The examples are written for Microsoft SQL Server, but very similar syntax is used in Oracle, MySQL and other databases.
Joins can be said to be INNER or OUTER joins, and the two tables involved are [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://www.datamartist.com/wp-content/uploads/2010/02/join-block-venn-diagram-datamartist.jpg" alt="join-block-venn-diagram-datamartist" title="join-block-venn-diagram-datamartist" width="212" height="188" class="alignright size-full wp-image-4068" /><br />
In this post I'll show you how to do all the main types of Joins with clear SQL examples.  The examples are written for Microsoft SQL Server, but very similar syntax is used in Oracle, MySQL and other databases.</p>
<p>Joins can be said to be INNER or OUTER joins, and the two tables involved are referred to as LEFT and RIGHT.  By combining these two concepts you get all the various types of joins in join land: Inner, left outer, right outer, and the full outer join.  </p>
<h2>Tables used for SQL Examples</h2>
<p><img src="http://www.datamartist.com/wp-content/uploads/2010/02/Join-Example-Students-And-Advisors-Tables.jpg" alt="Join-Example-Students-And-Advisors-Tables" title="Join-Example-Students-And-Advisors-Tables" width="606" height="214" class="aligncenter size-full wp-image-4057" /></p>
<p>In the screen shots I've configured Datamartist to  only show the name columns to save space.  The SQL code shown is "Select *" so it will return all the columns.  You can see that in the <a href="/">Datamartist tool</a> the type of join is selected by just checking the parts of the venn diagram that contain the rows you want.</p>
<h2>1) Inner Join SQL Example</h2>
<p><code>select * from dbo.Students S INNER JOIN dbo.Advisors A ON S.Advisor_ID=A.Advisor_ID</code></p>
<p><img src="http://www.datamartist.com/wp-content/uploads/2010/02/Join-Example-Students-And-Advisors-Inner-Join.jpg" alt="Join-Example-Students-And-Advisors-Inner-Join" title="Join-Example-Students-And-Advisors-Inner-Join" width="560" height="234" class="aligncenter size-full wp-image-4058" /></p>
<h2>2) Left Outer Join SQL Example</h2>
<p><code>select * from dbo.Students S LEFT OUTER JOIN dbo.Advisors A ON S.Advisor_ID=A.Advisor_ID</code></p>
<p><img src="http://www.datamartist.com/wp-content/uploads/2010/02/Join-Example-Students-And-Advisors-Left-Outer-Join.jpg" alt="Join-Example-Students-And-Advisors-Left-Outer-Join" title="Join-Example-Students-And-Advisors-Left-Outer-Join" width="625" height="265" class="aligncenter size-full wp-image-4059" /></p>
<h2>4) Full Outer Join SQL Example</h2>
<p><code>select * from dbo.Students S FULL OUTER JOIN dbo.Advisors A ON S.Advisor_ID=A.Advisor_ID</code><br />
<img src="http://www.datamartist.com/wp-content/uploads/2010/02/Join-Example-Students-And-Advisors-Full-Outer-Join.jpg" alt="Join-Example-Students-And-Advisors-Full-Outer-Join" title="Join-Example-Students-And-Advisors-Full-Outer-Join" width="581" height="291" class="aligncenter size-full wp-image-4063" /></p>
<h2>5) SQL example for just getting the rows that don't join</h2>
<p><code>select * from dbo.Students S FULL OUTER JOIN dbo.Advisors A ON S.Advisor_ID=A.Advisor_ID where A.Advisor_ID is null or S.Student_ID is null</code><br />
<img src="http://www.datamartist.com/wp-content/uploads/2010/02/Join-Example-Students-And-Advisors-non-joining-Join.jpg" alt="Join-Example-Students-And-Advisors-non-joining-Join" title="Join-Example-Students-And-Advisors-non-joining-Join" width="638" height="227" class="aligncenter size-full wp-image-4065" /></p>
<h2>6) SQL example for just rows from one table that don't join</h2>
<p><code>select * from dbo.Students S FULL OUTER JOIN dbo.Advisors A ON S.Advisor_ID=A.Advisor_ID where A.Advisor_ID is null</code><br />
<img src="http://www.datamartist.com/wp-content/uploads/2010/02/Join-Example-Students-And-Advisors-left-exlusive-Join.jpg" alt="Join-Example-Students-And-Advisors-left-exlusive-Join" title="Join-Example-Students-And-Advisors-left-exlusive-Join" width="615" height="228" class="aligncenter size-full wp-image-4070" /></p>
<h1>But what about the duplicate row thing?</h1>
<p>Now, since in this case we had a simple one to one relationship, the number of rows that were returned made the venn diagrams make sense, and add up pretty normally with table one and two.</p>
<p>What happens if the data in the tables are not a simple one to one relationship?  What happens if we add one duplicate advisor with the same ID, but a different name?<br />
<img src="http://www.datamartist.com/wp-content/uploads/2010/02/Join-Example-Students-And-Advisors-duplicate-advisors.jpg" alt="Join-Example-Students-And-Advisors-duplicate-advisors" title="Join-Example-Students-And-Advisors-duplicate-advisors" width="431" height="184" class="aligncenter size-full wp-image-4080" /></p>
<p>A join will create a row for every combination of rows that join together.  So if there are two advisors with the same key, for every student record that has that key, you will have two rows in the inner part of the join.  The advisor duplicate makes duplicate student records for every student with that advisor.</p>
<p>You can see how this could add up to a lot of extra rows.  The number of rows is the product of the two sets of joining rows. If the tables get big, just a few duplicates will cause the results of a join to be much larger than the total number of rows in the input tables- this is something you have to watch very carefully when joining- check your row counts.</p>
<p>So there you have it.  If you want to try joining tables with the Datamartist tool- <a href="/downloads">give it a try</a>.  It's a super fast install, and you'll be joining like a pro in no time.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/sql-inner-join-left-outer-join-full-outer-join-examples-with-syntax-for-sql-server/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Estimating the cost of Business Intelligence</title>
		<link>http://www.datamartist.com/estimating-the-cost-of-business-intelligence</link>
		<comments>http://www.datamartist.com/estimating-the-cost-of-business-intelligence#comments</comments>
		<pubDate>Mon, 08 Feb 2010 22:27:49 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Business Intelligence Architecture]]></category>
		<category><![CDATA[Reality Check]]></category>
		<category><![CDATA[Business Intelligence]]></category>
		<category><![CDATA[Cost Reduction]]></category>
		<category><![CDATA[Forrester Research]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=3975</guid>
		<description><![CDATA[How much does a single Business Intelligence report cost a company?  Well, obviously there is no single answer- but Boris Evelson of Forrester took a shot at it recently in a blog post.  Even when it's not an easy question, it is worth pursuing, and Boris lays out a useful discussion.

 $150 000 [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://www.datamartist.com/wp-content/uploads/2010/02/is-it-the-last-truck-load-of-money-for-the-data-warehouse.jpg" alt="is-it-the-last-truck-load-of-money-for-the-data-warehouse" title="is-it-the-last-truck-load-of-money-for-the-data-warehouse" width="423" height="287" class="alignright size-full wp-image-4044" />How much does a single Business Intelligence report cost a company?  Well, obviously there is no single answer- but Boris Evelson of Forrester took a shot at it recently <a href="http://blogs.forrester.com/business_process/2010/01/bottom-up-and-top-down-approaches-to-estimating-cost-for-a-single-bi-report.html" target="_blank">in a blog post</a>.  Even when it's not an easy question, it is worth pursuing, and Boris lays out a useful discussion.</p>
<ul>
<li> $150 000   is the AVERAGE cost of business intelligence software for a DEPARTMENT</li>
<li> ETL software (Extract transform and load) is also $150 000 on average.</li>
</ul>
<p>And the rule of thumb for cost of effort and services is <strong>5 times the software cost</strong></p>
<p>I'm not making this up. Check the link.</p>
<p>In the end, Boris suggests that the cost of a single, fairly straight forward report might be <bold>$20,000.</bold>  Of course as he rightly points out there are lots of variables, and it's a classic case of "it depends",  but even so- clearly you want to be sure the reports add value when you are using a process that requires that kind of investment.</p>
<p>Boris mentions in passing that the cost of a single day of an external developer he uses for estimating is $800 USD.  You can buy two licenses of Datamartist and take a friend out for dinner for that.</p>
<p>Don't get me wrong- for a number of applications you need the big enterprise stuff- but in my mind it makes sense to avoid it when you can.  Enterprise business intelligence has its place, but there are alternatives.  The rampant use of Excel spreadsheets is evidence of the fact there is huge demand for data out there.  <a href="/downloads">Try Datamartist</a> and find another even more powerful way to get a the data for those cases where you need to do more than a spreadsheet, but it's not time to kick off a data warehouse project.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/estimating-the-cost-of-business-intelligence/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Mystery or Junk data warehouse dimensions</title>
		<link>http://www.datamartist.com/mystery-or-junk-data-warehouse-dimensions</link>
		<comments>http://www.datamartist.com/mystery-or-junk-data-warehouse-dimensions#comments</comments>
		<pubDate>Mon, 18 Jan 2010 17:10:46 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Data Modelling]]></category>
		<category><![CDATA[Data warehouse]]></category>
		<category><![CDATA[Dimension Tables]]></category>
		<category><![CDATA[ETL]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=2933</guid>
		<description><![CDATA[Sometimes, when you are designing a star schema model, you'll find yourself in a dilemma.  You've come up with a beautiful design, right out of the pages of a Ralph Kimball book with 5 dimensions, and 5 measures, and you are on your way to star schema heaven when suddenly the users start asking [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://www.datamartist.com/wp-content/uploads/2010/01/ralph-kimball-on-the-phone-too-many-dimensions1.jpg" alt="ralph-kimball-on-the-phone-too-many-dimensions" title="ralph-kimball-on-the-phone-too-many-dimensions" width="317" height="199" class="alignright size-full wp-image-3952" />Sometimes, when you are designing a star schema model, you'll find yourself in a dilemma.  You've come up with a beautiful design, right out of the pages of a Ralph Kimball book with 5 dimensions, and 5 measures, and you are on your way to star schema heaven when suddenly the users start asking akward questions- where is such and such flag?  Where's the transaction type?  Why can't I sort based on the "e7" code from the system?</p>
<p>You can try to explain to them that pure star schemas should not be cluttered with a bunch of tiny dimensions and your fact table just won't stand for 100 million rows of the e7 code, and besides computery things like transaction codes should not be in a business savy data model.  But face it, after some digging you determine the user is right (happens quite often in fact)- they really do use that information and it is critical that you include it and you don't have the time or budget to make the perfect data warehouse.</p>
<p>So how do you deliver to them what they need, and avoid messing up your dimensional model?</p>
<p>One answer is to create one or more Junk dimensions, sometimes also referred to as a mystery dimension. </p>
<p>In the end although the content of a mystery dimension may or may not be mysterious, there is nothing particulary mysterious about how to implement this type of dimension table.  </p>
<p>Even if its perfectly clear what the column is, there are often a number of them with very low cardinality (that is they have very few distinct values).  It really does not make sense to add columns in the fact table for each one, and to have a bunch of tiny dimension tables with only a handful of rows in them.</p>
<p>Faced with this the data architect can wrap all these columns up into a junk dimension.</p>
<p>A junk dimension is a dimension that holds all the unique combinations of a set of columns, and assigns a unique key.  This key is what is stored in the fact table, in the mystery dimension column.</p>
<p>Lets look at a mystery dimension example.  We'll make up and example dimension thats very small for simplicity sake.  Lets say that the transactional table that is used to generate one of our facts has three columns "Zortz" "a3" and "uudl" which we fully satisfy our mystery dimension criteria.  (i.e. we don't know what they are, but people use them in queries.)</p>
<p>"Zortz" is a true/false value, "a3" is one of two values "Confirmed" or "Pending" and "uudl" is either "" or "k".  All the possible combinations of these values would be put into a dimension table and assigned an integer surrogate key.  Thus the mystery dimension table would look like this:<br />
<img src="http://www.datamartist.com/wp-content/uploads/2009/11/mystery-dimension-example-data-set.jpg" alt="mystery-dimension-example-data-set" title="mystery-dimension-example-data-set" width="425" height="200" class="aligncenter size-full wp-image-3548" /></p>
<p>A key consideration when forming mystery dimensions is how many combinations exist.  If the number of combinations is too high the mystery dimensions size may be unmanageable.</p>
<p>And be careful assuming that all the combinations have been used yet.  You are safe if the data type has a fixed set of values (like Boolean, or codes from a known finite set) because you can be sure you've created a dimension row for every combination.</p>
<p>But if there are free form string columns, then you need to make sure your ETL is able to generate new dimension rows and surrogate keys as new combinations are created in the source system.  This might still be worth while, depending on how many new combinations get created.</p>
<p>You can also manage the size of the mystery dimension tables by having 2 or more mystery dimensions, which might reduce the overall number of dimensional rows depending on the makeup of the data.  Different columns and values may tend to cluster together and you will find that grouping them correctly makes say, two small mystery dimensions rather than one huge one.</p>
<p>If, however the number of rows is manageable, a mystery dimension allows all the columns to be queriable, while only adding one column to the fact table, and providing a much more efficient solution in comparison to either creating multiple dimensions, or leaving all the data in the fact table.  </p>
<p>By moving it to a junk dimension or "mystery" dimension then you've got fewer indexes on the fact table which might be important depending on the size.  </p>
<p>So if you find yourself telling your end users that they will just have to do without a column, think twice about it.  The role of a data warehouse is to deliver the data- sometimes you just have to find the right packaging to get the job done.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/mystery-or-junk-data-warehouse-dimensions/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Spreadsheet errors- Fear, uncertainty and doubt</title>
		<link>http://www.datamartist.com/spreadsheet-risk-and-errors-fear-uncertainty-and-doubt</link>
		<comments>http://www.datamartist.com/spreadsheet-risk-and-errors-fear-uncertainty-and-doubt#comments</comments>
		<pubDate>Mon, 11 Jan 2010 18:54:46 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Business Intelligence Architecture]]></category>
		<category><![CDATA[Data Quality]]></category>
		<category><![CDATA[MS Excel]]></category>
		<category><![CDATA[Reality Check]]></category>
		<category><![CDATA[data culture]]></category>
		<category><![CDATA[Business Intelligence trends]]></category>
		<category><![CDATA[Excel]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=3831</guid>
		<description><![CDATA[I love the acronym FUD which stands for "Fear, uncertainty and doubt".  What I don't love is the underhanded use of FUD to manipulate peoples behavior.  Spreading FUD is not about creating something new, but destroying- destroying someones confidence in something, clouding the real issue, stopping a new or creative direction from being [...]]]></description>
			<content:encoded><![CDATA[<p>I love the acronym FUD which stands for "Fear, uncertainty and doubt".  What I don't love is the underhanded use of FUD to manipulate peoples behavior.  Spreading FUD is not about creating something new, but destroying- destroying someones confidence in something, clouding the real issue, stopping a new or creative direction from being taken.  FUD is often used to block reform and change because FUD can cause people to do nothing- and doing nothing is good for the incumbent.</p>
<p>In the data analysis realm, spreadsheet errors are often used to try to dissuade companies from letting their people "work with the data directly".  Software vendors of all sizes, but particularly the really big ones (those incumbants) spread FUD because if they can stop people from getting at the data themselves, it increases the chance of companies buying some more business intelligence suites.</p>
<p>The argument goes something like this:</p>
<blockquote><p>Spreadsheets have been shown to be plagued with errors, many studies showing error rates above 90%.  You need to reduce the risk that spreadsheets are creating in your organization by establishing formal, documented processes that are created an managed by professionals using sophisticated tools.</p></blockquote>
<p>Then the usual nightmare scenarios are brought out, all involving rabid Auditors, Sarbane-Oxley, governance failures etc.</p>
<p><img src="http://www.datamartist.com/wp-content/uploads/2010/01/accidently-put-last-years-spreadsheet-number-into-annual-report1.jpg" alt="accidently-put-last-years-spreadsheet-number-into-annual-report" title="accidently-put-last-years-spreadsheet-number-into-annual-report" width="341" height="226" class="alignright size-full wp-image-3839" />Now, don't get me wrong, spreadsheet errors are a very real and serious problem, and there are all sorts of data applications that should never be done in Excel or other ad-hoc, user driven tools. Ever.  Formal documented processes are critically important, and there are lots of places where you better be using the right tools and professionals.  </p>
<p>I have seen the culture of the spreadsheet completely undermine initiatives that would have driven better data quality, data analysis and business processes.  The spreadsheet certainly has its dark side.</p>
<p>But the problem is that FUD paints with a broad brush.  People take it as "Spreadsheets with data in them? Bad news. Don't do it.  Individuals able to get at the data, and quickly transform it, analyze it?  Who knows what they'll do- shut them down!"</p>
<p>Sadly, from a data quality point of view, sometimes the spreadsheets have the BEST data quality- because people have fixed the issues they can't fix in the transactional system due to constraints or IT department delays.</p>
<h2>Encourage positive change with reasonable controls.</h2>
<p>Intelligent, responsible people should be encouraged to use "informal" methods and tools to do data analysis.  </p>
<p>These people will find things, learn things, and drive positive change (including change in those big formal professional systems).  </p>
<p>They should do it with a reasonable understanding that doing things in an informal way, with spreadsheets or other tools does introduce errors, and should consider this when they recommend taking action based on the results. </p>
<h2>Balance between two extremes </h2>
<p><strong>The totalitarian state:</strong> I don't think there is an  IT department in the world that is capable of stopping all unofficial data analysis.  In fact, I would suggest that the moment such an IT department comes into existence, it would kill the host company, a harsh sort of self-regulation.  People interested in data and thinking for themselves would just pack up and leave. So who would be left making the decisions and based on what?</p>
<p><strong>The twisted web of spreadsheets:</strong> Companies that allow an anything goes, visual basic code, macros and manual cut and paste direct to the annual report environment are not going to be long for the world either.  They populate the horror story pages on <a href="http://www.eusprig.org/horror-stories.htm" target="_blank">the spreadsheet risk websites.</a></p>
<h2>The zone of win.</h2>
<p>You want to be somewhere between insane spreadsheet addiction and strict formal big tool paralysis.  </p>
<p>I submit that companies that balance risk while still encouraging their smart people to "play" with the data and do analysis in new and interesting ways with new tools are going to win.</p>
<p>Again, don't let this process generate your profit and loss statement- understand where and what the informal discovery process is for- but do let it discover things.  If it discovers something interesting you'll have the chance to check for the errors.  Make sure its part of the process to do so.</p>
<p>By letting the FUD get you down, you'll never get that far and who knows what insights you might be giving up?</p>
<p>Of course,  we believe you should go even further and give those intelligent, responsible people new tools that are less error prone than spreadsheets but still provide as much or even greater flexibility.  That's why we're building Datamartist after all.</p>
<p>Openness, balance, and clear minded pragmatism will get you further than FUD every time.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/spreadsheet-risk-and-errors-fear-uncertainty-and-doubt/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The tragedy of anti-data leadership and dataphobia</title>
		<link>http://www.datamartist.com/anti-data-leadership-the-lies-of-non-fact-based-management</link>
		<comments>http://www.datamartist.com/anti-data-leadership-the-lies-of-non-fact-based-management#comments</comments>
		<pubDate>Thu, 07 Jan 2010 17:44:34 +0000</pubDate>
		<dc:creator>James Standen</dc:creator>
				<category><![CDATA[Data Quality]]></category>
		<category><![CDATA[Reality Check]]></category>
		<category><![CDATA[data culture]]></category>

		<guid isPermaLink="false">http://www.datamartist.com/?p=3769</guid>
		<description><![CDATA[There has been a lot of discussion in the last year or so about how important data analysis is becoming.  
IBM made a major move into data analytics by establishing a new organisation "Business Analytics &#038; Optimization Services" with 4000 people in it.
There was the much quoted Hal Varian of Google who predicted that [...]]]></description>
			<content:encoded><![CDATA[<p>There has been a lot of discussion in the last year or so about how important data analysis is becoming.  </p>
<p>IBM made a major move into data analytics by establishing a new organisation <a href="http://www.businessweek.com/technology/content/apr2009/tc20090414_322525.htm?chan=top+news_top+news+index+-+temp_news+%2B+analysis" target="_blank">"Business Analytics &#038; Optimization Services"</a> with 4000 people in it.</p>
<p>There was the <a href="http://www.wired.com/culture/culturereviews/magazine/17-06/nep_googlenomics?currentPage=1" target="_blank">much quoted Hal Varian</a> of Google who predicted that the sexy new job this century will be some sort of data analyst/statistician.</p>
<p>But I believe there is a powerful force in many businesses that will slow down our headlong rush towards a fact based, analytical thinking, data quality focused future.</p>
<p>As a group they are generally referred to as "Upper management" or "Leadership".<img src="http://www.datamartist.com/wp-content/uploads/2010/01/the-data-days-no-the-ceo-says-yes-300x222.jpg" alt="the-data-days-no-the-ceo-says-yes" title="the-data-days-no-the-ceo-says-yes" width="300" height="222" class="alignright size-medium wp-image-3815" /></p>
<p>Now to be fair, there are obviously great leaders and executives that understand that data is important.  </p>
<p>But the fact that making decisions based on facts and data is actually defined as school of thought- "Fact based management" or "Evidence based management", or in the medical area its called "evidence based medicine" illustrates that too many alternatives still exist.</p>
<h2>The lies and dirty tricks of anti-data leadership</h2>
<p>They make comments that equate analysis with "delay".<br />
They confuse considering options with "indecisiveness".<br />
They don't invite people who actually have seen or understand the data to their meetings.</p>
<p>They come up with all sorts of alternate ways to make decisions- and defend their position even when the data clearly does not support them:</p>
<h3>Call it strategic</h3>
<blockquote><p>I know the numbers don't add up right now, but this is strategic.  </p></blockquote>
<p>What does that mean- our strategy is to do things without ROI?</p>
<h3>Go with consensus perception</h3>
<blockquote><p>We don't have time to get the actual data- we're going to have to make a decision based on what the people on the ground are seeing.</p></blockquote>
<p>If you ignore data, create a hypothesis and then go looking for supporting "evidence" in the form of people "on the ground" thinking it's a good idea, you'll find it.  </p>
<p><a href="http://agora.stanford.edu/sjls/Issue%20One/fisher&#038;tversky.htm" target="_blank">People take suggestions from your questions</a> and generate a matching memory/perception of what they think is happening in the real world.  This is something that is well understood and the accuracy of eye witness testimony is known to be poor.</p>
<h3>Blame the data quality</h3>
<blockquote><p>You know we have issues with that data.  I don't think we can risk relying on it.</p></blockquote>
<p>So what's the alternative? Tea leaves?  Might be some risk in that too.</p>
<p>And why is the data quality an issue? Probably because leadership didn't approve the budget and support the process changes that would have improved it.  If the top executives aren't responsible for data quality in their organisation and have decided not to use the data then a company is in a sad, dysfunctional state.</p>
<h1>Moving forward- fight the anti-data forces of evil</h1>
<p>Now, no-one can analyse forever- eventually a decision needs to be made.<br />
Often, not all the analysis we want to do can be done.  The number one reason anti-data leadership will likely reject doing detailed analysis is that it takes too long. They want to "pull the trigger" and get going, even if the decision is clueless (literally).</p>
<h2>Always be working on fixing the structural issues that slow analysis down</h2>
<p>These kinds of issues can slow you down:</p>
<ul>
<li>If you have bad quality data in your systems, any analysis must first fix it- causing delays.  </li>
<li>If you don't have the people on staff to do the analysis, you have to hire consultants, adding delay and cost.</li>
<li>If your data definitions are inconsistent across the company and with industry standards, mixing data from between operating units and other data sources takes forever.</li>
</ul>
<h2>Create a culture of data</h2>
<p>Some examples of beliefs that need to be openly stated and shared:</p>
<ul>
<li>the best way to make decisions, if possible, is by looking at actual data.</li>
<li>firing off decisions made on the basis of hunches isn't being "aggressive and decisive".  It's sloppy and incompetent.</li>
<li>data management and analysis is a key competency for ALL employees in ALL departments not just information technology.</li>
</ol>
<h2>Create data analysis SWAT teams</h2>
<p>On top of this, there are new techniques needed to enable data analysis to be fast enough to make decisions timely.  It is just not possible to launch a waterfall project, to try to find a date three weeks from now when everyone can get together for a functional requirements meeting.</p>
<p>Companies need to create teams (perhaps virtual, coming together when needed) that are able to use fast, flexible tools to do analysis quickly.  I am hoping that the <a href="/product/datamartist-for-developers">Datamartist tool</a> is one of the new tools that such SWAT teams would have in their toolkit.</p>
<p>The bottom line is that companies who have leaders that "get" data are going to be running circles around companies with executive dinosaurs who's eyes glaze over if anyone starts actually talking about facts and figures that can't fit on a single three dimensional pie chart in power point.</p>
<p>The future is data, but can we overcome the anti-data forces and their dataphobia?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datamartist.com/anti-data-leadership-the-lies-of-non-fact-based-management/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
