Datamartist gives you data profiling and data transformation in one easy to use visual tool.

« | »

Avoid the Spaghetti

spaghettiHas this ever happened to you? You have built some fantastic analytics- you are the Business Intelligence hero, the fixer of data, the creator of reports.

They ask you for something, you fix it. Data quality issue? No problem, you add filters, views, and report edits that hide the dirt. Its fast, its easy. You are the King or Queen of Data.

Then they ask you for a new analysis, and looking into it, you realize that there is no way to fix that.

And then you have to face the pile of spaghetti.

You have built so many reports on the existing structure that even the smallest change to the data model will break them all- and all the data quality patches, so fast to implement at the time, have to be reproduced for the new report, or none of the numbers are going to match.

Often, when moving fast, we make poor choices. Choices like adding that data fix in the report rather than in the database. Choices like pre-calculating ratios, pre-aggregating away detail and generally taking data steps that “we are sure will always be the case”.

Then we have to look at the spaghetti.

In my 20 years of messing with data, one of the most important lessons learned is this- don’t fix data quality problems in the reporting layer. Sure it works at first, but even medium term its a nightmare.

Reports and analysis should be views of clean, ready data in the data layer. Reports don’t contain data- they define a window on the data. When we add data fixes into a report, we are turning the report into a data transformation step. Any other report will not have access to this version of the data- as a result, any fixes must be reproduced accross every report. This creates a source of extra effort, and increases the likely hood of errors and omissions.

What this means is that while many tools have great features that let you do data cleansing right in the dashboard, and practicality means of course we use these features from time to time, every time we do we have to remember its a temporary patch. We need to fix the underlying data, either in the source system or at least in our repository, and then restore the report to its proper function.

Reports and Analytics tools should not fix data- they should display it.

Twitter

« | »

Leave a Response