Datamartist gives you data profiling and data transformation in one easy to use visual tool.

« | »

Data to the people- why self serve ETL

you-have-used-unautorised-data-transformationAs regular readers of this blog know, I believe in a balance between formal and informal data analysis tools.

I believe in an approach that firmly places people in the center of a new way of looking at the data analysis process.

In the past, “big business intelligence” created an infrastructure heavy, highly centralised and technology focused approach to getting data from source systems into reports in the hands of the users. Under this regime, users were not to be trusted with raw data, but were given tightly controlled, managed and aggregated reports in order to protect the “single version of the truth”.

The theory and practice were tightly defined, and had been honed over decades of business intelligence and data warehouse orthodoxy. Giving raw data to end users would lead to chaos. Letting end users define new ways to look at the data would corrupt the master data, and lead to everyone looking at something different.

You can guess the sort of response this “don’t give them the raw data” approach gets from capable, curious people that want to get down to some real analysis.

But to be fair you can see why these concerns are thought to be well founded. Almost every large enterprise is awash in a sea of excel files and a tangle of links and formulas. Excel is a wonderful tool, but it only offers the illusion of solving the data transformation problem. It is a much better reporting/dashboard tool than an ETL. (Although in the right hands it can do remarkable things.)

And this is the true state of affairs now. When the “official” system does not provide the answers that the business needs the people who need to make decisions get the data anyway, and they do it themselves. They do it in excel, they take night courses in Structured Query Language (SQL) they hire consultants (or even summer students) to build rogue data bases that they run on servers hidden under desks to get at the answers they need.

It is easy for the data warehouse theorists to highlight the clear issues with “spreadmarts” and “shadow systems”.

But we need to be pragmatic. The reality of building a centralized structure that imposes strict formal rules and change management processes is that often while it does ensure that there is only one version of the truth, it is a version of the truth that no one can use because it has been so formalized, aggregated, compromised and delayed that by the time it is delivered the pressing business questions have changed and meaning has been expunged. The data warehouse becomes reporting rather than analysis.

Its clear that enterprises need this kind of reporting- I’m not advocating abandoning the existing approach- but augmenting it. Up till now, the solution has often been “more of the same”.

The regime decided that the solution was to add more technology to the central systems, increase enforcement, and search out and repress all the dissident data manipulators. The data resistance was forced to go underground, to hide their spreadsheets, to outwardly appear to be following the official line.

It is very true that there are some risks in allowing people to analyze their own data, but there is also a reward. There are a small group of people who love data, who understand the business questions, who work to tease insight out of a steaming pile of raw data and can find things that are game changing. Massive, formal, designed by committee data warehouses can deliver a powerful and useful view of things, but they rarely offer flashes of insight. When they do, it is often during the design and discovery process- rarely by users using the system after it has gone live.

The Datamartist tool has been built based on the belief that both formal, centralized systems AND local, personal data transformation have a place in the architecture and that both should be official places.

People can be trusted with the data. In fact I think for an organisation to truly be successful at mastering its information, they have to be.

We have to realize that we can’t allow our obsession with the quest for a single version of the truth to turn us into totalitarian regimes, certain that OUR truth is THE truth, and that messing around with the data is by its very nature subversive and dangerous.

Data to the people.

Tagged as: ,

Twitter

« | »

Leave a Response