« | »

Why you should data profile.

Imagine that you have bought a new home, and you’ve decided to do some landscaping. So you pick three landscapers, draw a rough sketch of what you want, and ask them to bid on the job.

But you don`t allow them to come see your property, and your sketch doesn’t specify anything about the existing landscaping- just the final configuration. Do you think the landscapers would be willing to offer a reasonable price ?

Unlikely. What if there are existing patio stones to remove- or an in-ground swimming pool that`s got to go?

No landscaper would take on a job without understanding the lay of the land, and the existing conditions. It would be impossible to estimate the job. Anyone who did would give you a huge price to cover themselves, or demand extras upon discovering the extra work.

Yet when companies hire consultants to build them business intelligence solutions, or do data migration, it often happens with only the roughest outline of the existing data sets. Certainly, often a data model is included- but knowing what the table SHOULD contain rather than what it does is just not the same thing. It never ceases to amaze me that the simple, cost effective practice of data profiling is just often not part of the initial phases of so many business intelligence and data migration projects.

With the right data profiling tool, and just a few days work, its possible to gain a huge amount of insight into the data quality in your systems, and as a result, be able to make radically more accurate estimates of the cost to go from the “as is” to the “to be”.

Phil Simon talked about this in a great post on the Data flux blog called “What Consultants Don’t tell you”, and raises an important and somewhat ugly truth- many times, service providers don’t WANT to do data profiling because it reveals the true extent of the work to be done, increasing the budget requirement, and makes the project less likely to be approved.

Now certainly, we can’t use a broad brush to paint all consultants, but it does lead to a reduction in the number of times valuable tools such as data profiling are recommended even though in my opinion they are a low cost, no-brainer, do it unless you are crazy first step to any major project.

You are going to spend potentially millions of dollars on a business intelligence or data migration project- spend a few weeks to look at the data with the right tools first for goodness sake!

If you want to get a reasonable cost estimate, and you want to go into your business intelligence or data migration project with open eyes, don’t imagine you can know what it will cost to get from here to there if you don’t take a good look at where here really is.

Full disclosure– of course, you are reading the Datamartist blog, and Datamartist has lots of data profiling functionality- so you have to understand that we are incredibly biased on this topic. If you are able to overlook our inherent bias, give the tool a try– you`ll discover things about your data you might not have wanted to know, but its better to face the truth prepared, than to rely on wishful thinking, and then discover the bad news when you’re well into the project, and your budget is almost gone.

Tagged as: , ,


« | »


  1. Excellent post James,

    I really enjoyed the landscaping analogy. It has always been surprising to me to witness so many data initiatives (migration, integration, warehousing, BI, MDM, etc.) treating the data as an afterthought.

    So much discussion about data models, business requirements, expected system functionality, resource allocations, delivery dates, and of course financial costs–and all of these topics are vitally important–but little to no discussion about the data itself.

    Like you said, with just a few days of data profiling, its possible to gain a huge amount of insight into your data quality, and leverage this insight to produce more accurate estimates of the effort required.

    Best Regards,


  2. This is a really great post, James.

    Apart from better estimates, a thorough data profile would allow savvy companies and users the opportunity to better assess what data they actually have, how much is worth maintaining, what gaps exist and provides a more accurate portrayal of the effort necessary to maintain data in the future. The importance of data profiling and management unfortunately takes a back seat too often to “flash and sizzle” features in the project.

    I look forward to the opportunity to give Datamartist a try.