Datamartist gives you data profiling and data transformation in one easy to use visual tool.

« | »

A new years resolution to data profile

Well, it’s the time of making and breaking resolutions, a time when setting realistic goals is sometimes hard to do with all the optimism of the new year.

Sometimes, we make decisions NOT to set a goal, because we don’t want to break it.

You might be thinking you really should step up your data quality monitoring- get some data profiling underway to help identify the data domains and areas you most want to tackle in 2012. But you might be also thinking that with all the pressures and cutbacks that many companies are facing, you don’t have the resources to implement a full scale profiling and monitoring effort, and so might decide to delay.

Don’t wait. Just do it. The perfect is the enemy of the good.

Rather than worrying about how much of your data you are going to be able to cover, or that you can’t devote enough resources to tackle all of your reference areas at once, work at the problem from another direction.

First, start with master data.

Master data is the data that all your other data is made from. It’s the data everyone uses to view the massive piles of transactional data, so one bad row in a master data table, and the impact is felt across perhaps hundreds of reports, and multiple time periods. If you have a product in the wrong category, then every transaction, across perhaps hundreds of customers, and all time, will be mis-catagorized, and every total, sub-total and calculated metric using it will suffer.

While bad transactions are bad, bad reference data is deadly. Bad reference data takes a good transaction and messes it up.

Worst first!

Make a list of your reference tables/area. Customer, Product, Chart of account, etc. etc. What are the most important for your business? This isn’t something I can tell you- you have to think about what is most critical.

If you are a company that purchases large amounts of materials from many vendors, and purchasing decisions are fast paced and critical, then maybe it’s your vendor master, and your accounts payable.

On the other hand, if you have lots of interaction with your customers, and errors in the customer master cost you business, then start with that.

The key is to first make the list, and then think to yourself “if I have bad quality data, where am I most afraid it will be?” Start profiling there. You want to find the worst first, and fixing that will have the greatest positive impact.

Get to know your data

Don’t worry about setting complex or work intensive goals right away. Data profiling is about data discovery sometimes. You need to wade into your reference data, play with it, tease out patterns and relationships. As you get to know your data, you will be able to better identify where there are issues to tackle, and where root causes might lie for data quality issues.

One approach might be to simply resolve to spend an hour a week, every week, profiling some data. If you aren’t do that now, you will find that even just a bit of time set aside will give huge insight- sometimes we get too busy to do the basics, and we miss opportunities to make significant improvements with relatively little effort in our data.

Tagged as: ,


« | »

Leave a Response