Automated data profiling and reporting- Data quality behavioral modification?
Recently, Jim Harris of Obsessive compulsive data quality speculated as to if the concept of the "Swear jar" could be used to improve data quality. It was an interesting post, and the discussion in the comments underlined the reality of data quality- much of the time, the problem is not about changing bits in a database, but of flipping neurons in the brains of the people putting the bad data in there. And that's hard.
It's hard because although methods like data profiling can identify data quality problems, exactly who is to "blame" and how to manage it is difficult.
In thinking about this a bit more, I realised that the discussion was all about sticks- and not much about carrots. We discussed different ideas about how to proportion cost out, (which makes sense as the swear jar is about putting money IN, or punishing the offender).
What about working it the other way around? By using automated data profiling, and making the metrics now time based and available we could track data quality in key data sets, both in absolute terms (numbers of rows with issues), and relative ones (percentage of customer records with problems, etc.). This would allow you to have data quality dashboards.
It would then be possible to establish a bonus plan based on data quality. It could pay out for improvements (as a percentage), or have certain reoccurring payments that would decrease and then stop if data quality fell below a target level.
While it is still necessary to identify who is responsible for which data set's data quality, and as with all reward schemes the people responsible must also have the means to improve data quality and therefore reap the reward, I think in a number of situations this would be possible- for example, data entry clerks would undoubtedly double check each address more carefully if they knew there was a tangible reward to do so.
It seems that two key things are necessary to make this kind of bonus plan work- first, the ability to automate data profiling, and have meaningful metrics that can't be "gamed" (because if there is a way, people will find it), and secondly to be able to identify the savings due to improvements in data quality- because that's whats funding the bonus pool, after all.
Have any readers implemented or heard of such a plan? Do you track data quality using an automated data profiling tool and data quality dashboards?