Datamartist gives you data profiling and data transformation in one easy to use visual tool.

« | »

Data profiling- a search or a code to crack?

Often, tracking down data quality issues is presented as a search for bad data- but sometimes the data isn't so much bad, as not understood. In legacy systems, you might be more trying to first find the meaning of data- in effect, decoding it as if it had been encrypted (which in a way, time and lack of documentation might very well have done).

You know that all that data means something- but what?

One of my favorite code-busting stories is the epic victory over the Enigma code during the second world war. One of the reasons its of interest is that it was one of the early applications of computing- but the key lesson I think is from not the brute force computation done, but the strategies used to crack the code.

When you are trying to crack a code, one of the key things you need are "Cribs"- some way have samples of coded message and clear text. These cribs can radically reduce the number of possible ways a code can be decoded.

In the case of enigma, the allies would listen for German U-boat radio transmissions, while also using direction finding equipment to estimate their location. Standard procedure was for a U-Boat to first radio a weather report.

By painstakingly back tracking known weather conditions and locations of U-Boats when they transmitted it was possible to take advantage of that first weather report- there were only so many ways to say "Sunny and calm". Having this crib gave them a way to break into the code.

What is the point in terms of Data profiling? While it's critical to have the right tools to analyse the data (a data profiler like Datamartist, for example), its also important to get out there and talk to people, understand whats going on- collect some Cribs that will help it all make sense.

Tagged as:

Twitter

« | »

Leave a Response