Tag archive for ‘Data profiling’
-
A new years resolution to data profile
Well, it's the time of making and breaking resolutions, a time when setting realistic goals is sometimes hard to do with all the optimism of the new year. Sometimes, we make decisions NOT to set a goal, because we don't want to break it. You might be thinking you really should step up your data [...]
-
Data profiling- a search or a code to crack?
Often, tracking down data quality issues is presented as a search for bad data- but sometimes the data isn't so much bad, as not understood. In legacy systems, you might be more trying to first find the meaning of data- in effect, decoding it as if it had been encrypted (which in a way, time [...]
-
An introduction to using regular expressions for data quality validation
Regular expressions (sometimes referred to as regex or regexp) are a powerful formal language that can be used to match text strings to patterns. They way regular expressions work is like this: A pattern is defined. This is a string of symbols that act as a set of rules. A text string to test, and [...]
-
Data profiling rules and data format strings
A very useful technique in data profiling is data format analysis. Rather than looking at the actual individual values for a given column, by profiling the structure of the values you can understand at a higher level the quality of data. This technique is primarily used for string based data. Can't find the forest because [...]
-
When should you data profile? Morning, Noon and Night!
Data profiling is an important part of any data related project. The question often arises when the best time to data profile is. As you would expect from a software company that sells a really cool visual data profiling tool, our view is "all the time". Using data profiling tools before the project Data profiling [...]
-
What is data profiling? Data in the real world.
First of all, full disclosure, if you haven't already noticed, this blog is written by a software company that makes a pretty cool data profiling tool. We've just released our new version V1.3.0, so obviously we can't pretend to be completely objective in the "should you data profile debate". But bear with me, because I [...]
-
Datamartist V1.3.0 Value Distribution data profiling
This video gives a quick (under two minute) look at the Datamartist data profiler's ability to explore the distribution of numeric values in a data set by counting the number of values that fall into a series of equal size buckets. It highlights the datamartists calculation, visualization, selection and drill down features using a simple [...]
-
Why you should data profile.
Imagine that you have bought a new home, and you've decided to do some landscaping. So you pick three landscapers, draw a rough sketch of what you want, and ask them to bid on the job. But you don`t allow them to come see your property, and your sketch doesn't specify anything about the existing [...]
-
Automated data profiling and reporting- Data quality behavioral modification?
Recently, Jim Harris of Obsessive compulsive data quality speculated as to if the concept of the "Swear jar" could be used to improve data quality. It was an interesting post, and the discussion in the comments underlined the reality of data quality- much of the time, the problem is not about changing bits in a [...]
-
Data Profiling and Data Completeness
There are various steps in data analysis- for me the very first one is always "what have we got?". You have a data set, and some broad requests or ideas about what you want to get out of it, but the first question is how good is the data? In the end, the first thing [...]



