Category archive for ‘Data Quality’
-
A new years resolution to data profile
Well, it's the time of making and breaking resolutions, a time when setting realistic goals is sometimes hard to do with all the optimism of the new year. Sometimes, we make decisions NOT to set a goal, because we don't want to break it. You might be thinking you really should step up your data [...]
-
Data Quality Rules
What's the difference between good data and bad data? It is much like the difference between good children and bad children- the bad data doesn't follow the rules. But what are the rules? Unlike the rules for kids, which have been fixed in stone for decades (or at least, parents wish it were so), the [...]
-
Data quality sizzle
I'm an engineer. Being an engineer, I'm pretty product focused, pretty technology focused, and pretty "does it work or not" focused. Having technical things like tools work is useful, and good. But just because you build it, does not mean they will come. The challenge often in Data Quality is that often what has to [...]
-
Data profiling- a search or a code to crack?
Often, tracking down data quality issues is presented as a search for bad data- but sometimes the data isn't so much bad, as not understood. In legacy systems, you might be more trying to first find the meaning of data- in effect, decoding it as if it had been encrypted (which in a way, time [...]
-
Good Data is a force for good.
The United Nations has declared that today is the first world statistics day, "celebrating the many contributions and achievements of official statistics". It's the kind of holiday that those of us in the data wrangling profession can really get behind. Data about people in general, and their well being, their needs and challenges is a [...]
-
Data quality challenges: behavioral inertia and its evil opposite
Often, I hear someone say something like "this would be much easier if users would just..." or "If only we could convince the sales people that...". Technology folks often are frustrated by the people component of the complex systems they are trying to install. People are not a problem solved by technology Some try to [...]
-
Using regular expressions to check data quality Part 2
Regular expressions are a powerful way to test if strings match a given pattern or rule set. They can be used to validate the structure of a string field in data, highlighting any obviously incorrect string values. Note: In a previous post, I introduced regular expressions and went through a simple example using Canadian Postal [...]
-
An introduction to using regular expressions for data quality validation
Regular expressions (sometimes referred to as regex or regexp) are a powerful formal language that can be used to match text strings to patterns. They way regular expressions work is like this: A pattern is defined. This is a string of symbols that act as a set of rules. A text string to test, and [...]
-
When should you data profile? Morning, Noon and Night!
Data profiling is an important part of any data related project. The question often arises when the best time to data profile is. As you would expect from a software company that sells a really cool visual data profiling tool, our view is "all the time". Using data profiling tools before the project Data profiling [...]
-
What is data profiling? Data in the real world.
First of all, full disclosure, if you haven't already noticed, this blog is written by a software company that makes a pretty cool data profiling tool. We've just released our new version V1.3.0, so obviously we can't pretend to be completely objective in the "should you data profile debate". But bear with me, because I [...]


