Author Archives 
-
The hitchhikers guide to Data Quality
The hitchhikers guide to data quality has this to say about enterprise data; "Enterprise Data is big. Really big. I mean, you might think that the spreadsheet you saw accounting using last Thursday was huge, but thats just nothing compared to enterprise data, just listen..." After a while, the style settles down a bit and [...]
-
Using regular expressions to check data quality Part 2
Regular expressions are a powerful way to test if strings match a given pattern or rule set. They can be used to validate the structure of a string field in data, highlighting any obviously incorrect string values. Note: In a previous post, I introduced regular expressions and went through a simple example using Canadian Postal [...]
-
An introduction to using regular expressions for data quality validation
Regular expressions (sometimes referred to as regex or regexp) are a powerful formal language that can be used to match text strings to patterns. They way regular expressions work is like this: A pattern is defined. This is a string of symbols that act as a set of rules. A text string to test, and [...]
-
Data profiling rules and data format strings
A very useful technique in data profiling is data format analysis. Rather than looking at the actual individual values for a given column, by profiling the structure of the values you can understand at a higher level the quality of data. This technique is primarily used for string based data. Can't find the forest because [...]
-
Bill’s Epic data project Fail- a cautionary tale
In which Bill decides that data profiling is not necessary. We sit down to keyboard to tell a sad tale Of data project manager Bill, and his epic huge fail Bill thought he had all that it took He looked through the data model, "It reads like a book" He happily flipped through tables and [...]
-
When should you data profile? Morning, Noon and Night!
Data profiling is an important part of any data related project. The question often arises when the best time to data profile is. As you would expect from a software company that sells a really cool visual data profiling tool, our view is "all the time". Using data profiling tools before the project Data profiling [...]
-
What is data profiling? Data in the real world.
First of all, full disclosure, if you haven't already noticed, this blog is written by a software company that makes a pretty cool data profiling tool. We've just released our new version V1.3.0, so obviously we can't pretend to be completely objective in the "should you data profile debate". But bear with me, because I [...]
-
Data profiling enhanced- Datamartist V1.3 Released
We are very pleased to announce that Datamartist V1.3.0 is now available, and want to thank all our Beta testers. This release represents a major step forward particularly in the data profiling tools area for the professional edition. Datamartist gives you a highly visual, drag and drop, drill down, free form data profiling and transformation [...]
-
Data granularity- avoid going against the grain
In the world of data warehousing, the grain of a fact table defines the level of detail that is stored, and which dimensions are included make up this grain. Obviously, the higher the grain the better- although source systems and data volume/performance may intervene. Using the example in the Wikipedia article on fact tables, a [...]
-
Too much data storage hurts data quality- the toothpaste effect
When I brush my teeth there is a wide range in terms of amount of toothpaste that is acceptable to me. This is not a profound statement- bear with me. Only as the tube of toothpaste starts getting near to its end do I start conserving toothpaste because I know I need to make it [...]


