Category archive for ‘Data Quality’
-
Data quality from a four year old
I think my four year old would make a good data quality dude. He explained to me recently, why its better to use stickers than crayons, "for the things people use a lot". "Dad, if you use crayons, you might draw it different, but stickers- they are all the same." he then pointed to the [...]
-
Data integration is like a pizza
I enjoy a slice of pizza as much as the next person (perhaps a bit more). The key to a good pizza is the raw materials- use the right stuff, and you'll be happy every time. What's great about pizza is that it has all sorts of great stuff on it, and presents them all [...]
-
Spreadsheet errors- Fear, uncertainty and doubt
I love the acronym FUD which stands for "Fear, uncertainty and doubt". What I don't love is the underhanded use of FUD to manipulate peoples behavior. Spreading FUD is not about creating something new, but destroying- destroying someones confidence in something, clouding the real issue, stopping a new or creative direction from being taken. FUD [...]
-
The tragedy of anti-data leadership and dataphobia
There has been a lot of discussion in the last year or so about how important data analysis is becoming. IBM made a major move into data analytics by establishing a new organisation "Business Analytics & Optimization Services" with 4000 people in it. There was the much quoted Hal Varian of Google who predicted that [...]
-
Data migration- Part 2 – Determining data quality is the first key step
Any discussion on data migration needs to include data quality as a core topic. Migrating data from one set of applications to another, particularly when the applications were never designed to interact, and share little or no common structure or definitions is a complex task. This task is made even more complex by the data [...]
-
Data quality at the burger joint
I have noticed that when I go to a fast food outlet no matter what I get to drink with my meal it is almost always listed as "Cola" on the receipt. But I didn't order Cola. Ever. Usually I get juice, or milk. So every time I order a burger, I'm clearly a source [...]
-
Self Serve Business Intelligence
Self serve business intelligence dreams of letting everyone whip up any report or analysis they want. The reality is that its often not the report that's the problem- it the underlying data and model. So the idea of self serve business intelligence is a wonderful idea- the problem is that its not all about pretty [...]
-
Excel auto formating is getting into your genes
We often give Excel our data, and trust it to do the right thing. There was a link posted on meta-filter today that sparked some lively discussion amongst the crowd. The Excel auto formating "feature" loves to scramble common genetic nomenclature. It turns out that in the genetics field, common codes get converted to incorrect [...]
-
Connecting the dimension table to the fact table- Vendor Example (Part 3)
In parts one and two of this series we introduced our challenge (to make a data mart to analyze the Acme Company's spending) and showed how the Datamartist tool could import millions of rows of data and then turn it into a fact table we can use in Excel. Now we need to create a [...]
-
Data Profiling and Data Completeness
There are various steps in data analysis- for me the very first one is always "what have we got?". You have a data set, and some broad requests or ideas about what you want to get out of it, but the first question is how good is the data? In the end, the first thing [...]


