Tag archive for ‘Duplicate Data’

  • Data quality from a four year old

    I think my four year old would make a good data quality dude. He explained to me recently, why its better to use stickers than crayons, "for the things people use a lot". "Dad, if you use crayons, you might draw it different, but stickers- they are all the same." he then pointed to the [...]

  • Connecting the dimension table to the fact table- Vendor Example (Part 3)

    In parts one and two of this series we introduced our challenge (to make a data mart to analyze the Acme Company's spending) and showed how the Datamartist tool could import millions of rows of data and then turn it into a fact table we can use in Excel. Now we need to create a [...]

  • Duplicate Data and removing duplicate records

    Duplicate records, doubles, redundant data, duplicate rows; it doesn't matter what you call them, they are one of the biggest problems in any data analyst's life. There are lots of different types of data quality problems, but in this post I'll focus on Duplicates. I'll share some hints on how to find duplicate records and remove duplicate records, [...]