Datamartist gives you data profiling and data transformation in one easy to use visual tool.

Category archive for ‘Data Quality’ rss

  • Data Profiling and Data Completeness

    There are various steps in data analysis- for me the very first one is always "what have we got?".  You have a data set, and some broad requests or ideas about what you want to get out of it, but the first question is how good is the data?  In the end, the first thing [...]

  • Data modelling Hierarchies- how to make a dimension

    One of the most useful data model structures in a data mart is a Hierarchy (also called a Tree structure).  Tree structures let us take a large number of things and organise them in a way that makes sense.  More importantly, a tree structure lets us “drill down” into information.   Hierarchy Rules In a simple tree [...]

  • Duplicate Data and removing duplicate records

    Duplicate records, doubles, redundant data, duplicate rows; it doesn't matter what you call them, they are one of the biggest problems in any data analyst's life. There are lots of different types of data quality problems, but in this post I'll focus on Duplicates. I'll share some hints on how to find duplicate records and remove duplicate records, [...]

  • Fake Data

    Its amazing how hard it is to make fake data.  If you don't believe me you probably havn't tried it. By this I mean data that provides a reasonable test set for data analysis - not trying to fake data for some shady or illicit purpose. (I imagine, in some ways that might actually be [...]

  • Setting the stage: managing data issues

    Anyone who has done any data analysis with more than a few lines of data knows that some of the biggest time wasters are data quality issues. What is bad data?  Well, some of it is easy to see, some is downright impossible to find. Lets look at an easy example; a row of data where the country [...]