Datamartist gives you data profiling and data transformation in one easy to use visual tool.

« | »

Data modelling Hierarchies- how to make a dimension

One of the most useful data model structures in a data mart is a Hierarchy (also called a Tree structure).  Tree structures let us take a large number of things and organise them in a way that makes sense.  More importantly, a tree structure lets us “drill down” into information.  

Hierarchy Rules

In a simple tree structure, every object has one and only one parent, or it is at the top level of the tree.
For each level of the tree, all the objects are the same type.

All fine in theory, but what do the actual table structures look like? 

Parent Child Relationships

The most efficient way to store a tree structure of objects is in a parent child type structure. 

Parent Child Structure

Parent Child Structure

For every object you store one row recording the parent of the object.  This means that every relationship in the tree is stored only once. 

This is the best form to store the “master copy” of the tree- because there is no ambiguity- one row, one object, one parent.  Rule number one is enforced strictly by the physical model in this case- and that’s a good thing. 

The downside of this structure is that it requires looking at multiple rows to summarise data.  And its just not easy to read.
To find out which country a city is in, we have to first look up the parent (the state province), then we have to look up the parent of that to find the country.  If a hierarchy has 10 levels, we have to look at ten rows for every row that we want to summarise to the top level.  Not so good.

Dimensional Tables

In a dimensional table, we store one row for each object at the bottom of the hierarchy.  In that row, we store its parent, its grand parent, its great grand parent, its great great grand parent etc. etc.   Here’s what that table looks like for our example:

This way, we have everything right there and it makes it easy to summarise.  To find the totals for a country just add up every row with a given value in the country field.  The advantages of this form are clear when it comes time to do the analysis- but what are the disadvantages?  Well, if you want to change a parent child relationship between level 1 and 2, then you have to change lots of rows- the relationship between a country and a state/province is repeated many times .

Depending on where the data is, and what applications have access to read and write it’s also possible to have inconsistencies-  you could have some rows that say Michigan is in the USA, and others that put it in Canada.

The ideal solution is to store the master copy of the tree as a Parent Child relationship, and generate the Dimensional table automatically so that when the analysis is run, it’s fast and easy, and users can view it in spreadsheet tools in an easy to read format, knowing  that it is guaranteed to be consistent.
This is what is done by the Datamartist tool– but rather than worrying about data models and table structures,  managing tree structures is done with drag and drop.

Then dimensional tables are generated that are in the “everything in one row” format that is so easy to use in excel, either through an auto-filter, or with pivot tables. 

Find out more about Datamartist– and download a free trial version.

Tagged as: , ,

Twitter

« | »

1 Comment

  1. It’s been over 15 several hours since EA announced the newest Road Rash was birth. Finally some more rumors hit the gaming news pages. Hopefully they’re working on it as we speak. A PS3 version could be great. Hopefully they can do better than a mediocre Speed Kings-type game all of us saw for PS2.