Datamartist gives you data profiling and data transformation in one easy to use visual tool.

« | »

Data granularity- avoid going against the grain

In the world of data warehousing, the grain of a fact table defines the level of detail that is stored, and which dimensions are included make up this grain. Obviously, the higher the grain the better- although source systems and data volume/performance may intervene.

Using the example in the Wikipedia article on fact tables, a sales fact table holding sales transactions might have a grain of day, store and product. This means that with this grain in place, you can't analyze inter-day patterns, or which checkout was used, or which shelf the product was on. Makes sense, you have to stop somewhere.


But what if you want more detail? "Going against the grain" might be good for societal change and rebellious youth but accepting the grain in data is usually the right thing to do.

It is possible, if you are a rebel, and don't worry too much about accuracy, to "generate a finer grain" by "allocating" or "interpolating" data between points from multiple data sets.

The request might come something like; "I know we only collect data Y at region level, but can we allocate it down to stores to have more detail so we can put it in the cube?"

This is a slippery slope. It is always possible to allocate- but based on what? Sales? Shipping costs? Units sold? Employees? some mix of things which should correlate to data Y? Your sisters shoe size? The bottom line is you are making approximations and assumptions.

In the end, while sometimes it must be done, it is better to avoid going against the grain of the data. Spending effort on complex data fabrication processes will probably not drive real insight, and might even risk creating misleading information. Another potential issue is that if people know the data is "massaged" they will not treat it as credible, thus wasting your time as no-one is using your magic numbers.

If you don't have the right grain, and you need it, then try to go get it. Change the extraction from the source system, or if needed, increase the level of detail the data is captured at the source.

Real data is always best- trying to generate details you don't have is likely to lead you astray.

Tagged as:

Twitter

« | »

1 Comment

Trackbacks

  1. Tweets that mention Data granularity- avoid going against the grain | Datamartist.com -- Topsy.com

Leave a Response