Degenerate Dimensions in Datamarts
Not all dimensions are created equal. A typical dimension is defined by a table that holds the reference data that is being joined to the fact data. So in the fact table, for example, we have the product ID, or the product code, and in the product dimension table we have a single row for each product, that lists all the attributes of that product (its size, its color, its category, its segment, etc. etc.)
So it would follow, then, that there must be a dimension table for every dimension, right? Well, not if the dimension is degenerate. In fact, you could argue that calling it a dimension at all is pushing it, but I think the idea was to keep things tidy.
In any well structured data mart (a star schema), every column in the fact table should be either a measure or a dimension. If it's a measure, then it's storing a value for that particular fact- usually a number, and we use it for calculations and aggregations. If it's a dimension, then we join it to the appropriate dimension table and thereby look up all the interesting things about that fact on that dimension.
Where degenerate dimensions come in is that there are often some columns that we want to have, but that are not measures, and don't have a table of stuff we want to join to. Example: a purchase order number. These columns store something that we want to have (the purchase order number), but to create an empty dimension table would only slow things down. So, to ensure we don't feel bad about breaking the "only a measure or a dimension in the fact table" rule, we just CALL them dimensions- even without the table.
In the fact itself, any attribute of the purchase order that was of interest, and that therefore had values that would each have more attributes we would be interested in would have been turned into a dimension, and a dimension table would have been created.
But to create a dimension table that contains a row for every purchase order would create a very large dimension with nothing in it (since there are lots of purchase orders, possibly as many as there are facts if the grain your fact table is one per purchase order). But our users would not be happy if they could not get a list of the purchase orders included in a given total, or be able to drill down to that bottom level of detail that we've gone to all the trouble to include.
So, when we create transactional level fact tables, it is normal, in fact, necessary to include some degenerate dimensions- include columns that have useful information (very often referencing back to the source system) but that do not join to any dimension table. Plus you can just impress everyone with your dimensional modelling knowledge when you say "degenerate dimension".
Since we are very close to closing out 2008 and starting the new year, I'll share with you one of my new year's resolutions (there are many)- I'm going to start a data mart data modelling 101 series of blog posts in January, in which I will go through a complete data mart example. My intention is to both explain the data model concepts, and illustrate how they are executed using datamartist. And I think I'll run with the purchase order example, because given the economic situation we're going to have in 2009, identifying unnecessary spending, and finding ways to cut costs is one of the most important uses of a data mart- and one with potentially a huge payback.
Update: I've posted more recently on junk or mystery dimensions which might be of interest too.
Download the free, no risk Datamartist trial now and try it out on your own data. You'll be amazed whats possible. No registration required, and the install takes just minutes.