Data Warehouse vs Data Mart
Very often, the question is asked- what's the difference between a data mart and a data warehouse- which of them do I need?
Data warehouse or Data Mart?
- Holds multiple subject areas
- Holds very detailed information
- Works to integrate all data sources
- Does not necessarily use a dimensional model but feeds dimensional models.
- Often holds only one subject area- for example, Finance, or Sales
- May hold more summarised data (although many hold full detail)
- Concentrates on integrating information from a given subject area or set of source systems
- Is built focused on a dimensional model using a star schema.
More Info about Data mart models
More Detail regarding Data Warehouse Vs Datamart: and Inmon vs Kimball
As the concept of decisional systems, and data warehouses and data marts evolved, two major points of view came into existence. There are two giants in this field. Bill Inmon, and Ralph Kimball.
There are some that argue the best approach is to start with data marts, department by department, then merge them together to form a data warehouse- this is more in line with Kimballs approach.
Now, Bill Inmon is an advocate of the Data warehouse. Here's one of his articles, which contains the following quote that makes it clear what he thinks about the idea:
"You can catch all the minnows in the ocean and stack them together and they still do not make a whale," Bill Inmon, January 8, 1998.
Ralph Kimball, on the other hand, advocates what he calls a bus architecture data warehouse. His methodology specifies conformed dimensions, where multiple fact tables share common dimensional tables. For me, each of these fact tables represents a data mart. The row of dimensional tables that all the fact tables plug into is the bus, and because, for example, the finance and the sales data marts both use the same product dimension table there is integration between departments.
The more extreme data mart strategy is that of the completely stand alone data mart, the concept being that its fast, easy, cheap, and delivers value immediately. I'm a supporter of this at the desktop level- thats the point of the Datamartist tool afterall. But I don't buy this for server based architectures- what is really fast, easy and cheap when you have to buy servers, create a project and form a commitee? In my mind if you've decided you need a central server solution, some level of integration is needed, and don't pretend its going to be magic.
The interesting thing about these approaches, is that the harder you work on really conforming your dimensions, the more your data marts look like the data warehouse that Inmon advocates. (Data modellers in the know will be jumping up and down right now shouting NO they don't- but this is a high level conversation...) But the reality is, even in a data warehouse, issues will arise that require compromise- things that just don't map or conform, and budget, schedule and business reality will mean that nothing is ever perfect, and in the end the world is full of data warehouses that are less conformed than some data mart clusters. Its just not simple.
Generally, it is probably true that data warehouses provide a solution that is closer to the "single version of the truth", but they do take a HUGE amount of effort, and an ability to coordinate across the entire organisation. If you have not already built at least half a dozen data marts, don't think you can estimate how much effort a data warehouse will take. You can't. And bring your cheque book.
Whereas data marts might deliver some value early, if built without sufficient effort on cross functional mapping and data cleanup they are just more silo systems and have their own set of costs and issues. Don't measure payback on datamarts in years- nothing is the same in a few years, you'll be back to the drawing board shortly.
It's a real dilemma. So which one? Data warehouse? Data mart? In my view, the right answer is "it depends" and "yes". However, never launch a data warehouse project as your first shot. A successful strategy will balance the fast, pain point addressing solutions, with a medium and long term plan, and investment in infrastructure and competencies to build the technology, processes and culture that a company needs to manage information. Depending on your industry and how sucessful you are, a massive data warehouse might be in your future. But sorry, no magic bullet.
Build a multi-level data strategy
- Level 1- Get the data to the people
- Level 2- Build Departmental Data Marts
- Level 3- Plan long term infrastructure and architecture
Don't do these things in order- this isn't step 1, 2, 3- actively work on all three levels at once and ensure the plans at each level are coordinated.
Data to the People
People are building spreadsheets, and spending money on data base development now- you know they are. Give them better tools, help them better use the spreadsheets, and formalize the way they do. The do-it-yourself exists- but it doesn't have to be completely informal.
The Datamartist tool is adding another capability that can accelerate the process- letting you move more quickly, proto-typing and analysing to determine which areas are ready for additional analysis capability.
In some cases Datamartist itself might simply be the best choice for certain types of analysis, cutting costs dramatically. In addition, if your end users are building their own data marts, when it comes time to build server based data marts they know the concept, they understand the structure, and can even provide concrete examples of the dimensions they need.
But whatever you do- don't make the mistake of thinking this is all you need. Work on all levels at once.
Build Departmental Data marts
If a whole department is flying blind, and big money is on the table, then don't launch a three year data warehouse project- create some departmental data marts. These projects should be designed to be 3-6 months long, and be sold to management honestly and clearly as being for short term gains, and as part of a broader discovery process. The hardware and software licenses will be reusable- but be clear that the data marts will have a limited lifespan- they always do.
And trust me, when you build these data marts you will discover all sorts of things about your data, your organisation, and your definitions and business processes. You will discover that the sales organisation needs to analyse product segments in a way that is fundamentally inconsistent with how finance has been reporting it for years- and neither group is willing to budge (and they may both be right). You will discover that 80% of your sales orders have errors on them or are incomplete. These discoveries will help you build the next data mart, and assess if a data warehouse is possible. They should also send you back to your transactional system and business processes to work to clean up the problems.
Build the infrastructure and deal with the foundation
There are lots of pitfalls in creating a decisional architecture- this short list of from Gartner resonates with me- I've seen and battled these issues on project after project.
Set standards in terms of tools, project management etc. Buy infrastructure for multiple projects, not on a project by project basis- don't have multiple servers when one with multiple data marts is radically cheaper.
And probably most important, I honestly think that you can build anything you want, any way you want, but it will not succeed if you don't have your definitions, both data and information, under control. (See Gartner issue #8) In the end, it's not what language you speak, it's if you have a dictionary or not, and if everyone is using the same dictionary across your organisation.
You should have common reference data sets that are used by all levels. Datamartist can load in and use reference data that is coordinated with departmental data marts and the eventual warehouse. Make these data sets available to everyone- you'll be amazed that if they are easy to get and use, people will put them in their spreadsheets, and things might actually start matching up.