Wednesday, May 4, 2011

How many cubes

I am often surprised how many cubes some BI developers create. It would appear that each time they have a requirement for a piece of information, they will create a new cube.  Sometimes this is amplified because they are using denormalised summary tables, and creating new ones when they need a bit more dimensionality or a new measure, leading to a plethora of overlapping data.

Since SQL Server 2005, it is possible to put dozens of fact tables (cubes in the old days) into one cube with separate measure groups for each fact table. This is a great feature. For example, Sales and Budget have slightly different dimensionality as Sales is by cashier and hour of day, whereas, budget does not have cashier and time granularity is only available down to week.  Having these in the same cube is great, since it makes querying of sales against budget (by all the common dimensions) very easy. Also, if a user is only interested in one measure group, say sales, they can ask the browser that measure group, and they will on see the associated measures and dimensions.  In the same way they might browse a separate cube.

You might ask, are there any performance implications of having one super cube? 
  • In terms of cube size and processing overhead, the design alternatives are comparable. 
  • MDX queries can be broken down to FE (formula engine) and SE (storage engine) components.   The SE component of a query isn't materially affected by unrelated dimensions, since the storage is comparable.  However, the FE component of a query is affected by the unrelated dimensions as it is works within the entire cube space (larger with more dimensions). 
There is a LookupCube function in MDX that allows you to get the value from another cube.  This might be useful in a design that exploits multiple cubes.  But, don't see this as a panacea for multiple cubes.  It has its own performance overhead and is not as functional as having the measures in one cube.

There might also be operational considerations.  The design alternative I have discussed is having multiple cubes in the same database, but if the cubes really are unrelated, you might consider having separate SSAS databases for them.  This would lead to more operational options including, parallel processing, and even hosting databases on separate servers.

So, as a cube designer, you must weigh up the performance, functional and maintenance costs.  As an over simplified summary, on the side of mini cubes is the minimisation of FE performance costs, while on side of super cubes are increased user functionality.

For another discussion on this topic see Chris Webb's blog

No comments: