Saturday, January 30, 2010

Aggregations Won't Fix Cube Performance

There appears to be a temptation for SQL sales people to demonstrate the aggregation wizard with the impression that it is the tool to solve all performance issues. Unfortunately, it isn't, and I need to dispel that myth.

The aggregation design wizard is not a performance panacea, but it can help. It's like tire pressure in a racing car. The optimal pressure will help performance, but it won't turn a slow car into a fast car.

Let's think about what the aggregation wizard does. The base cube cells only contain values at the intersection of all dimension keys. When a query asks for data at a higher level, AS must derive this data from the base cells by aggregating them. The aggregation wizard determines which of these aggregations to calculate at cube processing time and store on disk. So designing aggregations will impact your cube processing time. The more aggregations you have the longer the cube processing time, and the larger the cube on disk. At query time, if AS does not have the pre-aggregated cell on disk, it will get the lower level cells (these might be aggregations too) and calculate the aggregated cell. This new aggregation will reside in AS memory like any other aggregation or base cell. So there's the rub, even if you don't have any aggregations, AS will create them at query time (or cache warming) and try and keep them in memory for subsequent queries. So, if AS is not memory constrained, aggregations will only help the queries while the cache is cold. In essence, a cube warming query (CREATE CACHE) will have a similar effect as aggregation designs for a non-memory constrained cube.

Cubes that benefit the most from aggregation design are ones that are updated frequently (for example, every minute or so during busy times) and are larger than the memory available to the AS data cache. Note, when a cube partition is processed, all the cells and aggregations in memory are dropped.

Another important note; by default, the aggregation wizard will only create aggregations on key attributes and attributes in natural hierarchies. For example, if you have a Date dimension with date as the key, the aggregation wizard by default, will only build aggregations on "date" and natural hierarchies. If these other attributes are commonly used and you would like the aggregation wizard to consider them, you can ask for it in the Aggregation Usage window (or properties in the cube design). The "Default" usage for a key is "Unrestricted", while the "Default" usage for a non-key property is "None". However, don't see this as a silver bullet, by enabling all properties as "Unrestricted"! That will only result in very thin aggregation design. Also, aggregations on keys, tend to be the most important ones, so the default options suit most cubes.

When using "Distinct Count" measure in a cube you need to be even more careful with aggregations, since each aggregation (and base cell) will include every distinct value of the Distinct Count. This can make aggregations very large and costly to build/query.

For very small non volatile cubes such as the New Zealand census cube http://EasternMining.com.au/sites/Demonstrations/Pages/NZCensus_Excel.aspx there is no benefit from aggregations, which is why that cube has no aggregations.

For really good OLAP cube performance you should ensure that the design is optimal, and use aggregation design (or usage based aggregation design) as a final operational tuning exercise.

3 comments:

chris said...

You're not quite right about the AggregationUsage property and non-key attributes: all attributes that are included in natural user hierarchies are treated as Unrestricted as well.

Richard Lees said...

Good point Chris. Attributes in natural hierarchies will also be considered by the aggregation wizard by default.

Sam Kane said...

Here are this and some other articles on SSAS Aggregations: http://ssas-wiki.com/w/Articles#Aggregations