Richard Lees on BI: SQL Server Analysis Services Partitions

Like relational partitions, SSAS partitions can be very useful in easing operations and, sometimes, improving performance. Like any performance tuning, you really need to know the ramifications of the feature to get the best out of it.

Firstly, partitioning is an Enterprise version, so you will need to have SQL Server EE to make use of it. Secondly, performance doesn't just improve by virtue of having partitions. It depends on what is in the partitions, how many there are and how they are processed/queried. SSAS knows the low and high dimension key attributes for each partition. So if you have partitioned on date, and your query is looking for a particular date value (or range) SSAS will only query the partition (or partitions) relevant for your query. Will your query run faster with partitioning in this instance? Maybe, but not necessarily. I would generally not anticipate an improvement in this scenario, unless there is some caching effect. For example, if the current month (partition) is the "hottest" it will tend to stay in the data cache at the expense of the less queried months. Note, this tends to happen in a large cube without partitions.

On the other hand, if your query does not include a filter that aligns to the partitioning strategy, SSAS will break your query into multiple sub queries, with at least one sub-query in each partition. This can allow the query to be executed in a parallel fashion, which might, but, again, not necessarily, improve performance. If the parallel queries are essentially breaking the workload (as opposed to all performing the same workload for different slices) then there is likely to be a significant performance gain.

Just like a relational database, if there is only one query active at a time, parallelisation is more likely to achieve a performance gain than if there are many concurrent queries. Many concurrent parallised queries can easily saturate a server and degrade overall performance.

One of my favourite strategies for partitioning is not the classic one (which is to partition by date, say one partition per month/year) and it comes with a very high probability of performing well. It is to partition into static and active partitions, with a large static partition(s) and a very small, but volatile, active partition. If you want to keep your cube very close to real-time, evey time you incrementally update a partition the partition is washed out of the data cache. It has to be washed out, since it is now out of date. Unfortunately, if your cube is very large, the next few queries (including cache warmer) will be very slow, while they read the cube (and aggregations) off disk and into memory. So a good potential strategy to to have all of the historical data in one partition and a second, very small, partition, with current data that is getting incrementally updated every minute or so. That way, each minute, the large static partition will remain in memory. Only the small dynamic partition will get washed out. This is good, we are only washing out the small amount of data that we are changing. This strategy requires an addition operational process, where the dynamic data is merged into the static partition. There are a number of ways to do this, including merging partitions, or reprocessing the partitions during the quiet part of the day. Also, this strategy does not negate the possibility of having a classic partitioning strategy also. I.e your large partition may actually be a number of medium sized partitions, but the volatile data is isolated into one small partition.

My second favourite reason for partitioning is, like relational partitions, it makes some operational procedures easier. For example, with a classic partitioning strategy, to drop a month or year's worth of data, simply involves dropping one or more partitions. No reprocessing require.

My suggestion to you, if you are considering partitioning (and you should for large cubes) is to create a few test cubes and benchmark the performance with various partitioning strategies. Monitor your IO, memory and cpu utilisation. Ensure that your tests involve the caching (or washing out of cache) effect, and ensure that you are using data volumes (and hardware) that reflect what will be used in production.

Richard Lees on BI

Tuesday, August 25, 2009

SQL Server Analysis Services Partitions

1 comment:

About Me

Followers