Wednesday, August 10, 2011

Improving MDX join performance

Analysis Services will tend to instantiate sets created from joins. Some sets can be extremely large and will actually fail with insufficient resources.
Server: The operation has been cancelled.


The solution is to write the MDX query in such a way that the intermediate sets are smaller. Here is an example of a problematic query that is asking for the top 10 hour/City/Date combinations. The output is simply 10 rows, but AS must create a set with hundreds of Millions of members to get the top 10.

select
[Measures].[Bytes Total] on Columns,
topcount(
[Client Host].[Client Geography].[City]
*[HoursOfDay].[HoursOfDay].[Hour Of Day]
*[Date].[Year Month Day].[Day]
,10,[Measures].[Bytes Total])
on Rows
from EasternMining


Here is a way to improve performance using the generate() function, another one of my favourite MDX functions.

select
[Measures].[Bytes Total] on Columns,
topcount(
generate([Client Host].[Client Geography].[City]
,{[Client Host].[Client Geography].currentmember}
*[HoursOfDay].[HoursOfDay].[Hour Of Day]
*[Date].[Year Month Day].[Day])
,10,[Measures].[Bytes Total])
on Rows
from EasternMining


The generate statement is breaking the query into a couple of steps, so that the largest set will probably be the Days*Hours, from which it is only keeping the top 10 for each city. The outside set, for every city, will only have 10 rows per city. The query is logically equivalent to the first query, meaning it will return the same results.

We could take this logic further with a second generate statement so the inside query only has 1o rows per city day combination.

select
[Measures].[Bytes Total] on Columns,
topcount(
generate([Client Host].[Client Geography].[City]
,topcount(
generate([HoursOfDay].[HoursOfDay].[Hour Of Day]
,topcount({[HoursOfDay].[HoursOfDay].currentmember}
*{[Client Host].[Client Geography].
currentmember}
*[Date].[Year Month Day].[Day]
,10,[Measures].[Bytes Total]))
,10,[Measures].[Bytes Total]))
,10,[Measures].[Bytes Total])
on Rows
from EasternMining

We can improve this query further by requesting nonempty() sets

select
[Measures].[Bytes Total] on Columns,
topcount(
generate(nonempty([Client Host].[Client Geography].[City],[Measures].[Bytes Total])
,topcount(
generate(nonempty([HoursOfDay].[HoursOfDay].[Hour Of Day],[Measures].[Bytes Total])
,topcount({[HoursOfDay].[HoursOfDay].currentmember}
*{[Client Host].[Client Geography].currentmember}
*nonempty([Date].[Year Month Day].[Day],[Measures].[Bytes Total])
,10,[Measures].[Bytes Total]))
,10,[Measures].[Bytes Total]))
,10,[Measures].[Bytes Total])
on Rows
from EasternMining

The NonEmpty() function is a very simple way to reduce the set size, and of course it can be done without using the generate() function. So going back to the original query, we might have found that the following change was sufficient to avoid resource errors.


select
[Measures].[Bytes Total] on Columns,
topcount(
nonempty(
[Client Host].[Client Geography].[City]
*[HoursOfDay].[HoursOfDay].[Hour Of Day]
*[Date].[Year Month Day].[Day]
,[Measures].[Bytes Total])
,10,[Measures].[Bytes Total])
on Rows
from EasternMining


However, I wanted to show the generate() function first, as it can improve the query in a way that the nonempty() cannot.

I should add that generate() and nonempty() functions are not panaceas for MDX query performance. The more you know MDX and the way SSAS executes the query, the more you will have in your toolkit to improve query performance.

No comments: