Thursday, November 9, 2017

Multidimensional and Tabular Compared

Microsoft's SSAS offers a choice of technology for OLAP cubes - the original Multidimensional, and the newer, Tabular. I compared them back in 2012 http://richardlees.blogspot.com.au/2012/05/sql-server-2012-tabular-versus.html. It is now safe to assume now that Tabular is the strategic technology. Not officially strategic, but with the dearth of Multidimensional enhancements since 2012 and Tabular recently becoming the default install, it is clear to see where Microsoft is heading.

It has been pleasing to see enhancements to Tabular in each SQL release, which has reduced the need for Multidimensional. However, there are still a few features that Tabular does not support, necessitating new developments with Multidimensional. The standout feature that Tabular does not support is composite primary keys. Ralph Kymball, needs to take some responsibility. His Star Schema modeling methodology employs single column surrogate keys for all dimension tables. Kymball has advanced data modeling a great deal, but I think it's wrong to blindly generate surrogate keys when natural primary keys exist. Kymball methodology requires fact and dimension tables to be physically generated from underlying normalized tables. This involves a lot of work, reconciliation and performance cost, when the underlying tables could be used directly, with their composite primary keys.

The really sad fact is SSAS documentation, comparing Multidimensional and Tabular, does not even list composite keys as a feature worth comparing. It's as if, no one would require a cube technology supporting composite primary keys. See https://docs.microsoft.com/en-us/sql/analysis-services/comparing-tabular-and-multidimensional-solutions-ssas




Since SQL 2012, I have chosen Tabular where possible, but it has been on rare occasions that Tabular could succeed. Mostly due to Tabular not supporting composite keys, and my unwillingness to generate redundant star schemas. Why would you generate copies of fact and dimension tables, just to build surrogate keys? Surrogate keys have no meaning. Business keys have meaning, and are typically part of the table already. Also, when you generate a redundant star schema, you need to continually append (or rewrite) to these tables as your DW data arrives. And, you should continue to reconcile these generated tables with the source tables. One might argue that the resultant star schema is more efficient than a star schema with composite keys, but that argument is marginal and rarely overcomes the increased cost and lag of building star schemas. And there can be significant performance loss in flattening (denormalising) the snowflake dimensions.

Even something simple like a Type 2 dimension already has a perfectly good natural key consisting of the source system's business key (eg ClientId) and EffectiveFromDate. So, why not just add EffectiveFromDate_Client to the fact table? A complete dimension and fact table rewrite is unnecessary!

I'm hoping that data warehouse developers see the sense of natural composite keys and that Tabular supports them in the near future. In the meantime, I will only be creating Tabular models on the rare occasions where composite keys are not part of the data structures.

If you would like to see Tabular support composite keys, please vote on this connect item. https://connect.microsoft.com/SQLServer/feedback/details/3144082

For a performance comparison of Tabular versus Multidimensional see https://richardlees.blogspot.com/2019/08/aws-redshift-performance-versus-sql.html

2 comments:

Christian Gräfe said...

Hi Richard,

your blog arbticle is very interesting and you pointed out an important drawback of the tabluar model.
Have you filed a connect item?

regards
Christian

Richard said...

Thank you Christian, Yes, a connect item is a good idea, although my experience hasn't been great with connect. I have one here https://connect.microsoft.com/SQLServer/feedback/details/760107/multidimensional-cube-is-not-retained-in-memory-even-though-there-is-plenty-of-ram for SSAS cache, which I consider the biggest performance limitation of Multidimensional. It has 140 votes.

Anyway, I like your suggestion, and I have created a connect item https://connect.microsoft.com/SQLServer/feedback/details/3144082
Please support this suggestion by voting on the connect item.

Thank you again,