- 44 Million row fact table
- 3 measures containing values between -99 and 1039.22
- 5 dimensions
- SQL Server 2008
- Zero aggregations
I used 6 datatypes in SQL Server tables. The SQL data types I employed were
- Money
- Float
- Decimal
- Bigint
- Integer
- Real
For the Money data type, I created two cubes, since Analysis Services defaulted the Money data type to Double. I created one cube with the default Double data type, and another with the cube data type as Currency.
The results show that the best performing (in terms of size) were
- Bigint and Integer (41% smaller than Double)
- Currency (25% smaller than Double)
- Single (3% smaller than Double)
- Double
Note, Sometimes the data type you choose is determined (or constrained) by the domain of the numbers you are aggregating. For example, if your numbers contain significant digits after the decimal point, then Bigint and Integer are inappropriate, unless you are happy to round them. Likewise, for imprecise numbers, Double gives you the most precision, which you might need. But just because the underlying SQL table uses a single or double byte precision data type, if the number is always an integer, you might as well tell Analysis Services to use a Bigint and save on space.
It shouldn't surprise you that the binary numbers (Bigint and Integer) are the smallest on disk, nor that they took the same space as each other. Analysis Services compresses all the cells before writing to disk, and the binary numbers will tend to have lots of leading zeroes, which compress very well. Similarly, but to a lessor extent, Currency (really a packed decimal) compresses next best.
Just in case you are wondering why bother fiddling with data types when the cube is only a gigabyte or two in size, and disk space is plentiful. The reason is that if you can save 40% in your cube size, processing and querying will be noticeably faster. With a smaller cube, your cache will warm faster, and if the entire cube doesn't fit in RAM, you will have a greator proportion of your cube in RAM with a smaller cube. Ie. your queries will run much faster.
Richard
for real-time online OLAP demonstrations go to http://RichardLees.com.au/Sites/Demonstrations