- What is the total value of sales for transactions that include products x (and y, and z...)?
- What is the average transaction value for transactions that include products x (and y, and z...)?
- What proportion of transactions include products x (or y or z)? And what proportion of transactions that include products a (and b, and c...) also include products x (and y, and z...)?
- How did the attach rate of product x change during the sales promotion series? And was there a lasting effect.
- Any analysis that requires mix of products within transactions.

- Rather than creating a dimension on transaction id, create a dimension on basket id. A basket is defined by the unique combination of items. So all the transactions that include items m, n, o and p, would be considered the same basket. Essentially, the basket id can be used as a dimension like the transaction id, but the basket it dimension is a fraction of the size. This makes the solution much more scalable and faster. A basket id dimension table can be created quite simply by creating a bit string with one bit representing each item. For example an 8000 byte string will support 64000 items. There is a little bit of math to create the bit string. but essentially each ItemID can create an exponential of 2 to get its position in the string. ie

POWER(convert(bigint,2),ItemID). Of course, a bigint can only support 2 to the power of 64, so you will need to do this by a series of 8 byte strings. This would be very very tedious SQL to write, but I found that it is quite easy to write a little SQL to generate the full SQL. Also note, there is no need to keep the full 8000 bytes, it can be a varbinary to the right most byte with an item. See below. - The second major suggestion I have, is to create a dimension using an identity column from the basket id table (there is no need to have the varbinary(8000) column in the cube, and write calculated measures for the basket analysis. ie. I would resist using the distinct count physical measures in the cube. You really need a Basket dimension, so that you can identify specific combinations of items.

The result is a relatively small cube, with a relatively large basket id dimension. The cube will have two measure groups, one for transactions with transaction count and sale value, while the item measure group will have item sales value and item count measures. Here is the short SQL query that will generate the full query for the basket ids. It is only a few lines of SQL, but it will generate over 1100 lines of SQL that will form the query for calculating the unique basket id. You will want to run this query incrementally, so just put a where clause at the bottom to ensure it is over the high water mark. I would also encourage you to put the query in an SSIS package with an OLEDB call to a procedure to look up the identity column on the basket table, so that can be stored with the transaction. The rest, including building the cube, should be easy.

A complementary application to this basket analysis cube is a data mining model targeted at the relationships between sales items and useful for predicting sales items.

If you are having performance issues with your basket analysis solutions or want help with your application, don't hesitate to ask me to help. This is the type of BI activity I really enjoy.

For real-time demonstrations of OLAP, data mining and related BI technologies see http://EasternMining.com.au/Sites/Demonstrations

## 4 comments:

Here is a soft copy of the SQL. Note, you will need to change the name of your TransactionItem table and columns.

select 'Select TransactionID,sum(ItemID) ItemIDSum,count(*) ItemIDCount,convert(varbinary(max),left('

union all

select ' +CONVERT(BINARY(8),sum(s'+CONVERT(VARCHAR(5),dWord)+'))'

from (select dWord,dWord*64 Topn

from (select top 999 row_number()

over(order by TransactionID) dWord

from TransactionItem) as x) as y

union all

select ',max(ItemID)/8+8)) BasketBitMap From ('

union all

select 'Select TransactionID,ItemID'

union all

select ',max(CASE WHEN ItemID='+convert(varchar(50),TopN-1)

+ ' THEN convert(bigint,-9223372036854775808) WHEN ItemID<'

+convert(varchar(50),TopN-64)

+ ' THEN convert(bigint,0) WHEN ItemID <'

+convert(varchar(50),TopN)

+' THEN POWER(convert(bigint,2),ItemID-'

+convert(varchar(50),TopN-64)

+') ELSE 0 End) S'+convert(varchar(5),dWord)

from (select dWord,dWord*64 Topn from

(select top 999 row_number()

over(order by TransactionID) dWord

from TransactionItem with (nolock)) as x) as y

union all

select 'from TransactionItem group by TransactionID,ItemID'

union all

select ') as q group by TransactionID'

Hi Richard,

Nice entry. I wonder what you think of my implementation of Market Basket Analysis in SSAS:

http://www.codeproject.com/KB/database/Market-Basket-Analysis.aspx

Here are this and some other articles on SSAS Market Basket Analysis:

http://ssas-wiki.com/w/Articles#Market_Basket_Analysis

I was wondering if you can expand on this a little. Can you show the resulting cube structure and calculations? You mentioned

What proportion of transactions include products x (or y or z)? And what proportion of transactions that include products a (and b, and c...) also include products x (and y, and z...) and I'm not sure how you would figure this.

Thanks

Post a Comment