Wednesday, February 29, 2012

Getting Previous Day's Value Efficiently

I have this saying; It is always possible to make something go faster.  Whether it is a database load, cube process, SQL query, MDX query or running a marathon.  If you put enough effort into it, you can almost always find a way of going faster (a corollary of the rule is a diminishing return effect).  I came across an interesting example of this today when a cube process was scanning a daily stocktake table, and it needed to also return the previous day's value.  That is the same product, but for the previous day.  The natural way to do this would be to join on the same table using the DATEADD() function with a DD of -1.  However, since this is a function, it obscures from the optimiser the sequence of the values returned and the optimiser thinks it needs to use a sort or hash table to cater for out of sequence.  This makes the query much slower.
Here is the query and its query plan
 select
  st.StockTakeDate
 ,st.StockOnHand
 ,st.StockOnHand-stPrev.StockOnHand StockChange
from tbStockTake st
inner join tbStockTake stPrev
on stPrev.ProductId=st.ProductId
and stPrev.StockTakeDate=DATEADD(dd,-1,st.StockTakeDate)








Notice how the join on the second tbStockTake table is sorted so it can be merged with the first tbStockTake table.  I am scanning 750 Million rows, so this sort is not trivial.  We know there is no need to sort the data, since the join will be in the same sequence as the first table.

Here is a very simple enhancement to the query
select
  st.StockTakeDate
 ,st.StockOnHand
 ,st.StockOnHand-stPrev.StockOnHand StockChange
from tbStockTake st
inner join tbStockTake stPrev
on stPrev.ProductId=st.ProductId
and stPrev.StockTakeDate=st.StockTakeDate-1


Notice how this new query plan is able to Merge  Join the two tables without a sort.  This dramatically improves the query performance and tempdb resources.  The table is simply scanned with two cursors and merged.

The SQL Server optimiser is very clever in understanding that a clustered scan by ProductId, StocktakeDate is in the same sequence (so can be merged) with st.ProductId=stPrev.ProductId and st.StocktakeDate=stPrev.StockTakeDate-1.  It is not quite clever enough to determine this for st.StocktakeDate=DATEADD(dd,-1,stPrev.StocktakeDate), somewhat understandably since a function is involved.

Datetime in SQL Server (at least up until the current version 2008 R2) can have arithmetic applied to it.  1 is equivalent to 1 day, and fractions of 1 are equivalent to portions of the day.  For example, 0.25 is equivalent to 6 hours. 

Interestingly (or should I say unfortunately) we cannot use the same trick where a Date datatype is used instead of the Datetime datatype.

So, whenever you have something that you would like to go faster, simply apply some focus on it, since the chances are, there is a faster option.

No comments: