Thursday, December 11, 2008

Book Review: Data Mining with SQL Server 2008

I remember when data mining algorithms were first included in SQL Server 2000. It was very exciting and I immediately went to the municipal library and talked them into giving me an extract of their database for data mining demonstrations. That version of SQL Server had no data mining documentation. But that didn't worry me and I helped a few organisations exploit data mining. However there weren't many people that went to the trouble of learning SQL Server's data mining technolgy. SQL Server 2005 had some documentation but data mining still wasn't used to its potential. SQL Server 2008 has been greatly beefed up in its data mining capability, primarily in the ancillary tools that professional data miners demand, such as lift charts and validity testing. Now SQL Server 2008 has, arguably, the best set of data mining tools in the market. So if you have been putting off data mining, don't delay any longer.

This book by Jamie MacLennan, ZhaoHui Tang and Bogdan Crivat (all developers of the product in Redmond) is a very practical guide and quite readable by someone new to data mining. It starts with the data mining tools included in Excel 2007 and goes on to detail all the algorithms, the syntax of the DMX language and embedding data mining in your applications. Experienced data mining will also find the book useful. I found many useful tips. For example, I only just learnt that you can nest MDX in your DMX queries. It is more common to embed SQL in data mining queries.

To learn about data mining, I really believe that you need some real data to explore. The book has a download site where readers can download databases and demonstrations to experiment with.

If you are using Analysis Services and haven't yet started data mining, I suggest that you get a copy of this book and teach yourself data mining. Data mining is going to be really big.

I did have trouble with Wiley download url, but here is a direct link for the exercise data.,descCd-DOWNLOAD.html

By the way, for anyone who is interested, there are a couple of live data mining demonstrations on my site. Where data mining is embedded in Reporting Services (also covered in the book). There is a book suggestion tool using the library data I mentioned above, and the another that predicts response times for the last 50 http requests on my web server. This demonstration has no practical value, but hopefully you can draw the analogy to a similar model that predicts customer profitability etc.


No comments: