Tuesday, September 30, 2008

Data Mining Demonstration using real-time data

I have created many business intelligence demonstrations and will be documenting these through this blogspot.

There are two data mining demonstrations on http://RichardLees.com.au/Sites/Demonstrations

Library Book Suggestion Tool http://RichardLees.com.au/sites/Demonstrations/Pages/LibrariesSuggestions.aspx
This demonstration has real borrower and book information from a municpal library. I have created a data mining model to use the borrower age and sex demographics combined with the books they have already borrowed to predict (suggest) other books that they might borrow. The tool is fronted with SQL Server Reporting Services report. There is a SQL Server relational database and a SQL Server data mining model behind the report. You can select any borrower by name (the names have been fictionalised) and when you click "Run Report" a query will run to get the demographic and loan history of the borrower, which will be input to the outer data mining query. The output of the data mining query is a list of books and associated probability that it will be successful.

The book suggestions are real books based on the borrowing history of real borrowers. This tool could be useful for any library or bookshop. Indeed online stores like Amazon do something very similar. Hopefully, this tool will also show to you that data mining queries can be requested for by someone not fully understanding the technology and that the data mining queries are very fast. All of the demonstrations on http://richardlees.com.au/ are running on a 32 bit desktop PC (or at least as of October 2008).


Web Request Latency Prediction Tool
http://RichardLees.com.au/sites/Demonstrations/Pages/PredictingLatenciesusingDataMining.aspx
This demonstration gets real data from the IIS logs on the web server. I have created a data mining model using the last 12 months web logs. The model is trained to predict http response latency using request attributes such as country, city, resource, bytes, client agent, operation, status etc. The model is scheduled for retraining every Sunday morning. When you request the data mining report, it uses the last 50 web requests to the web server, so that every request to this tool will have fresh new data.

There is obvious application for this particular data mining model, but I hope it does demonstrate that data mining can be real-time, used by non technical people and that queries are very fast. This particular data mining model uses Decision Tress with regression.

Richard
http://RichardLees.com.au/Sites/Demonstrations

No comments: