Saturday, June 13, 2009

Leaving Footprints All Over the Internet

Someone recently asked me if my web analytics dashboard (http://RichardLees.com.au/Sites/Demonstrations) records information on where they had been on my site. The answer is an emphatic Yes. Not that my site does anything special in the logging department. I am just using Windows IIS and by default it will log every http request. The only extra thing I do is that I put this data in a cube so that it is easy to query and visualise. You should remember that everytime you browse to a page on the internet, there is probably a log on that server recording your IP address, what you requested, date, time etc. And there will often be additional information such as your browser configuration, even the operating system and version number. If you are accessing the internet via a proxy server, then your IP address is shared amongst other proxy server users so you are somewhat camouflaged. But if you are using a proxy server, then you should consider the proxy server is logging every request you make over the internet, and possibly all the intranet browsing too.

So, as I was saying, my site puts all the web logs into a cube so that you can visualise all the activity on my site, even what I do. I built it quite simply as a demonstration of OLAP technology.

You can ad hoc query the activity on my site using ThinSlicer or PerformancePoint. Both of these tools are thin client, only requiring IE6 (or above) and there are no client controls. So if you have been to my site (anything on http://richardlees.com.au/) then you will be able to browse the cube and see where you have been and what resources you used on my server at the time. Simply go to the Interactive Chart Grid on http://RichardLees.com.au/Sites/Demonstrations and navigate to your IP address and display the requests you have made. This is using Microsoft's PerformancePoint dashboard technology, which takes a little getting used to, but it is a fully functional cube browser. For example, the picture above is a query I made, drilling into IP address 84.151.222.71, which happens to be in Munich, Germany. I then drilled into .jpg files to see what pictures had been downloaded. If you don't know your IP address, you can drill down to your geographic location and select the IP address active at the time you were on the site. You could even browse to my site requesting a resource that does not exist, such as HelenClark.jpg, since you will be the only person requesting this resource, you will be able to drill into this resource and list out your own IP address etc. You could then pivot on the IP address and see what else you requested. The cube is available for continuous querying, even though it is being updated with new logs every 15 minutes or so.

Please note, I do not use these web logs for anything other than providing a demonstration of OLAP and data mining technologies. There is no reason to feel paranoid about what I store on my server, but it might make you think about what information other web sites have about your activity.

No comments: