Chris Webb's BI Blog

Analysis Services, MDX, PowerPivot, DAX and anything BI-related

Analysing SQLBits 7 Data, Part 1: Session Selections

with 6 comments

As I’ve said before, I’m involved with the organisation of the SQLBits conferences here in the UK and at the moment the SQLBits committee is busy preparing for SQLBits 8 in April (make sure you come – it’s going to be great!). This eats up a lot of my spare time – spare time that I usually spend blogging – so I thought I’d kill two birds with one stone and blog about some of the BI-related stuff I’m doing for SQLBits (I’ve done this before but there’s plenty more mileage in this subject). It turns out a lot of the things SQLBits needs to do requires classic ‘self-service BI’: solve a business problem as best you can with whatever data and tools are to hand. It’s good to see things from the end user’s point of view for a change!

First of all, let’s take a look at scheduling: how can we make sure that we don’t run two sessions in the same time slot that are interesting to the same type of attendee? If attendees are put in a situation where they are forced to choose between two sessions they want to see they won’t be happy – we want to be able to create a schedule where there are as few difficult choices as possible. Unfortunately we don’t collect data about which sessions attendees actually go to, and even if we did it would be no use because of course by the time the session runs it’s too late to fix the agenda. However, well before the conference we allow people to vote for the ten sessions out of all those that have been submitted that they’d like to see (voting has just opened for SQLBits 8, incidentally), and we use this data to help us decide which ones make it onto the agenda; we can therefore use this data to help avoid overlaps.

This data can be visualised very effectively using NodeXL. To do this, I ran a SQL query on the SQLBits database that gave me every combination of two sessions that had been picked by the same user, so for example if a user had selected sessions A, B and C my query returned the pairs A-B, A-C and B-C. This gave me my list of edges for the graph and for the size of the edges I used the number of times the combination of sessions occurred, so I could see the most popular combinations. Unfortunately with 107 sessions on the list and thousands of edges, I got something that looked like one of my four-year-old daughter’s scribbles rather than a useful visualisation, so I decided to filter the data and look at one session at a time. Here’s what I got for my session ‘Implementing Common Business Calculations in DAX’:

nodexl1

Still not great, but at least with the thicker lines you can see where the strongest relationships are and when you select these relationships it highlights them and the nodes on either end, so you can read the names of the sessions. I then realised you could use the ‘dynamic filters’ functionality to filter out the weaker relationships, making it even easier to pick out the strongest ones:

image

So we can now see that the strongest relationships were with the sessions “You can create UK maps with SSRS 2008 R2” and “Data Mining with SQL Server 2008”. I’m still getting to grips with NodeXL which, I have to say, I like more and more and which deserves more visibility in the MS BI world.

Anyway, since this is a basket analysis problem I also thought of using the Data Mining Addin for Excel, but since I have Office 2010 64-bit I couldn’t. Luckily though the nice people at Predixion do have a version of their addin that works on 64-bit, and they gave me another eval license to use on my data. Getting useful results out of Predixion turned out to be ridiculously easy: I just copied the raw data into Excel, clicked the ‘Shopping Basket Analysis’ button on the ribbon and it spat out a pair of nicely-formatted reports. The first shows ‘Shopping Basket Recommendations’, ie if you select one session it recommends another one you might like:

image

And the second shows the most commonly-occurring ‘bundles’ of sessions that were picked together:

image

It almost feels too easy… but I think you can see that the results look correct and to be honest it’s much easier to do something useful with this than the NodeXL graph. When we close the voting for SQLBits 8 I’ll repeat the exercise and hand the results over to Allan, who’s in charge of speakers, and he’ll be able to use them to put together our agenda for Saturday April 9th.

Written by Chris Webb

February 2, 2011 at 10:14 am

Posted in Data Mining, Visualisation

Tagged with , ,

6 Responses

Subscribe to comments with RSS.

  1. Hi Chris,
    Do we have any BI specific event in North America beside SQL PASS?

    Farhan

    February 2, 2011 at 1:27 pm

    • Have you checked out SQL Saturday? They have lots of BI content.

      Chris Webb

      February 2, 2011 at 1:29 pm

      • Thanks Chris. I am also looking for training wiht concentration on SSAS administration side. I am in Connecticut area but can go to NYC. Any recommendation?

        Farhan

        February 2, 2011 at 2:29 pm

      • No, sorry – I don’t know who does good SSAS training in the US.

        Chris Webb

        February 3, 2011 at 9:11 pm

  2. Really interesting to see how it’s done. As a SQLBits attendee I know my session voting is quite diverse (sorry, there’s a lot of interesting sessions).

    Am very glad the sessions are recorded, as choosing between sessions on the day becomes ‘what do I want to learn first’ rather than missing the content totally.

    r

    rich

    February 2, 2011 at 9:31 pm

  3. In the shopping chart example I would also like to shed some light on the fact that probability is not the main point(Shopping Basket Recommendation). It is the importance score. It seems like you have the most important rule last in the list.

    thomas ivarsson

    February 7, 2011 at 9:08 am


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 3,146 other followers

%d bloggers like this: