NodeXL, Network Graphs and the Eurovision Song Contest
Via the Perceptual Edge blog, earlier this week I came across a cool new open-source Excel addin, developed by Microsoft Research and various people at universities around the world, called NodeXL. You can download it from Codeplex here:
Marc Smith’s blog also has a lot of information on it here:
Basically it’s a tool for displaying and analysing network graphs. That sounds a lot more complicated than it actually is – really all it means is that you can use it for analysing the relationships between things. For example, if you had a list of people who were registered on a social networking site like Facebook, you could use NodeXL to display which people were friends with with other people; in a business setting you could use it to display which products were bought together in the same ‘basket’. Although it’s somewhat buggy and crashed on me a few times, it’s a lot of fun to use and I can see that there are a lot of potential uses in the BI space, especially now that Excel is being pushed as Microsoft’s BI client tool of choice.
So anyway, after I downloaded it I had a think about what data I could with it. There are built-in options to import data from Outlook (to analyse which email contacts appear together on the To and CC lines of emails) and social networking sites like Twitter, Flickr and YouTube. But I wanted something a bit more fun – and then it suddenly occurred to me, why not use it to analyse voting patterns in the Eurovision Song Contest? Apologies to readers outside Europe who don’t know what this is: basically it’s an annual competition where each country in Europe enters the worst pop song they can possibly come up with, and the winner is chosen by a vote; what everyone knows, of course, is that countries vote for the other countries they are most friendly to rather than on the basis of the songs themselves, for example with Greece and Cyprus always voting for each other (see here and here for some examples of detailed analyses of voting patterns).
I downloaded the raw data for voting in the 2009 competition from here, and with a bit of hacking got it into the template that’s bundled with NodeXL. Here’s an example of the output, using the impressively-named Fruchterman-Reingold layout, with the UK highlighted to show who voted for the UK and who the UK voted for:
I’ll admit this particular graph is a bit busy, but in NodeXL itself you can zoom in and out, filter and analyse the relationships very easily; even here, though, we can see here things like the fact that the UK voted for Switzerland but Switzerland didn’t vote for the UK. Here’s a circular layout with the winners, Norway, highlighted:
Last of all, here’s the data filtered to show only Greece, with the points that Greece awarded to each other country shown on the vertices:
There are a lot of different options and this is a very complex product, so the fact it’s free is particularly amazing. It’s this kind of thing that makes Excel such a rich platform for data analysis – imagine using this with PowerPivot or the data mining addin, or other open-source tools like Sparklines for Excel.