Analysing Voting In the Power BI Competition – With Power BI!
As I mentioned the other day, the public voting round of the Power BI competition has now started and the entrants are now competing for a place in the top ten and a whole bunch of cool prizes including XBoxes and Surface Pros. Having entered a demo myself I’m naturally very keen to see how I’m doing, but unfortunately doing this is a bit of a pain – you have to go to www.powerbicontest.com, click on the Entries tab, sort them, find your entry, and so on… So I thought to myself, isn’t there a better way? And of course there is… using Power BI!
Power Query is of course the tool to use to scrape the voting data from the Power BI site. It wasn’t straightforward to do but on the other hand it wasn’t impossible: there’s no single table of results, and indeed the results are spread over four different pages. To load the data into the Excel Data Model I:
- Created a Power Query function to scrape the data from the first page
- First I created a query to scrape the data from the first page using the “From Web” button
- I then did a *lot* of searching around in the HTML to get the data (for this reason I’m not going to paste the code of the function here because it’s ugly)
- I then deleted a lot of columns, created some new custom columns, and pivoted the data until in was in a nice tabular form
- Finally, once I was sure the query was working properly I turned it into a function that could get data from any page
- Next I created a table with five rows in it, called the function five times, once for each row, to get the data from each page
- Then I merged all the data into a single table of results
- I created another function to calculate the rank of an entry (which itself was an interesting challenge) and added a rank column to the merged table
- Last of all, I created some Power View sheets to analyse the data
You can download the entire workbook here. I’ll warn you, the code isn’t pretty but it does the job and it’s got a few interesting features in for all you Power Query fans.
Now let’s have a look at what the data shows.
At the time of writing this I’m currently in 16th place – and not in line for an XBox alas. You can of course change this by
and that’s even if you’ve voted for me already – you can vote once every 24 hours! But I digress…
The obvious first thing to do was to create a leader board sorted by rank:
As you can see, at the moment Carlos Costa is way out in front with a massive 180 votes; after that there’s a group of people fighting for a top five finish; and after that there’s a large group of people who might just squeeze into the lower reaches of the top ten.
Here’s a column chart showing this more clearly:
What are the top-ranked guys doing to get so many votes? Is it something to do with the number of people seeing their entry? If you put votes and views next to each other, it’s not easy to say what’s going on:
I’ve got the second largest number of views, but I’m in the middle of the pack. The problem for me is that I was one of the first people to enter and most of my views were well before voting opened; unfortunately there’s no way of knowing how many views each entry had after voting opened so I can’t say for sure what’s going on here. Certainly Alexander Pinkus submitted his demo “Few Facts About Dinosaurs” around the same time as me, has around the same number of views, and he’s currently ranked number 5. From that I can only deduce that his demo is better than mine (and having watched it I can say it is very good); he certainly has more exciting subject matter. Who doesn’t love dinosaurs? There are several other demos that have got particularly eye-catching subject matter and/or titles and they’re also doing well.
Here are views and votes plotted on a scatter chart, with the top 10, 20, 30 and so on shown in different colours:
I think this makes it a little easier to disregard outliers like me, and it’s probably fair to say that there is some kind of link between views and votes, at least for those in the top 20.
Now here’s one last column graph, showing the average number of views per vote for the top 20 ranked entrants:
You can see how badly I’m doing in this respect, despite my begging and pleading both here and on Twitter; conversely you can see that Mike Tetreault at the other end is obviously very good at getting his vote out.
I would like to say at this point that it’s tempting to moan and complain that people have ‘cheated’ and got their friends, family and colleagues to vote for them. To be honest, I think that everyone who’s got any significant number of votes will have ‘cheated’ like this to some extent. I certainly have – I’ve used my blog and other social media to try to get as many votes as possible. Indeed this post itself is a ruse to try to get more votes! At the end of the day the whole point of this competition is for Microsoft to get as many people as possible to see Power BI, so more people want to buy it. Therefore this mad scramble by entrants to get as many votes as possible will benefit all of us in the end.
Anyway, by the time you read this post there’s a strong chance that the patterns here will have changed completely. It’s a shame there isn’t more data available to play with – it would be great to have the time and date of each vote cast, and even the location of the person casting the vote. Given that you have to have a Facebook account to vote I suspect that someone somewhere does have all of this data, and more… so I wonder if they’re using Power BI too?