Chris Webb's BI Blog

Analysis Services, MDX, PowerPivot, DAX and anything BI-related

Archive for February 2010

NodeXL, Network Graphs and the Eurovision Song Contest

with 3 comments

Via the Perceptual Edge blog, earlier this week I came across a cool new open-source Excel addin, developed by Microsoft Research and various people at universities around the world, called NodeXL. You can download it from Codeplex here:
http://nodexl.codeplex.com/
Marc Smith’s blog also has a lot of information on it here:
http://www.connectedaction.net/

Basically it’s a tool for displaying and analysing network graphs. That sounds a lot more complicated than it actually is – really all it means is that you can use it for analysing the relationships between things. For example, if you had a list of people who were registered on a social networking site like Facebook, you could use NodeXL to display which people were friends with with other people; in a business setting you could use it to display which products were bought together in the same ‘basket’. Although it’s somewhat buggy and crashed on me a few times, it’s a lot of fun to use and I can see that there are a lot of potential uses in the BI space, especially now that Excel is being pushed as Microsoft’s BI client tool of choice.

So anyway, after I downloaded it I had a think about what data I could with it. There are built-in options to import data from Outlook (to analyse which email contacts appear together on the To and CC lines of emails) and social networking sites like Twitter, Flickr and YouTube. But I wanted something a bit more fun – and then it suddenly occurred to me, why not use it to analyse voting patterns in the Eurovision Song Contest? Apologies to readers outside Europe who don’t know what this is: basically it’s an annual competition where each country in Europe enters the worst pop song they can possibly come up with, and the winner is chosen by a vote; what everyone knows, of course, is that countries vote for the other countries they are most friendly to rather than on the basis of the songs themselves, for example with Greece and Cyprus always voting for each other (see here and here for some examples of detailed analyses of voting patterns).

I downloaded the raw data for voting in the 2009 competition from here, and with a bit of hacking got it into the template that’s bundled with NodeXL. Here’s an example of the output, using the impressively-named Fruchterman-Reingold layout, with the UK highlighted to show who voted for the UK and who the UK voted for:

image 

I’ll admit this particular graph is a bit busy, but in NodeXL itself you can zoom in and out, filter and analyse the relationships very easily; even here, though, we can see here things like the fact that the UK voted for Switzerland but Switzerland didn’t vote for the UK. Here’s a circular layout with the winners, Norway, highlighted:

image

Last of all, here’s the data filtered to show only Greece, with the points that Greece awarded to each other country shown on the vertices:

image

There are a lot of different options and this is a very complex product, so the fact it’s free is particularly amazing. It’s this kind of thing that makes Excel such a rich platform for data analysis – imagine using this with PowerPivot or the data mining addin, or other open-source tools like Sparklines for Excel.

Written by Chris Webb

February 22, 2010 at 2:23 pm

Posted in Excel

Subselects and Calculated Members in R2

with 48 comments

As Darren noted the other week, there was a recent thread on the MSDN Forum that detailed the few enhancements for traditional SSAS users in SQL 2008 R2. I thought I’d pick up on the one MDX-related change, which is to do with calculated members and subselects. From a pure language point of view you’d be right in thinking that this doesn’t sound all that exciting, but it does fix one long-running issue that has caused a lot of people a lot of pain over the last couple of years – namely the way that Excel can’t handle calculated members on non-measures dimension. For a bit of background, see:
http://sqlblog.com/blogs/marco_russo/archive/2007/01/31/excel-2007-pivottable-with-calculated-members.aspx
http://sqlblog.com/blogs/marco_russo/archive/2007/03/07/ssas-2005-sp2-breaks-excel-calculated-member-selection.aspx
http://sqlblog.com/blogs/marco_russo/archive/2008/12/08/ssas-2008-calculated-members-still-don-t-love-excel-2007.aspx
http://blogs.msdn.com/excel/archive/2008/02/05/common-questions-around-excel-2007-OLAP-PivotTables.aspx

If you’re using Excel 2010 with Analysis Services 2008 R2 you’ll now be able to filter on individual calculated members again. Frankly, it’s a disgrace that it’s taken this long to fix and that you have to use Office 2010 and R2 before it works (especially when every other client tool worth its salt does not have this problem), but I suppose we should be grateful that it at least has been fixed.

Anyway, let’s take a look at how this change has actually been implemented. First of all, you only get the new behaviour when you use the Subqueries connection string property. It’s been possible to set Subqueries=1, which allows the use of calculated members in subselects, since SSAS 2008 (subqueries=0, which is the default, does not allow this) but I’m told this didn’t solve all of the Excel team’s problems; therefore the new setting Subqueries=2 was introduced in R2.

Now let’s add a calculated member to the Adventure Works cube as follows:

CREATE MEMBER CURRENTCUBE.[Date].[Calendar].[Calendar Year].&[2004].CALC1
AS 111;

As you can see, it’s on the Calendar hierarchy of the Date dimension, on the Calendar Semester level underneath the year 2004.

If we run the following query with no special connection string properties set:

SELECT {[Measures].[Internet Sales Amount]} ON 0,
[Date].[Calendar].[Calendar Semester].ALLMEMBERS
ON 1
FROM
(SELECT
[Date].[Calendar].[Calendar Year].&[2004].[CALC1]
ON 0
FROM [Adventure Works])

We get the error “A set has been encountered that cannot contain calculated members”. However, as you would expect, when you set Subqueries=1 or Subqueries=2 the query runs successfully and you get the following results:

image

Now, if we change the query to ask for all the members at the Year level as follows:

SELECT {[Measures].[Internet Sales Amount]} ON 0,
[Date].[Calendar].[Calendar Year].MEMBERS
ON 1
FROM
(SELECT
[Date].[Calendar].[Calendar Year].&[2004].[CALC1]
ON 0
FROM [Adventure Works])

When we use Subqueries=1 we get an empty set returned on rows:

image

When we use Subqueries=2 we get the year 2004, the parent of the calculated member, and a null for the measure value:

image

Why the null? Although there is data for 2004 in the cube, in our subselect we’ve only included a calculated member child of 2004, and calculated members’ values don’t aggregate up to their parents.

For the following query, where the year is in the subselect and the semesters are on rows:

SELECT {[Measures].[Internet Sales Amount]} ON 0,
[Date].[Calendar].[Calendar Semester].ALLMEMBERS
ON 1
FROM
(SELECT
[Date].[Calendar].[Calendar Year].&[2004]
ON 0
FROM [Adventure Works])

For both Subqueries=1 and Subqueries=2 you get the following result:

image

Interestingly, if you include a Year and Semester in the subselect as follows:

SELECT {[Measures].[Internet Sales Amount]} ON 0,
[Date].[Calendar].[Calendar Semester].ALLMEMBERS
ON 1
FROM
(SELECT
{[Date].[Calendar].[Calendar Year].&[2004],
[Date].[Calendar].[Calendar Semester].&[2004]&[2]}
ON 0
FROM [Adventure Works])

For Subqueries=1 you get this:

image

And for Subqueries=2 you get this:

image

I won’t go on (though there are more examples I could give) because I’m sure no-one outside the Excel team will ever care about any of this, but it’s interesting to note nonetheless and I doubt it will ever get properly documented anywhere. I’ve also been told there are some changes to how the DrillDownX family of functions work with regard to calculated members in R2, but I’ll save that for another post.

Written by Chris Webb

February 18, 2010 at 11:15 pm

Posted in MDX

Tableau Public

leave a comment »

In a clever marketing move, Tableau have just released a new free version of their tool called Tableau Public. I’ve been a fan of Tableau since I saw it a while back (was it really 2005?) but never seen it actually used at one of my customers, I suppose because of its hefty price. Anyway, what Tableau have done is created a basic version which can only connect to Access, Excel and Text files and can only save to Tableau’s own web gallery; however, it’s then possible to embed the visualisations you create in your own website/blog etc for sharing with the world. The stated aim is for this site to be the YouTube of data visualisation; coming from Tableau this is a half-credible claim because they really do understand this area and have some great tools. Also, I’m sure the extra publicity it will generate for the company will do no harm for sales of the paid version either.

And I’m sure it will be great when they get over their teething difficulties, because at the time of writing I can’t actually save anything… maybe they underestimated the amount of interest this would generate?

OK, it’s working now. But of course Windows Live Spaces doesn’t allow me to embed a ‘viz’ in a blog post (grr); instead, here’s a simple example I just uploaded using stats on the last 30 days of traffic on the front page of this blog:
http://public.tableausoftware.com/views/Blogstats/Sheet1

Here’s what it looks like in the client:

image

So Tuesday is the best day for page loads, but Thursday is slightly better for unique visitors… and so on. But it’s a cool tool and definitely worth checking out.

Written by Chris Webb

February 15, 2010 at 2:53 pm

Posted in Cloud

SQLBits VI Dates Announced

leave a comment »

It’s SQLBits time yet again! After a long absence we’re back in the South East (of England, that is) and we’ll be going to central London for the first time: Church House Conference Centre. It’s also going to be a one day event, but still free, and it’s going to be on Friday April 16th rather than a Saturday. As always, check http://www.sqlbits.com/ for more details in the coming weeks…

Written by Chris Webb

February 14, 2010 at 11:35 pm

Posted in Events

OData and Microsoft BI

with 6 comments

I first came across OData last year when Jamie Thomson blogged about it; since then I hadn’t really thought about it much until I came across Douglas Purdy’s blog and specifically his post from yesterday which really brought home how important it is to the future of Microsoft BI. I would urge you to watch the video “OData: The Movie” that he mentions in his post because it gives a really good introduction to the way that OData can be used in BI scenarios; if you don’t have the time, all you really need to know is that it’s OData that makes it possible for PowerPivot to consume data from SSRS reports and Sharepoint lists.

Just watching this video made my mind boggle with the possibilities of OData, although since I’m a long way from being an expert in this area I won’t bother to detail all of these fantasies as they’re probably rubbish. However, just consider how much easier life would be for the PowerPivot user of the future if the internet was full of sites that supported OData; certainly, when I’ve looked at the new UK government websites such as the recently-launched http://data.gov.uk/ and http://data.london.gov.uk/, or even sites like the Guardian Data Store that I blogged about last year, I’ve felt that the lack of a standard format to consume this data (a badly-formatted Google spreadsheet is not ideal) has seriously limited these sites’ usefulness. And what if Bing came up with a service like the sadly-useless Google Squared, where you could search for data on the web and return it in a structured, OData format?

I’d also love to see Analysis Services support OData too, in some shape or form. Perhaps it could be used to solve some of the same problems with XMLA that Julian Hyde lists in his recent post on xmla4js. Wouldn’t it also be cool if there was a standard interface for publishing an MDX query up to SSAS from any client tool, and then be able to consume it via OData (similar to what I suggested here)? You’d then enable scenarios like this: power user creates query using an advanced SSAS client tool, then publishes it up to SSAS whereupon it becomes something similar to a SQL stored proc with the results available as an OData feed, and therefore can be consumed either in your own code (eg in a website), by PowerPivot users, or by something like SSRS (which would then know nothing about the query used but would just be used to format the resultset).

Written by Chris Webb

February 3, 2010 at 9:56 am

Posted in BI

SQLBits V Videos Now Available

leave a comment »

Those nice people at MS sent a film crew to record all of the sessions at SQLBits V last November, and we’re pleased to announce that the videos are now available on the site for viewing/download:

http://www.sqlbits.com/News.aspx?Title=SQLBits%20Videos%20available%20now%20available

Enjoy!

Written by Chris Webb

February 1, 2010 at 11:56 am

Posted in Events

Follow

Get every new post delivered to your Inbox.

Join 3,296 other followers