Chris Webb's BI Blog

Analysis Services, MDX, PowerPivot, DAX and anything BI-related

Archive for October 2010

New Release of PowerPivot

with 2 comments

And Vidas has just noticed there’s a new release of PowerPivot that integrates with Windows Azure Marketplace – Microsoft’s data marketplace that was previously known as Project Dallas.

There’s also a separate Excel addin that allows you to import data from Windows Azure Marketplace direct into Excel:
https://datamarket.azure.com/addin

One last thing to note – among all the cool data that’s available at the Windows Azure Marketplace, I see that you can get access to data from Wolfram Alpha too!

Written by Chris Webb

October 28, 2010 at 10:09 pm

Posted in Uncategorized

BI User Group London – November 17th

leave a comment »

At long last I’ve got round to organising another BI user group meeting in London, on the evening of November 17th. We’ve got Ian Marrit talking about Master Data Services in the first session, that’s followed by Jamie Thomson and me talking about all the cool new stuff that will be announced for BI at PASS the week after next. For more details and registration see:

http://sqlserverfaq.com/events/248/Business-Intelligence-Introduction-to-Master-Data-Services-Ian-Marrit-Whats-new-for-BI-in-Denali-Chris-Webb-and-Jamie-Thomson.aspx

Hope to see you there!

Written by Chris Webb

October 28, 2010 at 8:46 pm

Posted in Uncategorized

SQL Azure Reporting: Reporting Services in the Cloud

leave a comment »

So SSRS in the cloud has just been announced! See this post on the SQL Azure team blog:
http://blogs.msdn.com/b/sqlazure/archive/2010/10/28/10082293.aspx

…and also this 20 minute video from PDC giving a lot more detail:
http://player.microsoftpdc.com/Session/5007e9c3-03cd-41b4-9e1c-4eb17cd60e37

Basically it’s the SSRS you know and love with only a few limitations: for example it only supports SQL Azure as a data source and there’s none of the developer extensibility options (like custom data extensions) available yet.

I can’t wait for SSAS in the cloud…

Written by Chris Webb

October 28, 2010 at 8:34 pm

Basket Analysis, PowerPivot and NodeXL

leave a comment »

While I was thinking about basket analysis last week I started wondering what kind of visualisations would be useful for this problem, and I remembered NodeXL (which I blogged about earlier this year). After all, isn’t basket analysis pretty similar to network analysis? It’s all just connections between things…

Anyway, taking the PowerPivot model and calculated measures from my last post, I started to look at how to import that data into the NodeXL Excel template. The first problem was how to present the data in a tabular form and the answer was the new Excel 2010 flattened pivot table that Kasper and Rob have blogged about before, with subtotals turned off. This gave an output that looked like this:

The second problem was slightly more tricky. NodeXL wants a set of edges to draw its graph, and in our case an edge will represent an instance where two products were bought together. However in the results above we have rows where the two products are identical, eg showing the number of customers who bought Apples with Apples, and we also have rows that show the same data in different ways, eg one row showing the number of customers who bought Apples and Cake and another row showing the number of customers who bought Cake with Apples. We need to get rid of these unwanted rows and we can do that with Excel 2010’s ability to use custom MDX to generate a named set. Here’s the set expression I used:

GENERATE(
[Product].[Product].[Product].MEMBERS
, {[Product].[Product].CURRENTMEMBER}
*
{LINKMEMBER(
[Product].[Product].CURRENTMEMBER
, [Product Bought With].[And Product]).NEXTMEMBER : NULL})

What I’m doing here is to taking the set of all Products, then using the Generate function to crossjoin each Product with the set of all Products from the And Product dimension, starting from the And Product that’s immediately after the And Product that has the same name as the current Product, to the end of the level.

This gives us the following set of rows:

We can now paste this into the Edges worksheet of the NodeXL template, so that NodeXL can work its magic. I’m not going to pretend to be an expert on NodeXL and indeed this data isn’t the most exciting in the world to visualise, but once I’d got the graph drawn a quick look through the NodeXL tutorial (and especially the section on analysing voting patterns in the Senate) showed me how to use the value of the measure ‘Customers buying both Products’ to control the opacity of the lines in the graph. And here it is:

From this we can see clearly that Bread and Cake were never bought together and that Apples and Bread were bought together more often that any other combination. Job done! With real data, I think NodeXL would prove very useful indeed for this kind of analysis and it would be great if NodeXL could work direct with data in PowerPivot (hint, hint) in the future. If anyone out there does try using NodeXL with their data for basket analysis, I’d be very interested to hear from them…

Written by Chris Webb

October 26, 2010 at 11:04 pm

Posted in DAX, PowerPivot

Tagged with

SQLBits VI Videos Now Available

with 2 comments

OK, I know this seems very late and we’ve just done SQLBits VII, but the videos from the sessions at SQLBits VI are now available to view online at last. Just select any of the sessions here:

http://www.sqlbits.com/information/event6/PublicSessions.aspx

…and you’ll be able to view the video for that session.

 

Written by Chris Webb

October 20, 2010 at 11:12 pm

Posted in Events

Tagged with

Simple Basket Analysis in DAX

with 2 comments

Continuing the theme of finding the distinct number of customers who’ve done something specific, I’ve recently been having a go at implementing simple basket analysis in DAX. So, for example, as well as finding the distinct number of customers who bought Apples, we may also want to find the distinct number of customers who bought both Apples and Oranges, or even those who bought Apples and Oranges but not Cake.

To illustrate this problem I used the following sample data entered into Excel tables as the basis for a PowerPivot model:



So what we have here is essentially a Product dimension table, a Customer dimension table and a Sales fact table. I did the obvious thing and set up relationships going from Sales to Product and Sales to Customer, but I also created two extra tables in PowerPivot that were built from the Product table: one to allow the selection of Products for the AND filter, and one to allow for the selection of Products in the NOT filter, neither of which were related to any other table. So when building the query to show the number of customers that bought Apples and Oranges but not Cake, the user would select Apples on the Product table, Oranges on the AND Product table and Cake on the NOT Product table.

The best and fastest way I found of writing the DAX I needed was to break the problem up into several smaller calculated measures. The starting point, to get the number of customers that bought Apples, is a simple calculated measure like this:

=COUNTROWS(DISTINCT(Sales[Customer Key]))

I then created two calculated measures: one that returned the distinct number of customers who had bought the AND Product, ignoring the selections made on Product and NOT Product; and one that returned the number of customers who had bought the NOT Product, ignoring the selections made on Product and AND Product. Here they are:

=CALCULATE(
COUNTROWS(DISTINCT(Sales[Customer Key]))
, FILTER(ALL(Product),
COUNTROWS(
FILTER(DISTINCT(‘Product Bought With’[And Product Key]), Product[Product Key] = ‘Product Bought With’[And Product Key]))
>0))

=CALCULATE(
COUNTROWS(DISTINCT(Sales[Customer Key]))
, FILTER(ALL(Product),
COUNTROWS(
FILTER(DISTINCT(‘Product Not Bought’[Not Product Key]), Product[Product Key] = ‘Product Not Bought’[Not Product Key]))
>0))

They both work in the same way: they find the distinct count but shift the context on the Product table (which has a relationship with Sales remember, whereas AND Product and NOT Product don’t) so that whatever has been selected on Product is ignored and, the selections made on AND Product and NOT Product are, respectively, used instead. To do this I’m using two nested filters – one which loops over every row in the Product table, and then an inner filter that loops over every selected member on AND/NOT Product; they return the rows from Product where the Product Key appears in the list of distinct Product Keys selected on AND/NOT Product.

At this point we can build a pivot table that looks like this:

With Customers on rows, obviously, no distinct count is greater than 1, but we can now see that Chris, Helen and Mimi bought Apples; that Chris and Natasha bought Oranges; and that Mimi and Natasha bought Cake. We can now use these measures to create our calculated measure that returns the distinct number of customers that bought Apples and Oranges but did not buy Cake:

=CALCULATE(COUNTROWS(DISTINCT(Sales[Customer Key])), FILTER(DISTINCT(Sales[Customer Key]), [Customers Buying Not Product]=0 && [Customers Buying And Product]>0))

It’s a variation on the original distinct count measure, but now we’re using Calculate once again to shift the context so we take the distinct customers and apply an additional filter to ensure they didn’t buy whatever was selected on NOT Product but did buy whatever was selected on AND Product.

At last we can see that Chris was the only customer that bought Apples and Oranges but did not buy Cake. We can also run queries like this that don’t display individual customers:

There’s one last problem to solve: what happens if we don’t want to apply the AND or NOT filter for some queries? It doesn’t really make much sense to select the All member on either of these tables (for example, you’d never want to find the number of Customers that bought Apples and bought any Product and did not buy any Product) so we can alter the calculations to turn off the AND and NOT filter when the All member has been selected on either the AND or NOT Product table. Here are the new versions of the two calculated measures that return the distinct counts for the AND and NOT products:

=IF(
COUNTROWS(DISTINCT(‘Product Bought With’[And Product Key]))=COUNTROWS(ALL(‘Product Bought With’[And Product Key]))
, 1
, CALCULATE(
COUNTROWS(DISTINCT(Sales[Customer Key])),
FILTER(ALL(Product),
COUNTROWS(
FILTER(DISTINCT(‘Product Bought With’[And Product Key]), Product[Product Key] = ‘Product Bought With’[And Product Key]))
>0)))

=IF(
COUNTROWS(DISTINCT(‘Product Not Bought’[Not Product Key]))=COUNTROWS(ALL(‘Product Not Bought’[Not Product Key]))
, 0
, CALCULATE(
COUNTROWS(DISTINCT(Sales[Customer Key])),
FILTER(ALL(Product),
COUNTROWS(
FILTER(DISTINCT(‘Product Not Bought’[Not Product Key]), Product[Product Key] = ‘Product Not Bought’[Not Product Key]))
>0)))

Here I’ve wrapped the previous expressions in an IF that checks if the number of distinct Product Keys in the selection matches the number of distinct Product Keys in the entire table – if it does then the All member has been selected, and the first expression then returns a 1 and the second returns a 0 to ensure that the filter conditions on these measures in the final calculated measure return true when the All member is selected. We can then run simpler queries like this one showing the customers who bought both Apples but not Cake, with no AND filter applied:

As always I’d be interested in hearing from anyone that’s got a better way of solving these problems. I’m aware I’ve used the evil Filter function all over the place and that that isn’t always good for performance (although on larger data sets this approach seemed to perform ok) so I wonder if there’s a way to avoid its use anywhere?

Written by Chris Webb

October 20, 2010 at 10:47 pm

Posted in DAX

Two PowerPivot Books

leave a comment »

There’s no way I could ever pretend to be an impartial reviewer of Marco Russo and Alberto Ferrari’s new book, “PowerPivot for Excel 2010: Give Your Data Meaning”. First of all, they’re good friends of mine (we wrote “Expert Cube Development with SQL Server Analysis Services 2008” together last year); and secondly, I got a freebie copy of the book. But all that aside, I do honestly think it is a really good book. Just as an example, I’ve been playing around with some DAX problems over the last few days with a view to writing a some blog posts and had been struggling to get the EARLIER function to work in the way I’d been expecting; there are only a few examples of its use on the web but I found the book had a very detailed explanation of how it works. Indeed a large part of the book is concerned with DAX and it’s probably the best resource on that subject that I’ve seen, so that’s reason enough to buy it.

It’s important to point out, though, that Marco and Alberto’s book doesn’t really go into any detail on PowerPivot for Sharepoint – there is one chapter at the end, but it’s main focus is on PowerPivot for Excel. If you want to learn more about the Sharepoint side of things I can recommend “Professional PowerPivot for Excel and Sharepoint” by Siva Harinath, Ron Pihlgren and Denny Lee (and yes, I got this book as a freebie too – it’s one of the perks of being a blogger that you get loads of free books!). There’s very little overlap between the two books – the only thing they both cover is the basic info on how to build a PowerPivot model, and that subject is so basic most people will be able to work it out for themselves – so it’s probably worth getting both if you’re serious about learning PowerPivot.

One last bit of advertising: Marco and Alberto are running a two day PowerPivot workshop in the Netherlands at the beginning of December. More details can be found here:
http://www.sqlbi.com/workshop/

Written by Chris Webb

October 13, 2010 at 2:49 pm

Posted in Books

Tagged with ,

Follow

Get every new post delivered to your Inbox.

Join 3,131 other followers