Chris Webb's BI Blog

Analysis Services, MDX, PowerPivot, DAX and anything BI-related

Archive for July 2013

Ordering Of Named Sets In Excel

with 3 comments

A bit of an obscure one, this, but it’s come up twice this week so worth mentioning. When you define a named set on your SSAS Multidimensional cube, Excel doesn’t respect the order of items in that set by default when you use it in a PivotTable. Consider the following named set defined on the Adventure Works cube (on the Calculations tab of the cube, not in defined in Excel itself):

CREATE SET [MY COUNTRIES] AS
{[Customer].[Country].&[France], [Customer].[Country].&[Canada], [Customer].[Country].&[Australia]};

Note that the countries are in the order France, Canada, Australia. When you use this named set in Excel, this order is overridden and the countries come out in hierarchy order, that’s to say the order that they appear on the Country hierarchy: Australia, Canada, France.

image 

image

How can you stop this? After all, in a lot of cases the order of members in a named set is important. If you have Excel 2010 or Excel 2013 (I believe this option isn’t available in Excel 2007), you need to click on the name of the set in the PivotTable Field List pane and select Field Settings:

image

Then in the Field Settings dialog go to the Layout and Print tab and uncheck the “Automatically order and remove duplicates from the set” option:

image

When you do that, the order of your set is respected:

image

Written by Chris Webb

July 31, 2013 at 2:12 pm

New MDX Divide() Function

with 5 comments

A few weeks ago I saw Rob Collie’s blog post about the DAX Divide() function, and I was a bit surprised that I hadn’t seen it before. Then, yesterday, I found that the same function has appeared in MDX in SSAS 2012 Multidimensional. Here’s the entry in BOL:
http://msdn.microsoft.com/en-us/library/jj873944.aspx

The syntax is:
Divide (<numerator>, <denominator> [,<alternateresult>])

Numerator and Denominator are self-explanatory; alternateresult is the constant value you want to return in case of division by zero, and if it is not specified a null is returned.

It turns out that it was added after RTM of SSAS 2012, and officially first appeared in SSAS 2012 SP1. This is the first new MDX function since… what, 2005? 2008? Five long years at least.

[A loud bump is heard as Chris falls off his chair in amazement]

This is important because there have been requests for a safe divide function in MDX for a long time. I posted a request on Connect, for example, and Darren Gosbell blogged about it here. It would have been nice if someone had mentioned to me that this feature had been added…

Why should you care about this? Well, anyone with any experience of MDX will know that you need to trap for division-by-zero and division-by-null when writing calculations. Consider the following query in Adventure Works:

with
member measures.[France Sales] as
([Measures].[Internet Sales Amount],
[Customer].[Country].&[France])
member measures.[US Sales] as
([Measures].[Internet Sales Amount],
[Customer].[Country].&[United States])

member measures.[France as a % of US] as
measures.[France Sales]/measures.[US Sales]
, format_string=’0.00%’

select
{measures.[France Sales],
measures.[US Sales],
measures.[France as a % of US]}
on 0,
[Date].[Date].[Date].members
on 1
from [Adventure Works]

The measure I’ve highlighted divides two other measures, and returns the value 1.#INF (infinity) when the measure [US Sales] is null:

image

1.#INF is not something you want to show to your end users. Furthermore, Mosha showed us all a long time ago that when dividing we should always use the pattern
iif(measures.x=0, null, measures.y/measures.x)
to get the best performance.

It now looks like this pattern has been superseded by the Divide() function. Here’s the measure in bold from the query above rewritten to use it:

member measures.[France as a % of US] as
divide(measures.[France Sales],measures.[US Sales])
, format_string=’0.00%’

From my limited testing on Adventure Works performance seems to be the same as with the iif() pattern, but I have heard that in other scenarios it may perform better. So I would recommend you try testing it on your cube, and use it in all your MDX from now on.

Written by Chris Webb

July 26, 2013 at 11:16 am

Posted in MDX, Multidimensional

Point-In-Time Dimension Reporting In DAX

with 8 comments

Before I start, I have to state that the technique shown in this post isn’t mine but was developed by my colleague Andrew Simmans, who has very kindly allowed me to blog about it.

Over the last few months I’ve been working on an SSAS Tabular project that has not only presented some interesting modelling challenges, but has shown how DAX can offer some new and interesting solutions to these challenges. Consider the following scenario: a supermarket sells products, and we have a fact table showing sales of products by day. Here’s some sample data:

image

To complicate matters, each product has one product manager but product managers for particular products change from time to time. Normally this might be solved by adding the product manager name to the Product dimension table and implementing a Type 2 Slowly Changing Dimension. In this case, though, we want something slightly different: instead of seeing sales attributed to the product manager who was in charge of the product at the time of the sale, and therefore seeing sales for the same product attributed to different product managers on different dates, we want to attribute all sales for a product to a single product manager but be able to use a second date dimension to be able to determine the point in time, and therefore the product manager in charge of each product at that point in time, that we want to report as of. To put it another way, we want to be able to find the state of a dimension on any given date and use that version of the dimension to do our analysis.

For example, we have the following table showing which product manager was in charge of each product at any given point in time:

image

Between January 1st 2013 and January 3rd 2013 Jim was the product manager for Orange, but from January 4th 2013 onwards Rob took over as product manager for Oranges; Fred was the product manager for Apples the whole time. We want a PivotTable that looks like this when we choose to report as of January 2nd 2013:

image

Notice how Jim is shown as the product manager for Oranges. If we wanted to report using the managers as of January 5th 2013, we would want to see Rob shown as the product manager for Oranges like so:

image

The solution to this problem involves a combination of two DAX techniques that have already been blogged about quite extensively and which I’d encourage you to read up on:

  • Many-to-many relationships, in this case the solution developed by Gerhard Brueckl, described on his blog here.
  • ‘Between’ date filters, which I wrote about recently but which Alberto has recently improved on in his must-read white paper here.

Here are the table relationships I’ve used for the sample scenario:

image

I’ve added a second date table called ReportingDate which contains the same rows as the Date table shown above; note that it has no relationship with any other table.

This problem is very similar to a many-to-many relationship in that a product can have many managers across time, and a manager can have many products. Indeed we could model this as a classic many-to-many relationship by creating a bridge table with one row for each valid combination of product and manager for each possible reporting date; on my project, however, this was not a viable solution because it would have resulted in a bridge table with billions of rows in it. Therefore, instead of joining the ReportingDate table directly to the ProductManager table, we can instead filter ProductManager using the between date filter technique.

Here’s the DAX of the Sum of Sales measure used in the PivotTables show above:

Sum of Sales:=

IF(

HASONEVALUE(ReportingDate[ReportingDate]),

CALCULATE(

SUM(Sales[Sales]), 

FILTER(ProductManager, MIN(ReportingDate[ReportingDate])>=ProductManager[StartDate] 

&& 

IF(ISBLANK(ProductManager[EndDate]), TRUE(), 

MIN(ReportingDate[ReportingDate])<=ProductManager[EndDate])

))

, BLANK()

)

 

This is not necessarily the best way to write the code from a performance point of view but it’s the most readable – if you need better performance I recommend you read Alberto’s white paper. What I’m doing is this:

  • Only return a value if a single reporting date is selected
  • Filter the ProductManager table so only the rows where the selected reporting date is between the start date and the end date are returned, ie we only get the rows where a manager was in charge of a product on the reporting date
  • Use the filtered ProductManager table to filter the main fact table using the Calculate() function, in exactly the same way that you would with a many-to-many relationship

You can download my sample workbook here.

Written by Chris Webb

July 19, 2013 at 11:41 pm

Posted in DAX, PowerPivot, Tabular

Defining DAX Measures In The With Clause Of An MDX Query

with 14 comments

It’s a little-known fact (but certainly not completely unknown – it was mentioned in Marco, Alberto and my SSAS Tabular book I think) that you can define measures using DAX in the WITH clause of an MDX query. This means you can write queries like the following against an SSAS Tabular model:

with
measure ‘Date’[Demo Calc] =
countrows(‘Date’)

select {measures.[Demo Calc]} on 0,
[Date].[Calendar Year].members on 1
from [Model]

image

The official documentation, such as it is, is here:
http://msdn.microsoft.com/en-us/library/hh758441.aspx

Unfortunately you can’t use it from Excel 2013 using the new ‘create calculated measure’ functionality; I also talked to the nice people behind OLAP PivotTable Extensions and there are some very good reasons why they can’t support this either.

What use is this then? You’re only going to be able to use it in scenarios where you control the generation of the MDX on the client side, such as SSRS reports, which may not be all that often; in fact, in these situations you might be better off writing the whole query in DAX. It’s only going to be useful when you need the power of MDX and DAX in the same query. For example, you might want to take advantage of DAX’s superior ability to detect multiselects, but write all your other calculations in MDX. I’m clutching at straws here though! Still, it’s an interesting thing to know about. Please leave a comment if you can thing of a situation where you can use it…

Written by Chris Webb

July 14, 2013 at 11:43 pm

Some Thoughts About Power BI

with 51 comments

By now you’ll probably have seen the Power BI announcement from Microsoft. It’s an important one, and if you haven’t seen it I suggest you take a look at the official announcements and website here:

http://blogs.office.com/b/office-news/archive/2013/07/08/announcing-power-bi-for-office-365.aspx
http://blogs.office.com/b/office365tech/archive/2013/07/07/what-powers-power-bi-in-office-365.aspx
http://blogs.technet.com/b/dataplatforminsider/archive/2013/07/08/introducing-power-bi-for-office-365.aspx?WT.mc_id=Social_TW_OutgoingEvents_20130708_25941_SQLServer
http://office.microsoft.com/en-us/excel/power-bi-FX104080667.aspx
http://blogs.msdn.com/b/powerbi/archive/2013/07/07/getting-started-with-pq-and-pm.aspx

Andrew Brust also has a good summary of the news here:
http://www.zdnet.com/microsoft-announces-power-bi-for-office-365-7000017746/

There’s no point me repeating what’s already been said, so I thought I’d post my initial reaction to it:

  • Proper Mobile BI! HTML 5 Power View! It works on iPads too! Hurray! At-bloody-last! The mobile BI solution will allow you to find, interact with and share “Excel and Power View content”, so I guess that includes Excel worksheets (with PivotTables, slicers etc) as well as Power View reports (a close look at the screenshots in the blog posts above back this up).
  • Making Power BI available only via Office 365 is going to be a very controversial strategy in the MS BI partner community. To be clear: as far as I understand it, Data Explorer (now Power Query), GeoFlow (now Power Map), mobile BI and all of the cool stuff that’s just been announced is only going to be available through Power BI, and therefore Office 365. Unfortunately the biggest companies, with the biggest BI budgets, are often the ones who are slowest to upgrade to the latest versions of Office and a lot of cases this won’t change just because someone wants to see their reports on an iPad. Where IT department inertia, worries about data privacy and company politics mean that Office 365 is not an option, Microsoft will lose out to the pure-play BI vendors who offer standalone solutions.
  • If you look at this from a different point of view, though, some of the things that I (as a BI Pro) feel least comfortable about in Microsoft’s BI strategy are also its greatest strengths. The way I see it, MS is not treating self-service BI as a solution in its own right, but selling self-service BI as a feature of Office. This makes a lot of sense from Microsoft’s point of view – it’s building on the fact that Excel is the tool of choice for data analysis for 99% of users. What I think is going to happen as a result of this is that, rather than partners selling BI as a standalone solution as in the past, we’re going to be talking to people who have already got Office 2013 or Office 365 and are looking to make the most of the BI features that come as part of that. The MS BI partner community is going to have to adjust to this because I doubt MS is going to change this strategy soon.
  • The same can be said of the Office 365-only strategy. If Microsoft is going to be successful with its cloud-first strategy then it’s going to have to prioritise cloud functionality over on-prem functionality. I think MS is doing the right thing with its cloud-first strategy, so therefore, even though I’m going to find it painful when I have to deal with customers that won’t or can’t move to Office 365, I can understand why MS is making Power BI Office 365 only. Beyond the hype (MS says that 1 in 4 of its customers is already on Office 365), it does seem like the uptake of Office 365 is quite strong, especially in SMBs, so hopefully there will be a large potential customer base.
  • I’ve been presenting sessions on the Excel 2013 BI stack at various user groups and conferences over the last few months and it’s gone down very well indeed. A lot of people have come up to me after seeing my session and said that they had been looking at QlikView and Tableau for BI, and would now consider Office 2013 as another alternative. The BI functionality on its own is pretty good, and good enough for a lot of customers even if it isn’t as mature as some of the competing offerings; the fact that the BI functionality comes baked into Office is the killer. While it may be expensive to upgrade to Office 2013/Office 365 this is a cost that many businesses will be considering anyway regardless of their BI requirements; you also have to compare this with the cost of QlikView and Tableau licenses and remember that not every user will need the most expensive SKUs of Office 2013.
  • The ability to refresh data in Excel workbooks deployed to Sharepoint Online, even when the data sources are on-prem, is a key feature and one that I’ve been waiting for. I wonder what the performance will be like?
  • For anyone of a certain age, the first reaction to the news of the natural language querying capability is two words: English Query. I haven’t played with it so I can’t pass judgement, but it’s going to have to be pretty impressive if anyone is going to use it for more than just sales demos. We shall see…
  • I am quite curious about the enterprise data search capabilities. Leaving aside the ability to query them in natural language, the ability to search for data across the enterprise will be useful. I think this is what happened to Project Barcelona.
  • Similarly, it seems as though this search capability is going to be significantly expanded on the public internet. At the moment, in Data Explorer Power Query we’ve seen the ability to query Wikipedia for data. Being able to query many other public data in the same way will be very powerful. There are a number of sites like Quandl that already make public datasets very easy to find and download, and the new search and query capabilities could leapfrog them.
  • No announcement on pricing has been made as yet. Please, please be ridiculously cheap!
  • We don’t have a date for the preview yet, but if you sign up here you’ll be notified when it’s available. It’s meant to be coming “later this summer”.

UPDATE: Some more things to add…

  • Something I didn’t pick up on at the time, but which emerged on Twitter later, is that PowerPivot has had a name change: it’s now Power Pivot with a space, to make it consistent with PowerV View and so on. This might seem minor, but for those of us who write books and have to sweat these details, it’s quite important!
  • I sense I’ve hurt a few feelings at MS with my comments on the natural language query. Let me be clear about my position here: I’ve not played with it, so I can’t pass judgement yet. I can imagine that natural language search for data will work well, but I will be very, very impressed if natural language query works well enough to be used on real data by real users. Real data is dirty and complex and user expectations will be very high and easily dashed. My guess is that the main issue will be that users can’t distinguish between what is a query and what is a calculation, and while the product can probably do queries well (eg “Show me the sales of widgets this year”) it may struggle with calculations (eg “Show me the customer churn by month this year”). But as I said, we shall see.

Finally, and as always, I reread this post this morning and worried that I sound too negative when in fact I’m very positive overall. To summarize:

  • The fact this is is all in Office and specifically Excel = the killer feature.
  • Mobile BI = good, even if it’s very late.
  • Power Query = very, very, very good indeed. I love it already and I think it’s going to be as big, if not bigger than Power Pivot. Power BI is worth buying for this alone.
  • Office 365 requirement = a problem for some customers, maybe, but understandable from Microsoft’s point of view.
  • Cloud requirement = again, a problem for some, but understandable and a big advantage for SMEs who can’t afford the cost of hardware and time to configure Sharepoint on-prem. The ability to refresh in the cloud from on-prem data sources is the key feature here.
  • Power Map = OK. Useful for customers who need geographic analysis, but it’s main use is that it’s great for demos (and this should not be underestimated – all products need some wow).
  • Power View goes HTML5 = relief. The Silverlight dependency undermined its credibility no end.
  • Natural language search for data sources = potentially very useful.
  • Natural language query of those data sources = see above. I remain to be convinced.
  • Data stewardship features = I haven’t seen enough of these to be able to comment.

More blog posts worth reading on this subject:
http://www.jenunderwood.com/blog.htm#O365PowerBI
http://www.jenstirrup.com/2013/07/power-business-intelligence-for.html

UPDATE #2: even more things to add…!

You can see the video of Amir’s demo from WPC here: http://www.youtube.com/watch?v=Jsa-5LGx_IY&feature=youtu.be

Some more details (which should be accurate) from a friend of mine at the conference:

  • The Power View standalone app is still Silverlight based, it’s only the mobile app that uses HTML5
  • Power Query queries can be shared between users via BI Sites, but the execution of these queries always takes place in desktop Excel. Queries will appear in searches when they’re published to a BI Site.
  • Excel workbooks connected to on-prem SSAS (Tabular and Multidimensional) will also be refreshable from a BI Site
  • Natural Language queries (what I think is being called “Q&A”) will also work against SSAS Tabular models. The Categorization property that’s in the Advanced tab in the PowerPivot window, and also in SSDT, is partly used to help Q&A do this.
  • External users can be given access to reports in BI Sites.
  • There will be a “Data Steward Portal” which will help you monitor who is doing what on your BI Site.
  • No comment on when this will arrive in Sharepoint on-prem. My feeling is that MS have no plans to do this at all, or maybe will only do it if a lot of people complain…?

Written by Chris Webb

July 8, 2013 at 5:10 pm

Another Look At Google BigQuery

with 12 comments

About a year ago I wrote a post looking at Google BigQuery which finished on a bum note when I ran into a limitation with the size of tables that could be used in a join. Recently I found out that that particular limitation had been lifted with the introduction of the new JOIN EACH operator, and since my account was still active and the data was still uploaded, I thought I’d see if my query could be made to run.

Just as a bit of background (if you can’t be bothered to read my previous post): the query runs on a sample data set originally created by Vertica (see here) that consists of a 1.2GB csv file with two integer columns and 86,220,856 rows. The rows represent the edges in a social network, and the aim of the query is to find the number of triangles – the number of cases where person A is connected to person B, B is connected to C and A is also connected to C. The query joins this large table to itself not once, but twice.

Anyway, to cut a long story short the query did work this time. Here’s the new version, rewritten to use JOIN EACH:

select count(*)
from
(select
e2.Source as e2Source, e2.Destination as e2Destination,
e3.Source as e3Source, e3.Destination as e3Destination
from
(select * from [Edges.EdgesFull]) as e2
join each
(select * from [Edges.EdgesFull]) as e3
on e2.destination = e3.source
where e2.source < e3.source) as e4
join each
(select * from [Edges.EdgesFull]) as e1
on e1.destination = e4.e2source
and e4.e3destination = e1.source
where e1.source < e4.e2source

Here’s the evidence:

image

It took 98.7 seconds to run, which I think is pretty good. Some remarks:

  • I’m not 100% sure this is the correct result, but I’m fairly confident.
  • I’m a long way from being 100% sure this is the most efficient way to write the query.
  • When the original Vertica blog post came out it sparked a p*ssing contest between vendors who wanted to to show they could do better than Vertica. You can see an example of other people’s results here; clearly other platforms could do a lot better, and remember these results are two years old.
  • Obviously you can’t judge the performance of a database on one query.

That’s all. My mind is now at rest, and I’ll go back to being a Microsoft fanboy tomorrow.

Written by Chris Webb

July 5, 2013 at 12:00 pm

Posted in Google

Tagged with

Building A Simple BI Dashboard With Visio 2013 And Visio Services

with 4 comments

Let me start this post by saying that I am a long way away from being a Visio expert – I’ve used Visio, of course, to create diagrams and I’ve also played with its BI capabilities in the past, but nothing more than that. A recent post on the Visio team blog reminded me about Visio’s BI capabilities and Jen Underwood then mentioned that Visio 2013 has some new functionality for BI, so I thought I’d check it out in more detail and blog about what I found. I’ve never seen Visio used to build dashboards or reports in the real world, but a quick search shows that the Visio pros out there have been doing this for years, so maybe it’s time us BI folks learned a few tricks from them? Visio 2013 is by no means a perfect tool for BI but I was pleasantly surprised at what it can do: you can create data-linked diagrams/dashboards in Visio on the desktop very easily, and then publish them to Visio Services in Sharepoint where they can be viewed in the browser and where the data can be refreshed.

First of all, the PowerPoint deck here is a good place to start to learn about Visio and Visio Services 2013 dashboards, as is the Visio team blog and Chris Hopkins’ blog. There’s also a walkthrough of how to link data to shapes here, and a lot of other good posts out there on creating charts and graphs in Visio such as this one.

Here’s what I did. To start, I created a few tables with data in in Excel to act as a data source, then published the workbook up to Excel Services in Sharepoint Online (I have an Office 365 E4 subscription). The data looked like this:

image

I then opened up Visio 2013, connected to the workbook in Excel Services and imported the data from these two tables. With that done, I was able to select a shape, and then drag a row of data from the External Data pane onto the sheet, which gave me a data-linked shape. It was then fairly easy to configure the data graphic associated with each shape – for example, in the diagram below, I selected a City shape, then dragged the row containing sales for London onto the sheet, which gave me a City with the data for London linked to it, and next to the City shape I had an associated data graphic which I configured as a Data Bar of type Multi-bar Graph.

image

The text next to the frowning face is also linked to data from Excel. I could then publish this to Sharepoint Online, and view the diagram in the browser just by opening it from the document library:

image

All very straightforward. In Visio Services you can add comments, and also refresh the data. Data can be refreshed manually or on a schedule; I used Excel Services data in this demo because in Office 365 only Excel Services and Sharepoint List data sources can be refreshed but the story is much better in on-prem Sharepoint (the PowerPoint deck I linked to above has all the details). Weirdly, I found that if I modified my Excel data source in the Excel Web App it took a few minutes for the new data to come through in Visio even with me clicking Refresh, although it did eventually.

Obviously this is a very basic (and badly designed) dashboard that works within the limitations of Visio and Visio Services, and if you want to learn how to do this properly I suggest you check out the links above. But there are two important questions that now need to be answered:

Why, as a BI pro, would I want to create a dashboard in Visio rather than, say, Excel or Power View?

I suspect Visio isn’t used more widely in MS BI circles is 20% down to ignorance of what it can do, 30% the cost of licensing and the Sharepoint dependency, and 50% the fact that there are only a limited number of scenarios where it is the right tool to use. So when would you actually want to use it? The risk of using Visio is that you end up with a visually appealing infographic that is actually very bad at conveying the information you want to convey, the kind of thing Stephen Few is complaining about here. You’d probably only want to use it if the nature of the diagram contributed to your understanding of the data. For example if you wanted to look at which seats were filled more frequently than others in a theatre or an aeroplane, it might be useful to have a diagram showing the seat layout and colour the seats that get filled. I guess these scenarios are very similar to the kind of scenarios where it makes sense to plot geographical data on a map.

What functionality is Visio missing for it to be a serious BI tool?

Quite a lot. Leaving aside PivotDiagrams, there is no proper support for SSAS or PowerPivot for data linking and that’s a big problem in these days of self-service BI. I also don’t link the way you have to import data into Visio before it can be used: I’d want to be able to select the data I want using a PivotTable-like interface (generating MDX or DAX queries) and then bind it to shape, so that I could slice and filter my data inside Visio without having to keep on importing it; I imagine being very similar to Power View today, but where you could drag data-driven shapes onto a canvas instead of tables and graphs. Maybe Power View and Visio need to get together and have children?

 

I don’t want to finish this post on a critical note, though, because I’ve had a lot of fun learning more about Visio and its BI capabilities, and I hope to use it on a project soon. Now that Sharepoint and especially Office 365 are being pushed so heavily for BI (and being used more widely), maybe we’ll see a lot more of Visio dashboards?

Written by Chris Webb

July 4, 2013 at 11:38 am

Posted in BI, Visio

Follow

Get every new post delivered to your Inbox.

Join 3,131 other followers