Chris Webb's BI Blog

Analysis Services, MDX, PowerPivot, DAX and anything BI-related

Archive for December 2012

Eighth Blog Birthday

with 4 comments

Today marks eight years since my first ever post on this blog, and every year on this date I write a review of what’s happened to me professionally and what’s gone on in the world of Microsoft BI in the previous year.

For me, 2012 has been yet another busy year. The SSAS Tabular book that Marco, Alberto and I wrote – “SQL Server Analysis Services 2012: The BISM Tabular Model” – was published in July and has been selling well, and the balance of my consultancy and training work has started to move away from Multidimensional and MDX towards Tabular, PowerPivot and DAX. It’s always exciting to learn something new and, frankly, the lack of any significant new functionality in Multidimensional and MDX has meant they have got a bit boring for me; at the same time, though, moving out of my comfort zone has been disconcerting. It seems like I’m not the only Microsoft BI professional feeling like this though: the most popular post on my blog by a long chalk was this one on Corporate and Self-Service BI, and judging by the comments it resonated with a lot of people out there.

Whether or not Microsoft is neglecting corporate BI (and I’m not convinced it is), it’s definitely making a serious investment in self-service BI. The biggest Microsoft BI release of this year was for me not SQL Server 2012 but Office 2013. That’s not to say that SQL Server 2012 wasn’t a big release for BI, but that Office 2013 was massive because of the amount of functionality that was packed into it and because the functionality was so well executed. You can read this post if you want details on why I think it’s significant, but I’ve really enjoyed playing with Excel 2013, PowerPivot, Power View and Office 365; there’s more cool stuff in form of Mobile BI, GeoFlow and Data Explorer coming next year, all of which are very much part of the Office 2013 story too. No Microsoft BI professional can afford to ignore all this.

The other big theme in Microsoft BI this year, and indeed BI as a whole, was Big Data. I reckon that 90% of everything I read about Big Data at the moment is utter b*llocks and as a term it’s at the peak of its hype cycle; Stephen Few has it right when he says it’s essentially a marketing campaign. However, as with any over-hyped technological development there’s something important buried underneath all the white papers, and that’s the increasing use of tools like Hadoop for analysing the very large data sets that traditional BI/database tools can’t handle, and the convergence of the role of business analyst and BI professional in the form of the data scientist. I’m still not convinced that Hadoop and the other tools that currently get lumped in under the Big Data banner will take over the world though: recently, I’ve seen a few posts like this one that suggest that most companies don’t have the expertise necessary for using them. Indeed, Google, the pioneer of MapReduce, felt the need to invent Dremel/BigQuery (which is explicitly referred to as an OLAP tool here and elsewhere) to provide the easy, fast analysis of massive datasets that MapReduce/Hadoop cannot give you. My feeling is that the real future of Big Data lies with tools like Dremel/BigQuery and Apache Drill rather than Hadoop; certainly, when I played with BigQuery it clicked with me in a way that Hadoop/HDInsight didn’t. I hope someone at Microsoft has something similar planned… or maybe this is the market that PDW and Polybase are meant to address? In which case, I wonder if we’ll see a cloud-based PDW at some point?

Written by Chris Webb

December 31, 2012 at 12:43 am

Posted in Random Thoughts

Unnecessary All Members and Performance Problems

with 4 comments

Maybe an obscure problem, this one, but worth recording nonetheless. The other week I was performance tuning some queries on a customer’s SSAS 2008R2 instance and came across a very strange issue related to the presence of unnecessary All Members in tuples. In this case it was in machine-generated MDX but it’s certainly the case the people new to MDX often include All Members in tuples when they are not actually needed; it’s a not good idea to do this because it can sometimes have unexpected effects as a result of attribute overwrite and because, as I found, it can also cause severe performance problems too.

The problem can be reproduced very easily against Adventure Works on the Customer dimension. Consider the following query that returns a list of customers who bought more than $1000 of goods in 2003:

with
set filteredcustomers as
filter(
[Customer].[Customer Geography].[Customer].members
, ([Measures].[Internet Sales Amount]
, [Date].[Calendar Year].&[2003])>1000)
select
{}
on columns,
filteredcustomers 
on rows
from [Adventure Works]

Pretty straightforward, and it returns instantly on my laptop as I’d expect. However, adding the All Member from the City hierarchy into the tuple used in the filter() function makes the query run very slowly indeed (I killed it after several minutes):

with
set filteredcustomers as
filter(
[Customer].[Customer Geography].[Customer].members
, ([Measures].[Internet Sales Amount]
, [Customer].[City].[All Customers]
, [Date].[Calendar Year].&[2003])>1000)
select
{}
on columns,
filteredcustomers 
on rows
from [Adventure Works]

The All Member here isn’t necessary at all; it won’t affect how the filter works or the set returned at all. Looking in Profiler it seems as though its presence triggers cell-by-cell mode, which is the cause of the awful performance. Interestingly, the performance got worse the more attributes were on the hierarchy – deleting attributes, even when they weren’t used in the query, improved query performance. I’m told the problem could be the result of attribute decoding (which Mosha referred to here, but which I don’t know much else about) as a result of attribute overwrite

Anyway in my case it wasn’t possible to change the MDX because it was being generated by a client tool – the All Member was there because the City hierarchy was being used as a parameter in the query, although in this case nothing had been selected on it. There was a workaround that I found though: it turns out the problem does not occur for user hierarchies that include the key attribute as their lowest level. So, I renamed the City attribute, hid it, and then created a new user hierarchy called City that had Customer as its lowest level:

image

With this done, both of the queries above return instantly.

Written by Chris Webb

December 22, 2012 at 2:52 pm

Introduction to MDX for PowerPivot Users, Part 5: MDX Queries

with 10 comments

In part 4 of this series (sorry for the long wait since then!) I finished off looking at what you can do with named sets. Now, before I go on to more important topics like Excel cube functions and calculated members I’d like to take a high-level look at what you can do with MDX queries running against PowerPivot – high level, because there’s much more to MDX queries than can be covered in a single post and, as I explain below, you probably won’t want to do this very often.

So why would I need to write whole queries against a PowerPivot model?

This is a very good question, given that in my opinion 99% of the time you can achieve what you want when building Excel reports using either PivotTables (either with or without named sets) or Excel cube functions. Having said that, the post I wrote a few years ago about binding a table in Excel to an MDX query has been one of the most popular I’ve ever written, so maybe I’m wrong about how frequently people need to do this…

I’d say that you would probably only want to write your own queries when you needed complete control over the MDX and didn’t mind that it made linking the query up to filters or slicers very difficult – for example, if you wanted a list of unpaid invoices, or a list of customers that met some specific criteria, in a dashboard.

Also, when you run MDX queries in Excel you’re going to use an Excel table to show the results rather than a PivotTable. This is actually the format you need to use to pass data to other Excel-based tools like like Excel Data Mining Addins (as well as PowerPivot), so writing your own MDX queries might actually save you having to convert to formulas, as Kasper does here, or cutting/pasting in cases like this.

Why use MDX instead of DAX?

From PowerPivot V2, PowerPivot models can be queried in either MDX or the DAX query language (if you want to learn about DAX queries take a look at the posts I wrote on this topic last year, starting here), and if you’ve already learned a lot of DAX for PowerPivot you’re probably going to be more comfortable using DAX queries. However, I know there are a lot of old SSAS-fans out there doing work with PowerPivot who prefer MDX, and there are still a few things that MDX can do that PowerPivot can’t, so choosing MDX over DAX is a legitimate choice. Examples would be when you want to pivot your resultset and put something other than measures on columns, or show a calculated member on rows, and I show how to do both of these things below.

How do I display the results of an MDX query in Excel?

As I said, when you display the results of an MDX query in Excel you’ll need to use an Excel table to do so. I blogged about a few ways to do this here but there’s actually a better way now: using DAX Studio. DAX Studio is a free Excel addin for people who want to write DAX queries against a PowerPivot model, but it can run MDX queries too. Unfortunately it doesn’t display any MDX metadata for you to use – only DAX metadata – but it’s still a much more convenient way of running MDX queries than doing a drillthrough and then editing the query property of a table.

The DAX Studio documentation gives you a good overview of how to use the tool and I won’t repeat that here, but to prove it does work here’s a screenshot of an MDX query run against a PowerPivot model:

image

OK, so get on with it and tell me how to write an MDX query…

The basic MDX query is quite simple. Books online has all the details:

http://msdn.microsoft.com/en-us/library/ms146002.aspx
http://msdn.microsoft.com/en-us/library/ms144785.aspx

…but really all you need to know is this:

Each MDX query needs a SELECT clause. Inside the SELECT clause you need to define one or two axes, either just a columns axis or a columns axis or a rows axis, and the way you define what appears on an axis is using a set, an object we’ve seen a lot of in the last few posts in this series. Each MDX query also needs a FROM clause, with the name of the cube that is to be queried; for PowerPivot the name of the ‘cube’ is always [Model].

Here’s an example of a simple MDX query on a PowerPivot model built on Adventure Works DW that returns a measure on columns and three years on rows:

SELECT
{[Measures].[Sum of SalesAmount]}
ON COLUMNS,
{[DimDate].[CalendarYear].&[2005]
, [DimDate].[CalendarYear].&[2006]
, [DimDate].[CalendarYear].&[2007]}
ON ROWS
FROM [Model]

image

Everything you do on columns, you can do on rows, and vice versa, so:

SELECT 
{[DimDate].[CalendarYear].&[2005]
, [DimDate].[CalendarYear].&[2006]
, [DimDate].[CalendarYear].&[2007]}  
ON COLUMNS,
{[Measures].[Sum of SalesAmount]}
ON ROWS
FROM [Model]

Returns this:

image

Using a set of tuples on rows and/or columns gives a crosstabbed effect:

SELECT 
{[Measures].[Sum of SalesAmount]}
*
{[DimProductCategory].[EnglishProductCategoryName].[EnglishProductCategoryName].&[Bikes]
, [DimProductCategory].[EnglishProductCategoryName].[EnglishProductCategoryName].&[Clothing]}
ON COLUMNS,
{[DimDate].[CalendarYear].&[2005]
, [DimDate].[CalendarYear].&[2006]
, [DimDate].[CalendarYear].&[2007]} *
{[DimProduct].[Color].&[Black]
, [DimProduct].[Color].&[Red]}
ON ROWS
FROM [Model]

image

After the FROM clause, you can add a WHERE clause to slice the resultset. Do not confuse the MDX WHERE clause with the SQL WHERE clause: it does something similar but it doesn’t directly affect what appears on rows or columns, it filters the values returned inside the query. For example:

SELECT 
{[Measures].[Sum of SalesAmount]}
ON COLUMNS,
{[DimDate].[CalendarYear].&[2005]
, [DimDate].[CalendarYear].&[2006]
, [DimDate].[CalendarYear].&[2007]}  
ON ROWS
FROM [Model]
WHERE(
[DimProductCategory].[EnglishProductCategoryName].[EnglishProductCategoryName].&[Bikes]
, [DimProduct].[Color].&[Black])

…returns sales for Black Bikes for the years 2005 to 2007:

image

Notice that the Colour Black and the Product Category Bikes don’t appear anywhere on rows or columns, but the values that are shown are for Black Bikes nonetheless.

The WITH clause

You can define your own calculated members (which I’ll talk about in a future post) and named sets inside a query if you add a WITH clause before your SELECT clause. Here’s an example of this:

WITH
SET [MY YEARS] AS
{[DimDate].[CalendarYear].&[2005]
, [DimDate].[CalendarYear].&[2006]
, [DimDate].[CalendarYear].&[2007]}
MEMBER [DimDate].[CalendarYear].[Total 2005-7] AS
AGGREGATE([MY YEARS])
MEMBER [Measures].[Percent of Total] AS
([Measures].[Sum of SalesAmount])
/
([Measures].[Sum of SalesAmount]
, [DimDate].[CalendarYear].[Total 2005-7])
, FORMAT_STRING=’PERCENT’
SELECT 
{[Measures].[Sum of SalesAmount]
,[Measures].[Percent of Total]}
ON COLUMNS,
{[MY YEARS], [DimDate].[CalendarYear].[Total 2005-7]}
ON ROWS
FROM [Model]

image

Here I’ve defined a named set called [MY YEARS] which I’ve then used to define what goes on the rows axis, and two calculated members, [Total 2005-7] which returns the subtotal of the years 2005 to 2007, and a new measure [Percent of Total] that shows the percentage that each row makes up of this subtotal. Incidentally, even though DAX can do this kind of subtotalling, it’s only in MDX that you can define any calculation you want on any axis in your query.

Flattened Rowsets

You might be wondering, looking at the examples above, why the column headers are all in human-unfriendly MDX and why the [Percent of Total] measure hasn’t had any formatting applied. You will also notice in this query how the name of the All Member on the [CalendarYear] hierarchy doesn’t get returned, and you get a blank row name instead:

SELECT 
{[Measures].[Sum of SalesAmount]}
ON COLUMNS,
{[DimDate].[CalendarYear].[All]
,[DimDate].[CalendarYear].&[2005]}
ON ROWS
FROM [Model]

image

This is because, when you run queries that get bound to an Excel table they are returned as flattened rowsets and not cellsets (which is how most SSAS client tools and SQL Server Management Studio returns MDX queries). Basically, this means your nice, multidimensional resultset gets squashed into something tabular – and when this happens, a lot of useful stuff gets lost along the way. Here’s the official documentation on how flattened rowsets are generated:
http://msdn.microsoft.com/en-us/library/windows/desktop/ms716948(v=vs.85).aspx

This is a pain, but there’s no way around it unless you want to write your own code to render a cellset in Excel unfortunately.

Conclusion

Writing your own MDX queries against a PowerPivot model isn’t exactly something you’ll need to do every day, but it’s a useful addition to your PowerPivot toolbox and I wanted to mention it in this series for the sake of completeness. In my next post I’ll be taking a look at MDX calculated members.

Written by Chris Webb

December 18, 2012 at 3:41 pm

Posted in MDX, PowerPivot

Up-to-date list of VBA Functions in MDX

with 6 comments

Some of you may be aware that a few VBA functions have been implemented as native MDX functions to improve performance. I blogged about this a few years ago, but I’ve now received an up-to-date list of all the VBA functions that that this has happened for as of SSAS 2012 SP1 from those nice people on the SSAS dev team:

CDate
CDbl
CInt
CLng
CStr
Int
Month
Now
IsArray
IsError
Abs
Round
InStr
LCase
Left
Len
Mid
Right

As far as I can see, it’s Month() and LCase() that are the new ones on this list, and which were added in 2012 SP1. Still no Log10() function though, alas.

Thanks to Akshai and Marius for their help with this.

Written by Chris Webb

December 12, 2012 at 7:07 am

Posted in MDX

Tagged with

Why Corporate BI and Self-Service BI Are Both Necessary

with 60 comments

I was chatting to a friend of mine a few days ago, and the conversation turned to Microsoft’s bizarre decision to make two big BI-related announcements (about Mobile BI and GeoFlow) at the Sharepoint Partner Conference and not at PASS the week before. I’d been content to write this off as an anomaly but he put it to me that it was significant: he thought it was yet more evidence that Microsoft is abandoning ‘corporate’ BI and that it is shifting its focus to self-service BI, so that BI is positioned as a feature of Office and not of SQL Server.

My first response was that this was a ridiculous idea, and that there was no way Microsoft would do something so eye-poppingly, mind-bogglingly stupid as to abandon corporate BI – after all, there’s a massive, well-established partner and customer community based around these tools. I personally don’t think it would ever happen and I don’t see any evidence of it happening. My friend then reminded me that the Proclarity acquisition was a great example of Microsoft making an eye-poppingly, mind-bogglingly stupid BI-related decision in the past and that it was perfectly capable of making another similar mistake in the future, especially when Office BI and SQL Server BI are fighting over territory. That forced me to come up with some better arguments about why Microsoft should not, and hopefully would not, ever abandon corporate, SQL Server BI in favour of an exclusively Office-BI approach. Some of these might seem blindingly obvious, and it might seem strange that I’m taking the time to even write them down, but conversations like this make me think that the time has come when corporate BI does need to justify its continued existence.

  • From a purely technical point-of-view, while most BI Pros have been convinced that the kind of self-service BI that PowerPivot and Excel 2013 enables is important, it’s never going to be a complete replacement for corporate BI. PowerPivot might be useful in scenarios where power users want to build their own models but the vast majority of users, even very sophisticated users, are not interested in or capable of doing this. This is where BI Pros and SSAS are still needed: centralised models (whether built in SSAS Tabular or Multidimensional) give users the ability to run ad hoc queries and build their own reports without needing to know how to model the data they use.
  • Even when self-service BI tools are used it’s widely accepted (even by Rob Collie) that you’ll only get good results if you have clean, well-modelled data – and that usually means some kind of data warehouse. Building a data warehouse is something that you need BI Pros for, and BI Pros need corporate BI tools like SSIS to do this. Self-service BI isn’t about power users working in isolation, it’s really about power users working more closely with BI Pros and sharing some of their workload.
  • Despite all the excitement around data visualisation and self-service, the majority of BI work is still about running scheduled, web-based or printed reports and sending them out to a large user base who don’t have the time or know-how to query an SSAS cube via a PivotTable, let alone build a PowerPivot model. Microsoft talks about bringing BI to the masses – well, this is what the masses want for their BI most of the time, however unsexy it might seem. This is of course what SSRS is great for and this is why SSRS is by far the most widely used of Microsoft’s corporate BI tools; you just can’t do the same things with Excel and Sharepoint yet.
  • Apart from the technical arguments about why corporate BI tools are still important, there’s another reason why Microsoft needs BI Pros: we’re their sales force. One of the ways in which Microsoft is completely different from most other technology companies is that it doesn’t have a large sales force of its own, and instead relies on partners to do its selling and implementation for it. To a certain extent Microsoft software sells itself and gets implemented by internal IT departments, but in a lot of cases, especially with BI, it still needs to be actively ‘sold’ to customers. The BI Partner community have, for the last ten years or so, been making a very good living out of selling and implementing Microsoft’s corporate BI tools but I don’t think they could make a similar amount of money from purely self-service BI projects. This is because selling and installing Office in general and Sharepoint in particular is something that BI partners don’t always have expertise in (there’s a whole different partner community for that), and if self-service BI is all about letting the power users do everything themselves then where is the opportunity to sell lots of consultancy and SQL Server licenses? If partners can’t make money doing this from Microsoft software they might instead turn to other BI vendors; I’ve seen some evidence of this happening recently. And then there’ll be nobody to tell the Microsoft BI story to customers, however compelling it might be.

These are just a few of the possible reasons why corporate BI is still necessary; I know there are many others and I’d be interested to hear what you have to say on the matter by leaving a comment. As I said, I think it’s important to rehearse these arguments to counter the impression that some people clearly have about Microsoft’s direction.

To be clear, I’m not saying that it should be an either/or choice between self-service/Office BI and corporate/SQL Server BI, I’m saying that both are important and necessary and both should and will get an equal share of Microsoft’s attention. Neither am I saying that I think Microsoft is abandoning corporate BI – it isn’t, in my opinion. I’m on record as being very excited about the new developments in Office 2013 and self-service but that doesn’t mean I’m anti-corporate BI, far from it – corporate BI is where I make my living, and if SSAS died I very much doubt I could make a living from PowerPivot or Excel instead. Probably the main reason I’m excited about Office 2013 is that it finally seems like we have a front-end story that’s as good as our back-end, corporate BI story, and the front-end has been the main weakness of Microsoft BI for much too long. If Microsoft went too far in the direction of self-service we would end up with the opposite problem: a great front-end and neglected corporate BI tools. I’m sure that won’t be the case though.

Written by Chris Webb

December 5, 2012 at 11:46 pm

Posted in BI

The PASS Business Analytics Conference is not the PASS Business Intelligence Conference!

with 8 comments

The call for speakers for the new PASS Business Analytics Conference (to be held April 10-12 next year in Chicago) is now live here:
http://passbaconference.com/Speakers/CallForSpeakers.aspx

Since I think this conference is a Very Good Thing, and because I’ve been asked to help shape the agenda in an advisory capacity, I thought I’d do a little bit of promotion for it here.

The important thing I’d like to point out is that this is not just a SQL Server BI conference: it covers the whole SQL Server BI stack, certainly, but really it aims to cover any Microsoft technology that can be used for any kind of business analytics. Which other technologies actually get covered depends a lot of who submits sessions but there are no end of possibilities if you think about it. I’d love to see sessions on topics such as F#, Cloud Numerics, Sharepoint, NodeXL, GeoFlow and especially non-BI Excel topics such as array formulas, Solver and techniques like Monte Carlo simulation, for example.

This brings me to the point of this post. Obviously I’d like all the SQL Server BI Pros out there who read my blog to consider submitting a session (or if you can’t travel to Chicago, the call for speakers for SQLBits is open too) and to attend. However what I’d really like is if the SQL Server BI community could reach out to the wider Microsoft Business Analytics community to encourage them to submit sessions and to attend too. This is where your help is needed! Who do you think should be speaking at the PASS BA Conference? Do you know experts outside the realms of SQL Server BI who you could persuade to come? What topics do you think should be covered? If you’ve got any ideas or feedback, please leave a comment…

Written by Chris Webb

December 3, 2012 at 9:35 pm

Posted in BI, PASS

Follow

Get every new post delivered to your Inbox.

Join 3,240 other followers