Chris Webb's BI Blog

Analysis Services, MDX, PowerPivot, DAX and anything BI-related

Archive for the ‘MDX’ Category

Defining DAX Measures In The With Clause Of An MDX Query

with 7 comments

It’s a little-known fact (but certainly not completely unknown – it was mentioned in Marco, Alberto and my SSAS Tabular book I think) that you can define measures using DAX in the WITH clause of an MDX query. This means you can write queries like the following against an SSAS Tabular model:

with
measure ‘Date’[Demo Calc] =
countrows(‘Date’)

select {measures.[Demo Calc]} on 0,
[Date].[Calendar Year].members on 1
from [Model]

image

The official documentation, such as it is, is here:
http://msdn.microsoft.com/en-us/library/hh758441.aspx

Unfortunately you can’t use it from Excel 2013 using the new ‘create calculated measure’ functionality; I also talked to the nice people behind OLAP PivotTable Extensions and there are some very good reasons why they can’t support this either.

What use is this then? You’re only going to be able to use it in scenarios where you control the generation of the MDX on the client side, such as SSRS reports, which may not be all that often; in fact, in these situations you might be better off writing the whole query in DAX. It’s only going to be useful when you need the power of MDX and DAX in the same query. For example, you might want to take advantage of DAX’s superior ability to detect multiselects, but write all your other calculations in MDX. I’m clutching at straws here though! Still, it’s an interesting thing to know about. Please leave a comment if you can thing of a situation where you can use it…

Written by Chris Webb

July 14, 2013 at 11:43 pm

Optimising Returning Customers Calculations in MDX

with 11 comments

One of the more popular blog posts from my archives (86 comments so far) is the one I wrote on “Counting New and Returning Customers in MDX”. The trouble with all of the calculations in there is that they execute in cell-by-cell mode, and therefore perform quite badly.

For example, take the following query on Adventure Works to find the number of returning customers (customers who have bought from us today and have also bought something before in the past):

with

member measures.[Returning Customers V1] as

count(

intersect(

nonempty([Customer].[Customer].[Customer].members, [Measures].[Internet Sales Amount])

,

nonempty([Customer].[Customer].[Customer].members, 

    [Measures].[Internet Sales Amount] * {null : [Date].[Date].currentmember.prevmember})

)

)

 

select {measures.[Returning Customers V1]} on 0,

[Date].[Date].[Date].members

on 1

from 

[Adventure Works]

 

On a cold cache this takes 47 seconds on my laptop and a quick look in Profiler shows this executes in cell-by-cell mode. In the comments on the original post Deepak Puri suggested an alternative approach using the Customer Count distinct count measure:

with

member measures.customerstodate as

aggregate(null:[Date].[Date].currentmember, [Measures].[Customer Count])

 

member measures.customerstoprevdate as

([Date].[Date].currentmember.prevmember, [Measures].customerstodate)

 

member measures.newcustomers as

measures.customerstodate - measures.customerstoprevdate

 

member measures.[Returning Customers V2] as

[Measures].[Customer Count] - measures.newcustomers

 

select {measures.[Returning Customers V2]} on 0,

[Date].[Date].[Date].members

on 1

from 

[Adventure Works]

Interestingly, this performs even worse than the previous query (although I would have expected it to be better). So how can we write a query that returns in a reasonable amount of time?

I haven’t found a way to do this for a calculated measure defined on the server, to be used in a true ad hoc query environment like Excel (any suggestions welcome – please leave a comment if you can do it), but I have got a way of optimising this calculation for scenarios where you have control over the MDX being used, such as in SSRS.

Here’s the query:

with

 

set customerdates as

nonempty(

[Date].[Date].[Date].members

*

[Customer].[Customer].[Customer].members

, [Measures].[Internet Sales Amount])

 

set nondistinctcustomers as

generate(

customerdates,

{[Customer].[Customer].currentmember}, all)

 

member measures.customercountsum as

sum(null:[Date].[Date].currentmember, [Measures].[Customer Count])

 

member measures.[Returning Customers V3] as

count(

intersect(

subset(nondistinctcustomers

    , (measures.customercountsum, [Date].[Date].currentmember.prevmember)

    , [Measures].[Customer Count])

,

head(nondistinctcustomers

    , (measures.customercountsum, [Date].[Date].currentmember.prevmember))

)

)

 

 

select {measures.[Returning Customers V3]} on 0,

[Date].[Date].[Date].members

on 1

from 

[Adventure Works]

 

On my laptop, this query executes in around 5 seconds on a cold cache. The reason it’s so much faster is also the reason it can’t be used in ad hoc queries – it uses named sets to find all the combinations of customer date needed by the query in one operation. Here’s a step-by-step explanation of how it works:

  • First of all, the customerdates set gets a set of tuples containing every single combination of day and customer where a purchase was made, using a simple Nonempty().
  • Next, the nondistinctcustomers set takes the customerdates set and removes the dates, so what we are left with is a list of customers. It’s not a list of distinct customers, however – a given customer may appear more than once. This still represents a list of the customers that bought something each day, it’s just that we no longer have any information about which day we’re looking at.
  • The customercountsum measure allows us to take the list of customers in the nondistinctcustomers set and find out which customers bought something in any given day. It’s a running sum of the Customer Count measure. This is a distinct count measure, and usually you wouldn’t use the Sum() function on a distinct count, but it’s important we do here. How is it used? For example, let’s imagine we had just three days of data: on the first day we had three customers, on the second four customers and on the third five customers. That would result in the nondistinctcustomers set containing twelve (not necessarily distinct) customers. We can then use the running sum of a distinct count of customers to find out the index of the item in nondistinctcustomers that is the last customer in the list for each day. So on day two we would have a running sum of seven, and therefore the seventh item in nondistinctcustomers gives us the last customer in the list for that day.
  • Finally, the Returning Customers V3 measure gives us the number of returning customers each day. It uses the customercountsum measure to find the subsets of the nondistinctcustomers set that represent the customers that bought on the current day and the customers that bought on all days up to yesterday, then uses the Intersect() function to find the returning customers.

Written by Chris Webb

June 28, 2013 at 3:48 pm

Aggregating the Result of an MDX Calculation Using Scoped Assignments

with 12 comments

I don’t usually like to blog about topics that I think other people have blogged about already, but despite the fact that Mosha blogged about this several years ago (in fact more than eight years ago, blimey) this particular problem comes up so often with my customers and on the MSDN Forum that I thought I should write something about it myself. So apologies if you know this already…

Here’s the problem description. If you define a calculated measure in MDX, that calculation will take place after the real measure values have all aggregated. For example, consider a super-simple cube with a Year dimension, two real measures called A and B and a calculated measure called [A * B] that returned the value of A multiplied by B. In a PivotTable you’d see the following result:

image

Note how the Grand Total for the [A * B] calculated measure is 12*16=192, and not 12+12+12+12=48. This is expected behaviour for calculated measures, and indeed the way you want your calculations to behave most of the time.

However, there are some scenarios where you want to do the calculation first and then aggregate up the result of that calculation; in our previous example that means you’d get 48 for the Grand Total instead. Currency conversion and weighted averages are common examples of calculations where this needs to happen. How can you handle this in MDX?

Let’s look at a slightly more complex example than the one above. In the following cube, based on Adventure Works data, I created a Date dimension that looks like this:

image

image

…and a Product dimension that looks like this:

image

image

I also created two measures on a fact table called A and B:

image

Now, let’s say that once again you want to calculate the value of A*B at the Date and Product granularity, and aggregate the result up. Probably the easiest way of handling this would be to do the calculation in the fact table, or in the DSV, and then bringing the result in as a new real measure. However this may not be possible with some types of calculation, or if the granularity that you want to do the calculation is not the same as the granularity of the fact table.

One way of approaching this in MDX would be to create a calculated measure like this:

CREATE MEMBER CURRENTCUBE.MEASURES.[CALC] AS
SUM(
DESCENDANTS([Date].[Calendar].CURRENTMEMBER, [Date].[Calendar].[Date])
*
DESCENDANTS([Product].[Category - Product].CURRENTMEMBER, [Product].[Category - Product].[Product])
, [Measures].[A] * [Measures].[B]);

image

The big problem with this approach (apart from the fact that it may break when you do a multi-select in certain client tools – but you could work around that) is that it is usually very, very slow indeed. Depending on the calculation, it may be unusably slow. So you need a different approach.

This is where scoped assignments come in. If you make a scoped assignment to a real measure, as opposed to a calculated measure, then the value of the assignment will aggregate up outside of the original scope. So, in this case, since you want the calculation to take place at the Date and Product granularity, if you scope on a real measure at that granularity the result of the calculation will aggregate up automatically.

The first step here is to create a new real (ie not calculated) measure for the calculation. This can be done in the DSV by creating a named calculation on your fact table which returns the value NULL:

image

You then need to create a new real measure on your measure group from this new named calculation:

image

In this example, I’ve left the AggregateFunction property of the measure to be the default of Sum, but you could use a different setting if you wanted a different type of aggregation. The next step is to process the cube, and once you’ve done that you’ll see a new measure that only returns the value 0:

image

Next, you need to create the scoped assignment on the Calculations tab of the Cube Editor. If you remember in my post last week about scoped assignments, I recommended avoiding writing scopes using user hierarchies; using only attribute hierarchies the scope statement becomes:

SCOPE([Measures].[A Multiplied By B]);
    SCOPE([Date].[Date].[Date].MEMBERS);
        SCOPE([Product].[Product].[Product].MEMBERS);
            THIS = [Measures].[A] * [Measures].[B];
        END SCOPE;
    END SCOPE;
END SCOPE;

image

One very important thing to notice: the sets I’ve used for scoping on the Dates and Products do not include the All Member: for example, [Date].[Date].[Date].MEMBERS. If you use a set that includes the All Member, such as [Date].[Date].MEMBERS, the calculation will not aggregate up correctly.

Here’s the result:

image

This is going to be much more efficient than the pure MDX calculated measure approach, though just how well the calculation performs will depend on the complexity of the calculation and the size of the area that you are scoping on.

Written by Chris Webb

May 29, 2013 at 5:11 pm

Using Scoped Assignments To Show Calendar and Financial Year-To-Date Values In The Same Calculated Measure

with 12 comments

I’ve had a lot of requests for more MDX content on my blog, so here’s something I’ve been meaning to write up for a long time: a worked example of how to use scoped assignments to implement two different types year-to-date calculation on two different hierarchies in the same dimension. Knowledge of how to use scoped assignments is the sign of a true MDX master (you can watch a video of a session I gave on the basics of scoped assignments at SQLBits here if you’re unfamiliar with them) but that’s because they can be very difficult to write and there’s surprisingly little information out there on the internet about them. They are incredibly powerful, though, and often they provide the most elegant and best-performing way to solve a problem.

Let’s start by looking at the Date dimension in the Adventure Works DW sample database, and more specifically the attributes, user hierarchies and attribute relationships:

image

image

Notice how we have two user hierarchies for Calendar Years (which start on January 1st) and Fiscal Years (which start on July 1st), called Calendar and Fiscal. Now, let’s say that you have a requirement to to show year-to-date values for a measure for both the Calendar and Fiscal hierarchies. It would be very easy to implement this as two separate calculated measures but what if you needed to show both types of year-to-date in the same calculated measure, showing Fiscal year-to-dates when the Fiscal hierarchy was used in a query and Calendar year-to-dates when the Calendar hierarchy was used in a query?

This is possible using scoped assignments. The first thing to point out, though, is that this is only going to be possible if you change the structure of the dimension. Why? Well, take a look at the Date levels of both hierarchies: they are both built using the Date attribute. If you were running a query with your YTD calculation on columns and only the Date attribute hierarchy on rows, would you expect to see Calendar or Fiscal YTD values? Certainly you couldn’t see both in the same cell, and this is the problem: if you expect to see Calendar YTD values at the bottom of your Calendar hierarchy, and Fiscal YTD values at the bottom of your Fiscal hierarchy, you need two separate Date attributes to do this. If you overwrite the values in the same cells twice using a scoped assignment, then you will only see the result of the second assignment.

Therefore, what we need to do is to create two new attributes, Calendar Date and Fiscal Date, to serve as the lowest levels of the Calendar and Fiscal hierarchies instead of the Date attribute. Here’s what the new attribute relationships look like:

image

From the end-user’s point of view nothing seems to have changed (these new attributes can have their AttributeHierarchyVisible property set to False) but this now means we have two attributes, two different slices of the cube, whose values we can overwrite separately instead of just one.

Now for the MDX. A good rule to follow when writing scoped assignments is to always use attribute hierarchies and never use user hierarchies; this is because there are rules you have to obey about the shape of the subcube of data you are overwriting with your scoped assignment. When defining a scope using only attribute hierarchies, you can only use the following types of set:

  • Every single member on the attribute hierarchy
  • Just one member on the attribute hierarchy
  • Multiple members on the attribute hierarchy not including the All Member

I wrote up a detailed set of rules for defining scopes in MDX Solutions, if you’re interested; if you don’t follow these rules you’ll get the infamous “An arbitrary shape of the sets is not allowed in the current context” error (I know a joke about that, incidentally).

For this calculation, we need to make two scoped assignments on a single calculated measure called [YTD Sales]: one to show the Fiscal YTD calculation for the Fiscal Date, Fiscal Month Name, Fiscal Quarter, Fiscal Semester and Fiscal Year attributes; and one to show the Calendar YTD calculation for the Calendar Date, Calendar Month Name, Calendar Quarter, Calendar Semester and Calendar Year attributes. When scoping on ranges of attributes like this, there’s another easy rule to follow: scope on the set of every member, including the All Member, from the attribute hierarchy at the lowest granularity attribute, and the set of every member, not including the All Member, from the highest granularity attribute. These two sets also need to be in the same, rather than separate, SCOPE statements for reasons I explain here.

This results in the following MDX:

CREATE MEMBER CURRENTCUBE.MEASURES.[YTD Sales] AS NULL;

 

SCOPE(MEASURES.[YTD Sales]);

    --Calendar YTD

    SCOPE([Date].[Calendar Date].MEMBERS, [Date].[Calendar Year].[Calendar Year].MEMBERS);

        THIS = AGGREGATE(

                PERIODSTODATE([Date].[Calendar].[Calendar Year], [Date].[Calendar].CURRENTMEMBER), 

                [Measures].[Internet Sales Amount]);

    END SCOPE;

    --Fiscal YTD

    SCOPE([Date].[Fiscal Date].MEMBERS, [Date].[Fiscal Year].[Fiscal Year].MEMBERS);

        THIS = AGGREGATE(

                PERIODSTODATE([Date].[Fiscal].[Fiscal Year], [Date].[Fiscal].CURRENTMEMBER),

                [Measures].[Internet Sales Amount]);

    END SCOPE;

END SCOPE;

Now you can see the YTD Sales calculated measure returns Calendar YTD values for the Calendar hierarchy:

image

…and it returns Fiscal YTD values for the Fiscal hierarchy:

image

There’s one last trick I want to show. It can be very difficult to know that your scoped assignment is covering the subcube you want it to cover, so while debugging scoped assignments I find it helps to assign values to the BACK_COLOR cell property so I can see exactly where I’m scoping. Here’s the MDX above with extra assignments to set the cell background colour to orange for the Calendar YTD calculation and blue for the Fiscal YTD calculations:

CREATE MEMBER CURRENTCUBE.MEASURES.[YTD Sales] AS NULL;

 

SCOPE(MEASURES.[YTD Sales]);

    --Calendar YTD

    SCOPE([Date].[Calendar Date].MEMBERS, [Date].[Calendar Year].[Calendar Year].MEMBERS);

        THIS = AGGREGATE(

                PERIODSTODATE([Date].[Calendar].[Calendar Year], [Date].[Calendar].CURRENTMEMBER), 

                [Measures].[Internet Sales Amount]);

        BACK_COLOR(THIS) = RGB(255,128,64);

    END SCOPE;

    --Fiscal YTD

    SCOPE([Date].[Fiscal Date].MEMBERS, [Date].[Fiscal Year].[Fiscal Year].MEMBERS);

        THIS = AGGREGATE(

                PERIODSTODATE([Date].[Fiscal].[Fiscal Year], [Date].[Fiscal].CURRENTMEMBER),

                [Measures].[Internet Sales Amount]);

        BACK_COLOR(THIS) = RGB(64,128,255);

    END SCOPE;

END SCOPE;

This now shows up in an Excel PivotTable as you can see below:

image

image

It can also help you understand what’s going on in the scenarios where the scopes overlap, for example if you put the Calendar and Fiscal hierarchies on rows and columns in the same PivotTable: the Fiscal hierarchy takes precedence, because it’s defined second.

image

Written by Chris Webb

May 23, 2013 at 3:27 pm

Cloud-Based DAX and MDX Formatter

with 4 comments

It’s been away for a while, but now it’s back – the site where you can enter your MDX or DAX code and have it formatted for you! Here’s the link:

http://formatmdx.azurewebsites.net/

image

Thanks to Nick Medveditskov, the man responsible for it.

Written by Chris Webb

April 17, 2013 at 11:06 pm

Posted in DAX, MDX

The Dangers of Non_Empty_Behavior

with 5 comments

One thing I see quite often when I’m reviewing MDX is the incorrect use of the Non_Empty_Behavior property. 99% of the time it’s set without the developer really knowing what it does, and luckily in most cases it makes no difference to performance or what the calculation returns. However… in this post I’m going to argue that you should not use it unless you really know what you’re doing and you can prove that it’s beneficial – because if it’s set incorrectly it can sometimes mean you get incorrect results from your queries.

First of all, what does Non_Empty_Behavior actually do? Basically, it’s a hint to the SSAS Formula Engine that says that a calculation will return a null value if a given tuple also returns a null value. Moreover it’s a hint that may or may not be ignored. For example, if you had an MDX calculated measure that was defined as

CREATE MEMBER CURRENTCUBE.[Measures].[MyMeasure] as [Measures].[Internet Sales Amount]*2;

It would be true to say that this calculated measure would always return a null value when the real measure [Measures].[Internet Sales Amount] returned null, so you would be able to say:

CREATE MEMBER CURRENTCUBE.[Measures].[MyMeasure] as [Measures].[Internet Sales Amount]*2
, NON_EMPTY_BEHAVIOR=[Measures].[Internet Sales Amount];

The late, great Mosha covered its use in detail in a number of blog posts from a few years ago:
http://sqlblog.com/blogs/mosha/archive/2006/11/05/budget-variance-a-study-of-mdx-optimizations-evaluation-modes-and-non-empty-behavior.aspx
http://sqlblog.com/blogs/mosha/archive/2007/03/06/averages-ratios-division-by-zero-and-non-empty-behavior.aspx

These posts all refer to SSAS 2005, and from SSAS 2008 onwards a lot of work went on to ensure that most of the scenarios where Non_Empty_Behavior had been necessary to get good performance were optimised away. Certainly, in the simple example I showed above there would be no need to set Non_Empty_Behavior at all – the Formula Engine will already be able to optimise it as much as it can. This means that if you’re on SSAS 2005 then Non_Empty_Behavior can be very useful for tuning calculations, but if you are using SSAS 2008 or greater it’s highly likely that you do not need to use Non_Empty_Behavior at all. In fact, Mosha’s advice back in 2009 when he did a preconference seminar at the PASS Summit on SSAS performance tuning was summed up in these two slides:

image

image

I think the message here is pretty clear…

So, it probably won’t help your query performance, but why is it dangerous? Consider the following query:

WITH
MEMBER MEASURES.SomeTest AS
IIF(Now()>0, "Internet", "Reseller")
MEMBER MEASURES.Demo as
STRTOMEMBER("[Measures].[" + measures.sometest + " Sales Amount]")
SELECT
{[Measures].Demo} 
ON 0,
NON EMPTY
[Product].[Category].[Category].members
*
[Date].[Date].[Date].members
ON 1
FROM [Adventure Works]

This returns 1887 rows. However, if I set Non_Empty_Behavior incorrectly, like so:

WITH
MEMBER MEASURES.SomeTest AS
IIF(Now()>0, "Internet", "Reseller")
MEMBER MEASURES.Demo as
STRTOMEMBER("[Measures].[" + measures.sometest + " Sales Amount]")
, NON_EMPTY_BEHAVIOR=[Measures].[Reseller Sales Amount]
SELECT
{[Measures].Demo}
ON 0,
NON EMPTY
[Product].[Category].[Category].members
*
[Date].[Date].[Date].members
ON 1
FROM [Adventure Works]

The query now returns only 64 rows! By saying that the calculation will return a null value when Reseller Sales Amount is null – which is not the case – then rows that do in fact have values will get filtered out by the non empty clause. Clearly this is a bad thing, and also one that may be very hard to see happening when you are testing.

The example above was quite obviously wrong, and in fact it’s quite hard on SSAS 2012 to find a query where Non_Empty_Behavior returns incorrect results. More often I see calculations like this:

WITH
MEMBER MEASURES.PreviousSales AS
([Measures].[Internet Sales Amount], [Date].[Calendar].CURRENTMEMBER.PREVMEMBER)
, NON_EMPTY_BEHAVIOR=[Measures].[Internet Sales Amount]
SELECT {[Measures].[Internet Sales Amount], measures.PreviousSales} ON 0,
[Date].[Calendar].[Calendar Year].MEMBERS ON 1
FROM [Adventure Works]

…where Non_Empty_Behavior is set incorrectly (the calculation returns the Internet Sales Amount for the previous time period, and it’s not true to say that the previous time period’s sales will be null if the current time period’s sales are null) but the query still returns the correct results. This is because, luckily, Non_Empty_Behavior is being ignored.

So, to sum up, if you are using SSAS 2008 or greater Non_Empty_Behavior may not solve your performance problem and may result in incorrect results. That is not to say that there are still one or two isolated scenarios where it can still be useful. Jorg Klein came across one fairly recently and blogged about it here:
http://sqlblog.com/blogs/jorg_klein/archive/2013/02/18/implement-ssas-md-cell-security-using-dimension-security-with-blazing-performance.aspx

But these scenarios are few and far between. Therefore don’t use it if you’re not sure what it does!

Written by Chris Webb

March 11, 2013 at 1:30 pm

MDX on Cloudera Impala

with 5 comments

You may have seen today’s announcement (and comment from Mary-Jo Foley) on the official SQL Server blog about HortonWorks Data Platform for Windows; I won’t repeat what it says, but it’s clear Microsoft is taking Hadoop and Big Data (yuk, I hate that term, but sometimes I can’t avoid using it…) very seriously. However, yesterday I saw another, very interesting announcement from the guys at Simba that presents an alternative vision for Microsoft BI and Big Data. You can see a demo of it here:

Simba MDX Provider for Impala

There are some more details here:
https://groups.google.com/a/cloudera.org/forum/?fromgroups=#!topic/impala-user/BniN1ELYNDs

Basically, Simba have built an OLEDB for OLAP Provider that translates the MDX generated by Excel PivotTables into HiveQL queries against Cloudera Impala. You can read a good, short overview of what Impala is here:
http://blog.cloudera.com/blog/2012/10/cloudera-impala-real-time-queries-in-apache-hadoop-for-real/
and there’s more detailed information here:
https://ccp.cloudera.com/display/IMPALA10BETADOC/Impala+Frequently+Asked+Questions

In summary, what this gives you is fast, interactive analysis from within Excel going direct against large amounts of data stored via Impala, with no need for users to have to write Hive queries or stage the data in PowerPivot in the way the current Microsoft/Hortonworks solution does. Even more interesting is the fact that Simba support MDX calculated members and not just MDX queries, so given that Excel 2013 allows you to define your own calculated members you could do some very powerful analysis this way. Well, those of us who know some MDX, at least :-)

Impala is open source and isn’t shy about the debt it owes to Google’s Dremel, which of course is available publicly now as BigQuery. When I first saw BigQuery I thought putting an MDX interface over the top would make it appealing to a much wider audience – maybe not a good thing for Microsoft, but it would be a clever move on the part of Google certainly. Microsoft hasn’t announced that it’s working on anything comparable to BigQuery, alas; already, BI tools like Tableau and BIME can connect to and query BigQuery (Tableau also connects to Amazon’s new Redshift database too), and it’s these tools that are Excel’s big competitors in the BI client tool space. I guess SSAS 2012 Tabular in DirectQuery mode going against PDW would be the only vaguely similar Microsoft solution, but PDW is on-prem only and pretty expensive. Translating MDX to the SQL used by tools like Impala, in the way that the Simba MDX Provider does, puts Excel on a more equal footing with Tableau et al. I don’t know how Simba/Cloudera will be making this MDX Provider available but I would be surprised if they didn’t charge for it; Microsoft’s close relationship with Hortonworks, a competitor to Cloudera, makes me think that Microsoft might not want to promote this particular tool either, which is a shame. Maybe an acceptable solution for MS would be to build new cartridges for SSAS 2012 and enable DirectQuery for data sources other than SQL Server? It’s unlikely to happen, I think, but it would be an option.

UPDATE: While I was writing this post, Cathy Dumas of Simba (you were wondering what happened to her, weren’t you?) also blogged about this

http://blogs.simba.com/simba_technologies_ceo_co/2013/02/demo-microsoft-excel-pivottables-on-cloudera-impala-via-simba-mdx-provider.html

Written by Chris Webb

February 25, 2013 at 11:38 pm

Posted in Excel, MDX

Tagged with

Counting Consistent Customers in MDX

with 4 comments

The post I wrote a few years ago on counting new and returning customers in MDX has proved to be one of the most popular here; it’s certainly a topic I’ve returned to a number of times for DAX, and other people (most recently Gerhard Brueckl) seem equally interested in solving this problem. However I had a comment from Sohrab Niramwalla yesterday which raised an interesting question: how do you could the number of customers who have bought from you in every time period from the beginning of time until the current date?

The Customer dimension in Adventure Works isn’t actually very good for illustrating this because customers very rarely buy more than once, but if you look at Countries then you can recreate the scenario. Consider the following query:

select
{[Measures].[Internet Sales Amount]} *
[Customer].[Country].[Country].members on 0,
[Date].[Date].[Date].members
on 1
from [Adventure Works]

image

From the screenshot you can see that on July 1st 2001 there were purchases in 4 out of 6 countries; of those 4 countries, only 2 (Australia and USA) had purchases on July 2nd; both of those had purchases on July 3rd; but by July 4th only USA had seen purchases on every day since the beginning of time.

How then is is possible to get this count of the number of countries that have seen purchases since the beginning of time? I can think of a few approaches. One would involve using recursion and strtoset/settostr, but I generally like to avoid recursion and strtoset because performance can be unpredictable and bad. Probably the best way is to think of the problem like this: if a country has had sales on every day since the beginning of time, then count of days that had sales since the beginning of time will be the same as the total count of days since the beginning of time. Therefore, you can write a query something like this:

with
–count the number of days since the beginning of time
member measures.daycount as
count(null:[Date].[Date].currentmember)

–count the number of non empty days for sales since the beginning of time
member measures.nonemptydaycount as
count(
nonempty(
null:[Date].[Date].currentmember
, [Measures].[Internet Sales Amount])
)

–count the number of countries that have nonemptydaycount = daycount
member measures.[Consistent Customers] as
count(
filter(
[Customer].[Country].[Country].members
, measures.nonemptydaycount = measures.daycount)
)
select measures.[Consistent Customers] on 0,
head([Date].[Date].[Date].members, 15) on 1
from [Adventure Works]

image

Mosha’s technique for optimising count(filter()) calculations might also be useful here, although I have to admit I’ve not tested it to see if it does improve performance.

However, for those of you who are fans of obscure MDX, here’s another solution:

with
member measures.[Consistent Customers] as
count(
–iterate over every date since the beginning of time
generate(
{null : {{[Date].[Date].currentmember} as currentdate}.item(0)}
,
{
–if the current date in the iteration is the first date
iif([Date].[Date].currentmember is [Date].[Date].[Date].members.item(0)
,
–then define the inline named set customerset as
–all the nonempty customers
intersect(
nonempty(
[Customer].[Country].[Country].members
, [Measures].[Internet Sales Amount]) as customerset
, {})
,
–else, redefine the set customerset as the intersection of
–customerset and the nonempty countries in the current time period
intersect(
intersect(
customerset
, nonempty(
[Customer].[Country].[Country].members
, [Measures].[Internet Sales Amount])
) as customerset
, {})
)
–note that the intersect function is used to ensure only an empty
–set is ever returned from these expressions
,
–if the current date in the iteration is the current time period
–ie we are at the final iteration
iif([Date].[Date].currentmember is currentdate.item(0)
–then return the contents of the named set customerset
, customerset
, {})
}
))

select {measures.[Consistent Customers]} on 0,
head(
[Date].[Date].[Date].members
, 15)
on 1
from [Adventure Works]

 

It uses the generate() function to loop over ever date from the beginning of time to the current time period, and then redefines an inline named set (called customerset) on every step of the iteration to find the non empty countries. It doesn’t perform as well as the previous solution in this particular case, but if there were more countries that might change; a few tweaks to the code might also speed it up. I thought it was worth mentioning, though, for the novelty value.

Written by Chris Webb

February 24, 2013 at 9:25 pm

Posted in Analysis Services, MDX

TopCounts With Ties In MDX

leave a comment »

Pretty much everyone that knows MDX knows the TopCount() function, which is used to find the top n items in a set – it can be used for finding your top 10 products, your top 20 salespeople, and so on. However most people don’t consider it’s biggest drawback: it always returns n items, and doesn’t take ties into account.

Consider the following query on Adventure Works, which returns the top 9 dates by Internet Order Count:

SELECT {[Measures].[Internet Order Count]} ON 0,
TOPCOUNT(
[Date].[Date].[Date].MEMBERS
, 9
,[Measures].[Internet Order Count])
ON 1
FROM [Adventure Works]

image

Now look at this query, which does the same thing but returns the top 10 dates:

SELECT {[Measures].[Internet Order Count]} ON 0,
TOPCOUNT(
[Date].[Date].[Date].MEMBERS
, 10
,[Measures].[Internet Order Count])
ON 1
FROM [Adventure Works]

image

Notice how June 11 2004 and June 17 2004 both have the same Internet Order Count of 86, but the latter date isn’t included in the first query. From this you can see that it’s important, when you’re doing a top n query, not to always return n items but to take tied values into account.

Luckily there’s a way of doing this. If you can download and install the dll from the Analysis Services Stored Procedure Project, you can use the TopCountWithTies() function that it provides, like so:

SELECT {[Measures].[Internet Order Count]} ON 0,
ASSP.TOPCOUNTWITHTIES(
[Date].[Date].[Date].MEMBERS
, 9
,[Measures].[Internet Order Count])
ON 1
FROM [Adventure Works]

image

Unfortunately, installing third-party dlls is not always allowed in a production environment, and SSAS 2012 Tabular doesn’t support dlls at all. However there is a way of getting the same result in pure MDX. Here’s an example:

WITH
SET TOP9 AS
TOPCOUNT(
[Date].[Date].[Date].MEMBERS
, 9
,[Measures].[Internet Order Count])
MEMBER MEASURES.TIEDRANK AS
RANK(
[Date].[Date].CURRENTMEMBER
, TOP9
, [Measures].[Internet Order Count])
SET TOP9WITHTIES AS
ORDER(
FILTER(
[Date].[Date].[Date].MEMBERS
, MEASURES.TIEDRANK>0 AND MEASURES.TIEDRANK<=9)
, MEASURES.TIEDRANK
, BASC)

SELECT {[Measures].[Internet Order Count], MEASURES.TIEDRANK} ON 0,
TOP9WITHTIES
ON 1
FROM [Adventure Works]

image

What I’m doing here is first using the TopCount() function to find the set of the top 9 dates (this step isn’t strictly necessary, but it has a significant positive impact on performance), and then using the Rank() function with the third parameter to find the tied rank. The Rank() function should find the position of a member in a set, but when the third parameter for it is specified it doesn’t match members based on the members themselves but on the values of the tuple specified in the third parameter. Hence, even when I have  a date that doesn’t appear in the set TOP9, the Rank() function can still return a value for it if that date has a value for Internet Order Count that does appear in that set.

Written by Chris Webb

February 9, 2013 at 11:05 am

Posted in MDX

Validating MDX Queries Without Running Them

with 3 comments

Here’s something interesting I came across while looking over some old documentation the other day: a way of checking whether an MDX query is syntactically correct without actually having to run it. You can do this by setting the Content connection string property. The default value for this property is:

Content=SchemaData

And this runs your queries as normal. For example, take the following query on the Adventure Works cube:

select {[Measures].[Internet Sales Amount]} on 0,
[Date].[Calendar Year].members on 1
from [Adventure Works]

On a cold cache I can see lots of activity on Profiler when it’s run, as I’d expect:

image

However, with the connection string set as follows (see here for how to do this in SQL Server Management Studio; but beware – this bug is still around in 2012) :

Content=Schema

When I rerun the query on a cold cache I can see the MDX Script being evaluated but nothing happening for the query:

image

In SQL Server Management Studio the query is executed successfully but no results are returned; notice that in the Resource Usage event all the values are zero too.

If I modify the query to include an error, however:

select {[Measures].[Internet Sales Amount]} on 0,
blah blah blah
[Date].[Calendar Year].members on 1
from [Adventure Works]

I do see an error in SQL Server Management Studio:

image

This functionality could be useful in situations where you wanted to test the syntax of an MDX query or indeed just a calculation – it would allow you to do this without actually running the query and then killing it (and some queries don’t die immediately when they’re cancelled, as you might know).

Written by Chris Webb

January 29, 2013 at 11:53 am

Posted in Analysis Services, MDX

Follow

Get every new post delivered to your Inbox.

Join 2,868 other followers