Chris Webb's BI Blog

Analysis Services, MDX, PowerPivot, DAX and anything BI-related

Counting Consistent Customers in MDX

with 4 comments

The post I wrote a few years ago on counting new and returning customers in MDX has proved to be one of the most popular here; it’s certainly a topic I’ve returned to a number of times for DAX, and other people (most recently Gerhard Brueckl) seem equally interested in solving this problem. However I had a comment from Sohrab Niramwalla yesterday which raised an interesting question: how do you could the number of customers who have bought from you in every time period from the beginning of time until the current date?

The Customer dimension in Adventure Works isn’t actually very good for illustrating this because customers very rarely buy more than once, but if you look at Countries then you can recreate the scenario. Consider the following query:

select
{[Measures].[Internet Sales Amount]} *
[Customer].[Country].[Country].members on 0,
[Date].[Date].[Date].members
on 1
from [Adventure Works]

image

From the screenshot you can see that on July 1st 2001 there were purchases in 4 out of 6 countries; of those 4 countries, only 2 (Australia and USA) had purchases on July 2nd; both of those had purchases on July 3rd; but by July 4th only USA had seen purchases on every day since the beginning of time.

How then is is possible to get this count of the number of countries that have seen purchases since the beginning of time? I can think of a few approaches. One would involve using recursion and strtoset/settostr, but I generally like to avoid recursion and strtoset because performance can be unpredictable and bad. Probably the best way is to think of the problem like this: if a country has had sales on every day since the beginning of time, then count of days that had sales since the beginning of time will be the same as the total count of days since the beginning of time. Therefore, you can write a query something like this:

with
–count the number of days since the beginning of time
member measures.daycount as
count(null:[Date].[Date].currentmember)

–count the number of non empty days for sales since the beginning of time
member measures.nonemptydaycount as
count(
nonempty(
null:[Date].[Date].currentmember
, [Measures].[Internet Sales Amount])
)

–count the number of countries that have nonemptydaycount = daycount
member measures.[Consistent Customers] as
count(
filter(
[Customer].[Country].[Country].members
, measures.nonemptydaycount = measures.daycount)
)
select measures.[Consistent Customers] on 0,
head([Date].[Date].[Date].members, 15) on 1
from [Adventure Works]

image

Mosha’s technique for optimising count(filter()) calculations might also be useful here, although I have to admit I’ve not tested it to see if it does improve performance.

However, for those of you who are fans of obscure MDX, here’s another solution:

with
member measures.[Consistent Customers] as
count(
–iterate over every date since the beginning of time
generate(
{null : {{[Date].[Date].currentmember} as currentdate}.item(0)}
,
{
–if the current date in the iteration is the first date
iif([Date].[Date].currentmember is [Date].[Date].[Date].members.item(0)
,
–then define the inline named set customerset as
–all the nonempty customers
intersect(
nonempty(
[Customer].[Country].[Country].members
, [Measures].[Internet Sales Amount]) as customerset
, {})
,
–else, redefine the set customerset as the intersection of
–customerset and the nonempty countries in the current time period
intersect(
intersect(
customerset
, nonempty(
[Customer].[Country].[Country].members
, [Measures].[Internet Sales Amount])
) as customerset
, {})
)
–note that the intersect function is used to ensure only an empty
–set is ever returned from these expressions
,
–if the current date in the iteration is the current time period
–ie we are at the final iteration
iif([Date].[Date].currentmember is currentdate.item(0)
–then return the contents of the named set customerset
, customerset
, {})
}
))

select {measures.[Consistent Customers]} on 0,
head(
[Date].[Date].[Date].members
, 15)
on 1
from [Adventure Works]

 

It uses the generate() function to loop over ever date from the beginning of time to the current time period, and then redefines an inline named set (called customerset) on every step of the iteration to find the non empty countries. It doesn’t perform as well as the previous solution in this particular case, but if there were more countries that might change; a few tweaks to the code might also speed it up. I thought it was worth mentioning, though, for the novelty value.

Written by Chris Webb

February 24, 2013 at 9:25 pm

Posted in Analysis Services, MDX

4 Responses

Subscribe to comments with RSS.

  1. Chris, thanks for the answer, and for taking the time to provide a thorough explanation. I tried to bend the original query to suit my needs for a while, and realized that I was on the wrong path.

    Sohrab Niramwalla (@scrampy)

    February 25, 2013 at 2:41 pm

  2. Chris, after testing both methods on my implementation over the past day, I found that though your first answer was “more readable”, it did not perform well (at all?) with my customer dimension that has about 120,000 members. Pondering your “obscure MDX” for a while was more what I was looking for: “sliding a set through time, and modifying its contents as we go”. I must admit, I have not really used generate() beyond textbook exercises, but with a little tweaking, and redifining “consistent” to mean “having purchased in each of the past three months”, the generate/intersect approach just FLIES! I got my result set back, and
    in just under three seconds. Perhaps I botched something on translating the count(filter()) approach, but thanks for posting multiple approaches!

    Am I right in guessing that the second approach will also let me use the set of consistent customers in future steps, whereas the count(filter()) will only ever let me know how many, but not who?

    Sohrab Niramwalla (@scrampy)

    February 26, 2013 at 2:56 pm

    • Hi Sohrab,

      That’s interesting – I’m glad one of the solutions worked out!

      Re your question, it depends on what you mean by ‘future steps’. You certainly wouldn’t be able to refer to the inline named set in another calculation; you might also have problems referring to the named set in other parts of the same calculation. You could copy the calculation (I think you’d need to use a different name for the set if you had two calcs using this approach) and alter it so you showed the names of the customers, using the version of Generate() that concatenates strings (see http://www.mdxpert.com/Functions/MDXFunction.aspx?f=109).

      Chris

      Chris Webb

      February 26, 2013 at 3:02 pm

    • Sohrab, Could you post your tweaked MDX?

      Thanks

      Kerry

      October 15, 2013 at 4:18 pm


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 2,868 other followers

%d bloggers like this: