Chris Webb's BI Blog

Analysis Services, MDX, PowerPivot, DAX and anything BI-related

Last Ever Non Empty – a new, fast MDX approach

with 46 comments

The last non empty semi-additive measure aggregation functionality in SSAS enterprise edition is very useful, but it doesn’t support one common business requirement: while it will give you the last non empty value within any given time period, it doesn’t handle the variation where you want to get the last non empty value of a measure from all preceding time periods (this is what I’m calling the ‘last ever non empty’ value). There are a number of business scenarios where you’d want to do this, for example finding the value of the last purchase a customer made, the last price you sold a product at, and the stock level of a product in a shop the last time a sales rep visited. Traditional MDX solutions to this problem have suffered from poor performance but in this blog post I’ll describe a new approach that performs much better; I think it will be very useful to a lot of people, and I’m quite proud of it!

Let’s take the following MDX query on Adventure Works as an example of the problem:

SELECT
HEAD([Customer].[Customer].[Customer].MEMBERS, 10)
*
{[Measures].[Internet Sales Amount]}
ON 0,
NON EMPTY
[Date].[Date].[Date].MEMBERS
ON 1
FROM [Adventure Works]

Here’s part of the results:

image

From this we can see that individual customers only bought from us once or twice. Now, for any date, let’s create a calculation that will find what the value of the last purchase by any given customer was, regardless of however long ago it was. Up until last week I’d have tackled this problem using a combination of the NonEmpty and Tail functions – for each customer and date, get the set of all preceding dates, find the dates which had values and find the value of the last date. Here’s the code:

WITH
MEMBER MEASURES.[Last Sale Original] AS
TAIL(
NONEMPTY({NULL:[Date].[Date].CURRENTMEMBER} * [Measures].[Internet Sales Amount])
).ITEM(0)

SELECT
HEAD([Customer].[Customer].[Customer].MEMBERS, 10)
*
{[Measures].[Internet Sales Amount],MEASURES.[Last Sale Original]}
ON 0,
[Date].[Date].[Date].MEMBERS
ON 1
FROM [Adventure Works]

And here’s the part of the results dealing with the first customer, Aaron A. Allen:

image

On my laptop the query takes 14 seconds to run, and that’s with only 10 customers on columns (it executes in cell-by-cell mode, I think); in many real world scenarios this kind of performance isn’t acceptable and that was certainly the case with the customer I was working with last week. So I came up with the following new MDX that does the same thing much faster:

WITH 

MEMBER MEASURES.DAYSTODATE AS
COUNT(NULL:[Date].[Date].CURRENTMEMBER)-1

MEMBER MEASURES.HADSALE AS
IIF([Measures].[Internet Sales Amount]=0, NULL, MEASURES.DAYSTODATE)

MEMBER MEASURES.MAXDATE AS
MAX(NULL:[Date].[Date].CURRENTMEMBER, MEASURES.HADSALE)

MEMBER MEASURES.LASTSALE AS
IIF(ISEMPTY(MEASURES.MAXDATE), NULL,
([Measures].[Internet Sales Amount],
[Date].[Date].[Date].MEMBERS.ITEM(MEASURES.MAXDATE)))

SELECT
HEAD([Customer].[Customer].[Customer].MEMBERS, 10)
*
{[Measures].[Internet Sales Amount]
,MEASURES.[LASTSALE]}
ON 0,
[Date].[Date].[Date].MEMBERS
ON 1
FROM [Adventure Works]

On my laptop this query now executes in 3 seconds. Here’s what it’s doing:

  • First of all the DaysToDate measure returns the zero-based index of the current date within the set of all dates, so the first date in the time dimension would have index 0, the second 1 and so on. This could be replaced by a real measure to get slightly better performance but I left it as a calculated measure for the sake of clarity.
  • Next, the measure HadSale returns the index of the current date if it has a value and null otherwise.
  • Next, the measure MaxDate returns the maximum value of HadSale for the set of all dates from the beginning of time up to the current date. This will give us the index of the last date which had a value.
  • Finally we can take this index and, using the Item function, get the value of Internet Sales Amount for the last date that had a value.

If we want to take this approach and apply it to a server-based calculation, and make it work at all levels on the Date dimension, we need a slight variation. Again using the Adventure Works cube to illustrate, here’s what you need to do…

First of all, you need to create a new column in your fact table that contains only null values and use this as the basis of a new real (ie not calculated) measure, which should be called MaxDate. This should have the aggregation function Max.

image

You then need to add the following code to the MDX Script of the cube:

CREATE MEMBER CURRENTCUBE.MEASURES.DAYSTODATE AS
COUNT(NULL:[Date].[Date].CURRENTMEMBER)-1
, VISIBLE=FALSE;

CREATE MEMBER CURRENTCUBE.MEASURES.HADSALE AS
IIF([Measures].[Internet Sales Amount]=0, NULL, MEASURES.DAYSTODATE)
, VISIBLE=FALSE;

SCOPE(MEASURES.MAXDATE, [Date].[Date].[Date].MEMBERS);
    THIS = MAX(NULL:[Date].[Date].CURRENTMEMBER, MEASURES.HADSALE);
END SCOPE;

CREATE MEMBER CURRENTCUBE.MEASURES.LASTSALE AS
IIF(ISEMPTY(MEASURES.MAXDATE), NULL,
([Measures].[Internet Sales Amount],
[Date].[Date].[Date].MEMBERS.ITEM(MEASURES.MAXDATE)));

This does basically the same as the previous example only now MaxDate is a real measure instead of a calculated measure, and we’re using a scoped assignment to overwrite its value at the Date level. Above the Date level the default aggregation method of the MaxDate measure kicks in and we see the Max value of MaxDate for all dates in the current time period – which means at the month, quarter and year level we once again get the index of the last non empty date. Here’s what the result looks like in the cube browser:

image

Advertisement

Written by Chris Webb

March 24, 2011 at 10:34 pm

Posted in MDX

46 Responses

Subscribe to comments with RSS.

  1. Very useful script!! thanks for that Chris.

    There are lots of practical examples where this will be useful in the retail sector; with the examples that you cited, but also on tables that contain only “Starting Dates” for prices for example (to find the latest “ever” active price).

    Also thinking about it, it might open the door for more in depth basket analysis…

    Philippe

    Philippe Harel

    March 24, 2011 at 11:29 pm

  2. HI Chris this is beautifull discovery :) I recently used the “lookup” technique on some ssas forum thread but never thought of using it this way. Great finding.
    Hrvoje

    Hrvoje Piasevoli

    March 25, 2011 at 11:10 am

  3. Really innovative way of thinking… Pretty sure this would come into use in a lot of manufacturing/retail scenarios. I really couldn’t believe it when the results came in less than 3 seconds.
    Great work Chris!!! ( as usual ;) )

    Jason Thomas

    March 25, 2011 at 12:25 pm

  4. Cool :) Wouldn’t a recursion do similarly well?

    WITH y AS
    IIF([Internet Sales Amount]=0,
    ([Date].[Date].PrevMember,y),
    [Measures].[Internet Sales Amount])

    Boyan Penev

    March 25, 2011 at 3:02 pm

    • I’ve had a lot of bad experiences with recursive calculations in the past – in some cases they perform ok, but in many cases the performance is bad and unpredictable – so I try to avoid them.

      In any case, I’ve just tested the same query with 1000 customers and with my approach it returns in 1:28; with a recursive calculation it returns in 1:37, so there’s still a slight advantage to my method.

      Chris Webb

      March 25, 2011 at 4:16 pm

  5. Very nice Chris. I will use this implemenntation. Thanks for helping us out with faster/better MDX.

    Paul Goldy – WhiteCloudAnalytics

    Paul Goldy

    March 25, 2011 at 8:23 pm

  6. Very nice and once again innovative thinking outside-of-the-box. I have used recursive in the past and have not been a fan of that approach at all. Will definitely review my process with this approach. Thanks for sharing.

    denglishbi

    March 26, 2011 at 3:18 pm

  7. Its interesting that recursion seems to work better for smaller column sizes and chris’s method works better for larger ones. I have the cutover at ~40-50. Anyway novel approach!

    Paul te Braak

    March 26, 2011 at 10:53 pm

    • I think I need to do some more research on recursive calculations – there are definitely some scenarios where they will outperform this approach but I don’t think the number of columns is a good guide on its own. Possibly the sparsity of the data is also a factor.

      One problem I do know of with recursive calculations is that they cause ‘cache fragmentation’, where the SE and FE caches get filled up with a large number of small subcubes.

      Chris Webb

      March 27, 2011 at 10:23 am

  8. [...] Chris Webb has come up with a really fast way of performing this traditionally poor performaning requirement. You can find a full explanation of this here [...]

  9. [...] Webb posted a wonderful solution to the Last “ever” non empty calculation in MDX. I never thought to this solution and it is really a nice approach. I think that DAX might be [...]

  10. Actually is not so new )) first it was discussed here
    http://www.sql.ru/forum/actualthread.aspx?tid=595614
    2.5 years ago and several times later

    But you have provided a good overview of the method.

    Sergey

    March 29, 2011 at 8:44 am

    • Interesting – as far as I can tell from Google Translate it looks similar, although not quite the same (it’s using a Max measure on the date key, which won’t work if there’s no data at all in the current time period).

      Chris Webb

      March 29, 2011 at 9:42 am

      • Yes, not quite the same.
        Actually i would start the observation of the methods from the
        1. ability to calculate dimension keys to obtain dimension members. Then we can note that base of the
        2. key calculation can be obtained from the result of regular measures calculation

        but as i have noticed already you have done a great job to bring the idea to the wide world. Just the idea is not so new ))

        Sergey

        March 29, 2011 at 6:51 pm

  11. [...] calculations recently (in fact, I’ve had similar problems in the past as I say in the comments here) and Jeffrey has been helping me; he’s also written up his explanation in a blog post here: [...]

  12. Thanks for great post.

    You know that lastnonempty semi-additive measure aggregation is not sopported In SSAS Standard Edition

    With this approach it will be available in SSAS STD.
    It is easier and faster to find Inventory Levels of Products by this method.
    And it also support MultiSelect in Excel. (Famous MultiSelect problem in Excel).

    Thank you very much.

    cemuney

    July 13, 2011 at 10:07 am

  13. Hi Chris,

    Always helpful as usual. I am currently stuck with a problem related to LastNonEmpty, not sure this is related to this specific post, but definitely need your help (an MDX Guru).
    For each inventory movement, I am calculating the onhandPrice. This involves many dimensions. To aggregate this Onhand, I chose LastNonEmpty. But since not all members of each dimension are present on each date (combination of products/stores/colors..etc.), thus when trying to calculate the onhand for e.g. 2010-01-01, the LastNonEmpty is only summing up the onhand in the Leaves that happened to have movements on that date.

    This is an example, say these are all movements:
    Date Product Store OnhandPrice
    2009-12-30 P1 S1 900$
    2009-12-31 P1 S1 1000$
    2010-01-01 P2 S2 500$
    2010-01-01 P3 S3 800$

    Ideally, total Onhand for 2010-01-01 should be 1000+500+800.=2300 But the lastnonempty for the date of 2010-01-01 only brings the onhand for the movements that occured on 2010-01-01, being 500+800=1300.

    I found a similar problem here:
    http://www.sqldev.org/sql-server-analysis-services/lastnonempty-does-not-sum-across-non-time-dimensions-32191.shtml

    But doesn’t seem to have found a solution. I hope I explained well my problem.

    Greatly appreciating your help.

    Thanks,
    Amir.

    Amir

    August 4, 2011 at 5:25 pm

    • Hi Amir,

      This is a similar problem to the one I describe. It’s difficult to come up with an exact solution without working on the cube itself, but I think the approach would be to use something like the scoped assignment version of the calculation above but scope at the root of every dimension, not just the time dimension.

      Chris

      Chris Webb

      August 8, 2011 at 5:33 pm

  14. Thank you Chris. I am trying the scope assignment but it doesn’t seem to affect the measure. Here is what I made: I defined [measures].[onhand last], as the net of INs – OUTs of the inventory sliced by Product/Store/Color/Size, with LastNonEmpty aggregation.

    The following solution fills the future dates for each movement for a certain product appropriately (just like your measure Original fills 3399.99 for dates later than 4 June), but doesn’t aggregate when applied on many products/stores/colors/sizes.

    member [measures].[lastnonemptyOnhand] as
    tail(nonempty({null:[Time].[Calendar].currentmember}
    *[measures].[onhand last])).item(0)

    For example:

    –where {(Product1, Store1, Size1, Color1)}
    [onhand last] [lastnonemptyOnhand]
    June-7 60 60
    June-8 100 100

    –where {(Product1, Store1, Size1, Color2)}
    [onhand last] [lastnonemptyOnhand]
    June-7 50 50
    June-8 NULL 50

    –where {(Product1, Store1, Size1, Color1), (Product1, Store1, Size1, Color2)}
    [onhand last] [lastnonemptyOnhand]
    June-7 110 110
    June-8 100 100 (it should be 150)

    So the problem is: calculated measure ([lastnonemptyOnhand]) aggregates its measuregroup-measure ([onhand last]) then applies its formula based on the aggregated value. What needed is: making the calculation (formula above) on each member (each color in my example) then making aggregation.

    Thanks again for your help.

    Amir.

    Amir

    August 17, 2011 at 4:54 pm

    • Yes, a calculated member won’t aggregate. That’s why in the article I specifically say you have to create a new real measure (not a calculated measure) because that’s the only way you can get the results to aggregate up.

      Chris Webb

      August 18, 2011 at 8:18 pm

  15. As for the scopes. I tried on an MDX query:
    with cell calculation NonEmptyOnhand
    for ‘({[measures].[onhand last]},leaves())’
    as ‘tail(nonempty((periodstodate([Time].[Calendar].[All Time].level,
    [Time].[Calendar].currentmember),[measures].[onhand last])))’

    But this timed Out.

    And I tried the scope: SCOPE ({[measures].[onhand last]},leaves());
    THIS = tail(nonempty({null:[Time].[Calendar].currentmember}
    *[measures].[onhand last])).item(0); End Scope;

    But this didn’t even replicate the Onhand for individual cells -Color1- to future dates (didn’t perform as [lastnonemptyOnhand] or your measure [Last Sale Original]). Recalling that [onhand Last] is a measuregroup measure, and I scoped it rather than scoping [lastnonemptyOnhand], hoping it bring last nonempty onhand then aggregate properly.

    Appreciating your advice. Thanks again.

    Amir

    August 17, 2011 at 5:33 pm

    • Ah, ok, yes I see you’re scoping on a real measure now. But you’re using the sub-optimal version of the algorithm so this is probably why it timed out. Unfortunately it’s not going to be easy to work out what’s going on here without seeing your cube…

      Chris Webb

      August 18, 2011 at 8:28 pm

  16. Thanks, this was really helpful.

    Now how would one go about retrieving the last filled date caption for each day where the facts are empty?

    Like in this example http://cwebbbi.files.wordpress.com/2011/03/image_thumb2.png?w=307&h=279
    How can I retrieve the caption of June 4 on all the other days following afterwards?

    Minse Blom

    August 24, 2011 at 3:09 pm

    • You’d simply get the name of the member, instead of use it in a tuple:

      MEMBER MEASURES.LASTSALE AS
      IIF(ISEMPTY(MEASURES.MAXDATE), NULL,
      [Date].[Date].[Date].MEMBERS.ITEM(MEASURES.MAXDATE).NAME)

      Chris Webb

      August 25, 2011 at 9:24 pm

  17. Excellent post Chris – has helped me a great deal.

    As I am a relative newbie to MDX and AS2008, how would I go about reusing your approach to be able to have more than 1 measure in the Measure group that does the same sort of thing?

    In my case, I have a number of different cash ledgers that I need to bring the most recent balances forward for, a bit like having many different internet sales amounts columns.

    Also, are there any considerations to take into account if you are using a type 2 scd in either the scoping or the calculations? e.g. getting 2 different rows for the same customer, or product etc for a given time period where an historical change has occurred on the dimension but we have a different SK for the product in the same timeframe?

    thanks,

    Rodd

    Rodd

    September 22, 2011 at 5:09 pm

    • Hi Rodd,

      There shouldn’t be any issues with multiple measures – although you do need to repeat the logic for every single measure, I suspect. There shouldn’t be any problems with type 2 dimensions either.

      Chris

      Chris Webb

      September 22, 2011 at 5:30 pm

  18. I can confirm this. I’ve used this logic with multiple measures.
    I was able to reuse the days to date calculation, but needed to duplicate all the other calculations.

    Minse Blom

    September 23, 2011 at 8:17 am

  19. Chris,

    I have repeated the logic for another measure and this seems to follow fine.

    I do seem to be having a minor problem with my figures however, and I’m pretty sure it’s related to the scope statements to be used against each measure. Hopefully the following will illustrate my question / problem.

    In my basic Fact table, i have the following structure:

    DateDimID,
    ProductDimID,
    LedgerCurrencyDimID,
    BranchDimID,
    Ledger1CashBalanceSnapshotGBP,
    Ledger1CashBalanceSnapshot,
    Ledger2CashBalanceSnapshot2GBP,
    Ledger2CashBalanceSnapshot2

    So from the above, I have snapshots of different Cash ledger balances in GBP (and their native equivalent) for different products and Branches. The balances change on an irregular basis but are recorded at the day level.

    What I would like to be able to do is be able to find out the valuation of the respective cash ledgers at different points in time, and as you would expect drill up or drill down per Product type, Branch and so on at different time periods.

    At the moment in my test cube, I have only 2 products. However, I seem to be only returning most recent values for one of the Products, and not the aggregation of both Products. The values I have coming back for the individual Product seem ok when I check against the Fact table underlying data, but this doesnt represent the true Valuation of the individual ledger or total cash valuation at a given point in time.

    Any suggestions on how best to proceed, or will using this approach not give me what I am looking for?

    Thanks,

    Rodd

    Rodd

    September 23, 2011 at 12:22 pm

  20. Hi Rodd,

    I think I can see the problem. What you need to do is calculate the last balances at the product level and then aggregate up; at the moment the aggregation is happening first, then the last balance calculation, because the last balance calculation is taking place in a calculated measure.

    What you’ll need to do is use a real measure instead, and then use a scoped assignment on that measure at the Product and Date level to perform the last value calculation, and then let the results of the calculation aggregate up. To do this you need to use the approach I describe here of creating a dummy measure:
    http://social.msdn.microsoft.com/Forums/en-US/sqlanalysisservices/thread/84eb78dd-c69d-4d8b-a79c-2bdcc89aafca

    HTH,

    Chris

    Chris Webb

    September 23, 2011 at 1:01 pm

    • Chris,

      I’ve looked at the link, and created the dummy measure ‘Z’ in my Fact table and Measure group, however could you clarify the following please:

      Does the solution described completely replace your approach described above, that utilises the DaysToDate, HadSale, LastSale measures from? For example, the “[Measures].[Measure A] * [Measures].[Measure Percent];” calculation gets replaced with with equivalent of Measures.HadSale or Measures.LastSale ? Am just getting in a pickle tying all of it together in terms of knowing what measures I need to keep or discard and relate to my Measures.

      Similarly for the Scope’ing of Z, do i presumably include all my other related dimensions in the scope statement, and not just Date and Product as alluded to in the article? e.g. add in Currency, and Branch.

      In more general terms, in my measure group itself, do my base measures (Ledger1CashBalanceSnapshotGBP and so on) need to be defined as Sum or as LastNonEmpty?

      Again, many thanks,

      Rodd

      Rodd

      September 23, 2011 at 4:17 pm

      • Rodd,

        No, this doesn’t replace the logic for calculating the last available measure value. The link I gave you simply shows how you can do any calculation at a low level of granularity and have the result aggregate up efficiently.

        Chris

        Chris Webb

        September 24, 2011 at 10:05 pm

  21. Chris,

    Thanks for your help.

    I still can’t see the wood for the trees at present, probably due to my relative inexperience of all things MDX related. I will have to do some further experimentation as I am getting nowhere fast!

    Rodd

    Rodd

    October 5, 2011 at 8:45 pm

  22. Hello Chris,

    Very good info here. Just one question, what if the recursion occurs on a time dimension and for example all the previous members (days) are null? Will it iterate for all previous members and all days before that date?

    Example i have a Time Hierarchy called Time – Year – Month – Day, and for the month level i iterate to find the latest member with values before the last day of that month even if the day belongs to previous months.

    The problem is, if i have null for the first 3 years of data for example, it will iterate over and over because previous is always null… How to prevent this? Can recursive have a maximum iteration?

    Thank you

    Luis Simoes

    October 24, 2011 at 2:59 pm

    • The recursion won’t stop automatically on null values – you’d need to explicitly code a test to see if there is a null value and then stop the recursion. There’s no way to specify the maximum depth for recursion.

      Chris Webb

      October 24, 2011 at 3:03 pm

  23. Chris, thAnks for your blog, I need your help, do you know an mdx code for retrieving data for dates in 2 different dimensions where one date is less than or equal to a member and the other is greater than the same member. I know how to write this in sql but I’m struggling to accomplish this in mdx.

    Thank for your help.

    Daniel

    January 4, 2012 at 12:33 am

    • Hi Daniel,

      I can probably help you, yes, but you’ll need to be a bit more specific about what you want to accomplish. Can you give an example based on Adventure Works?

      Chris

      Chris Webb

      January 4, 2012 at 9:04 am

  24. Hi,
    How can we use ORDER BY in View?Is it necessary to ORDER the data before building a cube?

    Ali

    February 7, 2012 at 10:11 pm

    • No, there’s no foolproof way of ordering data so it can be loaded into SSAS: you can’t change the SQL that SSAS generates during processing to include an Order By clause, and you can’t use an Order By clause in a view unless you do the old SELECT TOP 100 PERCENT trick (Google for it). While ordering data before loading it into a cube can improve compression – I think the latest SSAS Performance Guide or Operations Guide covers this – it isn’t necessary otherwise.

      Chris Webb

      February 7, 2012 at 10:14 pm

  25. Hi,How can we use order by clause in View statement? Is it necessary to order data before building a cube?

    Ali

    February 7, 2012 at 10:13 pm

  26. Hi Chris,

    Slightly off topic but just in case I miss at the conference, wanted to run this past you please.

    I have one dimension with 100,000 rows each of which is a journal number plus other dimension with account codes. Each journal refers to many journal numbers in the fact table and the same for the GL code dimension.

    There approx 49 million records in the fact table and 30 columns. The dimensions do not all relate to eachother and so one dimension is dropped on another in a pivot table. This then can cause major performance issues.

    I have built aggregation and and also used the attribute hierarchies for the individual dimensions.

    Performance is terrible. I have read your previous blogs but my case seems a bit different as the data is being aggregated but when one dimensions is dropped onto another it has a long snooze..

    Any pointers for this?

    Thanks

    J

    J

    February 9, 2012 at 8:48 pm

  27. Thanks Chris.

    No calculations, I have taken them out to narrow down.

    Yes aggregations as in SSAS aggregations.

    Thanks for the link

    Ta

    J

    February 11, 2012 at 8:18 pm

  28. Usually if you have null values for measures in a fact table, it indicates you’ve made a mistake somewhere with your dimensional modelling. However in the short term if you want to make sure your null values stay null when they are brought into SSAS (rather than get converted to zeroes) you can set the Null Processing property of the measure to Preserve (as shown here: http://thomasivarssonmalmo.wordpress.com/2008/06/27/null-processing-of-measures-in-ssas2005/).

    Chris Webb

    January 31, 2012 at 3:25 pm

  29. Yes, we need null values in the cube, but I don’t think a null value in a measure column in a fact table is ever justified unless it’s a value that is late arriving and will be filled in later.

    Chris Webb

    January 31, 2012 at 4:30 pm


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 531 other followers