Chris Webb's BI Blog

Analysis Services, MDX, PowerPivot, DAX and anything BI-related

Solving the ‘Events in Progress’ problem in MDX, Part 2–Role Playing Measure Groups

with 20 comments

In my last post I described a simple solution to the ‘events in progress’ problem in SSAS and MDX, which nonetheless had one drawback: it involved loading all the data from the same fact table into two different measure groups in the same cube, which of course potentially doubles the amount of time taken to process the cube and its size on disk. I used the same technique in my recent post on improving the performance of currency conversion, and indeed it’s a technique that I have used in the past with several customers successfully; but it seems rather inelegant, so is there a way we can avoid doing it and only have one measure group? It doesn’t seem to be possible at first glance and I actually opened a Connect a while ago asking for this functionality (‘role playing measure groups’) to be implemented. I was having a good week last week, though, and at the same time as solving the ‘events in progress’ problem I also came up with a solution to this problem too…

Here’s what I did, using the ‘events in progress’ problem as an illustration:

First of all, I created and processed a simple cube called ‘EventsInProgressSource’ with a single measure group, a single Count measure, and two dimensions, Order Date and Ship Date having regular relationships joining on the OrderDateKey and ShipDateKey columns on the fact table:

image

image

I then created a second cube, ‘EventsInProgress2’, using the ‘Create Empty Cube’ option in the Cube Wizard.

image

I then started the New Linked Object wizard and copied everything from the ‘EventsInProgressSource’ cube to the EventsInProgress2 cube:

image

This resulted in a cube with one linked measure group and two linked dimensions:

image

Now here’s the fun bit. I then deployed and processed the cube, closed BIDS and went to SQL Management Studio. There I scripted the Internet Sales Facts linked measure group in ‘EventsInProgress2’ to an XMLA Create statement, then manually updated the XMLA by adding a 1 to the end of the name and ID of the object itself and the Line Item Count measure, then executed the script against the ‘EventsInProgess2’ cube. This created a second, identical linked measure group – something that again BIDS doesn’t let you do. I then reopened BIDS and connected direct to the cube in online mode (I could also have reimported the project back into BIDS) and went to the Dimension Usage tab, then deleted the relationship between Ship Date and the first linked measure group and Order Date and the second linked measure group, leaving the relationships like this:

image

I then added another Date dimension and set up referenced relationships (which had to be non-materialised) with each measure group via the Date attributes of the Ship Date and Order Date dimensions:

image

With this done we have achieved out goal: we have the same fact table appearing twice in the same cube as two different measure groups with different dimensionality, but we are only processing the data once. The last step to solve the ‘events in progress’ problem is to add what is essentially the same MDX as last time to the cube:

CREATE MEMBER CURRENTCUBE.MEASURES.OrderedToDate AS
    SUM(NULL: [Date].[Calendar].CURRENTMEMBER,
        ([Measures].[Line Item Count]));

CREATE MEMBER CURRENTCUBE.MEASURES.ShippedToPrevDate AS
    SUM(NULL: [Date].[Calendar].CURRENTMEMBER.PREVMEMBER,
        ([Measures].[Line Item Count1]));

CREATE MEMBER CURRENTCUBE.MEASURES.OrderedNotShipped AS
MEASURES.OrderedToDate – MEASURES.ShippedToPrevDate;

And we get the results we want out of the cube:

image

Now I haven’t tested this with anything other than the Adventure Works data, and there are some areas where I’d like to do more testing:

  • Non-materialised reference relationships don’t always perform that well. Materialised reference relationships aren’t allowed with linked measure groups though, so possibly using a m2m relationship to join the Date and Order Date/Ship Date dimensions might be an alternative
  • There are various ways of calculating a total-to-date in MDX and I’ve not spent any time working out if the version I’ve shown above is the most efficient.
  • There is going to be an overhead from querying a linked measure group rather than a regular measure group, and also probably an overhead from trying to query the same source measure group twice rather than two different measure groups, but I don’t know how significant it is.

If anyone out there does decide to try this at scale with their own data, please leave a comment and tell me about your experiences.

Written by Chris Webb

January 22, 2011 at 10:46 pm

Posted in Analysis Services, MDX

20 Responses

Subscribe to comments with RSS.

  1. Two minutes after I posted this, Marco came and asked me the obvious question: did I really need to do all this, and couldn’t I use the LinkMember function instead? I should have included some benchmarks…

    If we take the ‘EventsInProgressSource’ cube and add the following calculations (which give us the results we want, but use LinkMember):

    create member currentcube.measures.test1 as
    sum(null:[Order Date].[Calendar].currentmember,[Measures].[Line Item Count]);

    create member currentcube.measures.test2 as
    sum(null:
    linkmember([Order Date].[Calendar].currentmember.prevmember, [Ship Date].[Calendar])
    ,([Measures].[Line Item Count], [Order Date].[Calendar].[All Periods]));

    create member currentcube.measures.test3 as
    measures.test1 – measures.test2;

    Then run the following query:

    select measures.test3 on 0,
    [Order Date].[Date].[Date].members on 1
    from [EventsInProgressSource]

    The query runs in around 4.4 seconds on a cold cache on my machine. If we run the same query using the approach described above, it runs in 1.4 seconds. This is the overhead of using LinkMember…

    Chris Webb

    January 23, 2011 at 10:02 am

  2. Oh, and as far as I can see, the approach used in this post performs the same as the duplicated measure group approach described in part 1, after a very limited amount of testing.

    Chris Webb

    January 23, 2011 at 10:08 am

  3. Would SQL Server benefit from something like Materialized Views as in Oracle? That way, you don’t have to duplicate your data and storage. I am not sure about the processing time though.

    Bruce

    January 24, 2011 at 6:49 pm

  4. Hi Chris. I think you can solve this problem in the dimensional model using the many to many approach, http://smithicus.wordpress.com/2011/01/03/point-in-time-analysis/ using the order date as the start date equivalent and the ship date as the end date equivalent. Please let me know what you think.

    Ian smith

    January 24, 2011 at 8:19 pm

  5. The alternate solution i proposed does need a little tweaking but i stil think its worthwhile looking at.

    Ian smith

    January 24, 2011 at 8:42 pm

    • Using m2m relationships is what Richard Tkachuk proposed in his (now unavailable, unfortunately) paper on this subject – in fact, he went one step further in that he worked around the problem that when there are a large number of different intervals it will make the intermediate measure group unfeasibly large by doing compression on the contents. It complicates the ETL a lot but it may well perform better than my approach in some scenarios. Which one works best will depend on your data, but the advantage of my approach is that no extra ETL development/processing involved :-)

      Chris Webb

      January 24, 2011 at 9:20 pm

      • Ah, I didnt know Richard had allready mentioned this. Will blog about your approach and the one Richard and I proposed (will be linking to Richards and your blog page for the credits) just to outline the two.

        In short; yes as always it depends on the situation :) The advantage using the m2m appraoch is that it’s solution lies in the dimensional model and therefor more transparent and less technology dependent.

        Ian Smith

        January 25, 2011 at 7:39 am

  6. Is this the whitepaper you are talking about – from Richard Tkachuk?

    http://msdn.microsoft.com/en-us/library/ms345139.aspx

    Naveen Das

    January 24, 2011 at 10:12 pm

  7. Hmm, cool method. I’ve used an mdx script similar to the one here: http://consultingblogs.emc.com/christianwade/archive/2006/04/30/MDX-Sprocs-and-Scripting_3A00_-An-Interesting-Example.aspx

    But I’ll try this and see how it compares!

    Claire

    January 27, 2011 at 9:35 pm

  8. Hi Chris,

    after reading your two posts I also started thinking about the problem of “Events in Progress” and came up with the following:
    for the relatively simple scenario of just count orders that are ordered but not yet shipped at a given date you could “unpivot” your facttable as
    SELECT
    OrderDate AS Date
    1 AS OrderCount
    FROM MyFactTable
    UNION ALL
    SELECT
    ShipDate AS Date
    -1 AS OrderCount
    FROM MyFactTable

    the rest is similar to your calculations except that you only need one calculated member:
    SUM({NULL:[Date].[Date].currentmember}, [Measures].[OrderCount])

    though this also doubles your data …

    Gerhard Brueckl

    February 2, 2011 at 9:44 pm

  9. Chris,

    This is an interesting approach. We solved this problem by denormalizing the ship date into the order line record. If an order line does not have a ship date then it gets mapped to the “Unknown” member. then it becomes (order lines – ([Ship Date].&[Unknown],order lines). May be an over simplification and it has its own set of drawbacks but it scales very well for us.

    James Rogers

    March 9, 2011 at 6:11 pm

    • We have used this approach for imported and forecasted data. Those two measure groups get used in every cube so we set them up in their own cube and linked the measure groups. Seems to scale very well. I will try to pull together some statistics to quantify the performance.

      James Rogers

      March 9, 2011 at 6:18 pm

  10. I appreciate this is old now but I’ve used this approach several times in respect to dates very successfully. I’m now having to implement it down to hours (its bed occupancy in a hospital scenario) and have degraded the data so I can provide a key at DateHour level. Which seems to be working ok. But it feels untidy. I wondered if I could use two separate dimensions Date and Hour and cheat somehow with a reference dimension. Has anyone done something similar?

    stevep

    October 10, 2012 at 3:03 pm

  11. […] What grain are admissions? Given the goal of calculating ‘bed days‘ we might need to model them as events in progress. […]

  12. or pivot the fact table which have the events dates ,much easier to manage

    ganeshvelt

    October 30, 2013 at 11:51 pm

  13. Hey Chris, if LINKMEMBER is that costly, why not use STRTOMEMBER as shown below?

    CREATE MEMBER CURRENTCUBE.MEASURES.TEST2
    AS

    SUM(
    NULL:STRTOMEMBER(“[Ship Date].[Calendar].&[” + [Order Date].[Calendar].CurrentMember.Member_Key + “]”),
    ([Measures].[Line Item Count], [Order Date].[Calendar].[All Periods])
    );

    Also you could use a physical measure and scope statement to calculate the OrderedNotShipped value to allow users to drill through from Excel if need be.

    James Mburu

    June 16, 2014 at 6:45 pm

    • STRTOMEMBER is usually even worse for performance than LINKMEMBER, especially on older versions of SSAS.

      Chris Webb

      June 16, 2014 at 7:32 pm


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 3,113 other followers

%d bloggers like this: