Chris Webb's BI Blog

Analysis Services, MDX, PowerPivot, DAX and anything BI-related

A New Events-In-Progress DAX Pattern

with 25 comments

I’ve been working on a very complex SSAS Tabular implementation recently, and as a result I’ve learned a few new DAX tricks. The one that I’m going to blog about today takes me back to my old favourite, the events-in-progress problem. I’ve blogged about it a lot of times, looking at solutions for MDX and DAX (see here and here), and for this project I had to do some performance tuning on a measure that uses a filter very much like this.

Using the Adventure Works Tabular model, the obvious way of finding the number of Orders on the Internet Sales table that are open on any given date (ie where the Date is between the dates given in the Order Date and the Ship Date column) is to write a query something like this:

EVALUATE

ADDCOLUMNS (

    VALUES ( 'Date'[Date] ),

    "OpenOrders",

    CALCULATE (

        COUNTROWS ( 'Internet Sales' ),

        FILTER( 'Internet Sales', 'Internet Sales'[Ship Date] > 'Date'[Date] ),

        FILTER( 'Internet Sales', 'Internet Sales'[Order Date] <= 'Date'[Date] )

    )

)

ORDER BY 'Date'[Date]

On my laptop this executes in around 1.9 seconds on a cold cache. However, after a bit of experimentation, I found the following query was substantially faster:

EVALUATE

ADDCOLUMNS (

    VALUES ( 'Date'[Date] ),

    "OpenOrders",

    COUNTROWS(

        FILTER(

            'Internet Sales',

            CONTAINS(

                DATESBETWEEN('Date'[Date]

                    , 'Internet Sales'[Order Date]

                    , DATEADD('Internet Sales'[Ship Date],-1, DAY))

                , [Date]

                , 'Date'[Date]

            )

        )

    )

)

ORDER BY 'Date'[Date]

On a cold cache this version executes in just 0.2 seconds on my laptop. What’s different? In the first version of the calculation the FILTER() function is used to find the rows in Internet Sales where the Order Date is less than or equal to the Date on rows, and where the Ship Date is greater than the Date. This is the obvious way of solving the problem. In the new calculation the DATESBETWEEN() function is used to create a table of dates from the Order Date to the day before the Ship Date for each row on Internet Sales, and the CONTAINS() function is used to see if the Date we’re interested in appears in that table.

I’ll be honest and admit that I’m not sure why this version is so much faster, but if (as it seems) this is a generally applicable pattern then I think this is a very interesting discovery.

Thanks to Marco, Alberto and Marius for the discussion around this issue…

UPDATE: Scott Reachard has some some further testing on this technique, and found that the performance is linked to the size of the date ranges. So, the shorter your date ranges, the faster the performance; if you have large date ranges, this may not be the best performing solution. See https://twitter.com/swreachard/status/349881355900952576

UPDATE: Alberto has done a lot more research into this problem, and come up with an even faster solution. See: http://www.sqlbi.com/articles/understanding-dax-query-plans/

Written by Chris Webb

June 13, 2013 at 10:32 am

25 Responses

Subscribe to comments with RSS.

  1. Chris, I was running your scripts against AdventureWorksDW2012 and getting unintended results. I think that the numbers in the OpenOrders column are not the number of Open Orders, but the total number of days between OrderDay and ShipDay. The Grand Total that I get, running your first script is 422,786 and the Grand Total of the second script is 440,504.

    So, I decided to create a PivotTable. First, I created a Relashionhip between FactInternetSales[OrderDate] and DimDate[FullDateAlternateKey]. Then I added a new calculated column to the FactInternetSales table, OrdersPending:=If([ShipDate]-[OrderDate]>0,1,0). Adding those two columns to the PivotTable gives me the correct number of Pending Orders for every date and a Grand Total of 60,398, which is the number of rows of the FactInternetSales table, given that there in no row with [OrderDate] = [ShipDate]. I also added another calculated column, TotalDays:=[ShipDate]-[OrderDate].

    Then I used the same logic of ([ShipDate]-[OrderDate]>0) on your script. I added two more columns. [TotalDays] to calculate the sum of Pending Days for each date. [AverageDays] to calculate the average Pending Days for the Orders Pending on each date. The number of days between [ShipDate] and [OrderDate] is always 7 for the whole FactInternetSales table. The [OpenOrders] Grand Total is 60,398 and [TotalDays] Grand Total is 422,786.

    To test the script a bit more I added two more calculated columns to insert into the pivot table, ShipDate1:=If([OrderDateKey]0),’FactInternetSales’[OrderDate])),
    “TotalDays”,
    CALCULATE(SUMX(FILTER(‘FactInternetSales’,(‘FactInternetSales’[ShipDate] – ‘FactInternetSales’[OrderDate])>0),’FactInternetSales’[ShipDate] – ‘FactInternetSales’[OrderDate])),
    “AverageDays”,
    CALCULATE(SUMX(FILTER(‘FactInternetSales’,(‘FactInternetSales’[ShipDate] – ‘FactInternetSales’[OrderDate])>0),’FactInternetSales’[ShipDate] – ‘FactInternetSales’[OrderDate])) /
    CALCULATE(COUNTX(FILTER(‘FactInternetSales’,(‘FactInternetSales’[ShipDate] – ‘FactInternetSales’[OrderDate])>0),’FactInternetSales’[OrderDate])),

    “OpenOrders1″,
    CALCULATE(COUNTX(FILTER(‘FactInternetSales’,(‘FactInternetSales’[ShipDate1] – ‘FactInternetSales’[OrderDate])>0),’FactInternetSales’[OrderDate])),
    “TotalDays1″,
    CALCULATE(SUMX(FILTER(‘FactInternetSales’,(‘FactInternetSales’[ShipDate1] – ‘FactInternetSales’[OrderDate])>0),’FactInternetSales’[ShipDate1] – ‘FactInternetSales’[OrderDate])),
    “AverageDays1″,
    CALCULATE(SUMX(FILTER(‘FactInternetSales’,(‘FactInternetSales’[ShipDate1] – ‘FactInternetSales’[OrderDate])>0),’FactInternetSales’[ShipDate1] – ‘FactInternetSales’[OrderDate])) /
    CALCULATE(COUNTX(FILTER(‘FactInternetSales’,(‘FactInternetSales’[ShipDate1] – ‘FactInternetSales’[OrderDate])>0),’FactInternetSales’[OrderDate]))
    )
    ORDER BY ‘DimDate’[FullDateAlternateKey]

    David Canales

    June 16, 2013 at 3:35 am

  2. Chris, the second part of my comment was not pasted correctly, I don’t know why, but this is the second part:

    To test the script a bit more I added two more calculated columns to insert into the pivot table, ShipDate1:=If([OrderDateKey]0),’FactInternetSales’[OrderDate])),
    “TotalDays”,
    CALCULATE(SUMX(FILTER(‘FactInternetSales’,(‘FactInternetSales’[ShipDate] – ‘FactInternetSales’[OrderDate])>0),’FactInternetSales’[ShipDate] – ‘FactInternetSales’[OrderDate])),
    “AverageDays”,
    CALCULATE(SUMX(FILTER(‘FactInternetSales’,(‘FactInternetSales’[ShipDate] – ‘FactInternetSales’[OrderDate])>0),’FactInternetSales’[ShipDate] – ‘FactInternetSales’[OrderDate])) /
    CALCULATE(COUNTX(FILTER(‘FactInternetSales’,(‘FactInternetSales’[ShipDate] – ‘FactInternetSales’[OrderDate])>0),’FactInternetSales’[OrderDate])),

    “OpenOrders1″,
    CALCULATE(COUNTX(FILTER(‘FactInternetSales’,(‘FactInternetSales’[ShipDate1] – ‘FactInternetSales’[OrderDate])>0),’FactInternetSales’[OrderDate])),
    “TotalDays1″,
    CALCULATE(SUMX(FILTER(‘FactInternetSales’,(‘FactInternetSales’[ShipDate1] – ‘FactInternetSales’[OrderDate])>0),’FactInternetSales’[ShipDate1] – ‘FactInternetSales’[OrderDate])),
    “AverageDays1″,
    CALCULATE(SUMX(FILTER(‘FactInternetSales’,(‘FactInternetSales’[ShipDate1] – ‘FactInternetSales’[OrderDate])>0),’FactInternetSales’[ShipDate1] – ‘FactInternetSales’[OrderDate])) /
    CALCULATE(COUNTX(FILTER(‘FactInternetSales’,(‘FactInternetSales’[ShipDate1] – ‘FactInternetSales’[OrderDate])>0),’FactInternetSales’[OrderDate]))
    )
    ORDER BY ‘DimDate’[FullDateAlternateKey]

    David Canales

    June 16, 2013 at 3:41 am

  3. Chris, I pasted a second time but the text is not right. Parts of the comment and the code are missing. When I pasted both times, the text looked right, but when I press the button “Post Comment” the results are not right. What should I do to get it right?

    David Canales

    June 16, 2013 at 3:57 am

    • I posted my comments and complete code on my blog: inteligencia-de-negocios.com.
      There is a link to translate the post into english.

      David Canales

      June 16, 2013 at 5:09 pm

      • Hi David,

        I have to admit I’ve not tested this properly as a measure (so I don’t know what values it returns for Grand Totals), but it does return the correct values at the day level in the query I gave. Maybe part of the problem is that you’re not using the official Adventure Works Tabular model and building your own?

        Chris

        Chris Webb

        June 16, 2013 at 10:14 pm

  4. Chris, I just realized that I was not getting my math correctly. If we have 60,398 orders on the FactInternetSales table and it takes on average 7 days for every order to be shipped, then we have every order repeating for 7 days on average. That give us 60,398 * 7 = 422,786 days in total. So, your script is correct and I was on the wrong track. Sorry for that.

    Leaving my previous script aside, I added some lines to your script to calculate the average days that took to ship the orders that were pending on each day. I think that this time I got this right.

    EVALUATE
    ADDCOLUMNS (
    VALUES ( ‘DimDate’[FullDateAlternateKey] ),
    “OpenOrders”,
    CALCULATE (
    COUNTROWS ( ‘FactInternetSales’ ),
    FILTER( ‘FactInternetSales’, ‘FactInternetSales’[OrderDate] <= 'DimDate'[FullDateAlternateKey] ),
    FILTER( 'FactInternetSales', 'DimDate'[FullDateAlternateKey] < 'FactInternetSales'[ShipDate] )),
    "AverageDays",
    CALCULATE(
    SUMX( 'FactInternetSales', 'FactInternetSales'[shipDate] – 'FactInternetSales'[OrderDate] ),
    FILTER( 'FactInternetSales', 'FactInternetSales'[OrderDate] <= 'DimDate'[FullDateAlternateKey] ),
    FILTER( 'FactInternetSales', 'DimDate'[FullDateAlternateKey] < 'FactInternetSales'[ShipDate] )) /
    CALCULATE (
    COUNTROWS ( 'FactInternetSales' ),
    FILTER( 'FactInternetSales', 'FactInternetSales'[OrderDate] <= 'DimDate'[FullDateAlternateKey] ),
    FILTER( 'FactInternetSales', 'DimDate'[FullDateAlternateKey] < 'FactInternetSales'[ShipDate] ))
    )
    ORDER BY 'DimDate'[FullDateAlternateKey]

    David Canales

    June 16, 2013 at 10:44 pm

  5. Hello Chris, i have another question and was wondering if you could point me in the direction to start searching for a solution, I have a case similar to this Events in Progress scenario however, I need the COUNTROWS() function to be run on a table generated by a Summarize function.

    - I have Item Inventory table which have detailed movement of all items and I have a measure on this table that calculates the stock for any given day
    - I created a summarize function that shows (Day – Item – Quantity ) and I’m only showing rows that have 0 stock
    - I am trying to do an aggregation on the result of the summarize function to show me the no of days I had items out of stock per item.

    I understand I am getting into too much details and you are probably very busy but I would really appreciate if you can point me to the direction I can use to resolve the issue, been trying to resolve it for a couple of weeks with no luck at all :(

    Thanks a lot

    Omar Sultan

    June 28, 2013 at 5:33 pm

    • Hi Omar,

      The Summarize function returns a table, so you should be able to use it inside CountRows() and find the number of rows in that table?

      Chris

      Chris Webb

      June 28, 2013 at 5:40 pm

  6. Chris I can’t thank you enough, your advice got it working :) :)

    Your blog and examples has made me accomplish today what I’ve been trying to finish for the past 8 days. Thank you

    Omar Sultan

    June 28, 2013 at 5:58 pm

  7. […] date filters, which I wrote about recently but which Alberto has recently improved on in his must-read white paper […]

  8. […] date filters, which I wrote about recently but which Alberto has recently improved on in his must-read white paper […]

  9. […] problem, one that I have blogged about many times in the past (see here, here, here and here for example) and see how we can solve it in Power Query. It’s a very common problem that you […]

  10. […] problem, one that I have blogged about many times in the past (see here, here, here and here for example) and see how we can solve it in Power Query. It’s a very common problem that you […]

  11. Chris I’m wondering if you have any guidance for implementing this as a measure. You’re initial post was on this solution was very easy to apply to a model I’ve built but I’d like to know if this new method (or Alberto’s variation) would speed things up. However, I’m not totally sure how to make it all work as a measure. Any help would be appreciated.

    William Weber

    June 16, 2014 at 10:42 pm

    • What kind of problems are you having implementing it?

      Chris Webb

      June 17, 2014 at 2:58 pm

      • In the previous solution you essentially built two measures. The running total of active values then subtracted the running total of inactive values to get the current events-in-progress at any point in time. I’m not sure how this method gets implemented as a measure. The query above seems to me to function more as a calculated column. For whatever reason I just can’t seem to wrap my head around converting the logic to a calculated measure in my fact table that returns the same values as the earlier solution.

        William Weber

        June 17, 2014 at 5:31 pm

      • OK, yes, I see. Let me have a think and see if I can come up with a version that is a measure.

        Chris Webb

        June 19, 2014 at 9:44 am

  12. Chris,

    Your content is always much appreciated. I have a presumably simple question which I have not been able to track down an answer for. It is a simple “count all rows between dates”, very similar to your example. Two tables ; Customers ; id, fromdate, todate And Calendar ; date, week, month, year.

    So, user will want to trend and calculate # of customers in any which way, meaning count by week, month, year over year, etc.

    I guess I will need the same setup as you, unrelated tables. I have about 200k customers in the table over 5 years, so data volumes are not too bad.

    Would appreciate input. Cheers

    Stian

    June 27, 2014 at 2:27 pm

    • I guess the first question you’d need to answer is how the users would choose their From and To dates. Would you want to make them separate tables, and separate selections, or would you want to make the selection on just one table?

      Chris Webb

      June 29, 2014 at 11:45 am

      • Thanks for the reply.

        The users will choose period from time table, and a measure will show how many customers were active at that time. This is not customers per se, rather subscriptions with a from date and a to date if it was ended. If a user selects a date, the sql query for returning a count for that date would be extremely simple. I know however that once you start selecting a group of dates, for example month, the business requirements needs to be clear on how it should behave.

        Stian

        June 29, 2014 at 10:28 pm

      • Yes, but I probably wasn’t clear about my question – do you want your users to makes two selections, a “from” date and a “to” date, from two different tables, or do you want them to make the selection as a range of dates from a single table?

        Chris Webb

        June 29, 2014 at 10:45 pm

      • Definitely as a selection from one date table. Storing data this way is quite common for a lot of the sources involved, and I am hoping there is an elegant and powerful solution to it with tabular.

        Stian

        June 29, 2014 at 11:24 pm

      • Maybe I can fill in a bit.

        Creating a calculated column on date table :

        =CALCULATE(COUNTROWS(‘Subscription’);FILTER(Subscription;Subscription[FromDate]>=Date[Date] && Subscription[ToDate]=Date[Date])) — Returns correct number for each day, but is of course not the number I am looking for.

        So, one filter works fine.

        ToDate are blank for active subscriptions, but I have set them to the last date of date table to not have more factors than necessary for solving this basic problem.

        Another requirement is that the user would want to slice/filter the data based on columns in the Subscription table, for example type or product.

        Coming from the multidimensional side where this would be modeled quite differently is maybe not helping me much either. :)

        Stian

        June 30, 2014 at 12:33 pm

      • Like a previous user here, wordpress removed some of my post.

        CALCULATE(COUNTROWS(‘Subscription’);FILTER(Subscription;Subscription[FromDate]>=Date[Date] && Subscription[ToDate]=Date[Date])) – Returns correct number for each day, but is of course not the number I am looking for.

        Stian

        June 30, 2014 at 12:35 pm

      • I think WordPress has lost some of your code again :-)

        Anyway, I get the idea. Calculated columns aren’t going to do what you want; you need to create a measure here. I think what you’re missing is in the Filter() function you’d need to find the first and last dates in the selection made by the user (so if the user selected a year, you’d want to find the first and last dates in the year and then find the number of subscriptions active in that year). Therefore you’d need to use the FirstDate and LastDate functions to on your Date[Date] column.

        Chris Webb

        June 30, 2014 at 1:15 pm


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 3,072 other followers

%d bloggers like this: