Chris Webb's BI Blog

Analysis Services, MDX, PowerPivot, DAX and anything BI-related

A New Events-In-Progress DAX Pattern

with 11 comments

I’ve been working on a very complex SSAS Tabular implementation recently, and as a result I’ve learned a few new DAX tricks. The one that I’m going to blog about today takes me back to my old favourite, the events-in-progress problem. I’ve blogged about it a lot of times, looking at solutions for MDX and DAX (see here and here), and for this project I had to do some performance tuning on a measure that uses a filter very much like this.

Using the Adventure Works Tabular model, the obvious way of finding the number of Orders on the Internet Sales table that are open on any given date (ie where the Date is between the dates given in the Order Date and the Ship Date column) is to write a query something like this:

EVALUATE

ADDCOLUMNS (

    VALUES ( 'Date'[Date] ),

    "OpenOrders",

    CALCULATE (

        COUNTROWS ( 'Internet Sales' ),

        FILTER( 'Internet Sales', 'Internet Sales'[Ship Date] > 'Date'[Date] ),

        FILTER( 'Internet Sales', 'Internet Sales'[Order Date] <= 'Date'[Date] )

    )

)

ORDER BY 'Date'[Date]

On my laptop this executes in around 1.9 seconds on a cold cache. However, after a bit of experimentation, I found the following query was substantially faster:

EVALUATE

ADDCOLUMNS (

    VALUES ( 'Date'[Date] ),

    "OpenOrders",

    COUNTROWS(

        FILTER(

            'Internet Sales',

            CONTAINS(

                DATESBETWEEN('Date'[Date]

                    , 'Internet Sales'[Order Date]

                    , DATEADD('Internet Sales'[Ship Date],-1, DAY))

                , [Date]

                , 'Date'[Date]

            )

        )

    )

)

ORDER BY 'Date'[Date]

On a cold cache this version executes in just 0.2 seconds on my laptop. What’s different? In the first version of the calculation the FILTER() function is used to find the rows in Internet Sales where the Order Date is less than or equal to the Date on rows, and where the Ship Date is greater than the Date. This is the obvious way of solving the problem. In the new calculation the DATESBETWEEN() function is used to create a table of dates from the Order Date to the day before the Ship Date for each row on Internet Sales, and the CONTAINS() function is used to see if the Date we’re interested in appears in that table.

I’ll be honest and admit that I’m not sure why this version is so much faster, but if (as it seems) this is a generally applicable pattern then I think this is a very interesting discovery.

Thanks to Marco, Alberto and Marius for the discussion around this issue…

UPDATE: Scott Reachard has some some further testing on this technique, and found that the performance is linked to the size of the date ranges. So, the shorter your date ranges, the faster the performance; if you have large date ranges, this may not be the best performing solution. See https://twitter.com/swreachard/status/349881355900952576

UPDATE: Alberto has done a lot more research into this problem, and come up with an even faster solution. See: http://www.sqlbi.com/articles/understanding-dax-query-plans/

Written by Chris Webb

June 13, 2013 at 10:32 am

11 Responses

Subscribe to comments with RSS.

  1. Chris, I was running your scripts against AdventureWorksDW2012 and getting unintended results. I think that the numbers in the OpenOrders column are not the number of Open Orders, but the total number of days between OrderDay and ShipDay. The Grand Total that I get, running your first script is 422,786 and the Grand Total of the second script is 440,504.

    So, I decided to create a PivotTable. First, I created a Relashionhip between FactInternetSales[OrderDate] and DimDate[FullDateAlternateKey]. Then I added a new calculated column to the FactInternetSales table, OrdersPending:=If([ShipDate]-[OrderDate]>0,1,0). Adding those two columns to the PivotTable gives me the correct number of Pending Orders for every date and a Grand Total of 60,398, which is the number of rows of the FactInternetSales table, given that there in no row with [OrderDate] = [ShipDate]. I also added another calculated column, TotalDays:=[ShipDate]-[OrderDate].

    Then I used the same logic of ([ShipDate]-[OrderDate]>0) on your script. I added two more columns. [TotalDays] to calculate the sum of Pending Days for each date. [AverageDays] to calculate the average Pending Days for the Orders Pending on each date. The number of days between [ShipDate] and [OrderDate] is always 7 for the whole FactInternetSales table. The [OpenOrders] Grand Total is 60,398 and [TotalDays] Grand Total is 422,786.

    To test the script a bit more I added two more calculated columns to insert into the pivot table, ShipDate1:=If([OrderDateKey]0),’FactInternetSales’[OrderDate])),
    “TotalDays”,
    CALCULATE(SUMX(FILTER(‘FactInternetSales’,(‘FactInternetSales’[ShipDate] – ‘FactInternetSales’[OrderDate])>0),’FactInternetSales’[ShipDate] – ‘FactInternetSales’[OrderDate])),
    “AverageDays”,
    CALCULATE(SUMX(FILTER(‘FactInternetSales’,(‘FactInternetSales’[ShipDate] – ‘FactInternetSales’[OrderDate])>0),’FactInternetSales’[ShipDate] – ‘FactInternetSales’[OrderDate])) /
    CALCULATE(COUNTX(FILTER(‘FactInternetSales’,(‘FactInternetSales’[ShipDate] – ‘FactInternetSales’[OrderDate])>0),’FactInternetSales’[OrderDate])),

    “OpenOrders1″,
    CALCULATE(COUNTX(FILTER(‘FactInternetSales’,(‘FactInternetSales’[ShipDate1] – ‘FactInternetSales’[OrderDate])>0),’FactInternetSales’[OrderDate])),
    “TotalDays1″,
    CALCULATE(SUMX(FILTER(‘FactInternetSales’,(‘FactInternetSales’[ShipDate1] – ‘FactInternetSales’[OrderDate])>0),’FactInternetSales’[ShipDate1] – ‘FactInternetSales’[OrderDate])),
    “AverageDays1″,
    CALCULATE(SUMX(FILTER(‘FactInternetSales’,(‘FactInternetSales’[ShipDate1] – ‘FactInternetSales’[OrderDate])>0),’FactInternetSales’[ShipDate1] – ‘FactInternetSales’[OrderDate])) /
    CALCULATE(COUNTX(FILTER(‘FactInternetSales’,(‘FactInternetSales’[ShipDate1] – ‘FactInternetSales’[OrderDate])>0),’FactInternetSales’[OrderDate]))
    )
    ORDER BY ‘DimDate’[FullDateAlternateKey]

    David Canales

    June 16, 2013 at 3:35 am

  2. Chris, the second part of my comment was not pasted correctly, I don’t know why, but this is the second part:

    To test the script a bit more I added two more calculated columns to insert into the pivot table, ShipDate1:=If([OrderDateKey]0),’FactInternetSales’[OrderDate])),
    “TotalDays”,
    CALCULATE(SUMX(FILTER(‘FactInternetSales’,(‘FactInternetSales’[ShipDate] – ‘FactInternetSales’[OrderDate])>0),’FactInternetSales’[ShipDate] – ‘FactInternetSales’[OrderDate])),
    “AverageDays”,
    CALCULATE(SUMX(FILTER(‘FactInternetSales’,(‘FactInternetSales’[ShipDate] – ‘FactInternetSales’[OrderDate])>0),’FactInternetSales’[ShipDate] – ‘FactInternetSales’[OrderDate])) /
    CALCULATE(COUNTX(FILTER(‘FactInternetSales’,(‘FactInternetSales’[ShipDate] – ‘FactInternetSales’[OrderDate])>0),’FactInternetSales’[OrderDate])),

    “OpenOrders1″,
    CALCULATE(COUNTX(FILTER(‘FactInternetSales’,(‘FactInternetSales’[ShipDate1] – ‘FactInternetSales’[OrderDate])>0),’FactInternetSales’[OrderDate])),
    “TotalDays1″,
    CALCULATE(SUMX(FILTER(‘FactInternetSales’,(‘FactInternetSales’[ShipDate1] – ‘FactInternetSales’[OrderDate])>0),’FactInternetSales’[ShipDate1] – ‘FactInternetSales’[OrderDate])),
    “AverageDays1″,
    CALCULATE(SUMX(FILTER(‘FactInternetSales’,(‘FactInternetSales’[ShipDate1] – ‘FactInternetSales’[OrderDate])>0),’FactInternetSales’[ShipDate1] – ‘FactInternetSales’[OrderDate])) /
    CALCULATE(COUNTX(FILTER(‘FactInternetSales’,(‘FactInternetSales’[ShipDate1] – ‘FactInternetSales’[OrderDate])>0),’FactInternetSales’[OrderDate]))
    )
    ORDER BY ‘DimDate’[FullDateAlternateKey]

    David Canales

    June 16, 2013 at 3:41 am

  3. Chris, I pasted a second time but the text is not right. Parts of the comment and the code are missing. When I pasted both times, the text looked right, but when I press the button “Post Comment” the results are not right. What should I do to get it right?

    David Canales

    June 16, 2013 at 3:57 am

    • I posted my comments and complete code on my blog: inteligencia-de-negocios.com.
      There is a link to translate the post into english.

      David Canales

      June 16, 2013 at 5:09 pm

      • Hi David,

        I have to admit I’ve not tested this properly as a measure (so I don’t know what values it returns for Grand Totals), but it does return the correct values at the day level in the query I gave. Maybe part of the problem is that you’re not using the official Adventure Works Tabular model and building your own?

        Chris

        Chris Webb

        June 16, 2013 at 10:14 pm

  4. Chris, I just realized that I was not getting my math correctly. If we have 60,398 orders on the FactInternetSales table and it takes on average 7 days for every order to be shipped, then we have every order repeating for 7 days on average. That give us 60,398 * 7 = 422,786 days in total. So, your script is correct and I was on the wrong track. Sorry for that.

    Leaving my previous script aside, I added some lines to your script to calculate the average days that took to ship the orders that were pending on each day. I think that this time I got this right.

    EVALUATE
    ADDCOLUMNS (
    VALUES ( ‘DimDate’[FullDateAlternateKey] ),
    “OpenOrders”,
    CALCULATE (
    COUNTROWS ( ‘FactInternetSales’ ),
    FILTER( ‘FactInternetSales’, ‘FactInternetSales’[OrderDate] <= 'DimDate'[FullDateAlternateKey] ),
    FILTER( 'FactInternetSales', 'DimDate'[FullDateAlternateKey] < 'FactInternetSales'[ShipDate] )),
    "AverageDays",
    CALCULATE(
    SUMX( 'FactInternetSales', 'FactInternetSales'[shipDate] – 'FactInternetSales'[OrderDate] ),
    FILTER( 'FactInternetSales', 'FactInternetSales'[OrderDate] <= 'DimDate'[FullDateAlternateKey] ),
    FILTER( 'FactInternetSales', 'DimDate'[FullDateAlternateKey] < 'FactInternetSales'[ShipDate] )) /
    CALCULATE (
    COUNTROWS ( 'FactInternetSales' ),
    FILTER( 'FactInternetSales', 'FactInternetSales'[OrderDate] <= 'DimDate'[FullDateAlternateKey] ),
    FILTER( 'FactInternetSales', 'DimDate'[FullDateAlternateKey] < 'FactInternetSales'[ShipDate] ))
    )
    ORDER BY 'DimDate'[FullDateAlternateKey]

    David Canales

    June 16, 2013 at 10:44 pm

  5. Hello Chris, i have another question and was wondering if you could point me in the direction to start searching for a solution, I have a case similar to this Events in Progress scenario however, I need the COUNTROWS() function to be run on a table generated by a Summarize function.

    - I have Item Inventory table which have detailed movement of all items and I have a measure on this table that calculates the stock for any given day
    - I created a summarize function that shows (Day – Item – Quantity ) and I’m only showing rows that have 0 stock
    - I am trying to do an aggregation on the result of the summarize function to show me the no of days I had items out of stock per item.

    I understand I am getting into too much details and you are probably very busy but I would really appreciate if you can point me to the direction I can use to resolve the issue, been trying to resolve it for a couple of weeks with no luck at all :(

    Thanks a lot

    Omar Sultan

    June 28, 2013 at 5:33 pm

    • Hi Omar,

      The Summarize function returns a table, so you should be able to use it inside CountRows() and find the number of rows in that table?

      Chris

      Chris Webb

      June 28, 2013 at 5:40 pm

  6. Chris I can’t thank you enough, your advice got it working :) :)

    Your blog and examples has made me accomplish today what I’ve been trying to finish for the past 8 days. Thank you

    Omar Sultan

    June 28, 2013 at 5:58 pm

  7. […] date filters, which I wrote about recently but which Alberto has recently improved on in his must-read white paper […]

  8. […] date filters, which I wrote about recently but which Alberto has recently improved on in his must-read white paper […]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 2,868 other followers

%d bloggers like this: