Chris Webb's BI Blog

Analysis Services, MDX, PowerPivot, DAX and anything BI-related

Using Non_Empty_Behavior With YTD calculations

with 3 comments

I always believe in giving credit where credit’s due, and before I go on with this blog entry I have to declare that this isn’t my idea. I heard about the general approach from Eric Jacobsen (or possibly Richard Tkachuk) at the BI Conference last year and he said it originally came from Thomas Kejser who is now on the SQLCat team; I’ll be working with him on a project fairly soon so I’ll ask him about it! I’ve not actually seen it implemented or written about anywhere though, and since I’ve only just got around to testing it out and seen how useful it can be I thought I’d blog about it.

Let’s start by taking a look at the following Adventure Works query:

with member measures.ytdsales as
sum(
periodstodate([Date].[Calendar].[Calendar Year],[Date].[Calendar].currentmember)
, [Measures].[Internet Sales Amount])

select {[Date].[Calendar].[Month].&[2003]&[12].children} on 0,
non empty
[Customer].[Customer Geography].[Customer].members
*
[Promotion].[Promotion].[Promotion].members
on 1
from [adventure works]
where(measures.ytdsales, [Product].[Subcategory].&[2])

It runs in 1 minute 10 seconds on my laptop and returns 33 columns and 2585 rows. YTD calculations are now very efficient in AS2005 SP2 and as far as I can see the real cause of the problem is the extremely large crossjoin on rows and the associated non empty. Relatively few customers actually bought anything in 2003 in this particular subcategory and when they did it was usually associated with no discount, so there’s a lot of empty tuples that we are going to need to remove on rows; but in order to be able to filter them out, at the moment, we have to calculate the ytd sales for every single one of them. Now if we had control over the MDX of this query we could probably do some clever stuff with the NonEmpty function that would improve performance a lot, but what if users are querying with an ad hoc tool like Excel or Proclarity? It would be nice if we could set the Non_Empty_Behavior property on this calculated member, but that involves having a real measure on the cube whose value is null when ytd sales is null, and we don’t have one of those…

So why don’t we build one? In fact it’s much easier to do this than you think especially when you remember that we’re not interested in the value of this measure at all, just whether it’s null or not. What we need to do is take our existing fact table, do a group by to find the minimum date for every key combination and then generate a row in the new fact table from that date to the end of the year. So to take a simplified example if you had a fact table with two dimensions, Product and Time, and the granularity of Time was Month, you would start with this:

Product

Time

Sales

Apples

October

100

Oranges

November

150

and want to get to this:

Product

Time

Apples

October

Apples

November

Apples

December

Oranges

November

Oranges

December

Here’s a SQL query which does just that for the FactInternetSales fact table in Adventure Works DW:

select
f.ProductKey, f.DueDateKey, f.ShipDateKey,
f.CustomerKey, f.PromotionKey, f.CurrencyKey, f.SalesTerritoryKey,
c.TimeKey, f.SalesOrderNumber, f.SalesOrderLineNumber
from
(select i.ProductKey, i.DueDateKey, i.ShipDateKey,
i.CustomerKey, i.PromotionKey, i.CurrencyKey, i.SalesTerritoryKey
, i.SalesOrderNumber, i.SalesOrderLineNumber
, min(i.OrderDateKey) as mindate, t.CalendarYear
from
dbo.FactInternetSales i inner join dbo.DimTime t
on i.orderdatekey = t.timekey
–where SalesOrderNumber=’SO43701′
group by t.CalendarYear, i.ProductKey, i.DueDateKey, i.ShipDateKey,
i.CustomerKey, i.PromotionKey, i.CurrencyKey, i.SalesTerritoryKey, i.SalesOrderNumber, i.SalesOrderLineNumber) f
inner join
DimTime c on f.CalendarYear = c.CalendarYear
and f.mindate<=c.TimeKey

You’ll have to excuse my SQL – it’s pretty ropey – but even though I’m sure this could be optimised it still runs very quickly; if you uncomment that line with the where clause in the middle, so the table is filtered down to one SalesOrderNumber, you’ll see what it’s doing more easily. Now you’ll probably be worried, quite rightly, that doing this creates a fact table that’s much larger than your original fact table and that’s certainly true. In this case FactInternetSales is just over 60000 rows and the query above returns almost 7.5 million rows; however depending on the granularity of your fact table and the nature of your data the explosion could be a lot less, and we’re not interested in any of the measure columns in the original fact table either so this new table will not be as wide.

The next step is to take the query above and use it as the basis of a named query in the Adventure Works dsv, and then build a new measure group in the Adventure Works cube from it containing a single Count measure. Partitioning and aggregation might be useful here too, but I didn’t bother with them and just went ahead and processed; I was pleasantly surprised at how quickly it processed, I guess because of the lack of measures.

Anyway, now we’ve got a measure in our cube we can use for our Non_Empty_Behavior property. I called it [Fact NEB Count] and here’s the new version of the calculation in the original query:

with member measures.ytdsales as
sum(
periodstodate([Date].[Calendar].[Calendar Year],[Date].[Calendar].currentmember)
, [Measures].[Internet Sales Amount])
, non_empty_behavior=[Measures].[Fact NEB Count]
select {[Date].[Calendar].[Month].&[2003]&[12].children} on 0,
non empty
[Customer].[Customer Geography].[Customer].members
*
[Promotion].[Promotion].[Promotion].members
on 1
from [adventure works]
where(measures.ytdsales, [Product].[Subcategory].&[2])

The query now runs in, wait for it, in 7 seconds on a cold cache in SQLMS and if you look in Profiler you’ll see that in fact the majority of that time is taken up by SQLMS rendering the resultset – it takes just over 2.5 seconds to actually run on the server. A lot of effort, certainly, but a big improvement. Of course I’ve cheated a little bit and used a query that shows a particularly large change – the larger percentage that rows with data make up of the overall number of rows in the crossjoin, the less the difference will be. That said, I’m working with a customer at the moment for whom this is a very valid scenario: they have a lot of customers and a lot of products, but a customer only tends to buy one or two products and the users love to run big queries with YTD measures.

Written by Chris Webb

February 27, 2008 at 6:30 pm

Posted in Analysis Services

3 Responses

Subscribe to comments with RSS.

  1. Hi Chris,
     
    Shouldn\’t the min, rather than the max date in the year be used, because YTD would not be null on the min and higher dates? In your query, it probably doesn\’t matter, since the grouping includes the fact table (SalesOrder/Line) keys, where there would be a single order date. But that\’s my other question – do the fact table keys need to be included in the grouping?
     
     – Deepak

    Deepak

    March 3, 2008 at 6:53 am

  2. Where would I be without you, Deepak? You\’re right as always, and I\’ve updated the SQL appropriately. I included the Sales Order/Line keys so I could join the Internet Sales Order Details and Sales Reasons dimensions to my new measure group; without them you\’re right I wouldn\’t have needed to add them to the group by.

    Chris

    March 3, 2008 at 9:31 am

  3. Hi Chris,
    I\’m a young (and not so expert) SSAS developer, and I\’m trying to increase performance of a cube with 15 dimensions, about a hundred of measures and YTD calculations (7 measure groups), and a small fact table containing 1.7 million rows. Last week I tryied to find a way to use NEB with YTD calculations and I found your solution. Creating the named query in dsv, I got another fact table containing 218 million rows (granularity of the Time dimensione is Day). Now my problem is the processing time: it passed from 15 minutes to about 2 hours.
    If you want my e-mail address is "meli"+ "spa@hot" + "mail.it".
     
    Thank you for your help and excuse me for my bad english.
     
    Paolo

    paolo

    August 28, 2008 at 6:24 pm


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 3,083 other followers

%d bloggers like this: