Chris Webb's BI Blog

Analysis Services, MDX, PowerPivot, DAX and anything BI-related

Archive for March 2011

Dynamic Named Sets and the Formula Cache

with 6 comments

Here’s another scenario where the Formula Engine is unable to cache the results of calculations longer than the lifetime of a query (thanks to Akshai Mirchandani yet again for pointing this one out): when you use dynamic named sets inside calculations. To see this happening add the following calculations to the MDX Script of the Adventure Works cube:

CREATE SET NONDYNAMICSET AS [Date].[Day Name].MEMBERS;

CREATE DYNAMIC SET DYNAMICSET AS [Date].[Day Name].MEMBERS;

CREATE MEMBER CURRENTCUBE.MEASURES.NONDYNAMICCALC AS COUNT(NONDYNAMICSET);

CREATE MEMBER CURRENTCUBE.MEASURES.DYNAMICCALC AS COUNT(DYNAMICSET);

Now, clear the cache and run the following query twice:

SELECT MEASURES.NONDYNAMICCALC ON 0
FROM [Adventure Works]

Looking in Profiler, you can see that on the second run the query was answered from the formula engine’s flat cache:

image

When you run the following query using the calculation that references the dynamic named set:

SELECT MEASURES.DYNAMICCALC ON 0
FROM [Adventure Works]

…you can see that the cache does not get hit:

image

So be careful if you ever need to use them inside your calculations! Personally I’ve never done much with dynamic named sets at all, but Mosha has of course blogged extensively about several ways they can be useful.

Written by Chris Webb

March 31, 2011 at 9:50 pm

Posted in Analysis Services, MDX

Last Ever Non Empty – a new, fast MDX approach

with 165 comments

The last non empty semi-additive measure aggregation functionality in SSAS enterprise edition is very useful, but it doesn’t support one common business requirement: while it will give you the last non empty value within any given time period, it doesn’t handle the variation where you want to get the last non empty value of a measure from all preceding time periods (this is what I’m calling the ‘last ever non empty’ value). There are a number of business scenarios where you’d want to do this, for example finding the value of the last purchase a customer made, the last price you sold a product at, and the stock level of a product in a shop the last time a sales rep visited. Traditional MDX solutions to this problem have suffered from poor performance but in this blog post I’ll describe a new approach that performs much better; I think it will be very useful to a lot of people, and I’m quite proud of it!

Let’s take the following MDX query on Adventure Works as an example of the problem:

SELECT 
HEAD([Customer].[Customer].[Customer].MEMBERS, 10)
*
{[Measures].[Internet Sales Amount]} 
ON 0,
NON EMPTY
[Date].[Date].[Date].MEMBERS
ON 1
FROM [Adventure Works]

Here’s part of the results:

image

From this we can see that individual customers only bought from us once or twice. Now, for any date, let’s create a calculation that will find what the value of the last purchase by any given customer was, regardless of however long ago it was. Up until last week I’d have tackled this problem using a combination of the NonEmpty and Tail functions – for each customer and date, get the set of all preceding dates, find the dates which had values and find the value of the last date. Here’s the code:

WITH 
MEMBER MEASURES.[Last Sale Original] AS
TAIL(
NONEMPTY({NULL:[Date].[Date].CURRENTMEMBER} * [Measures].[Internet Sales Amount])
).ITEM(0)

SELECT 
HEAD([Customer].[Customer].[Customer].MEMBERS, 10)
*
{[Measures].[Internet Sales Amount],MEASURES.[Last Sale Original]} 
ON 0,
[Date].[Date].[Date].MEMBERS
ON 1
FROM [Adventure Works]

And here’s the part of the results dealing with the first customer, Aaron A. Allen:

image

On my laptop the query takes 14 seconds to run, and that’s with only 10 customers on columns (it executes in cell-by-cell mode, I think); in many real world scenarios this kind of performance isn’t acceptable and that was certainly the case with the customer I was working with last week. So I came up with the following new MDX that does the same thing much faster:

WITH 

MEMBER MEASURES.DAYSTODATE AS 
COUNT(NULL:[Date].[Date].CURRENTMEMBER)-1

MEMBER MEASURES.HADSALE AS 
IIF([Measures].[Internet Sales Amount]=0, NULL, MEASURES.DAYSTODATE)

MEMBER MEASURES.MAXDATE AS 
MAX(NULL:[Date].[Date].CURRENTMEMBER, MEASURES.HADSALE)

MEMBER MEASURES.LASTSALE AS
IIF(ISEMPTY(MEASURES.MAXDATE), NULL, 
([Measures].[Internet Sales Amount],
[Date].[Date].[Date].MEMBERS.ITEM(MEASURES.MAXDATE)))


SELECT 
HEAD([Customer].[Customer].[Customer].MEMBERS, 10)
*
{[Measures].[Internet Sales Amount]
,MEASURES.[LASTSALE]} 
ON 0,
[Date].[Date].[Date].MEMBERS
ON 1
FROM [Adventure Works]

On my laptop this query now executes in 3 seconds. Here’s what it’s doing:

  • First of all the DaysToDate measure returns the zero-based index of the current date within the set of all dates, so the first date in the time dimension would have index 0, the second 1 and so on. This could be replaced by a real measure to get slightly better performance but I left it as a calculated measure for the sake of clarity.
  • Next, the measure HadSale returns the index of the current date if it has a value and null otherwise.
  • Next, the measure MaxDate returns the maximum value of HadSale for the set of all dates from the beginning of time up to the current date. This will give us the index of the last date which had a value.
  • Finally we can take this index and, using the Item function, get the value of Internet Sales Amount for the last date that had a value.

If we want to take this approach and apply it to a server-based calculation, and make it work at all levels on the Date dimension, we need a slight variation. Again using the Adventure Works cube to illustrate, here’s what you need to do…

First of all, you need to create a new column in your fact table that contains only null values and use this as the basis of a new real (ie not calculated) measure, which should be called MaxDate. This should have the aggregation function Max.

image

You then need to add the following code to the MDX Script of the cube:

CREATE MEMBER CURRENTCUBE.MEASURES.DAYSTODATE AS 
COUNT(NULL:[Date].[Date].CURRENTMEMBER)-1
, VISIBLE=FALSE;

CREATE MEMBER CURRENTCUBE.MEASURES.HADSALE AS 
IIF([Measures].[Internet Sales Amount]=0, NULL, MEASURES.DAYSTODATE)
, VISIBLE=FALSE;

SCOPE(MEASURES.MAXDATE, [Date].[Date].[Date].MEMBERS); 
    THIS = MAX(NULL:[Date].[Date].CURRENTMEMBER, MEASURES.HADSALE);
END SCOPE;

CREATE MEMBER CURRENTCUBE.MEASURES.LASTSALE AS
IIF(ISEMPTY(MEASURES.MAXDATE), NULL, 
([Measures].[Internet Sales Amount],
[Date].[Date].[Date].MEMBERS.ITEM(MEASURES.MAXDATE)));

This does basically the same as the previous example only now MaxDate is a real measure instead of a calculated measure, and we’re using a scoped assignment to overwrite its value at the Date level. Above the Date level the default aggregation method of the MaxDate measure kicks in and we see the Max value of MaxDate for all dates in the current time period – which means at the month, quarter and year level we once again get the index of the last non empty date. Here’s what the result looks like in the cube browser:

image

Written by Chris Webb

March 24, 2011 at 10:34 pm

Posted in MDX

Public MDX Course: London, 12th-13th May 2011

with 11 comments

I’ve finally got round to organising a two-day public MDX course, to be held in the UK, in central London on the 12th and 13th of May. It’s the standard ‘Introduction to MDX’ course that I’ve been teaching for several years now as a private course (indeed I’ve given it five or six times already this year) and aims to take you from the point of being a complete beginner with no previous knowledge of MDX up to the point where you can write 90% of the MDX calculations and queries you’ll ever need to write.

Here’s the agenda:

Day 1

  • What is MDX and why should I use it?
  • Understanding the structure of a cube: recognizing cubes, measure groups, dimensions and hierarchies
  • Understanding the concepts of members, tuples and sets
  • Writing simple queries: using SQL Management studio to write SELECT statements
  • Creating simple calculated members: when to use query, session and cube-scoped calculations, using simple MDX expressions and functions

Day 2

  • Standard Calculations: looking at the best ways to implement common calculations, such as time series, market shares and rankings
  • Using set functions: using common MDX functions such as Crossjoin, Filter, Order, Generate to create more complex queries
  • Advanced concepts: autoexists, solve order and subselects
  • MDX Script Assignments: making scoped assignments, understanding how assignments affect aggregation, using assignments with calculated members

The cost is £499 plus VAT, including lunch and drinks/snacks. There will be some practical exercises but to do them you’ll need to bring your own laptop with SSAS and Adventure Works on – I’m not providing any PCs.

Here’s the link to register:
http://www.regonline.co.uk/Register/Checkin.aspx?EventID=940936

UPDATE I’m running the same course on October 26th-28th 2011 in London:

http://cwebbbi.wordpress.com/2011/06/28/public-mdx-courselondonoctober-26-to-28-2011/

UPDATE #2 – I’m now running all my public MDX and SSAS training courses through my new company, Technitrain. Check out the website for more details:
http://www.technitrain.com/index.php?trackingcode=CWB

 

Written by Chris Webb

March 16, 2011 at 8:34 pm

Posted in Events

Referencing Named Sets in Calculations

with 13 comments

I was recently involved in an interesting discussion about the negative performance impact of referencing named sets inside calculated members. It’s an issue that’s dealt with in this topic in BOL, along with lots of other useful tips for things to avoid when writing MDX calculations:
http://msdn.microsoft.com/en-us/library/bb934106.aspx

Since I see lots of people making this mistake, though, I thought it was nonetheless worth a blog post; it’s certainly very easy to reproduce in Adventure Works. Take the following set of calculations:

CREATE SET ALLCUSTS AS [Customer].[Customer].[Customer].MEMBERS;

CREATE MEMBER CURRENTCUBE.MEASURES.TEST1 AS 
COUNT(
NONEMPTY(
[Customer].[Customer].[Customer].MEMBERS
, [Measures].[Internet Sales Amount])
);

CREATE MEMBER CURRENTCUBE.MEASURES.TEST2 AS 
COUNT(
NONEMPTY(
ALLCUSTS
, [Measures].[Internet Sales Amount])
);

CREATE MEMBER CURRENTCUBE.MEASURES.TEST3 AS 
SUM(
[Customer].[Customer].[Customer].MEMBERS
, [Measures].[Internet Sales Amount]);

CREATE MEMBER CURRENTCUBE.MEASURES.TEST4 AS 
SUM(
ALLCUSTS
, [Measures].[Internet Sales Amount]);

 

You’ll notice that TEST1 and TEST2 are essentially the same calculation, as are TEST3 and TEST4; the only difference between them is that the set expressions in TEST1 and TEST3 have been replaced by references to the named set ALLCUSTS in TEST2 and TEST4.

Now run the following query four times on a cold cache, each time putting a different calculated measure from the list above in the WHERE clause:

SELECT [Date].[Calendar Year].MEMBERS ON 0,
[Product].[Product].MEMBERS ON 1
FROM [Adventure Works]
WHERE(MEASURES.TEST1)
 

On my machine the query with TEST1 took 874ms to run; the query with TEST2 took 6302ms; the query with TEST3 took 234ms; and the query with TEST4 I ended up killing after a few minutes.

So, clearly, as the article says referencing a named set inside one of the MDX aggregation functions in a calculation is a Very Bad Thing for performance and something to be avoided at all costs. While it might seem an appealing thing to do for readability, the downsides are significant.

Written by Chris Webb

March 16, 2011 at 8:12 pm

Posted in MDX

‘Events in Progress’ Part 3 – Average Age of Unpaid Invoices

with 21 comments

Sorry for the relative quiet on here recently, I’ve been caught up with helping to organise SQLBits and other stuff… But anyway, this week I’m working with a company who asked me to tackle what must be a very common business problem: they have a fact table where one row represents an invoice they have generated, and they need to know at any point in time what the average age of all their unpaid invoices is, and monitor whether this value goes up or down. The point of this is, of course, is that if the average age of outstanding invoices starts to go up over time then it means that their customers are getting slower to pay them and they may potentially have a cashflow problem.

The solution is a variation on the ‘events in progress’ model that I blogged about recently but with an interesting twist that I thought was worth describing. First of all, we need to prepare some data so we can reproduce the scenario in Adventure Works – let’s pretend that the FactInternetSales table actually contains invoice data. We’ll need to massage the data a bit for two reasons:

  • In Adventure Works if you look at the OrderDate, ShipDate and DueDate columns there’s always exactly the same number of days between the dates, which doesn’t make for a great demo
  • In the real world some invoices will be unpaid when the fact table is loaded, others will be paid. We need to be able to handle both scenarios properly.

Once again, we need two fact tables. The first of these will contain all of our invoices, and for this I used the following named query in the Adventure Works DSV:

SELECT        CustomerKey, OrderDateKey
FROM            dbo.FactInternetSales

The second will be a fact table that contains information on invoices that have been paid:

SELECT       
CustomerKey,
OrderDateKey,
CASE WHEN SalesTerritoryKey > 5 THEN ShipDateKey ELSE DueDateKey END AS PaidDateKey,
CASE WHEN SalesTerritoryKey > 5 THEN 7 ELSE 12 END AS DaysToPayment
FROM            dbo.FactInternetSales
WHERE        (PromotionKey < 3)

A few things to notice in this second query:

  • I’ve filtered the fact table to remove a very small number of rows where PromotionKey is less than 3 – this will give us a few unpaid invoices to make the numbers more interesting.
  • I’ve created a new date key called PaidDateKey to make the invoices have different payment lengths.
  • Because these are paid invoices, we know how long they took to be paid and we can precalculate this and put it in the fact table in the DaysToPayment column.

We can now build the cube. It needs two measure groups, one based on each fact table, and we can add the Customer dimension too. Three measures need to be built: one count measure on each measure group and a sum measure on the DaysToPayment column. The Date dimension joins to the OrderDateKey on the All Invoices measure group, and to the PaidDateKey on the Paid Invoices measure group.

Now for the MDX. The first thing we need to do is find the total number of invoices generated to date and the total number of invoices that have been paid to date (and also to make sure that these totals aggregate up properly through the time dimension); we can then subtract the second from the first to find the total number of unpaid invoices:

CREATE MEMBER CURRENTCUBE.MEASURES.[Total Invoices To Date] AS NULL;

SCOPE(MEASURES.[Total Invoices To Date]);

    SCOPE([Date].[Date].[Date].MEMBERS);
        THIS = SUM(NULL:[Date].[Date].CURRENTMEMBER, [Measures].[All Invoices Count]);
    END SCOPE;

    SCOPE([Date].[Date].[All Periods]);
        THIS = (TAIL(EXISTING [Date].[Date].[Date].MEMBERS, 1).ITEM(0).ITEM(0)
                , MEASURES.[Total Invoices To Date]);
    END SCOPE;

END SCOPE;


CREATE MEMBER CURRENTCUBE.MEASURES.[Total Paid Invoices To Date] AS NULL;

SCOPE(MEASURES.[Total Paid Invoices To Date]);

    SCOPE([Date].[Date].[Date].MEMBERS);
        THIS = SUM(NULL:[Date].[Date].CURRENTMEMBER, [Measures].[Paid Invoices Count]);
    END SCOPE;

    SCOPE([Date].[Date].[All Periods]);
        THIS = (TAIL(EXISTING [Date].[Date].[Date].MEMBERS, 1).ITEM(0).ITEM(0)
                , MEASURES.[Total Paid Invoices To Date]);
    END SCOPE;

END SCOPE;

CREATE MEMBER CURRENTCUBE.MEASURES.[Unpaid Invoices Count] AS
MEASURES.[Total Invoices To Date] - MEASURES.[Total Paid Invoices To Date];

 

This is the easy bit. In order to find out what the average of the number of days the unpaid invoices have actually been unpaid, we need to find the number of days that each unpaid invoice has been open for, sum that up, and divide it by the total number of unpaid invoices. And to find this sum of the number of days that all unpaid invoices are open, we need to do the following:

  • Find the total up to yesterday of the [Unpaid Invoices Count] measure. This gives the total number of days that every invoice has ever been open.
  • Then sum the total to date of the Days to Payment measure we created on the Paid Invoices fact table, to find the total number of days that all paid invoices were ever open
  • Then subtract this second figure from the first, which will give us the sum of the number of days all currently unpaid invoices have been open

Here’s the code:

CREATE MEMBER CURRENTCUBE.MEASURES.[Sum of Ages of Invoices To Date] AS NULL;

SCOPE(MEASURES.[Sum of Ages of Invoices To Date]);

    SCOPE([Date].[Date].[Date].MEMBERS);
        THIS = SUM(NULL:[Date].[Date].CURRENTMEMBER.PREVMEMBER
                , MEASURES.[Unpaid Invoices Count]);
    END SCOPE;

    SCOPE([Date].[Date].[All Periods]);
        THIS = (TAIL(EXISTING [Date].[Date].[Date].MEMBERS, 1).ITEM(0).ITEM(0)
                , MEASURES.[Sum of Ages of Invoices To Date]);
    END SCOPE;

END SCOPE;

CREATE MEMBER CURRENTCUBE.MEASURES.[Sum of Ages of Paid Invoices To Date] AS NULL;

SCOPE(MEASURES.[Sum of Ages of Paid Invoices To Date]);

    SCOPE([Date].[Date].[Date].MEMBERS);
        THIS = SUM(NULL:[Date].[Date].CURRENTMEMBER, [Measures].[Days To Payment]);
    END SCOPE;

    SCOPE([Date].[Date].[All Periods]);
        THIS = (TAIL(EXISTING [Date].[Date].[Date].MEMBERS, 1).ITEM(0).ITEM(0)
                , MEASURES.[Sum of Ages of Paid Invoices To Date]);
    END SCOPE;

END SCOPE;

CREATE MEMBER CURRENTCUBE.MEASURES.[Difference of Sum of Ages] AS
MEASURES.[Sum of Ages of Invoices To Date] - MEASURES.[Sum of Ages of Paid Invoices To Date];

 

Finally, we can calculate the average like so:

CREATE MEMBER CURRENTCUBE.MEASURES.[Average Age of Unpaid Invoices] AS
IIF(MEASURES.[Unpaid Invoices Count]=0, NULL,
MEASURES.[Difference of Sum of Ages]/MEASURES.[Unpaid Invoices Count])
, FORMAT_STRING='#,#.00';

 

It’s pretty complex I’ll admit, but it does perform well. Here’s what the results look like in Excel:

image

I only wish I could come up with a way of getting some of my customers to pay this quickly…!

Written by Chris Webb

March 15, 2011 at 11:01 pm

Posted in Uncategorized

SQLBits 8 Agenda Published

with 3 comments

The agenda for SQLBits 8, taking place in the UK in Brighton on April 7th-9th, has now been published:
http://sqlbits.com/information/Agenda.aspx

I’m speaking on the Friday – a session on scoped assignments in MDX – along with many other SQL Server experts from around the world. If you’d like to come you’ll need to register quick though, since we’ll be full soon!

Don’t forget there’s also our new event, SQLBits Insight, for CIOs, technical architects and senior IT decision makers taking place on April 7th too. The big draw is the opportunity to meet Steve Wozniak but we’ve also got some great speakers from Microsoft too, covering topics like consolidation, virtualisation and Parallel Data Warehouse. It’s a great opportunity to show the top people in your company, or your customers, all of the cool new stuff in the world of SQL Server.

Written by Chris Webb

March 6, 2011 at 9:24 pm

Posted in Events

Tagged with

Follow

Get every new post delivered to your Inbox.

Join 3,301 other followers