# Chris Webb's BI Blog

Analysis Services, MDX, PowerPivot, DAX and anything BI-related

## Avoiding an expensive TOPCOUNT

with one comment

An interesting MDX problem came up the other day, the solution to which I thought would be worth sharing. It involved a query which for a set of months showed the top n products plus a member which showed the sum of all other products outside the top n – the Foodmart 2000 equivalent is something like this:

WITH
SET MYDATES AS ‘{[Time].[1997].[Q1].[1]:[Time].[1997].[Q2].[5]}’
MEMBER PRODUCT.OTHERS AS ‘([Product].[All Products], TIME.CURRENTMEMBER) – SUM(TOPCOUNT([Product].[PRODUCT NAME].MEMBERS, 3, (MEASURES.[UNIT SALES], TIME.CURRENTMEMBER)), (MEASURES.[UNIT SALES], TIME.CURRENTMEMBER))’
SELECT MEASURES.MEMBERS ON 0,
GENERATE(
MYDATES,
CROSSJOIN(
{TIME.CURRENTMEMBER},
, PRODUCT.OTHERS}
) )
ON 1
FROM SALES

The question was, could the query be optimised? What struck me was that in the query the top n was worked out in two places: first in the rows axis, then in the calculated member. And doing a TOPCOUNT is usually a very expensive operation. So how could you make sure you only did it once? At first I thought you might be able to declare a set containing the top n inside the GENERATE statement and then reference it from the calculated member, but I couldn’t make that work. String manipulation turned out to be the key and the Foodmart 2000 version of the optimised query is as follows:

WITH
SET MYDATES AS ‘{[Time].[1997].[Q1].[1]:[Time].[1997].[Q2].[5]}’
MEMBER MEASURES.MYTOPSTR AS ‘SETTOSTR(TOPCOUNT([Product].[PRODUCT NAME].MEMBERS, 3, (MEASURES.[UNIT SALES], TIME.CURRENTMEMBER)))’
SELECT MEASURES.MEMBERS ON 0,
GENERATE(
MYDATES,
CROSSJOIN(
{TIME.CURRENTMEMBER},
STRTOSET(CSTR(MEASURES.MYTOPSTR))
, PRODUCT.OTHERS}
) )
ON 1
FROM SALES

What I did was create a calculated measure which returned the top n set in string form; using STRTOSET you could then reference it from both the rows axis and the OTHERS calculated member. The reason this second query is faster than the first – and the guy who came to me with the problem says that the second version runs in almost a quarter of the time – is that the values returned by calculated members are cached so the second time that Measures.MyTopStr is called it doesn’t need to do the topcount again.

The one drawback to this approach is something that is common to all MDX that relies on STRTOSET and other similar functions – if you are trying to turn a very large set into a string, you may exceed the maximum string length and get an error. However if, as in this example, you’re only dealing with sets containing a few members you’ll never run into this problem.

Written by Chris Webb

March 30, 2005 at 11:48 am

Posted in MDX