Bucketisation

I couldn’t think of a better title for this entry, but I’m sure there must be a better term for what I’m writing about… Anyway, the original solution I posted for the ‘get the names of the currentmembers on all dimensions’ problem last week reminded me of some MDX I came up with last year, for dynamic bucketisation. By this I mean the ability to run queries in MDX whereby you can count (or sum, or whatever) the members that fall within dynamically defined ranges. Imagine that you’re the manager of the Foodmart 2000 stores and you want to see a histogram showing the number of product categories that had sales of between $0 and $500, $500 and $1000, $1000 and $1500 and so on (each of these ranges being the ‘buckets’ of the title). Now, in AS2K, the standard approach would be to create these calculated members manually either on the cube or in the WITH clause, which is pretty time-consuming if you have lots of buckets. But what if you want a report with this histogram on where you can dynamically define the number of buckets, and the range that each bucket contained? That makes things even more difficult!

Here’s my solution in a Foodmart 2000 query:

WITH
/*FIRST DEFINE THE SIZE OF EACH BUCKET*/
MEMBER MEASURES.SIZE AS ‘500’
/*THEN, CHOOSE A DIMENSION LEVEL WHICH YOU AREN’T USING IN YOUR QUERIES AND WHICH HAS MORE
MEMBERS THAN THE NUMBER OF BUCKETS YOU WANT. THEN DEFINE A SET WITH THE SAME NUMBER
OF MEMBERS IN AS THE NUMBER OF BUCKETS DESIRED*/
SET BUCKETHACK AS ‘HEAD([Promotions].[Promotion Name].MEMBERS, 10)’
/*NEXT, CREATE A CALCULATED MEMBER THAT FINDS THE RANK OF THE CURRENTMEMBER WITHIN THE SET JUST CREATED*/
MEMBER MEASURES.RANKING AS ‘RANK([Promotions].CURRENTMEMBER, BUCKETHACK) – 1’
/*NOW, USE THE CREATEPROPERTYSET FUNCTION TO CREATE A SET OF CALCULATED MEMBERS WHICH REPRESENT
THE BUCKETS. THE NAME OF EACH CALCULATED MEMBER IS DERIVED FROM THE RANK OF EACH MEMBER OF THE BUCKETHACK SET
I AM USING THE CUSTOMERS DIMENSION HERE, BUT YOU SHOULD USE A NEW DIMENSION CREATED SPECIFICALLY FOR THIS PURPOSE
WITH ONLY AN ALL MEMBER*/
SET MYBUCKETS AS ‘CREATEPROPERTYSET([Customers].[All Customers], BUCKETHACK,
"$" + CSTR(MEASURES.RANKING * MEASURES.SIZE) + " – $" + CSTR((MEASURES.RANKING+1) * MEASURES.SIZE))’
/*THEN CREATE A NEW CALCULATED MEASURE WHICH FINDS THE RANK OF THE CURRENT BUCKET WITHIN THE LIST OF ALL BUCKETS
TO MAKE THE MDX EASIER TO READ*/
MEMBER MEASURES.RANKING2 AS ‘RANK(CUSTOMERS.CURRENTMEMBER, MYBUCKETS)-1’
/*AND FINALLY CREATE A NEW CALCULCATED MEASURE WHICH COUNTS THE NUMBER OF PRODUCT CATEGORIES IN EACH BUCKET*/
MEMBER MEASURES.[PROD CAT COUNT] AS ‘COUNT(
FILTER([Product].[Product Category].MEMBERS,
(MEASURES.[UNIT SALES], [Customers].[All Customers])>=(MEASURES.RANKING2*MEASURES.SIZE)
AND
(MEASURES.[UNIT SALES], [Customers].[All Customers])<((MEASURES.RANKING2+1)*MEASURES.SIZE)
))’
/*JUST TO SHOW WHICH MEMBERS FALL INTO WHICH BUCKETS*/
MEMBER MEASURES.[BUCKET MEMBERS] AS ‘GENERATE(
FILTER([Product].[Product Category].MEMBERS,
(MEASURES.[UNIT SALES], [Customers].[All Customers])>=(MEASURES.RANKING2*MEASURES.SIZE)
AND
(MEASURES.[UNIT SALES], [Customers].[All Customers])<((MEASURES.RANKING2+1)*MEASURES.SIZE)
), PRODUCT.CURRENTMEMBER.NAME, ", ")’
/*THE DEMO QUERY*/
SELECT {MEASURES.RANKING2, MEASURES.[PROD CAT COUNT], MEASURES.[BUCKET MEMBERS]} ON 0,
MYBUCKETS ON 1
FROM SALES

You could adapt this MDX for a various other similar types of analysis – Pareto curves come to mind – and I guess it would come in handy for situations where generating the query itself dynamically is either difficult or impossible, such as in a Reporting Services report.

I admit that it’s not elegant, but I thought it was interesting nonetheless and worth sharing. It manages the problem of the dynamic number of buckets by the same method as the MDX last week – simulating a for/next loop by creating a set with a given number of members and then using GENERATE to iterate over it. Of course, this suffers from the need to have a level with at least as many members on as you need iterations. It then manages the problem of creating calculated members for each of the buckets dynamically by using the undocumented CREATEPROPERTYSET MDX function. Hopefully there will be much better solutions to both of these problems in Yukon – I assume a stored procedure would do the job, but I’m not sure.

3 thoughts on “Bucketisation

  1. This won\’t work on 2005 or 2008, unfortunately – there\’s no way of generating calculated members dynamically.

  2. Hi,

    Could u pls help me with alternate function to the CREATEPROPERTYSET as i m using Sql 2008 and it is not supported in this version.

    Thanks,
    Shakila.

Leave a Reply