Chris Webb's BI Blog

Analysis Services, MDX, PowerPivot, DAX and anything BI-related

Archive for April 2008

Google App Engine

with 2 comments

I see Google have announced their own web application platform:
http://www.techcrunch.com/2008/04/07/google-jumps-head-first-into-web-services-with-google-app-engine/#comments
which includes BigTable as part of it (remember this post?). There’s speculation over whether Microsoft has something similar up its sleeve:
http://blogs.zdnet.com/microsoft/?p=1320

This and the Panorama and Good Data stuff I blogged about over the last few weeks make me quite excited. What will the next generation of OLAP/BI tools be like? Surely it’s a mistake to think of them as hosted versions of what we’ve got today. As I’ve said before, the attribute-based approach of databases like BigTable remind me of Analysis Services dimensions; wouldn’t it be cool just to be able to grab data from a number of these stores and use them as dimensions and fact tables? Maybe through a front-end that was something like Lotus Improv (thanks to Andew Wiles for directing me to this in a recent conversation) on the web? But with an XMLA interface too? And since in the cloud hardware scale-out will presumably be just a matter of paying a bit more cash, you’d want an engine that could handle that transparently in the way I understand something like Teradata does? Ahh, if only I had a couple of million USD$s of VC funding to waste I’d hire Mosha and set him to work on this…

Written by Chris Webb

April 8, 2008 at 12:33 pm

Posted in On the internet

Can your sum be a subtraction? Or can you avoid it altogether?

with 4 comments

Quite often you’ll find yourself writing calculations that need to sum up large sets; in fact, they might be summing up all of the members on a level apart from one or two. In that situation it’s worth using the structure of your dimension to your advantage. Take the following query, which sums up the Internet Sales of all customers apart from one and then does a TopCount based on this:

WITH
MEMBER MEASURES.TEST AS
SUM(
EXCEPT(
[Customer].[Customer].[Customer].MEMBERS
, {[Customer].[Customer].&[20075]})
, [Measures].[Internet Sales Amount])

SELECT MEASURES.TEST ON 0,
TOPCOUNT(
[Date].[Date].[Date].MEMBERS
, 10, MEASURES.TEST) ON 1
FROM [Adventure Works]

On my laptop using AS2005 the query runs in 2 minutes 2 seconds on a cold cache (Katmai does no better with this query, incidentally). Yuck. But if we change the calculation around, so that rather than summing the customers we do want we take the sales for all customers and subtract the sales for the customer we don’t want (which is fine if the measure we’re looking at has an aggregate function of Sum) then we can do the following:

WITH
MEMBER MEASURES.TEST AS
[Measures].[Internet Sales Amount] –
([Customer].[Customer].&[20075], [Measures].[Internet Sales Amount])

SELECT MEASURES.TEST ON 0,
TOPCOUNT(
[Date].[Date].[Date].MEMBERS
, 10, MEASURES.TEST) ON 1
FROM [Adventure Works]

…which executes in 1 second on a cold cache. Taking this further, if you have a set like this you’re frequently summing up in calculations it might be a good idea to create a new attribute on your dimension to avoid having to do any work in MDX at all. In the Adventure Works example above, maybe Customer 20075 is the Sultan of Brunei and he ordered 5000 new bikes for all his friends – which means that including him in your calculations would skew the results. What you could do is create a new boolean attribute on Customer called something like ‘Exclude from Calculations’, which would then mean you could rewrite the query like this:

SELECT [Measures].[Internet Sales Amount] ON 0,
TOPCOUNT(
[Date].[Date].[Date].MEMBERS
, 10, [Measures].[Internet Sales Amount]) ON 1
FROM [Adventure Works]
WHERE ([Customer].[ExcludeFromCalculations].&[False])

and probably get even better performance.

Written by Chris Webb

April 3, 2008 at 6:21 pm

Posted in MDX

Follow

Get every new post delivered to your Inbox.

Join 2,867 other followers