Chris Webb's BI Blog

Analysis Services, MDX, PowerPivot, DAX and anything BI-related

Archive for April 2005

MDX Solutions Second Edition

leave a comment »

I’ve been quite busy recently working on the second edition of MDX Solutions, along with the legendary George Spofford and two others: Siva Harinath and Francesco Civardi. I’ve only got a few chapters to update and one to write from scratch but it’s already dominating my evenings – yesterday, for example, I spent hours trying to get a complex CREATE CUBE statement working and found that the problem was a missing underscore in a column name. Argh! No pain, no gain I suppose.

I have to say that I was honoured to be asked by George to work on this project because MDX Solutions is the only AS2K book that I use regularly, and in my opinion there’s a real need for advanced books on Analysis Services rather than ones that try to teach you the basics. I had been worried that after his move to Hyperion (a big loss to the MS BI community) it wouldn’t be updated, but since Essbase now supports MDX we have the chance to add the details of that implementation into the book too.

Anyway, we are currently looking for feedback so that we can make the new edition as good as possible – we want to make sure it maintains its place as the definitive guide to the MDX language. Here’s a snippet of the post I’ve just made to the microsoft.public.sqlserver.olap newsgroup which sums up the kind of feedback we’d like:

Our aims for this edition are to: * Update the book to include SQL Server 2005’s new MDX features, and * Expand the content to include more detail on advanced MDX topics As a result we’re interested in hearing your opinions on the first edition: what you liked and what you didn’t like, mistakes that need correcting, existing topics that need more detail and new topics that you think should be included. The more specific you are the more it will help us – think of the functions you’ve had trouble understanding (NonEmptyCrossjoin is on our list!), the calculated members you’ve had trouble implementing and the queries you’ve had trouble writing.

So, please help us! If you could either post your feedback to the newsgroup or send it to the hotmail alias I use for newsgroup postings (onlyforpostingtonewsgroups@hotmail.com) then we would be much obliged.

Looking forward to hearing from you…. 

Written by Chris Webb

April 28, 2005 at 11:04 am

Posted in MDX

Building cubes from SAP data

leave a comment »

Courtest of Mat Stephen, whose blog I’ve just come across and which contains a lot of interesting MS BI-related content, I’ve just read the following paper on building AS2K cubes from SAP BW. Now I don’t know much about SAP BW or R/3, but I do know that this is the data that people most want to analyse and that SAP’s own BI tools have a history of being a bit rubbish (I’ve heard this from a lot of people, but the best summaries can be found in the OLAP Survey and the OLAP Report – both fine reads if you have the cash). However, I also know that one of the main reasons why SAP’s own attempt to use AS2K to speed up BW’s query performance was never widely used was because AS2K simply couldn’t handle the number of dimensions and levels that are present in a typical BW Infocube: the only time I heard anyone complaining about AS2K only being able to support 128 dimensions and 256 levels in a cube was in precisely this scenario, even if in my opinion anyone expecting a user to understand a cube this complex is nuts…

Anyway, to get onto my main point, it strikes me that the new attribute-based dimension model of AS2005 is going to make it much more suitable for analysing SAP data. If I didn’t suspect that Microsoft were going to beat everyone to it with Maestro I’d say that using AS2005 and RS2005 to do exactly this job could be the basis for a killer third-party product, although I’m sure there’ll still be a lot of expensive consulting needed even for Maestro implementations. There must be plenty of disaffected BW users out there who are looking for a replacement BI solution.

Also on this subject, the paper above stresses several times that extracting data from R/3 and BW directly from the relational tables in which it’s stored is risky and unsupported, because tables are subject to change in future versions etc etc. That said, I understand that doing this is much quicker than going through any of the official interfaces. So I thought I’d mention a tool called Saphir produced by a company I got to know when I worked in Switzerland, which makes exploring and extracting from SAP’s (and other ERP tools’) relational tables much easier. I know of at least one project where it was used to extract data to populate Analysis Services cubes.

Update: no sooner do I write this entry than I read this article – if you follow the link to the SAP site, you’ll see a mention of ‘Business Analytics delivered through Microsoft Excel’ – hmm, I wonder if this has anything to do with Maestro?

Written by Chris Webb

April 26, 2005 at 10:52 am

Posted in Analysis Services

PrevMember bug

with 5 comments

An interesting bug came up on the newsgroup this week, concerning the PREVMEMBER function. You can read the original thread here, which includes Mosha’s explanation of what the problem is exactly (full credit to him for taking the time to investigate this and posting back to the ng to share the details – he deserves a medal for his community involvement).

I wonder how many cubes out there have calculated members which are suffering from poor performance as a result of this? Not many, I guess. The only scenario I could think of when this bug might be a problem would be if you had a previous period growth calc, a reasonably large time dimension (probably going down to day or hour level and consisting of several years) and a query with a lot of tuples on rows or columns. But if you do think you are experiencing this problem, how can it be worked around? Well, knowing that PREVMEMBER is the problem, I rewrote the query in the thread to avoid it and used a combination of RANK and ITEM instead:

with
member measures.myrank as ‘rank([Customers].[Name].[Amy Petranoff], [Customers].[Name].[Amy Petranoff].PARENT.CHILDREN)-2′
member [Customers].[CM] as ‘[Customers].[Name].[Amy Petranoff].PARENT.CHILDREN.ITEM(MEASURES.MYRANK)’
SELECT { [Customers].[CM] } ON COLUMNS ,
NON EMPTY  {   { [Store].[Store Name].Members  } * {
[Product].[Product Name].Members  } }   ON ROWS
FROM [Sales]
where ([Measures].[Unit Sales]) 

This performs much better than the PREVMEMBER version (only 4 seconds rather than 75), although it doesn’t look very elegant. But what about a more realistic scenario? Going back to the previous period growth example, consider the following query in Foodmart:

WITH MEMBER MEASURES.[PREVMONTH SALES] AS ‘MEASURES.[UNIT SALES] – (MEASURES.[UNIT SALES], TIME.CURRENTMEMBER.PREVMEMBER)’
SELECT { MEASURES.[UNIT SALES], MEASURES.[PREVMONTH SALES] } ON COLUMNS ,
NON EMPTY
CROSSJOIN( 
CROSSJOIN([Store].[Store Name].Members ,[Product].[Product Name].Members)
,{[Time].[1997].[Q1].CHILDREN,[Time].[1997].[Q2].CHILDREN })
ON ROWS
FROM [Sales]

This to me seems like a very common query: find the current month’s sales and the growth from the previous month. I therefore rewrote the query to use RANK and ITEM instead of PREVMEMBER, so that it looks like this:

WITH
MEMBER MEASURES.PREVTIME AS ‘(MEASURES.[UNIT SALES], TIME.CURRENTMEMBER.PARENT.CHILDREN.ITEM(RANK(TIME.CURRENTMEMBER, TIME.CURRENTMEMBER.SIBLINGS)-2))’
MEMBER MEASURES.[PREVMONTH SALES] AS ‘MEASURES.[UNIT SALES] – MEASURES.PREVTIME’
SELECT { MEASURES.[UNIT SALES], MEASURES.[PREVMONTH SALES] } ON COLUMNS ,
NON EMPTY
CROSSJOIN( 
CROSSJOIN([Store].[Store Name].Members ,[Product].[Product Name].Members)
,{[Time].[1997].[Q1].CHILDREN,[Time].[1997].[Q2].CHILDREN })
ON ROWS
FROM [Sales]

Running both these queries on my laptop, from a cold cache, the first executes in 73 seconds and the second executes in 65 seconds. Not an enormous difference, then, but it might be more significant on a larger time dimension or a much larger query. I don’t think I’ll be changing the way I write any of my standard MDX calculations as a result…

Written by Chris Webb

April 15, 2005 at 11:09 am

Posted in MDX

CustomData() MDX function

with 2 comments

Courtesy of Richard Tkachuk, I’ve just found out about another new MDX function in AS2005: CustomData(). You use it as follows:

  • You assign any value to the CustomData connection string property, eg as follows: CustomData=Hello World!
  • You can then retrieve this value in your queries, eg
    WITH MEMBER MEASURES.TEST AS CUSTOMDATA()
    SELECT MEASURES.TEST ON 0 FROM
    [Adventure Works]

What’s the point of this? As Richard pointed out to me, it should prove useful in middle tier scenarios. Imagine you have a web application using Reporting Services 2005 which in turn connects to Analysis Services 2005. Anyone connecting to your web application is authenticated using whatever method your company uses, and you can then pass in a user name through the connection string RS uses to connect to AS (RS2005, I understand, allows you to have dynamically generated connection strings); AS can then use the CustomData function to pick up this user name and use it in a dynamic cell or dimension security definition. I’m sure this functionality will come in handy for other things too, but I just haven’t thought what those things are yet…

Written by Chris Webb

April 12, 2005 at 12:41 pm

Posted in MDX

Tableau

leave a comment »

I have to admit that as far as the market for client tools for Analysis Services went, I thought I’d seen it all. Now that Proclarity et al are relatively mature, who would want to challenge them? And if someone was going to launch a new tool, they’d need a new usp – and after the glut of tools that did cell-by-cell analysis in Excel (eg Intelligentapps, MIS Plain) that came along a few years ago I didn’t think there were any new usps left. Well, a new client called Tableau has come along to change my mind…

My interest was aroused by this article by Stephen Few, a visualisation guru whose other articles on data presentation in BI I’ve always enjoyed. His main theme is that data must be properly presented if it’s going to be understood by the people its intended for, and I have to admit that I (along with the rest of the BI industry) haven’t always taken this as seriously as we should have done. In fact, as techies, we tend to be disparaging of solutions that merely look nice especially if the technology behind is less than impressive. But at the end of the day it’s the data that’s important to our end users, and anyone who has spent any time with users knows that they just love a pretty, easy-to-understand output. Tableau’s usp is that it is very, very good at data visualisation.

Take a look for yourself here, or even better take a look at the gallery of screenshots here for an idea of what you can produce. Nice, eh? And the thing it, it’s not difficult to produce this kind of output. The UI and workflow are smooth and fast, and while it took me a few minutes to ‘get’ some of the concepts, once I knew what I was doing I found it was very easy to produce some quite impressive-looking reports. Some of the more mature clients for Analysis Services out there look decidedly dated and clunky in comparison…

The only downside I can see is the price: at $999 for Standard Edition and $1799 for Professional Edition (and it seems like you need the latter to be able to connect to Analysis Services), it is VERY expensive. They are going to have to be able to market themselves very well to compete, but I think the gap is there for this kind of niche product. I hope they do well.

Written by Chris Webb

April 8, 2005 at 2:48 pm

Posted in Client Tools

Data Mining Programmability white paper

leave a comment »

Just noticed this new white paper on Data Mining Programmability, available here. Even if, like me, you don’t know an awful lot about data mining (although I think the new functionality in this area is very exciting and I can’t wait to spend some time trying it out on my data), this white paper contains a lot of information which is relevant to Analysis Services programming in general, on topics such as ADOMD.Net, AMO etc. 

Written by Chris Webb

April 7, 2005 at 9:59 am

Posted in On the internet

Declaring Named Sets Inline

with one comment

I mentioned in my last entry on the ‘expensive topcount’ that I’d originally tried to solve the problem by declaring a set inside the axis definition, and after a chat about this with my colleague Jon I thought it might be worth going into more detail on what I was trying to do and why it failed. What I was talking about was the fact that MDX allows you to declare a named set anywhere a set definition appears, not just in a WITH SET or CREATE SET statement, and this ability is very important when it comes to optimising queries which use complex set operations.

Part of the reason why I’ve not blogged about this before is because two short articles I wrote on this subject a while ago are now publicly available on the ‘BI Best Practices’ blog:
http://blogs.msdn.com/bi_systems/articles/162840.aspx
covering the performance benefits, and
http://blogs.msdn.com/bi_systems/articles/162850.aspx
covering their use in solving the problem of currentmembers from different contexts. Between them they cover the basics of the subject; in the first you can see how you can use them to optimise queries very similar to the ones in the ‘expensive topcount’ problem. But why couldn’t I use named sets to solve that particular problem, then? The answer is because in that case I was trying to reuse a set between an axis definition and a calculated member and this just isn’t possible. The following query is what my first, unsuccessful attempt at the problem looked like:

WITH
SET MYDATES AS ‘{[Time].[1997].[Q1].[1]:[Time].[1997].[Q2].[5]}’
MEMBER PRODUCT.OTHERS AS ‘([Product].[All Products], TIME.CURRENTMEMBER) – SUM(STRTOSET("MYTOPCOUNT"), (MEASURES.[UNIT SALES], TIME.CURRENTMEMBER))’
MEMBER Measures.Test as ‘SetToStr(STRTOSET("MyTopCount"))’
SELECT {MEASURES.MEMBERS, Test} ON 0,
GENERATE(
MYDATES,
CROSSJOIN(
{TIME.CURRENTMEMBER},
{ [Product].[All Products], TOPCOUNT([Product].[PRODUCT NAME].MEMBERS, 3, (MEASURES.[UNIT SALES])) as MyTopCount, PRODUCT.OTHERS}
) )
ON 1
FROM SALES

You can see that I’m declaring the set MyTopCount in the rows axis definition and then attempting to refer to it in the Product.Others calculated member. Once I’d got round the fact that the query wouldn’t initially run because MyTopCount hadn’t been declared when the calculated member was parsed (by wrapping it in a StrToSet), I found that by the time Product.Others got calculated the contents of MyTopCount had been lost. Similarly, if you turn the query round the other way as follows:

WITH
SET MYDATES AS ‘{[Time].[1997].[Q1].[1]:[Time].[1997].[Q2].[5]}’
MEMBER PRODUCT.OTHERS AS ‘([Product].[All Products], TIME.CURRENTMEMBER) – SUM(TOPCOUNT([Product].[PRODUCT NAME].MEMBERS, 3, (MEASURES.[UNIT SALES])) as MyTopCount, (MEASURES.[UNIT SALES], TIME.CURRENTMEMBER))’
MEMBER Measures.Test as ‘SetToStr(MyTopCount)’
SELECT {MEASURES.MEMBERS, Test} ON 0,
GENERATE(
MYDATES,
CROSSJOIN(
{TIME.CURRENTMEMBER},
{ [Product].[All Products], MyTopCount, PRODUCT.OTHERS}
) )
ON 1
FROM SALES

…you can see that while the values for Product.Others are correct, and you can see the contents of MyTopCount in Measures.Test, the members that make up MyTopCount don’t appear on rows. The reason why neither of these queries work is fairly obvious when you think about it: AS has to fully evaluate the sets that appear on both axes before the values of the cells in the cellset can be calculated, so we have two separate iterations not one, and therefore no opportunity to reuse the sets (incidentally, if you’re interested in finding out more about the steps that AS goes through when running a query take a look at MDX Solutions chapter 5, ‘Cell Context and Resolution Order in Queries’).  

Written by Chris Webb

April 5, 2005 at 5:06 pm

Posted in MDX

Follow

Get every new post delivered to your Inbox.

Join 3,083 other followers