Chris Webb's BI Blog

Analysis Services, MDX, PowerPivot, DAX and anything BI-related

Archive for August 2006

Playing away from home with Pentaho and Mondrian

with 2 comments

Back in March, when I announced on this blog that I was going to set myself up as an independent consultant, I mentioned that I would be interested in doing some MDX work on other OLAP servers that support the language. Very soon afterwards I got an email from Nick Goodman at Pentaho asking me if I was interested in finding out a bit more about the open source OLAP server Mondrian and helping the open source BI community come to grips with MDX. How could I refuse?
 
Anyway, Nick pointed me in the direction of a very quick and easy to install demo of all of the Pentaho platform including Mondrian:
You literally download it (it’s about 100Mb), unzip it, click on ‘start-pentaho.bat’ and go to http://localhost:8080/pentaho/ and you’re there. Click on the ‘Steel Wheels’ sample and you can see demos of their reporting, olap and dashboarding tools. I’ve only really had a look at the olap stuff so far: in the Steel Wheels sample you click on ‘Pentaho Analysis’ and then ‘Territory Analysis by Year’ and you’ll see a pivot table (JPivot) and a graph; click on the MDX button at the top and you’ll see, and can alter, the MDX behind it. My first impression of this – and Pentaho Analysis is just rebranded Mondrian as far as I understand it - is just how conceptually similar it is to AS2K. Anyone coming from a Microsoft BI background will feel very at home with it. There’s a ton of documentation on Mondrian here:
…if you’d like to find out more.
 
Of course now I feel all guilty and adulterous (well, ok, not that bad) having strayed away from Analysis Services, but don’t worry – it will always remain my true love. But given that there’s such a shortage of people even in the Microsoft BI world that understand MDX I think there must be massive untapped demand out there for people who have cross-platform MDX skills: even though platforms like Essbase and SAP BW support MDX I’m sure you could count the number of Essbase and SAP BW consultants out there with anything more than a superficial knowledge of MDX on the fingers of one hand.

Written by Chris Webb

August 31, 2006 at 2:21 pm

Posted in Open Source BI

Adapting SQLIS to work with tuples and sets as a data source

with 3 comments

It’s been a long time since I posted in my ‘random thoughts’ category… but I just had such an interesting idea I thought I’d post it up (even if there’s 0% chance I’ll ever get round to implementing this).
 
I was looking at a (non-Microsoft platform) BI tool today and got thinking about MDX, how people find it hard to work with, and how most client tools don’t really expose the power of MDX sets, and how handy it would be to be able to do some procedural things in MDX too. This particular tool had some cool set-based selection functionality and I reflected that even though I’d seen similar set-based selection tools, some on AS (didn’t Proclarity have something in this area?), they’d never really taken off; I also thought about the much-missed MDX Builder tool which had a similarly visual approach to building MDX expressions. I started thinking about whether it would be worth building another client tool which took this approach but quickly came to the conclusion that the world needed another AS client tool like a hole in the head, but realised that if I was going to build this kind of tool how much it would resemble Integration Services. And then I had my idea: why not extend Integration Services so it can treat MDX sets and tuples as a data source, and then use its existing functionality and create new transformations to implement MDX set-based operations?
 
Let me explain in more detail. I’m not talking about simply getting data out of AS in the same way you’d get it out of a SQL Server table, using an MDX query. What I’m saying is that what would be flowing though the IS data flow tasks would be members, sets and tuples: each ‘row’ of data would be an MDX expression returning member, or tuple, or set. So you’d create a custom data source where you could define a set as your starting point – probably at this point you’d just select a whole level, or the children of a member, or some such simple set of members. For example you might select the [Customers].[Customer].[Customer] level in your Customer dimension; the output from this would be a single text column and a single row containing the set expression [Customers].[Customers].[Customers].Members. You could then put this through an Exists() transform to return only the customers in the UK and France, the output from which would be the set expression Exists([Customer].[Customer].[Customer].Members, {[Customer].[Country].&[United Kingdom], [Customer].[Country].&[France]}). Similarly then you could put this through a Crossjoin() transform to crossjoin this set with the set of all your Products, then put the result through a NonEmpty() transform to remove all non empty combinations from the set. At this point your output would still be a single row and column, consisting of the MDX expression:

NonEmpty(
Crossjoin(
Exists(
[Customer].[Customer].[Customer].Members
, {[Customer].[Country].&[United Kingdom], [Customer].[Country].&[France]})
, [Product].[Product].[Product].Members)
, [Measures].[Internet Sales Amount])

So far, so dull though. All we’ve got is a way of building up a string containing an MDX set expression and SQLIS brings little to the party. But the real fun would start with two more custom transformations: SetToFlow and FlowToSet. The former would take an input containing MDX set expressions (and conceivably there could be more than one row, although we’ve only got one so far) and would output a flow containing all the tuples in the set(s) we’ve passed in. Taking the set above, the output would be the contents of measures.outputdemo in the following query on AdventureWorks:

with member measures.outputdemo as TupleToStr(
([Customer].[Customer].Currentmember, [Product].[Product].Currentmember)
)
select {measures.outputdemo} on 0,
NonEmpty(
Crossjoin(
Exists(
[Customer].[Customer].[Customer].Members
, {[Customer].[Country].&[United Kingdom], [Customer].[Country].&[France]})
, [Product].[Product].[Product].Members)
, [Measures].[Internet Sales Amount])
on 1
from
[Adventure Works]

The FlowToSet transform would do the opposite, ie take an input containing tuples and return a single row containing the set represented by the entire input. For the above example, this would be a big set:
{([Customer].[Customer].&[12650],[Product].[Product].&[214]), ([Customer].[Customer].&[12650],[Product].[Product].&[225]),…}
But the point of this would be that you could then apply more MDX set expressions efficiently, although of course there’s no reason why you can’t apply MDX set expressions to individual tuples in a data flow. The final important
custom transform you’d need would be an Evaluate transform, which would append one or more numeric or text columns to a tuple or set dataflow: each of these columns would be populated by evaluating an MDX expression which returned a value against the set or tuple for each row. So, for example, if a row contained a the set we’ve been using we could apply a the Count function to it and get the value 12301 back; if a row contained the tuple ([Customer].[Customer].&[12650],[Product].[Product].&[214]) we could ask for the value of this tuple for the measure [Internet Freight Cost] and get the value 0.87 back; or to the same tuple we could ask for the value of [Customer].[Customer].CurrentMember.Name and get back the value "Aaron L. Wright".
 
Of course the beauty of this is that once you’ve got a flow containing sets, tuples and numeric values retrieved from the cube for them then you can use all the cool existing SQLIS functionality too, like multicasts, lookups, UnionAlls, Aggregates etc to do stuff with your sets that is hard in pure MDX; and of course you can easily integrate other forms of data such as relational or XML, and do useful things at the end of it all like send an email to all your male customers in the UK who bought three or more products in the last year, or who live in London and have incomes in excess of £50000 and have averaged over £50 per purchase, or who have been identified as good customers by a data mining model, and who aren’t on the list of bad debtors that you’ve got from the Accounts department’s Excel spreadsheet.
 
Now of course all of this is possible with using only relational data with SQLIS, or even without using SQLIS and just using pure MDX. I guess the point of this is, as always, that it provides an easier way to do stuff: build MDX expressions without having to know much MDX, integrate AS data with other data and other applications without doing (much) coding, and so on.
 
So, as ever, I’d be interested in your comments on this. I have the distinct feeling that this is a solution in search of a problem… but if you can think of some problems it might solve, then let me know!

Written by Chris Webb

August 25, 2006 at 3:52 pm

Posted in Random Thoughts

Debunking the Beer and Diapers Story

with one comment

Mark Whitehorn (of ‘Fast Track to MDX’ fame) debunks that old ‘beer and diapers’ story that I’ve used plenty of times in data mining presentations in the Register today:
Not that I’m going to let the small matter of the truth get in the way of a good story in future presentations!
 
Thinking about it, though, the fact that I’ve never seen beer and nappies together in the supermarket (or for that matter other equally incongruous combinations) should have given the game away. I’m a young father – daugher #2 was born three weeks ago – and I’ve spent more Friday nights down at Tesco buying nappies than I’d like to admit; I always find the beer in the beer aisle and the nappies in the nappy aisle at opposite ends of the store.
 
 

Written by Chris Webb

August 15, 2006 at 3:04 pm

Posted in Data Mining

Announcing the Analysis Services Stored Procedure Project

leave a comment »

Back in April, I posted an entry on Analysis Services stored procedures asking if anyone would be interested in joining me to create a few useful examples that people could download. I had a good response and now, after several months of work behind the scenes, I’m happy to announce that you can see beta 1 of the fruits of our labours here:
I won’t go into any detail about what’s in there – it’s reasonably well documented, and you can download the source code as well as the dll. Suffice to say that there are several cool functions to extend MDX and hopefully you’ll find at least one or two useful ones. If you have any feedback, suggestions, bugs to report etc then please do so on Codeplex rather than here.
 
Thanks are due to Jon Axon, Christian Wade, Mosha Pasumansky and especially Greg Galloway and Darren Gosbell for the effort they’ve put in to this.

Written by Chris Webb

August 14, 2006 at 7:27 am

Posted in MDX

Executing Multiple MDX Statements

with 6 comments

Just a quickie, but here’s something I didn’t know was possible – open an MDX query window in SQLMS, connect to Adventure Works and paste in the following:
 

SELECT MEASURES.MEMBERS ON 0 FROM
[Adventure Works]

GO

SELECT [Measures].[Internet Sales Amount] ON 0,
[Date].[Day of Week].MEMBERS ON 1
FROM [Adventure Works]

Apparently, you can string multiple MDX statements together with the GO keyword in SQLMS (but only SQLMS – this isn’t valid MDX, it’s just SQLMS parsing the query text itself)
 
Very useful…

Written by Chris Webb

August 13, 2006 at 9:15 pm

Posted in MDX

Optimising NON EMPTY filtering in queries

leave a comment »

Recently, on a few engagements, I’ve found myself optimising queries which do non-empty filtering and thought I’d share some tips (apologies if some of these seem a bit obvious to the more advanced MDXers among you). This will probably turn into an ongoing series – I’ve got some other ideas that need a bit more testing/research before I write them up…
 
The first thing to do is to set the NON_EMPTY_BEHAVIOR property on your calculated members, if you can. Take the following query on the Adventure Works cube, which runs in around 1 minute 20 seconds on my laptop:

WITH MEMBER MEASURES.DEMO AS [Measures].[Internet Sales Amount] * 2
SELECT {[Measures].[Internet Sales Amount], MEASURES.DEMO} ON 0,
NON EMPTY
[Product].[Category].[Category].MEMBERS
*
[Product].[Subcategory].[Subcategory].MEMBERS
*
[Product].[Product].[Product].MEMBERS
*
[Customer].[Postal Code].[Postal Code].MEMBERS
ON
1
FROM [Adventure Works]
WHERE([Date].[Calendar Year].&[2004])

Simply by adding the NON_EMPTY_BEHAVIOR property to the calculated member, so:

WITH MEMBER MEASURES.DEMO AS [Measures].[Internet Sales Amount] * 2
,NON_EMPTY_BEHAVIOR=[Measures].[Internet Sales Amount]
SELECT {[Measures].[Internet Sales Amount], MEASURES.DEMO} ON 0,
NON EMPTY
[Product].[Category].[Category].MEMBERS
*
[Product].[Subcategory].[Subcategory].
MEMBERS
*
[Product].[Product].[Product].
MEMBERS
*
[Customer].[Postal Code].[Postal Code].
MEMBERS
ON
1
FROM [Adventure Works]
WHERE([Date].[Calendar Year].&[2004])

The query returns the same results in just 6 seconds. Analysis Services can do non-empty filtering very efficiently on regular measures, but has to use a more time-consuming algorithm on calculated measures which don’t have NON_EMPTY_BEHAVIOR set; this property just tells AS that it can use the more efficient algorithm when doing non-empty filtering on the calculated measure and treat it eaxactly the same as the given regular measure. Clearly, we can say that whenever [Measures].[Internet Sales Amount] is null then [Measures].[Internet Sales Amount]*2 will be null.

But what about calculated measures where we can’t set NON_EMPTY_BEHAVIOR? Take the following query, similar to the one we’ve just been looking at, which again executes in 1 minute 20 seconds on a cold cache:

WITH MEMBER MEASURES.DEMO AS ([Measures].[Internet Sales Amount], [Date].[Calendar Year].PREVMEMBER)SELECT {[Measures].[Internet Sales Amount], MEASURES.DEMO} ON 0,
NON EMPTY
[Product].[Category].[Category].MEMBERS
*
[Product].[Subcategory].[Subcategory].MEMBERS
*
[Product].[Product].[Product].MEMBERS
*
[Customer].[Postal Code].[Postal Code].MEMBERS
ON
1
FROM
[Adventure Works]
WHERE([Date].[Calendar Year].&[2004])

The first thing to note is that if you can do away with the calculated measure altogether, then do so. The following query returns the same values but by crossjoining the years we’re interested in with [Measures].[Internet Sales Amount]:

SELECT

{[Measures].[Internet Sales Amount]}
*
{[Date].[Calendar Year].&[2004], [Date].[Calendar Year].&[2003]}
ON 0,
NON EMPTY
[Product].[Category].[Category].MEMBERS
*
[Product].[Subcategory].[Subcategory].MEMBERS
*
[Product].[Product].[Product].MEMBERS
*
[Customer].[Postal Code].[Postal Code].MEMBERS
ON
1
FROM
[Adventure Works]

If you’re generating the entire query dynamically, then this might be more work but is the cleanest option. But often we can’t get rid of the calculated measure, or don’t want to for reasons of simplicity. We can’t set NON_EMPTY_BEHAVIOR on the calculated measure on our calc because we can’t say that it will be null whenever [Measures].[Internet Sales Amount] or any other regular measure will be null but we can do something similar. The following query returns the same results as the preceding query but again brings the performance back down to 6 seconds

WITH

MEMBER MEASURES.DEMO AS
([Measures].[Internet Sales Amount], [Date].[Calendar Year].PREVMEMBER)
SELECT {[Measures].[Internet Sales Amount], MEASURES.DEMO} ON 0,
NON EMPTY
NONEMPTY(
[Product].[Category].[Category].MEMBERS
*
[Product].[Subcategory].[Subcategory].MEMBERS
*
[Product].[Product].[Product].MEMBERS
*
[Customer].[Postal Code].[Postal Code].MEMBERS
, ([Measures].[Internet Sales Amount],[Date].[Calendar Year].[All Periods]) )
ON 1
FROM
[Adventure Works]
WHERE([Date].[Calendar Year].&[2004])

What we’re doing here is a two-step filtering operation: first remove the tuples on rows which would be empty for all years using the NONEMPTY function (this allows us to use the more efficient algorithm again, because we’re filtering by a regular measure) and then getting rid of the empty rows that are still left using NON EMPTY.

Written by Chris Webb

August 13, 2006 at 9:21 am

Posted in MDX

More on Visio 12 as an AS Client

leave a comment »

At the end of last year I linked to a post on Nick Barclay’s blog entitled "Don’t forget about Visio for BI":
…which highlighted some of the features coming in Visio 2007 for BI. I naturally then forgot all about Visio for BI until I saw the following post on Mauro Cardarelli’s blog:
Apart from an online course on what’s new in Visio 2007 (https://www.microsoftelearning.com/eLearning/offerContent.aspx?offerPriceId=99493) he also points back to Eric Rockey’s Visio 2007 blog, which is definitely checking out again, especially this post on PivotDiagrams:
There are some interesting points raised in the comments section on whether this stuff could be integrated with Reporting Services; I agree that while these capabilities are very cool, just having them in Visio is a bit of a waste. Perhaps this is a case where a Reporting Services Custom Report Item could be written (see ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/rptsprg9/html/402910f7-7e45-4e22-8182-fb58a96bfec0.htm ) but I’m not sure you’d want to use Visio on a server in this way. For further reading, there is even more information on PivotDiagrams on Bill Morein’s blog here:

https://blogs.msdn.com/wmorein/

 

Written by Chris Webb

August 6, 2006 at 8:35 am

Posted in Client Tools

Follow

Get every new post delivered to your Inbox.

Join 2,868 other followers