Chris Webb's BI Blog

Analysis Services, MDX, PowerPivot, DAX and anything BI-related

Adapting SQLIS to work with tuples and sets as a data source

with 3 comments

It’s been a long time since I posted in my ‘random thoughts’ category… but I just had such an interesting idea I thought I’d post it up (even if there’s 0% chance I’ll ever get round to implementing this).
 
I was looking at a (non-Microsoft platform) BI tool today and got thinking about MDX, how people find it hard to work with, and how most client tools don’t really expose the power of MDX sets, and how handy it would be to be able to do some procedural things in MDX too. This particular tool had some cool set-based selection functionality and I reflected that even though I’d seen similar set-based selection tools, some on AS (didn’t Proclarity have something in this area?), they’d never really taken off; I also thought about the much-missed MDX Builder tool which had a similarly visual approach to building MDX expressions. I started thinking about whether it would be worth building another client tool which took this approach but quickly came to the conclusion that the world needed another AS client tool like a hole in the head, but realised that if I was going to build this kind of tool how much it would resemble Integration Services. And then I had my idea: why not extend Integration Services so it can treat MDX sets and tuples as a data source, and then use its existing functionality and create new transformations to implement MDX set-based operations?
 
Let me explain in more detail. I’m not talking about simply getting data out of AS in the same way you’d get it out of a SQL Server table, using an MDX query. What I’m saying is that what would be flowing though the IS data flow tasks would be members, sets and tuples: each ‘row’ of data would be an MDX expression returning member, or tuple, or set. So you’d create a custom data source where you could define a set as your starting point – probably at this point you’d just select a whole level, or the children of a member, or some such simple set of members. For example you might select the [Customers].[Customer].[Customer] level in your Customer dimension; the output from this would be a single text column and a single row containing the set expression [Customers].[Customers].[Customers].Members. You could then put this through an Exists() transform to return only the customers in the UK and France, the output from which would be the set expression Exists([Customer].[Customer].[Customer].Members, {[Customer].[Country].&[United Kingdom], [Customer].[Country].&[France]}). Similarly then you could put this through a Crossjoin() transform to crossjoin this set with the set of all your Products, then put the result through a NonEmpty() transform to remove all non empty combinations from the set. At this point your output would still be a single row and column, consisting of the MDX expression:

NonEmpty(
Crossjoin(
Exists(
[Customer].[Customer].[Customer].Members
, {[Customer].[Country].&[United Kingdom], [Customer].[Country].&[France]})
, [Product].[Product].[Product].Members)
, [Measures].[Internet Sales Amount])

So far, so dull though. All we’ve got is a way of building up a string containing an MDX set expression and SQLIS brings little to the party. But the real fun would start with two more custom transformations: SetToFlow and FlowToSet. The former would take an input containing MDX set expressions (and conceivably there could be more than one row, although we’ve only got one so far) and would output a flow containing all the tuples in the set(s) we’ve passed in. Taking the set above, the output would be the contents of measures.outputdemo in the following query on AdventureWorks:

with member measures.outputdemo as TupleToStr(
([Customer].[Customer].Currentmember, [Product].[Product].Currentmember)
)
select {measures.outputdemo} on 0,
NonEmpty(
Crossjoin(
Exists(
[Customer].[Customer].[Customer].Members
, {[Customer].[Country].&[United Kingdom], [Customer].[Country].&[France]})
, [Product].[Product].[Product].Members)
, [Measures].[Internet Sales Amount])
on 1
from
[Adventure Works]

The FlowToSet transform would do the opposite, ie take an input containing tuples and return a single row containing the set represented by the entire input. For the above example, this would be a big set:
{([Customer].[Customer].&[12650],[Product].[Product].&[214]), ([Customer].[Customer].&[12650],[Product].[Product].&[225]),…}
But the point of this would be that you could then apply more MDX set expressions efficiently, although of course there’s no reason why you can’t apply MDX set expressions to individual tuples in a data flow. The final important
custom transform you’d need would be an Evaluate transform, which would append one or more numeric or text columns to a tuple or set dataflow: each of these columns would be populated by evaluating an MDX expression which returned a value against the set or tuple for each row. So, for example, if a row contained a the set we’ve been using we could apply a the Count function to it and get the value 12301 back; if a row contained the tuple ([Customer].[Customer].&[12650],[Product].[Product].&[214]) we could ask for the value of this tuple for the measure [Internet Freight Cost] and get the value 0.87 back; or to the same tuple we could ask for the value of [Customer].[Customer].CurrentMember.Name and get back the value "Aaron L. Wright".
 
Of course the beauty of this is that once you’ve got a flow containing sets, tuples and numeric values retrieved from the cube for them then you can use all the cool existing SQLIS functionality too, like multicasts, lookups, UnionAlls, Aggregates etc to do stuff with your sets that is hard in pure MDX; and of course you can easily integrate other forms of data such as relational or XML, and do useful things at the end of it all like send an email to all your male customers in the UK who bought three or more products in the last year, or who live in London and have incomes in excess of £50000 and have averaged over £50 per purchase, or who have been identified as good customers by a data mining model, and who aren’t on the list of bad debtors that you’ve got from the Accounts department’s Excel spreadsheet.
 
Now of course all of this is possible with using only relational data with SQLIS, or even without using SQLIS and just using pure MDX. I guess the point of this is, as always, that it provides an easier way to do stuff: build MDX expressions without having to know much MDX, integrate AS data with other data and other applications without doing (much) coding, and so on.
 
So, as ever, I’d be interested in your comments on this. I have the distinct feeling that this is a solution in search of a problem… but if you can think of some problems it might solve, then let me know!

Written by Chris Webb

August 25, 2006 at 3:52 pm

Posted in Random Thoughts

3 Responses

Subscribe to comments with RSS.

  1. I like this more in theory than in practice. If you are targeting business (or even super) users with this, I think the SSIS flow metaphors (and the interface) are too alien. With that said someone really needs to come up with a decent MDX client. The best I\’ve seen so far is Proclarity and even that only utilizes like 5% of MDX\’s potential. I built a MDX client for a BI portal application a couple of years back and know how hard it is to create an intuitive interface for MDX. The main problem is the compexity of the tuple. What I did was to ignore tuples and only allow for one dimensional sets in each axis. Another problem is that MDX is so bloody hard to parse. I\’ve yet to see a client reverse engineering a hand typed mdx statement.

    peter

    August 25, 2006 at 9:19 pm

  2. I think a valuable thing for novice MDX developers would be to have the SQL statement equivalents for an MDX problem.  I believe most people who start using MDX have a fairly decent SQL background, and often writing relational queries are straight forward, but MDX is a bit trickier.  I realize that SQL does not have all the flexibility that MDX does, but there is rarely a query that cannot be written in SQL using subqueries, advanced analytic SQL functions (such as PIVOT and CUBE or PARTITION OVER, RANK, etc.)
     
    I was thinking of creating a set of SQL queries based on the AdventureWorks database, and trying to write the MDX equivilents, to keep a library of "snippets" that I can use for various occasions.  Would anyone else find this valuable?
     
    -Kory

    Kory

    August 26, 2006 at 5:39 pm

  3. Chris, you\’re on to something here on the broader applicability of MDX !!!!
     

    http://frustratedprogramming.blogspot.com/2006/08/on-broader-applicability-of-mdx.html

    John

    August 26, 2006 at 9:10 pm


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 3,240 other followers

%d bloggers like this: