Chris Webb's BI Blog

Analysis Services, MDX, PowerPivot, DAX and anything BI-related

The rather-too-many ways to crossjoin in MDX

with 20 comments

In my last post I made the point that it’s a bit too easy to write and MDX query that works, even if you don’t really understand why it works, and in this post I’m going to address a similar issue. In MDX one of the commonest set operations is a crossjoin, and while most people understand what this operation does there are so many ways of writing a crossjoin in MDX that it can hurt readability and make the language even more confusing for newcomers. So what are all these different ways of crossjoining, and which one is to be preferred?

First of all, let’s look at what a crossjoin actually does. Imagine we have two sets, each with two members in them: {A,B} and {X,Y}. If we crossjoin these two sets together, we get a set of tuples containing every possible combination of A and B and X and Y, ie the set {(A,X), (A,Y), (B,X), (B,Y)}.

As an example of this, let’s look at the first way of doing a crossjoin in MDX: the Crossjoin() function. Here’s a query against the Adventure Works cube that returns the crossjoin of the two sets {Married, Single} and {Female, Male} on the rows axis:

SELECT {[Measures].[Internet Sales Amount]} ON 0,
CROSSJOIN(
{[Customer].[Marital Status].&[M], [Customer].[Marital Status].&[S]}
,
{[Customer].[Gender].&[F],[Customer].[Gender].&[M]}
)
ON 1
FROM [Adventure Works]

Here’s the output:

image

As you’d expect, you get four rows for each of the four tuples in the set: {(Married, Female), (Married, Male), (Single, Female), (Single, Male)}.

What are the pros and cons of using the Crossjoin() function then? Well, one thing it’s worth stating is that all of the different ways of doing crossjoins in MDX perform just as well as the others, so it’s purely a question of readability and maintainability. On those criteria its main advantage is that it’s very clear you’re doing a crossjoin – after all, that’s the name of the function! However, because it carries an overhead in terms of the numbers of brackets and commas and the name of the function itself, which isn’t so good for readability, and this is why I generally don’t use it. When you’re crossjoining a lot of sets together, for example:

SELECT {[Measures].[Internet Sales Amount]} ON 0,
CROSSJOIN(
{[Customer].[Marital Status].&[M], [Customer].[Marital Status].&[S]}
,
{[Customer].[Gender].&[F],[Customer].[Gender].&[M]}
,
[Customer].[Education].[Education].MEMBERS
,
[Customer].[Total Children].[Total Children].MEMBERS
)
ON 1
FROM [Adventure Works]

…you might need to look a long way up to the top of the query to find out you’re doing a crossjoin.

The most popular alternative to the Crossjoin() function is the * operator. This allows you to crossjoin several sets by simply putting an asterisk between them, for example:

SELECT {[Measures].[Internet Sales Amount]} ON 0,
{[Customer].[Marital Status].&[M], [Customer].[Marital Status].&[S]}
*
{[Customer].[Gender].&[F],[Customer].[Gender].&[M]}
ON 1
FROM [Adventure Works]

It’s more concise than the Crossjoin() function and I think easier to read; it also has the advantage of being the most frequently-used syntax. However there are rare cases when it can be ambiguous because an asterisk is of course also used for multiplication. Consider the following calculated measure in the following query:

WITH
MEMBER MEASURES.DEMO AS
([Measures].[Internet Sales Amount]) * ([Customer].[Gender].&[F])
SELECT {[Measures].DEMO} ON 0,
{[Customer].[Marital Status].&[M], [Customer].[Marital Status].&[S]}
ON 1
FROM [Adventure Works]

Are we crossjoining or multiplying here? Actually, we’re multiplying the result of the two tuples together, rather than returning the result of the tuple ([Measures].[Internet Sales Amount], [Customer].[Gender].&[F]), but it’s not easy to tell.

The third way of doing a crossjoin is one I particularly dislike, and it’s the use of brackets and commas on their own as follows:

SELECT {[Measures].[Internet Sales Amount]} ON 0,
({[Customer].[Marital Status].&[M], [Customer].[Marital Status].&[S]}
,
{[Customer].[Gender].&[F],[Customer].[Gender].&[M]})
ON 1
FROM [Adventure Works]

This is for me the least readable and most ambiguous syntax: in my mind round brackets denote a tuple and here we’re getting a set of tuples. I’d therefore advise you not to use this syntax.

Last of all, for maximum MDX geek points, is the Nest() function. Almost completely undocumented and unused, a hangover from the very earliest days of MDX, it works in exactly the same way as the Crossjoin() function:

SELECT {[Measures].[Internet Sales Amount]} ON 0,
NEST(
{[Customer].[Marital Status].&[M], [Customer].[Marital Status].&[S]}
,
{[Customer].[Gender].&[F],[Customer].[Gender].&[M]}
)
ON 1
FROM [Adventure Works]

Of course you’d never want to use it unless you were either showing off or wanted to confuse your colleagues as much as possible…

In summary, I’d recommend using the * operator since it’s probably the clearest syntax and also the most widely-accepted. Equally importantly, I’d advise you to be consistent: choose one syntax, stick with it and make sure everyone on the project does the same.

Written by Chris Webb

July 5, 2011 at 4:51 pm

Posted in MDX

20 Responses

Subscribe to comments with RSS.

  1. I have to say that after 12 years of MDX experience, I didn’t know the existence of the last two syntaxes!
    Thanks Chris!

    Marco Russo

    July 5, 2011 at 6:58 pm

  2. Hello Chris,

    Nice overview.

    When you want to show not all the tuples from a crossjoin result, but only some specific tuples from a crossjoin result, then you can also immediately specify them tuples. E.g.
    SELECT
    {[Measures].[Reseller Sales Amount]} on 0
    , {([Alabama],[Bikes]),([Maryland].[Clothing])} on 1
    FROM [Adventure Works];
    I agree, this is not a crossjoin anymore (see your definition above).

    Thx
    Franky L.

    Franky Leeuwerck

    July 6, 2011 at 8:19 am

    • Yes, this is just explicitly specifying the set of tuples you want, not a crossjoin.

      Chris Webb

      July 6, 2011 at 8:37 am

  3. Hello Chris,

    I dint get what you want to say here, if i use normal set also i will get same 4 rows , i mean to say,

    SELECT {} ON COLUMNS,
    ({Male, Female} * {Single, Married}) ON ROWS FROM [CUBE NAME]

    (OR)
    SELECT {} ON COLUMNS,
    ({Male, Female} , {Single, Married}) ON ROWS FROM [CUBE NAME]

    RESULT IS SAME

    Married Female
    Married Male
    Single Female
    Single Male

    My question is, when both provide same result then what is the use of cross join?

    vinay

    December 30, 2011 at 11:35 am

    • Hi Vinay,

      Neither of your examples show a ‘normal’ set – they both show different ways of performing a crossjoin on two sets and returning the same set of four tuples.

      Chris

      Chris Webb

      December 30, 2011 at 11:54 am

  4. […] Chris Webb (b | t) has an excellent blog post on this topic: The rather-too-many ways to crossjoin in MDX. In that post, he examines each option and settles on Option 3 as his preferred method. He lays out […]

  5. […] (B,X), (B,Y)}. There are in fact several ways to write a crossjoin in MDX as I showed in this post, and I prefer to use the * operator over the Crossjoin() function because it’s less verbose. […]

  6. how to get results for married female and single males only. i.e. 2 rows only using MDX???

    Nimit Goyal

    July 22, 2013 at 8:47 am

  7. I am stuck in one MDX query.

    I want to implement cross join in MDX query.

    Example, I have 1 table for gender (M,F) and 1 for marital status (M,S).

    The below query gives me all combinations of Married Males and Females and single males and females.

    SELECT {[Measures].[Internet Sales Amount]} ON 0,
    {[Customer].[Marital Status].&[M], [Customer].[Marital Status].&[S]}
    *
    {[Customer].[Gender].&[F],[Customer].[Gender].&[M]}
    ON 1
    FROM [Adventure Works]

    But, I want to get only Married females and single males and also measures are different. I don’t need to show them on axis..The condition would be in where in MDX.

    Please revert if anyone knows it.

    Nimit Goyal

    July 22, 2013 at 9:19 am

    • In this scenario, you don’t need a condition, you just need to know how to write the set of tuples you want. For example to get married females and single males you’d just say:

      SELECT {[Measures].[Internet Sales Amount]} ON 0,
      {
      ([Customer].[Marital Status].&[M], [Customer].[Gender].&[F]) ,
      ([Customer].[Marital Status].&[S], [Customer].[Gender].&[M])
      }
      ON 1
      FROM [Adventure Works]

      Chris Webb

      July 22, 2013 at 1:53 pm

  8. […] you can cross join different hierarchies using three different ways. Chris Webb has a blog here, The rather-too-many ways to crossjoin in MDX, which is better than anything I can write about CROSS […]

  9. Hi Chris,

    Does a Dim1.Hier1.members * Dim2.Hier2.members * Dim3.Hier3.member implicitly get converted to a CROSSJOIN(CROSSJOIN(Dim1.Hier1.members, Dim2.Hier2.members), Dim3.Hier3.members)? Or does any optimization go on in the background when using the * way of cross-joining. Is there any difference in execution times?

    Sourav

    January 21, 2015 at 2:16 pm

    • One doesn’t get converted to the other – they’re just two different ways of writing the same thing and the engine handles them in the same way. There should be no difference in performance provided the sets are used in the same order.

      Chris Webb

      January 21, 2015 at 2:23 pm

      • Please ignore my prev reply(I wanted to delete it, but I don’t see a delete button).

        So you mean no real difference between CROSSJOIN(, , ) and CROSSJOIN(CROSSJOIN(, ), )? Does the engine work on the crossjoins in an exactly similar fashion?

        Sourav

        January 21, 2015 at 2:44 pm

      • Yes, it should do.

        Chris Webb

        January 21, 2015 at 2:49 pm

      • But remember the order of the sets in the crossjoin can be important if two or more sets come from different hierarchies on the same dimension. So
        Crossjoin(a,b,c)
        is the same as
        Crossjoin(Crossjoin(a,b),c)
        but might not perform the same as
        Crossjoin(a,c,b)

        Chris Webb

        January 21, 2015 at 2:51 pm

      • I saw an intriguing question on one of the online forums that made me curious and knock your door? You can see it here –> http://stackoverflow.com/q/28069076/2993606

        Sourav

        January 21, 2015 at 2:46 pm

      • This question is about Mondrian, not SSAS. I don’t know whether Mondrian might handle this differently.

        Chris Webb

        January 21, 2015 at 2:51 pm

  10. If I want to display members of 2 hierarchies on an axis, which is the better way of displaying them? Crossjoin or a tuple. e.g. (Dim1.Hier1.members, Dim1.Hier2.members) or (Dim1.Hier1.members * Dim1.Hier2.members). Is there any general rule about it?

    Sourav

    January 21, 2015 at 2:20 pm

    • Both of your examples are crossjoins – (Dim1.Hier1.members, Dim1.Hier2.members) is not a tuple, it is another way of writing a crossjoin. I generally prefer using the * and always avoid writing (Dim1.Hier1.members, Dim1.Hier2.members) because I think the latter is very confusing: it looks like a tuple but it isn’t.

      Chris Webb

      January 21, 2015 at 2:24 pm


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 3,865 other followers

%d bloggers like this: