Chris Webb's BI Blog

Analysis Services, MDX, PowerPivot, DAX and anything BI-related

Archive for July 2008

Named Sets, AutoExists and Katmai

with 6 comments

A couple of months ago Radim Hampel pointed out to me some very weird stuff happening with named sets and the Where clause. Since it turned out that Darren had run into the same issue and also been thrown by it, and since I tested it out on Katmai CTP6 and could see that it was behaving differently from AS2005, I opened an item on Connect:
https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=331186

And now, after a long and detailed email thread involving Mosha, Edward Melomed, Marius Dumitru, Darren and Deepak we’ve got to the stage where I understand what’s going on, Katmai RC0 does roughly what I want, and I can blog about it!

Let me explain what I saw first. What would you expect the calculated member in following query to return?

–Query 1
WITH
  SET mytestset AS
    PeriodsToDate
    (
      [Date].[Calendar].[Calendar Year]
     ,[Date].[Calendar].[Month].&[2004]&[6]
    )
  MEMBER measures.test AS
    SetToStr(mytestset)
SELECT
  measures.test ON 0
FROM [adventure works]
WHERE
  [Date].[Calendar].[Month].&[2004]&[7]

Just from looking at the code I would have set the calculated member should return the string representation of the set from January 2004 to June 2004. But if you run the query you will in fact see that it returns the set containing the member July 2004 on both AS2005 and Katmai. To me that made absolutely no sense… Now, take a look at this query:

–Query 2
WITH
  SET mytestset AS
    [Date].[Calendar].[Month].&[2004]&[6]
  MEMBER measures.test AS
    SetToStr(mytestset)
SELECT
  measures.test ON 0
FROM [adventure works]
WHERE
  [Date].[Calendar].[Month].&[2004]&[7]

On AS2005 the calculation returns June as I would expect, on Katmai it returns an empty set. Now run this query:

–Query 3
WITH
  SET mytestset AS
    {
      [Date].[Calendar].[Month].&[2004]&[5]
     ,[Date].[Calendar].[Month].&[2004]&[6]
    }
  MEMBER measures.test AS
    SetToStr(mytestset)
SELECT
  measures.test ON 0
FROM [adventure works]
WHERE
  [Date].[Calendar].[Month].&[2004]&[7]

and this query:

–Query 4
WITH
  SET mytestset AS
    (
      [Date].[Calendar].[Month].&[2004]&[5]
     :
      [Date].[Calendar].[Month].&[2004]&[6]
    )
  MEMBER measures.test AS
    SetToStr(mytestset)
SELECT
  measures.test ON 0
FROM [adventure works]
WHERE
  [Date].[Calendar].[Month].&[2004]&[7]

…which to me should do the same thing. On AS2005 query 3 returns the set May and June but Katmai returns an empty set; query 4 returns an empty set on both AS2005 and Katmai. At this point I could see that something funny was happening that I didn’t like!

What are the practical implications of this? Take the following query from Mosha’s blog entry on ranking:

WITH
  SET OrderedEmployees AS
    Order
    (
      [Employee].[Employee].[Employee].MEMBERS
     ,[Measures].[Reseller Sales Amount]
     ,BDESC
    )
  MEMBER [Measures].[Employee Rank] AS
    Rank
    (
      [Employee].[Employee].CurrentMember
     ,OrderedEmployees
    )
SELECT
  [Measures].[Employee Rank] ON 0
,[Employee].[Employee].[Employee].MEMBERS ON 1
FROM [Adventure Works]

Run it and you’ll see that the Employee with the key 46, A Scott Wright, has a rank of 18. Now let’s slice by this Employee:

WITH
  SET orderedemployees AS
    Order
    (
      [Employee].[Employee].[Employee].MEMBERS
     ,[Measures].[Reseller Sales Amount]
     ,bdesc
    )
  MEMBER measures.[employee rank] AS
    Rank
    (
      [Employee].[Employee].CurrentMember
     ,orderedemployees
    )
SELECT
  measures.[employee rank] ON 0
FROM [adventure works]
WHERE
  [Employee].[Employee].&[46]

Run this and you’ll see that A Scott Wright has now supposedly got a rank of 1. Whatever the logic behind this, it doesn’t make sense from an end user perspective does it?

So how can we explain what’s happening here? It’s all to do with autoexists: in some cases it makes sense to apply autoexists to named sets, but in others (mostly when the set is intended for use in a calculation) then it doesn’t. Let’s forget about trying to understand what AS2005 does because it tries to guess when it should apply autoexists and gets very confused, but Katmai is mostly consistent and logical: by default it applies autoexists to all named sets. That explains why queries 2,3 and 4 all return empty sets on Katmai: May and June don’t exist with July.

After my initial item on Connect was opened those nice people in Redmond (who agreed with me that the way things were working wasn’t ideal) added a new connection string property, Autoexists, which can have the following values:

0 – default (same as 1)
1 – Apply deep autoexists for query axes and named sets (with WHERE clause and subselects)
2 – Apply deep autoexists for query axes and no autoexists for named sets (with WHERE clause and subselects)
3 – Apply shallow autoexists for query axes with WHERE clause, deep autoexists for query axes with subselects, no autoexists for named sets with WHERE clause and deep autoexists for named sets with subselects

Here’s the explanation I got from Marius about what’s meant by ‘deep’ and ‘shallow’ autoexists:

Suppose a query axis or named set involves a set expression of the form F(G(s)), with F and G being set functions (e.g. TopCount, Tail etc.)
Let SSW denote the Where clause slice and Subselect restrictions present in the query.
“Shallow autoexists” evaluates the set expression as Exists(F(G(s)), SSW).
“Deep autoexists” evaluates the set expression as Exists( F( Exists( G( Exists(s, SSW) ), SSW ), SSW) – it applies autoexists with the Where and Subselects at every intermediary step. Deep autoexists applies to all set-valued subexpressions being evaluated in the root context of the query (i.e. the context of default member coordinates, or Where clause coordinates, if a Where clause is present).
For many/most set functions (e.g. Union), the rules above produce the same result.
For others (e.g. TopCount, Head/Tail), the results differ in the general case.

So the behaviour I describe above for RC0 is also what you get when you put Autoexists=1 in the connection string. But what about Query 1 – why does the PeriodsToDate function return July with this setting? Hmm, well I think this is a bug and it should return an empty set. I opened another Connect about this:
https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=356193
and I’ll update this post when I get an answer on it.

What happens with Autoexists=2 and Autoexists=3 then? In both cases, Query 1 returns the set of months from January to June; Query 2 returns June, Query 3 and Query 4 both return May and June. To see the difference between these two settings take a look at this query:

WITH
  SET mytestset AS
    (
      [Date].[Calendar].[Month].&[2004]&[5]
     :
      [Date].[Calendar].[Month].&[2004]&[6]
    )
  MEMBER measures.test AS
    SetToStr(mytestset)
SELECT
  measures.test ON 0
FROM (SELECT
  [Date].[Calendar].[Month].&[2004]&[7] ON 0
FROM [adventure works] )

With Autoexists=1 and Autoexists=3 the set here is empty; with Autoexists=2 then it contains May and June.

Written by Chris Webb

July 15, 2008 at 5:08 pm

Posted in MDX

Microsoft BI Conference 2008 session list now posted

with 4 comments

You can see the session list for this year’s BI Conference here:

http://www.msbiconference.com/pages/tracksandsessions.aspx

Looks good. Hmm, wish I could go to this and PASS, but I just can’t justify the expense and the time…

Written by Chris Webb

July 14, 2008 at 5:09 pm

Posted in Events

Caching: can there be too much of a good thing?

with 2 comments

I was doing some load testing on a cube recently and noticed an interesting thing. When my load tests started, the average query response time was high for the first minute or so, which is reasonable given that all queries were being run on a cold cache; subsequently the average query response time fell dramatically, as you would expect when the cache warmed up. But over a long period (and I was running a lot of queries over several hours) I noticed that the average query response time started creeping up again. Not by a massive amount, it’s true, but enough to be noticeable: say a rise from three to four seconds.

Naturally I fired off an email to various people who I thought might help, and as usual Mosha came up with the goods. It turns out he’d noticed the same thing happening on the stress tests that MS run internally but come to the conclusion that it wasn’t going to be a problem in the real world. I won’t try to repeat too much of the technical detail he gave in case I end up garbling it but essentially there were two reasons why this performance degradation was happening:

  • Most stress tests use template queries into which random parameters are passed. So, for example, I had captured several ‘typical’ queries with four or five dimensions in the Where clause which in my test I parameterised;I then selected a random member on each of these dimensions to pass in to each query. This resulted in a lot of queries being run which returned no data at all, which in turn meant that the cache filled up with entries in the index that were very small and not much use, which in turn made cache lookups slower. In the real world the proportion of queries that return no data is much smaller.
  • The AS cache is optimised for ad-hoc query patterns, where a user starts in one place then drills down/up, slices, dices and so on, and where the data that a query returns is very closely connected to the data retrieved by the last query that was run. This is very different from randomised queries, which as a result perform worse.

So all in all, it’s nothing to worry about. I guess if you had a large cube, lots of users, random query patterns and didn’t process your cube very often then it might be a slight problem, but that’s a lot of ifs.

As an aside, if you’re thinking of doing stress testing I wouldn’t waste any time trying to get the AS Stress tool I blogged about here working – I found it a real pain and ended up building my own load test tool in SSIS using an approach similar to my cache warmer package.

Written by Chris Webb

July 9, 2008 at 5:40 pm

Posted in Analysis Services

PASS Camp Germany 2008

leave a comment »

Are you looking for a public MDX course in Europe? I teach private MDX courses (I’m in the process of building a new website with all the details on, now I’ve left Solid Quality – drop me an email if you’re interested in one in the meantime) but I’ll also be teaching a 3-day public MDX course at PASS Camp Germany from the 2nd to the 4th of September this year:
http://www.sqlpasscamp.de/default.html

Although the website’s in German my session will be in English, and although the event is aimed at the German market I’m sure anyone from anywhere will be more than welcome!

UPDATE: here are the details in English:
http://www.sqlpasscamp.de/mail/news1a.html

Written by Chris Webb

July 4, 2008 at 5:13 pm

Posted in Events

Dynamically generating session calculated members in code with Analysis Services 2008

with 3 comments

One problem I come across from time to time is the need to be able to group members on a dimension into buckets, for example if you have a Customer dimension you may want to group your Customers together by age group and run a query that shows sales for the 0-9 age group, the 10-19 age group and so on. If you know what buckets you want then you can simply build this into your dimension in advance by creating another attribute; but what if your users want to be able to change the definition of the buckets themselves from query to query – perhaps they wanted to 5 year age groups rather than 10? I described a bit of a hack you could use on AS2K to do this a while ago, but it doesn’t work any more on AS2005 or AS2008 because it uses the unsupported CreatePropertySet function and in any case it was pretty nasty; but the only alternative is to be able to have your client tool dynamically create large numbers of calculated members (one for each bucket) with the appropriate definition and then use these calculated members in your query.

However with Analysis Services 2008 there’s now another possibility – you can create a stored procedure that can create calculated members within the current session. The "Analysis Services Personalization Extensions" sample shows off a lot of the new possibilities for server-side coding and Michel Caradec blogged about them a while ago too, but neither show how to create calculated members so I thought I’d put an example up here.

The following class generates the calculated members needed to draw a Lorenz curve (as always please excuse the code – I’m no great C# coder and I just threw it together to make the point):

using System;
using System.Collections.Generic;
using System.Text;
using Microsoft.AnalysisServices.AdomdServer;

namespace testdc
{
    public class testdc
    {
        public static void Lorenz(string cubeName, int incr, string setToGroup, string numericValue)
        {
            AdomdCommand cmd = new AdomdCommand();
            try
            {
                //drop set
                cmd.CommandText = "drop set [" + cubeName + "].[Lorenz]";
                cmd.ExecuteNonQuery();
            }
            catch
            {
                //if the set isn't there we'll get an error we can ignore
            }
            StringBuilder s = new StringBuilder();
            s.Append("{");
            //create the calculated member for the total of the set
            //try to drop the member first, in case it exists
            try
            {
                cmd.CommandText = "drop member [" + cubeName + "].measures.LorenzTotal";
                cmd.ExecuteNonQuery();
            }
            catch
            {
                //if the member isn't there we'll get an error we can ignore
            }
            cmd.CommandText = "create member [" + cubeName + "].measures.LorenzTotal ";
            cmd.CommandText += "as aggregate(" + setToGroup + " , " + numericValue + ")";
            cmd.CommandText += ", visible=false;";
            cmd.ExecuteNonQuery();
            //create the calculated members for the Lorenz curve
            for (int i = incr; i <= 100; i += incr)
            {
                //try to drop the member first, in case it exists
                try
                {
                    cmd.CommandText = "drop member [" + cubeName + "].measures.Lorenz";
                    cmd.CommandText += i.ToString();
                    cmd.ExecuteNonQuery();
                }
                catch 
                {
                    //if the member isn't there we'll get an error we can ignore
                }
                cmd.CommandText = "create member [" + cubeName + "].measures.Lorenz"; 
                cmd.CommandText +=  i.ToString();
                cmd.CommandText += " as aggregate(bottomcount(" + setToGroup ;
                cmd.CommandText += ", count(" + setToGroup + ") * " + i.ToString() + "/100 , ";
                cmd.CommandText += numericValue + "), " + numericValue + ")/measures.LorenzTotal";
                cmd.CommandText += ", visible=false, format_string='percent';";
                cmd.ExecuteNonQuery();
                s.Append("measures.Lorenz" + i.ToString() + ",");    
            }
            s.Append("{}}");
            //Create set containing all the calculated members
            cmd.CommandText = "create hidden set [" + cubeName + "].[Lorenz] as " + s.ToString();
            cmd.ExecuteNonQuery();
            cmd.Dispose();
        }
    }
}

You need to call the as follows:

call
testdc!Lorenz(
//The name of the cube you’re using
"Adventure Works",
//The % increment for each grouping member
//For example, 5 will result in 100/5=20 members
5,
//The set to group
"[Customer].[Customer].[Customer].Members",
//The measure to use
"[Measures].[Internet Sales Amount]")

When it executes it creates a number of calculated members (the exact number depends on the second parameter) and a named set called [Lorenz] with them all in. Having done that you can run a query and request all the members in the named set:

select [Lorenz] on 0 from [adventure works]

You can call the sproc many times in the same session with different parameters and it should always work, although I’ll admit it could do a better job of dropping old calculated members. It would also have been able to create a function that did much the same thing but which returned a set containing all these calculated members, so you could call it direct from your query and not have to call the sproc beforehand, but I couldn’t get that to work unfortunately.

Written by Chris Webb

July 3, 2008 at 6:25 pm

Posted in MDX

Follow

Get every new post delivered to your Inbox.

Join 3,202 other followers