Chris Webb's BI Blog

Analysis Services, MDX, PowerPivot, DAX and anything BI-related

Archive for July 2008

Many-to-Many Dimension bug

with 2 comments

Jon Axon mailed me recently with an interesting bug he’d come across concerning many-to-many dimensions, where it looks like the AS engine is trying to be a bit too clever for its own good as far as query optimisation goes. It’s reproducible on AS2005 SP2 and Katmai RC0, and I logged it on Connect here: https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=357469

Here’s an example on Adventure Works that Jon gave me. The following query:

SELECT
[Measures].[Customer Count] on 0
FROM [Adventure Works]

Shows the total number of customers as being 18484. Looking at the Sales Reason dimension (which has a many-to-many relationship with the measure group Customer Count is from), if you run a relational query on the underlying data source you can see that not every order was associated with a sales reason – this is the key point here. Now if we run the query:

SELECT
[Measures].[Customer Count] on 0
FROM [Adventure Works]
WHERE [Sales Reason].[Sales Reason].[All Sales Reasons]

We can see that it returns the same result as the first query, 18484, as you would expect. But what if we want to find the number of customers who specified a sales reason? You would think that the following query would do that:

SELECT
[Measures].[Customer Count] on 0
FROM [Adventure Works]
WHERE [Sales Reason].[Sales Reason].[All Sales Reasons].Children

But it doesn’t, it returns 18,484 again. However if you run this query which should be equivalent to the previous query:

SELECT
[Measures].[Customer Count] on 0
FROM [Adventure Works]
WHERE {[Sales Reason].[Sales Reason].&[1],
       [Sales Reason].[Sales Reason].&[2],
       [Sales Reason].[Sales Reason].&[3],
       [Sales Reason].[Sales Reason].&[4],
       [Sales Reason].[Sales Reason].&[5],
       [Sales Reason].[Sales Reason].&[6],
       [Sales Reason].[Sales Reason].&[7],
       [Sales Reason].[Sales Reason].&[8],
       [Sales Reason].[Sales Reason].&[9],
       [Sales Reason].[Sales Reason].&[10]}

It returns the value we’re looking for, 17022. Not good! What I suspect is happening here is that when the query optimiser sees the expression [Sales Reason].[Sales Reason].[All Sales Reasons].Children in the Where clause, it assumes that it’s equivalent to [Sales Reason].[Sales Reason].[All Sales Reasons] which it is in most cases, just not here. Interestingly if you look in Profiler and run the queries on a cold cache, you see exactly the same activity when [Sales Reason].[Sales Reason].[All Sales Reasons].Children is in the Where clause as when [Sales Reason].[Sales Reason].[All Sales Reasons] is there; but when you explicitly list all of the individual sales reasons, as in the last query, because Sales Reason has to resolve itself through the Internet Sales Order Details dimension which is ROLAP, you see SQL queries being fired off against the relational database.

Written by Chris Webb

July 30, 2008 at 4:09 pm

Posted in MDX

Microsoft buys DATAllegro

with one comment

Woweee, great news: Microsoft is buying the DW appliance vendor DATAllegro! This is big, big news for Microsoft BI and I am super-excited. Curt Monash has a few posts that explain why this is so big on his blog:
See also:
 
I’ve long wanted to be able to stick Analysis Services in ROLAP/HOLAP mode on top of one of these MPP babies, and maybe now I’ll get the chance. Potentially in the long term we might get MPP MOLAP too, which would be even better…

Written by Chris Webb

July 24, 2008 at 8:37 pm

Posted in On the internet

XLCubed (and a rant about Microsoft’s client tool strategy)

with 14 comments

The other week I stopped by in Maidenhead to see the guys at XLCubed, and to take a look at their latest stuff. XLCubed have been around a long time and their Excel addin AS client has always been one of the best out there, but with the improved Analysis Services support in Excel 2007 (especially with the introduction of ‘convert to formulas’) and the Proclarity acquisition has put a squeeze on the client tools sector. A lot of the third party client tools out there, XLCubed included, are better in a lot of ways than the equivalent Microsoft offerings but it’s often hard to explain to someone who isn’t very experienced with Analysis Services what the advantages are and why they represent a good reason to buy a non-Microsoft product. So, in order to survive, you need a clear, unique selling point and XLCubed now have one in the form of Microcharts after they bought Bonavista Systems last year (I blogged about the Microcharts product in its original form here). Microcharts gives you the ability to create sparklines, bullet graphs and other in-cell charts, which is not only impressive when used in conjunction with regular Excel and Reporting Services (with or without AS as a data source) but enters the realm of extreme coolness when you see how it’s been integrated with XLCubed.

Here’s just one example of the kind of dashboard you can build with XLCubed:

CIODashboard_550X 

You can see a whole page of sample dashboards here:
http://www.xlcubed.com/en/Demo_Overview.html

Nice, eh? I should also mention they have an excellent data visualisation blog that’s well worth a read:
http://blog.xlcubed.com/

While on the subject of client tools, can I veer off on a tangent here and criticise Microsoft’s strategy in this area? In my opinion (and just about everyone I’ve met agrees with me, not least disgruntled ex-Proclarity employees) what they’ve done has actually harmed the core Microsoft BI market over the last two years. Before the Proclarity acquisition it wasn’t an ideal situation, for sure, since telling customers that they had to buy their client tools from a third party looked bad. But what Microsoft have done is bought the leading third-party client tool and effectively chucked it in the bin, saying people should use Excel and PerformancePoint instead. Excel 2007 is a good client tool but a) a lot of companies are still on Excel 2003 and before, and are not going to upgrade just for the sake of a BI project, b) it has nowhere near the kind of advanced functionality that the Proclarity desktop tool had and never will, and c) it still has a few glaring problems (see here for example); PerformancePoint too is encouraging but very much a version 1.0. Microsoft’s long release cycles for both mean that we have to wait way too long for any upgrade in functionality, and in the meantime we’re left with a vacuum: the third party client tool market has been weakened because now all customers will want to use Microsoft client tools as a first choice, but these client tools are not yet up to scratch. Why on earth didn’t they carry on developing the Proclarity product line for a few more years until a smoother transition could be made? Why the prejudice against standalone client tools? Once again I’m left with the feeling that senior people in Redmond have little idea what’s going on in the real world and more importantly are insulated from the impact that their decisions have on the bottom line. On the positive side, though, Microsoft’s actions have given companies like XLCubed the breathing space they needed to innovate and survive.

Written by Chris Webb

July 24, 2008 at 5:18 pm

Posted in Client Tools

Show Hidden Objects in the Calculations tab

with one comment

Here’s a new button I’ve just noticed in the toolbar on the Calculations tab in BIDS 2008: Show Hidden Objects. Quite often when you’re writing MDX you want to reference hierarchies that are hidden to the end user, and in BIDS 2005 you had to unhide them to be able to see them (and so find out their unique name) in the metadata pane, which was a pain; now in BIDS 2008 you just need to click the new ‘Show Hidden Objects’ button to be able to see them. For some bizarre reason it doesn’t seem to allow me to see hidden sets or calculated members though – why?

showhiddenobjects

Written by Chris Webb

July 23, 2008 at 5:05 pm

Posted in Analysis Services

Kalido Universal Information Director now generates Analysis Services cubes

with 2 comments

Here’s the press release from Kalido:
http://www.kalido.com/5a04ea36-c50e-4617-b25d-85ea56a22690/news-and-events-press-center-press-releases-detail.htm

I don’t have any direct experience with Kalido’s products (although I’ve heard good things) but I’d be interested to see the cubes it generates. I wonder to what extent it’s possible to optimise automatically generated cubes?

Written by Chris Webb

July 21, 2008 at 5:04 pm

Posted in On the internet

SQLBits III – Registration Now Open!

with one comment

Registration for SQLBits III, to be held at Hatfield in the UK on September 13th, is now open! SQLBits is the UK’s finest SQL Server technical conference, and it’s 100% free to attend too. Take a look at:
http://www.sqlbits.com
…to see what we’ve got lined up. We’ve had an amazing 63 session submissions this time (the BI track is the strongest yet, in my opinion) so we’re asking everyone who registers to vote on the sessions they’d like to see. We’re also running a competition to design a logo for the day too, and you can vote on the logos here:
http://www.sqlbits.com/LogoCompetition.aspx
There are also SQLBits groups on Facebook and LinkedIn if you’d like to keep in touch with other people who are going or share photos etc. And as I’ve said before, we’d still like to talk to any companies who are interested in sponsoring the event.

Incidentally, Hatfield is just north of London and right next to Luton airport, so it’s very easy to get to if you’re coming from Europe. Why not pop over on Easyjet, and let your significant other go shopping in London for the day while you enjoy the conference?

One other thing to mention is that we’re having a training day at the same venue the day before, September 12th. There are a number of reasonably-priced one-day seminars on SQL Server related subjects to choose from:
http://www.sqlbits.com/information/TrainingDay.aspx
…including, as you can see, an introduction to SQL Server BI called ‘Making the most of data through business intelligence’ run by me and Allan Mitchell (SSIS MVP and one of the guys behind http://www.sqlis.com/). If you know a DBA, business analyst or manager who wants to learn more about BI then it’s the ideal way to get them out of the office for the day – just tell them it’s a trip to see Mitchell and Webb perform live.

Written by Chris Webb

July 18, 2008 at 11:53 am

Posted in Events

Last night’s BI evening event

leave a comment »

I just wanted to say a quick thank you to everyone who attended last night’s BI evening event at TVP, especially to Jes Kirkup of TAH (TAH also very kindly provided the beer and pizza during the break, which I’m very grateful for) and Jeremy Kashel of Adatis. I’ve asked them both to post their slides up on the web so once they’ve done that I’ll post the links here.

I promise that it won’t be quite so long before we have another event at TVP – I’ll try to do a better job of alternating between central London and TVP because I know we get a completely different group of people at each venue.

UPDATE: You can get hold of Jeremy Kashel’s slides on "PEL vs MDX" here -
http://blogs.adatis.co.uk/blogs/jeremykashel/archive/2008/07/18/bi-evening-slides-for-pel-vs-mdx.aspx

UPDATE #2: You can get hold of Jes Kirkup’s slides, along with a detailed write up of what he presented, on his new blog:
http://jeskirkup.spaces.live.com/blog/cns!55BF48702D38E629!167.entry
http://jeskirkup.spaces.live.com/blog/cns!55BF48702D38E629!169.entry

Written by Chris Webb

July 18, 2008 at 11:52 am

Posted in Events

Named Sets, AutoExists and Katmai

with 6 comments

A couple of months ago Radim Hampel pointed out to me some very weird stuff happening with named sets and the Where clause. Since it turned out that Darren had run into the same issue and also been thrown by it, and since I tested it out on Katmai CTP6 and could see that it was behaving differently from AS2005, I opened an item on Connect:
https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=331186

And now, after a long and detailed email thread involving Mosha, Edward Melomed, Marius Dumitru, Darren and Deepak we’ve got to the stage where I understand what’s going on, Katmai RC0 does roughly what I want, and I can blog about it!

Let me explain what I saw first. What would you expect the calculated member in following query to return?

–Query 1
WITH
  SET mytestset AS
    PeriodsToDate
    (
      [Date].[Calendar].[Calendar Year]
     ,[Date].[Calendar].[Month].&[2004]&[6]
    )
  MEMBER measures.test AS
    SetToStr(mytestset)
SELECT
  measures.test ON 0
FROM [adventure works]
WHERE
  [Date].[Calendar].[Month].&[2004]&[7]

Just from looking at the code I would have set the calculated member should return the string representation of the set from January 2004 to June 2004. But if you run the query you will in fact see that it returns the set containing the member July 2004 on both AS2005 and Katmai. To me that made absolutely no sense… Now, take a look at this query:

–Query 2
WITH
  SET mytestset AS
    [Date].[Calendar].[Month].&[2004]&[6]
  MEMBER measures.test AS
    SetToStr(mytestset)
SELECT
  measures.test ON 0
FROM [adventure works]
WHERE
  [Date].[Calendar].[Month].&[2004]&[7]

On AS2005 the calculation returns June as I would expect, on Katmai it returns an empty set. Now run this query:

–Query 3
WITH
  SET mytestset AS
    {
      [Date].[Calendar].[Month].&[2004]&[5]
     ,[Date].[Calendar].[Month].&[2004]&[6]
    }
  MEMBER measures.test AS
    SetToStr(mytestset)
SELECT
  measures.test ON 0
FROM [adventure works]
WHERE
  [Date].[Calendar].[Month].&[2004]&[7]

and this query:

–Query 4
WITH
  SET mytestset AS
    (
      [Date].[Calendar].[Month].&[2004]&[5]
     :
      [Date].[Calendar].[Month].&[2004]&[6]
    )
  MEMBER measures.test AS
    SetToStr(mytestset)
SELECT
  measures.test ON 0
FROM [adventure works]
WHERE
  [Date].[Calendar].[Month].&[2004]&[7]

…which to me should do the same thing. On AS2005 query 3 returns the set May and June but Katmai returns an empty set; query 4 returns an empty set on both AS2005 and Katmai. At this point I could see that something funny was happening that I didn’t like!

What are the practical implications of this? Take the following query from Mosha’s blog entry on ranking:

WITH
  SET OrderedEmployees AS
    Order
    (
      [Employee].[Employee].[Employee].MEMBERS
     ,[Measures].[Reseller Sales Amount]
     ,BDESC
    )
  MEMBER [Measures].[Employee Rank] AS
    Rank
    (
      [Employee].[Employee].CurrentMember
     ,OrderedEmployees
    )
SELECT
  [Measures].[Employee Rank] ON 0
,[Employee].[Employee].[Employee].MEMBERS ON 1
FROM [Adventure Works]

Run it and you’ll see that the Employee with the key 46, A Scott Wright, has a rank of 18. Now let’s slice by this Employee:

WITH
  SET orderedemployees AS
    Order
    (
      [Employee].[Employee].[Employee].MEMBERS
     ,[Measures].[Reseller Sales Amount]
     ,bdesc
    )
  MEMBER measures.[employee rank] AS
    Rank
    (
      [Employee].[Employee].CurrentMember
     ,orderedemployees
    )
SELECT
  measures.[employee rank] ON 0
FROM [adventure works]
WHERE
  [Employee].[Employee].&[46]

Run this and you’ll see that A Scott Wright has now supposedly got a rank of 1. Whatever the logic behind this, it doesn’t make sense from an end user perspective does it?

So how can we explain what’s happening here? It’s all to do with autoexists: in some cases it makes sense to apply autoexists to named sets, but in others (mostly when the set is intended for use in a calculation) then it doesn’t. Let’s forget about trying to understand what AS2005 does because it tries to guess when it should apply autoexists and gets very confused, but Katmai is mostly consistent and logical: by default it applies autoexists to all named sets. That explains why queries 2,3 and 4 all return empty sets on Katmai: May and June don’t exist with July.

After my initial item on Connect was opened those nice people in Redmond (who agreed with me that the way things were working wasn’t ideal) added a new connection string property, Autoexists, which can have the following values:

0 – default (same as 1)
1 – Apply deep autoexists for query axes and named sets (with WHERE clause and subselects)
2 – Apply deep autoexists for query axes and no autoexists for named sets (with WHERE clause and subselects)
3 – Apply shallow autoexists for query axes with WHERE clause, deep autoexists for query axes with subselects, no autoexists for named sets with WHERE clause and deep autoexists for named sets with subselects

Here’s the explanation I got from Marius about what’s meant by ‘deep’ and ‘shallow’ autoexists:

Suppose a query axis or named set involves a set expression of the form F(G(s)), with F and G being set functions (e.g. TopCount, Tail etc.)
Let SSW denote the Where clause slice and Subselect restrictions present in the query.
“Shallow autoexists” evaluates the set expression as Exists(F(G(s)), SSW).
“Deep autoexists” evaluates the set expression as Exists( F( Exists( G( Exists(s, SSW) ), SSW ), SSW) – it applies autoexists with the Where and Subselects at every intermediary step. Deep autoexists applies to all set-valued subexpressions being evaluated in the root context of the query (i.e. the context of default member coordinates, or Where clause coordinates, if a Where clause is present).
For many/most set functions (e.g. Union), the rules above produce the same result.
For others (e.g. TopCount, Head/Tail), the results differ in the general case.

So the behaviour I describe above for RC0 is also what you get when you put Autoexists=1 in the connection string. But what about Query 1 – why does the PeriodsToDate function return July with this setting? Hmm, well I think this is a bug and it should return an empty set. I opened another Connect about this:
https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=356193
and I’ll update this post when I get an answer on it.

What happens with Autoexists=2 and Autoexists=3 then? In both cases, Query 1 returns the set of months from January to June; Query 2 returns June, Query 3 and Query 4 both return May and June. To see the difference between these two settings take a look at this query:

WITH
  SET mytestset AS
    (
      [Date].[Calendar].[Month].&[2004]&[5]
     :
      [Date].[Calendar].[Month].&[2004]&[6]
    )
  MEMBER measures.test AS
    SetToStr(mytestset)
SELECT
  measures.test ON 0
FROM (SELECT
  [Date].[Calendar].[Month].&[2004]&[7] ON 0
FROM [adventure works] )

With Autoexists=1 and Autoexists=3 the set here is empty; with Autoexists=2 then it contains May and June.

Written by Chris Webb

July 15, 2008 at 5:08 pm

Posted in MDX

Microsoft BI Conference 2008 session list now posted

with 4 comments

You can see the session list for this year’s BI Conference here:

http://www.msbiconference.com/pages/tracksandsessions.aspx

Looks good. Hmm, wish I could go to this and PASS, but I just can’t justify the expense and the time…

Written by Chris Webb

July 14, 2008 at 5:09 pm

Posted in Events

Caching: can there be too much of a good thing?

with 2 comments

I was doing some load testing on a cube recently and noticed an interesting thing. When my load tests started, the average query response time was high for the first minute or so, which is reasonable given that all queries were being run on a cold cache; subsequently the average query response time fell dramatically, as you would expect when the cache warmed up. But over a long period (and I was running a lot of queries over several hours) I noticed that the average query response time started creeping up again. Not by a massive amount, it’s true, but enough to be noticeable: say a rise from three to four seconds.

Naturally I fired off an email to various people who I thought might help, and as usual Mosha came up with the goods. It turns out he’d noticed the same thing happening on the stress tests that MS run internally but come to the conclusion that it wasn’t going to be a problem in the real world. I won’t try to repeat too much of the technical detail he gave in case I end up garbling it but essentially there were two reasons why this performance degradation was happening:

  • Most stress tests use template queries into which random parameters are passed. So, for example, I had captured several ‘typical’ queries with four or five dimensions in the Where clause which in my test I parameterised;I then selected a random member on each of these dimensions to pass in to each query. This resulted in a lot of queries being run which returned no data at all, which in turn meant that the cache filled up with entries in the index that were very small and not much use, which in turn made cache lookups slower. In the real world the proportion of queries that return no data is much smaller.
  • The AS cache is optimised for ad-hoc query patterns, where a user starts in one place then drills down/up, slices, dices and so on, and where the data that a query returns is very closely connected to the data retrieved by the last query that was run. This is very different from randomised queries, which as a result perform worse.

So all in all, it’s nothing to worry about. I guess if you had a large cube, lots of users, random query patterns and didn’t process your cube very often then it might be a slight problem, but that’s a lot of ifs.

As an aside, if you’re thinking of doing stress testing I wouldn’t waste any time trying to get the AS Stress tool I blogged about here working – I found it a real pain and ended up building my own load test tool in SSIS using an approach similar to my cache warmer package.

Written by Chris Webb

July 9, 2008 at 5:40 pm

Posted in Analysis Services

Follow

Get every new post delivered to your Inbox.

Join 2,866 other followers