Archive for the ‘Multidimensional’ Category
Using Scoped Assignments To Show Calendar and Financial Year-To-Date Values In The Same Calculated Measure
I’ve had a lot of requests for more MDX content on my blog, so here’s something I’ve been meaning to write up for a long time: a worked example of how to use scoped assignments to implement two different types year-to-date calculation on two different hierarchies in the same dimension. Knowledge of how to use scoped assignments is the sign of a true MDX master (you can watch a video of a session I gave on the basics of scoped assignments at SQLBits here if you’re unfamiliar with them) but that’s because they can be very difficult to write and there’s surprisingly little information out there on the internet about them. They are incredibly powerful, though, and often they provide the most elegant and best-performing way to solve a problem.
Let’s start by looking at the Date dimension in the Adventure Works DW sample database, and more specifically the attributes, user hierarchies and attribute relationships:
Notice how we have two user hierarchies for Calendar Years (which start on January 1st) and Fiscal Years (which start on July 1st), called Calendar and Fiscal. Now, let’s say that you have a requirement to to show year-to-date values for a measure for both the Calendar and Fiscal hierarchies. It would be very easy to implement this as two separate calculated measures but what if you needed to show both types of year-to-date in the same calculated measure, showing Fiscal year-to-dates when the Fiscal hierarchy was used in a query and Calendar year-to-dates when the Calendar hierarchy was used in a query?
This is possible using scoped assignments. The first thing to point out, though, is that this is only going to be possible if you change the structure of the dimension. Why? Well, take a look at the Date levels of both hierarchies: they are both built using the Date attribute. If you were running a query with your YTD calculation on columns and only the Date attribute hierarchy on rows, would you expect to see Calendar or Fiscal YTD values? Certainly you couldn’t see both in the same cell, and this is the problem: if you expect to see Calendar YTD values at the bottom of your Calendar hierarchy, and Fiscal YTD values at the bottom of your Fiscal hierarchy, you need two separate Date attributes to do this. If you overwrite the values in the same cells twice using a scoped assignment, then you will only see the result of the second assignment.
Therefore, what we need to do is to create two new attributes, Calendar Date and Fiscal Date, to serve as the lowest levels of the Calendar and Fiscal hierarchies instead of the Date attribute. Here’s what the new attribute relationships look like:
From the end-user’s point of view nothing seems to have changed (these new attributes can have their AttributeHierarchyVisible property set to False) but this now means we have two attributes, two different slices of the cube, whose values we can overwrite separately instead of just one.
Now for the MDX. A good rule to follow when writing scoped assignments is to always use attribute hierarchies and never use user hierarchies; this is because there are rules you have to obey about the shape of the subcube of data you are overwriting with your scoped assignment. When defining a scope using only attribute hierarchies, you can only use the following types of set:
- Every single member on the attribute hierarchy
- Just one member on the attribute hierarchy
- Multiple members on the attribute hierarchy not including the All Member
I wrote up a detailed set of rules for defining scopes in MDX Solutions, if you’re interested; if you don’t follow these rules you’ll get the infamous “An arbitrary shape of the sets is not allowed in the current context” error (I know a joke about that, incidentally).
For this calculation, we need to make two scoped assignments on a single calculated measure called [YTD Sales]: one to show the Fiscal YTD calculation for the Fiscal Date, Fiscal Month Name, Fiscal Quarter, Fiscal Semester and Fiscal Year attributes; and one to show the Calendar YTD calculation for the Calendar Date, Calendar Month Name, Calendar Quarter, Calendar Semester and Calendar Year attributes. When scoping on ranges of attributes like this, there’s another easy rule to follow: scope on the set of every member, including the All Member, from the attribute hierarchy at the lowest granularity attribute, and the set of every member, not including the All Member, from the highest granularity attribute. These two sets also need to be in the same, rather than separate, SCOPE statements for reasons I explain here.
This results in the following MDX:
CREATE MEMBER CURRENTCUBE.MEASURES.[YTD Sales] AS NULL;
SCOPE(MEASURES.[YTD Sales]);
--Calendar YTD
SCOPE([Date].[Calendar Date].MEMBERS, [Date].[Calendar Year].[Calendar Year].MEMBERS);
THIS = AGGREGATE(
PERIODSTODATE([Date].[Calendar].[Calendar Year], [Date].[Calendar].CURRENTMEMBER),
[Measures].[Internet Sales Amount]);
END SCOPE;
--Fiscal YTD
SCOPE([Date].[Fiscal Date].MEMBERS, [Date].[Fiscal Year].[Fiscal Year].MEMBERS);
THIS = AGGREGATE(
PERIODSTODATE([Date].[Fiscal].[Fiscal Year], [Date].[Fiscal].CURRENTMEMBER),
[Measures].[Internet Sales Amount]);
END SCOPE;
END SCOPE;
Now you can see the YTD Sales calculated measure returns Calendar YTD values for the Calendar hierarchy:
…and it returns Fiscal YTD values for the Fiscal hierarchy:
There’s one last trick I want to show. It can be very difficult to know that your scoped assignment is covering the subcube you want it to cover, so while debugging scoped assignments I find it helps to assign values to the BACK_COLOR cell property so I can see exactly where I’m scoping. Here’s the MDX above with extra assignments to set the cell background colour to orange for the Calendar YTD calculation and blue for the Fiscal YTD calculations:
CREATE MEMBER CURRENTCUBE.MEASURES.[YTD Sales] AS NULL;
SCOPE(MEASURES.[YTD Sales]);
--Calendar YTD
SCOPE([Date].[Calendar Date].MEMBERS, [Date].[Calendar Year].[Calendar Year].MEMBERS);
THIS = AGGREGATE(
PERIODSTODATE([Date].[Calendar].[Calendar Year], [Date].[Calendar].CURRENTMEMBER),
[Measures].[Internet Sales Amount]);
BACK_COLOR(THIS) = RGB(255,128,64);
END SCOPE;
--Fiscal YTD
SCOPE([Date].[Fiscal Date].MEMBERS, [Date].[Fiscal Year].[Fiscal Year].MEMBERS);
THIS = AGGREGATE(
PERIODSTODATE([Date].[Fiscal].[Fiscal Year], [Date].[Fiscal].CURRENTMEMBER),
[Measures].[Internet Sales Amount]);
BACK_COLOR(THIS) = RGB(64,128,255);
END SCOPE;
END SCOPE;
This now shows up in an Excel PivotTable as you can see below:
It can also help you understand what’s going on in the scenarios where the scopes overlap, for example if you put the Calendar and Fiscal hierarchies on rows and columns in the same PivotTable: the Fiscal hierarchy takes precedence, because it’s defined second.
SSAS on Windows Azure Virtual Machines
You may have already seen the announcement about Windows Azure Virtual Machines today; what isn’t immediately clear (thanks to Teo Lachev for the link) is that Analysis Services 2012 Multidimensional and Reporting Services are installed on the new SQL Server images. For more details, see:
http://msdn.microsoft.com/en-us/library/jj992719.aspx
SSAS 2012 Tabular is also supported but not initially installed.
The Dangers of Non_Empty_Behavior
One thing I see quite often when I’m reviewing MDX is the incorrect use of the Non_Empty_Behavior property. 99% of the time it’s set without the developer really knowing what it does, and luckily in most cases it makes no difference to performance or what the calculation returns. However… in this post I’m going to argue that you should not use it unless you really know what you’re doing and you can prove that it’s beneficial – because if it’s set incorrectly it can sometimes mean you get incorrect results from your queries.
First of all, what does Non_Empty_Behavior actually do? Basically, it’s a hint to the SSAS Formula Engine that says that a calculation will return a null value if a given tuple also returns a null value. Moreover it’s a hint that may or may not be ignored. For example, if you had an MDX calculated measure that was defined as
CREATE MEMBER CURRENTCUBE.[Measures].[MyMeasure] as [Measures].[Internet Sales Amount]*2;
It would be true to say that this calculated measure would always return a null value when the real measure [Measures].[Internet Sales Amount] returned null, so you would be able to say:
CREATE MEMBER CURRENTCUBE.[Measures].[MyMeasure] as [Measures].[Internet Sales Amount]*2
, NON_EMPTY_BEHAVIOR=[Measures].[Internet Sales Amount];
The late, great Mosha covered its use in detail in a number of blog posts from a few years ago:
http://sqlblog.com/blogs/mosha/archive/2006/11/05/budget-variance-a-study-of-mdx-optimizations-evaluation-modes-and-non-empty-behavior.aspx
http://sqlblog.com/blogs/mosha/archive/2007/03/06/averages-ratios-division-by-zero-and-non-empty-behavior.aspx
These posts all refer to SSAS 2005, and from SSAS 2008 onwards a lot of work went on to ensure that most of the scenarios where Non_Empty_Behavior had been necessary to get good performance were optimised away. Certainly, in the simple example I showed above there would be no need to set Non_Empty_Behavior at all – the Formula Engine will already be able to optimise it as much as it can. This means that if you’re on SSAS 2005 then Non_Empty_Behavior can be very useful for tuning calculations, but if you are using SSAS 2008 or greater it’s highly likely that you do not need to use Non_Empty_Behavior at all. In fact, Mosha’s advice back in 2009 when he did a preconference seminar at the PASS Summit on SSAS performance tuning was summed up in these two slides:
I think the message here is pretty clear…
So, it probably won’t help your query performance, but why is it dangerous? Consider the following query:
WITH
MEMBER MEASURES.SomeTest AS
IIF(Now()>0, "Internet", "Reseller")
MEMBER MEASURES.Demo as
STRTOMEMBER("[Measures].[" + measures.sometest + " Sales Amount]")
SELECT
{[Measures].Demo}
ON 0,
NON EMPTY
[Product].[Category].[Category].members
*
[Date].[Date].[Date].members
ON 1
FROM [Adventure Works]
This returns 1887 rows. However, if I set Non_Empty_Behavior incorrectly, like so:
WITH
MEMBER MEASURES.SomeTest AS
IIF(Now()>0, "Internet", "Reseller")
MEMBER MEASURES.Demo as
STRTOMEMBER("[Measures].[" + measures.sometest + " Sales Amount]")
, NON_EMPTY_BEHAVIOR=[Measures].[Reseller Sales Amount]
SELECT
{[Measures].Demo}
ON 0,
NON EMPTY
[Product].[Category].[Category].members
*
[Date].[Date].[Date].members
ON 1
FROM [Adventure Works]
The query now returns only 64 rows! By saying that the calculation will return a null value when Reseller Sales Amount is null – which is not the case – then rows that do in fact have values will get filtered out by the non empty clause. Clearly this is a bad thing, and also one that may be very hard to see happening when you are testing.
The example above was quite obviously wrong, and in fact it’s quite hard on SSAS 2012 to find a query where Non_Empty_Behavior returns incorrect results. More often I see calculations like this:
WITH
MEMBER MEASURES.PreviousSales AS
([Measures].[Internet Sales Amount], [Date].[Calendar].CURRENTMEMBER.PREVMEMBER)
, NON_EMPTY_BEHAVIOR=[Measures].[Internet Sales Amount]
SELECT {[Measures].[Internet Sales Amount], measures.PreviousSales} ON 0,
[Date].[Calendar].[Calendar Year].MEMBERS ON 1
FROM [Adventure Works]
…where Non_Empty_Behavior is set incorrectly (the calculation returns the Internet Sales Amount for the previous time period, and it’s not true to say that the previous time period’s sales will be null if the current time period’s sales are null) but the query still returns the correct results. This is because, luckily, Non_Empty_Behavior is being ignored.
So, to sum up, if you are using SSAS 2008 or greater Non_Empty_Behavior may not solve your performance problem and may result in incorrect results. That is not to say that there are still one or two isolated scenarios where it can still be useful. Jorg Klein came across one fairly recently and blogged about it here:
http://sqlblog.com/blogs/jorg_klein/archive/2013/02/18/implement-ssas-md-cell-security-using-dimension-security-with-blazing-performance.aspx
But these scenarios are few and far between. Therefore don’t use it if you’re not sure what it does!
Technitrain 2013 Course Schedule: SSAS, PowerPivot, MDX, SSIS, TSQL and SQL Server Engine Training in London
As you probably know, when I’m not blogging or wrestling with SSAS I run a small SQL Server training company here in the UK called Technitrain. If you’ve got some training budget to spare and London is convenient for you, you might want to check out the 2013 course schedule below:
- Introduction to StreamInsight with Allan Mitchell. London, 27-28 February http://www.technitrain.com/coursedetail.php?c=20&trackingcode=CWB
- Advanced TSQL with Dave Ballantyne. London, 19-20 March http://www.technitrain.com/coursedetail.php?c=19&trackingcode=CWB
- Introduction to PowerPivot with Chris Webb and Bob Phillips. London, 5-6 June http://www.technitrain.com/coursedetail.php?c=24&trackingcode=CWB
- Super-scaling SQL Server with Thomas Kejser. London, 13 June http://www.technitrain.com/coursedetail.php?c=21&trackingcode=CWB
- Real-world SQL Server Analysis Services Cube Design and Performance Tuning with Chris Webb. London, 18-19 June http://www.technitrain.com/coursedetail.php?c=23&trackingcode=CWB
- SQL Server Integration Services Design Patterns with Andy Leonard. London, 9-12 September http://www.technitrain.com/coursedetail.php?c=25&trackingcode=CWB
- Introduction to SQL Server Analysis Services 2012 Tabular Models with Chris Webb. London, 26-27 September http://www.technitrain.com/coursedetail.php?c=26&trackingcode=CWB
- Introduction to MDX with Chris Webb. London, 13-15 November http://www.technitrain.com/coursedetail.php?c=22&trackingcode=CWB
As you can see, I’m doing a fair bit of teaching myself but I’m also proud to have Thomas Kejser, Allan Mitchell, Dave Ballantyne, Bob Phillips and Andy Leonard teaching courses for me too. The aim is to deliver reasonably-priced, expert-level Microsoft BI and SQL Server training of the sort the regular training companies don’t provide; it’s basically the kind of training you’d get at a SQLBits or SQL Saturday pre-con but in a more traditional classroom environment and not limited to one day. If you’re reading my blog, you’re my target audience for these courses – so I hope to see you at one of them this year!
Unnecessary All Members and Performance Problems
Maybe an obscure problem, this one, but worth recording nonetheless. The other week I was performance tuning some queries on a customer’s SSAS 2008R2 instance and came across a very strange issue related to the presence of unnecessary All Members in tuples. In this case it was in machine-generated MDX but it’s certainly the case the people new to MDX often include All Members in tuples when they are not actually needed; it’s a not good idea to do this because it can sometimes have unexpected effects as a result of attribute overwrite and because, as I found, it can also cause severe performance problems too.
The problem can be reproduced very easily against Adventure Works on the Customer dimension. Consider the following query that returns a list of customers who bought more than $1000 of goods in 2003:
with
set filteredcustomers as
filter(
[Customer].[Customer Geography].[Customer].members
, ([Measures].[Internet Sales Amount]
, [Date].[Calendar Year].&[2003])>1000)
select
{}
on columns,
filteredcustomers
on rows
from [Adventure Works]
Pretty straightforward, and it returns instantly on my laptop as I’d expect. However, adding the All Member from the City hierarchy into the tuple used in the filter() function makes the query run very slowly indeed (I killed it after several minutes):
with
set filteredcustomers as
filter(
[Customer].[Customer Geography].[Customer].members
, ([Measures].[Internet Sales Amount]
, [Customer].[City].[All Customers]
, [Date].[Calendar Year].&[2003])>1000)
select
{}
on columns,
filteredcustomers
on rows
from [Adventure Works]
The All Member here isn’t necessary at all; it won’t affect how the filter works or the set returned at all. Looking in Profiler it seems as though its presence triggers cell-by-cell mode, which is the cause of the awful performance. Interestingly, the performance got worse the more attributes were on the hierarchy – deleting attributes, even when they weren’t used in the query, improved query performance. I’m told the problem could be the result of attribute decoding (which Mosha referred to here, but which I don’t know much else about) as a result of attribute overwrite
Anyway in my case it wasn’t possible to change the MDX because it was being generated by a client tool – the All Member was there because the City hierarchy was being used as a parameter in the query, although in this case nothing had been selected on it. There was a workaround that I found though: it turns out the problem does not occur for user hierarchies that include the key attribute as their lowest level. So, I renamed the City attribute, hid it, and then created a new user hierarchy called City that had Customer as its lowest level:
With this done, both of the queries above return instantly.
Storage Engine Caching, Measures and Measure Groups
I’ve been doing some performance tuning work on SSAS Multidimensional recently that has forced me to look at some behaviour I’ve observed several times but never properly understood: what happens with Storage Engine caching when you are querying multiple measures in the same measure group. Here are some of my findings (thanks, as always, to Akshai and Marius for answering my questions on this) although this post only deals with a few basic scenarios…
Consider the following, quite basic cube built from Adventure Works. It has one measure group and two measures, Sales Amount and Tax Amount, that both have AggregateFunction Sum:
And a single Date dimension with the following attribute relationships:
If I run a Profiler trace, clear the cache and run the following query twice:
SELECT
{[Measures].[Sales Amount]}
ON 0,
[Date].[Year].[Year].MEMBERS
ON 1
FROM
[Measure Caching]
I can see that the first time the query is run it doesn’t hit cache, and the second time the query (in the second red box below) is run it does hit the Storage Engine cache:
This is as you’d expect. However, now look what happens when I run a query that returns the Tax Amount measure – which was not in the original query – without clearing the cache:
SELECT
{[Measures].[Tax Amount]}
ON 0,
[Date].[Year].[Year].MEMBERS
ON 1
FROM
[Measure Caching]
Even though this is the first time I’ve queried for this measure since the cache was cleared, this query still hits the cache. This is because when you query for one measure, the SSAS Storage Engine will retrieve data for all other measures in the same measure group for the granularity of data requested.
This means that the AggregateFunction property of a measure is significant here. If I add a new measure to the cube with AggregateFunction set to Count instead of Sum:
I see the same thing happening, ie queries that request data for Sales Amount or Tax Amount also warm the SE cache with values for Internet Sales Count. This is because a query for Internet Sales Count can be answered with data of the same granularity as a query for Sales Amount. However, if I add a new measure called Last Sales Amount with AggregateFunction Last Non-Empty:
And then clear the cache, and run the two following queries one after the other:
SELECT
{[Measures].[Sales Amount]}
ON 0,
[Date].[Year].[Year].MEMBERS
ON 1
FROM
[Measure Caching]
SELECT
{[Measures].[Last Sales Amount]}
ON 0,
[Date].[Year].[Year].MEMBERS
ON 1
FROM
[Measure Caching]
I can see that the first query does not warm the cache for the second query – both queries go to disk:
Why is this happening? Why isn’t the cache being used? A clue lies in the Query Subcube Verbose event for both queries. For the first query, using Sales Amount, the following granularity of data is being requested:
Dimension 0 [Date] (0 0 0 *) [Date]:0 [Month]:0 [Quarter]:0 [Year]:*
Whereas the second query, using Last Sales Amount, requests this granularity:
Dimension 0 [Date] (* 0 0 *) [Date]:* [Month]:0 [Quarter]:0 [Year]:*
Both queries have Years on rows, but because Last Sales Amount is semi-additive the values returned are actually from the Year and Date granularity. So, when the semi-additive measure is requested in the second query the data needed for it is not in the Storage Engine cache: the first query requested data at the Year granularity only.
From what I understand, the logic governing this behaviour is very complex and the exact query plan that gets generated will depend on the overall design of your cube, the AggregateFunction used for the measures in each measure group (measures with measure expressions are going to work in a similar way to semi-additive measures) and the queries you’re running. However it is useful to be aware of this kind of behaviour when designing and tuning SSAS cubes. For example, it could be that if you have a large number of measures (tens or even a hundred) in the same measure group it could be worth splitting them out into separate measure groups to improve performance, especially if some measures are never queried together – you would need to test this thoroughly first though. This behaviour would also be relevant in cases where you’re designing aggregations manually.
Power View on SSAS Multidimensional
Earlier this week I mentioned that the announcement about Power View working on SSAS Multidimensional had leaked out. There was a full session on it yesterday giving all the details and I thought I’d summarise them here for your enjoyment. Here are the main points:
- This has not shipped yet – it is not in SSAS 2012 SP1. No release date had been announced but it sounds like it is coming very soon.
- It will come as a new build of SSAS 2012, so to use Power View you will need to upgrade to that build and no earlier version of SSAS will work.
- It will also require an update to Power View. This means:
- Power View in Sharepoint (ie SSRS) will need to be updated too
- Power View in Excel will still not work initially, even when the new build of SSAS has been released. We’ll have to wait for another update for Office (perhaps a service pack?) before Power View in Excel works on SSAS Multidimensional too.
- In technical terms, what has happened is that SSAS Multidimensional now
- Exposes Tabular metadata
- Can be queried in DAX
- There is no translation going on from DAX to MDX, SSAS Multidimensional supports DAX natively.
- The consensus in the session room was that the SSAS team had done a really good job thinking through all the details of how this will work. In general your existing cube will not need to be redesigned, it will just work and all your queries and calculations will return the results you expect. In the session a lot of time was spent showing how things like default members will work.
- But some things are not supported: if you have named sets or actions they will not be exposed; in some scenarios calculated members on non-measures dimensions will not be exposed either, but time utility dimensions should work; and cell security is not supported either – if a user is a member of a role that has cell security applied, they will not be able to run DAX queries.
- There’s a minor new feature in SSAS that allows you to mark an attribute as containing a URL for an image, so that Power View can display the images automatically.
UPDATE: the public CTP is now available http://blogs.msdn.com/b/analysisservices/archive/2012/11/29/power-view-for-multidimensional-models-preview.aspx
Storage and the NameColumn and KeyColumns Properties
Those of you who have worked with SSAS Multidimensional for a reasonable amount time will, no doubt, be very familiar with the NameColumn and KeyColumns properties of an attribute (if you’re not, see here and here) and how they should be used. You will probably also know that when the KeyColumns property has been set to only one column, then the NameColumn property can be left unset and the key will be used as the name of the attribute’s members.
However, while onsite with a customer recently I noticed something strange. Here’s an illustration: if you create a simple dimension based on the DimCustomer table in Adventure Works and create a single attribute based on the CustomerKey column, this is what you see in the Properties for that attribute:
However, if you deploy the database then import it into a new project in SSDT/BIDS, then you see that the NameColumn property has been set:
My first thought was that this was a bit dangerous, because it might mean that the imported version of the database would start storing extra strings for the names. But this was incorrect because a look at the data directories for the two versions of the dimensions showed they contained the same files and were using the same amount of storage:
I’m very grateful to Akshai Mirchandani of the dev team for confirming that in this situation, it is irrelevant whether you set the NameColumn or not – data duplication will always happen, and the key values will be stored again as strings. The only time it doesn’t happen is when the key and the name of the attribute are both the same column and that column is a string.
This means that if you have a very large attribute that is in danger of exceeding the infamous 4GB limit (although this is of course fixed in SSAS 2012) and which never needs to be visible, you can use the trick that Greg Galloway describes here to reduce the size of the string store. This involves creating a dummy column in your DSV (or underlying view or table) that contains only an empty string and then setting this as the NameColumn of your attribute. For the example above, this is the result:
For this example, the overall amount of storage used for the dimension has gone down from 1.24MB to 1.04MB, and although you can see the .ahstore file (the hash store) for the Customer Key attribute have grown, the size of the string store, Customer Key.asstore (note: don’t get confused between .asstore and .astore files), has reduced from 362KB to 1KB.
Storage Engine Cache Aggregation and its Implications for Dimension Design
SSAS Multidimensional does caching in a number of places, but the most important type of caching for most cubes happens in the Storage Engine (SE) – the part of SSAS that reads data from disk and aggregates it up to a given granularity. After the SE has answered a single request for data it stores the resulting subcube in cache so that if the same request is made in the future then it can be answered from there very quickly. In addition, in some cases the SE is also able to answer requests at a higher granularity from data that is currently in its cache, and increasing the likelihood of this happening can have a significant positive impact on the performance of large cubes.
Let’s see an example of how SE caching works using a very simple cube built on the Adventure Works database, with one Sum measure and just one dimension, a Date dimension with the following attribute relationships:
Running the following query on a cold cache:
select {[Measures].[Sales Amount]} on 0,
[Date].[Year].[Year].members
on 1
from [SECacheDemo]
…and then looking in Profiler shows that the SE has to go to disk to get the data it needs (as shown by the Progress Report Begin/End events):
Running the same query immediately afterwards shows the SE can get the data it needs from cache:
Running the following query, where I’m getting the All Member from the Year hierarchy, shows that the SE is also able to answer this request from cache:
select {[Measures].[Sales Amount]} on 0,
{[Date].[Year].[All]}
on 1
from [SECacheDemo]
This is as you’d expect because, of course, the All Member on the Years hierarchy represents the aggregated total of all the years returned in the first query.
There are several limits on the ability of the SE to derive aggregated totals from data it already has in cache. For a start, the SE cannot aggregate multiple cache entries to derive a single figure. So, for example, if I run the following three queries:
select {[Measures].[Sales Amount]} on 0,
{[Date].[Year].&[2001],[Date].[Year].&[2002]}
on 1
from [SECacheDemo]
select {[Measures].[Sales Amount]} on 0,
{[Date].[Year].&[2003],[Date].[Year].&[2004]}
on 1
from [SECacheDemo]
select {[Measures].[Sales Amount]} on 0,
{[Date].[Year].&[2006]}
on 1
from [SECacheDemo]
…which together return all the years, when I run the query to get the All Member that will not be answered from the SE cache. Each of the three queries above create separate entries in the SE cache; this is one of the reasons why, when cache warming, it’s better to use a few very large queries rather than lots of small, filtered queries.
Furthermore (and this is something that surprised me a little when I found out about it recently), despite the presence of attribute relationships, the SE cannot always work out how to derive higher-level values from lower-level cached data. Running the following query on a cold cache:
select {[Measures].[Sales Amount]} on 0,
[Date].[Date].[Date].members
on 1
from [SECacheDemo]
…populates the SE cache with data at the Date granularity, but the following query to get the values for all years:
select {[Measures].[Sales Amount]} on 0,
[Date].[Year].[Year].members
on 1
from [SECacheDemo]
…is not answered from cache, even though the year values could be derived from the date values already in cache.
Why is this happening? Well, the SE does not (at least at the time of writing) decode the attribute relationships when working out which granularities can be aggregated from cache. If you look at the granularities as represented in the Query Subcube Verbose events shown in the previous two screenshots, the granularity for the request at Date is
* 0 0 0
…and the granularity for the request at Year is
0 0 0 *
The four characters in this representation of the granularity stands for the four attributes on the dimension: Date, Month, Quarter and Year. The 0 character shows that a request is not at the granularity of that attribute, any other value shows that it is, and the asterisk character shows the request returns all the values at the specified granularity (this white paper gives more detail on how to interpret these values). So, without knowing anything about attribute relationships, the SE can say that the granularity
0 0 0 0
can be aggregated from
* 0 0 0
but it cannot say that
0 0 0 *
can be aggregated from
* 0 0 0
Luckily these limitations on what can be aggregated do not apply to aggregations: if I was to build an aggregation at the Date granularity, my query at the Year granularity would be able to make use of that aggregation.
Also, the use of natural user hierarchies can work around this limitation. Consider the following user hierarchy built on the dimension:
Querying at the Date level of this user hierarchy, like so:
select {[Measures].[Sales Amount]} on 0,
[Date].[Calendar].[Date].members
on 1
from [SECacheDemo]
…results in a request at the granularity
* * * *
which can then be aggregated up to many more granularities – querying at a level in a natural user hierarchy automatically includes the granularities of all the attributes used for the levels above in the user hierarchy.
Therefore, both the following queries:
select {[Measures].[Sales Amount]} on 0,
[Date].[Calendar].[Year].members
on 1
from [SECacheDemo]
select {[Measures].[Sales Amount]} on 0,
[Date].[Year].[Year].members
on 1
from [SECacheDemo]
…can be answered from the cache created by the query at the [Date].[Calendar].[Date] level.
The conclusion must be, then, that using natural user hierarchies will allow for much better SE cache reuse than using the attribute hierarchies on their own. Also, if you have a natural user hierarchy, it might be a good idea to hide the underlying attribute hierarchies so users and other developers do not reference them in their queries and calculations. You may not notice the performance difference that better SE cache reuse gives you on most cubes, but on very large cubes or cubes that are very SE-intensive (for example, because they are reprocessed frequently) this could make a noticeable difference to your overall query performance.
Thanks to Akshai Mirchandani and Hrvoje Piasevoli for their help in understanding this.
Tuning Queries with the WITH CACHE Statement
One of the side-effects of the irritating limitations that SSRS places on the MDX you can use in your reports is the widespread use of calculated measures to get the columns you want. For example, a query like this (note, this query isn’t on the Adventure Works cube but on a simpler cube built on the Adventure Works DW database):
SELECT
{[Measures].[Sales Amount]}
*
[Date].[Day Number Of Week].[Day Number Of Week].MEMBERS
ON 0,
[Product].[Product].[Product].MEMBERS ON 1
FROM [Adventure Works DW]
…which wouldn’t be allowed in SSRS, could be rewritten like so:
WITH
MEMBER MEASURES.D1 AS
([Measures].[Sales Amount], [Date].[Day Number Of Week].&[1])
MEMBER MEASURES.D2 AS
([Measures].[Sales Amount], [Date].[Day Number Of Week].&[2])
MEMBER MEASURES.D3 AS
([Measures].[Sales Amount], [Date].[Day Number Of Week].&[3])
MEMBER MEASURES.D4 AS
([Measures].[Sales Amount], [Date].[Day Number Of Week].&[4])
MEMBER MEASURES.D5 AS
([Measures].[Sales Amount], [Date].[Day Number Of Week].&[5])
MEMBER MEASURES.D6 AS
([Measures].[Sales Amount], [Date].[Day Number Of Week].&[6])
MEMBER MEASURES.D7 AS
([Measures].[Sales Amount], [Date].[Day Number Of Week].&[7])
SELECT
{MEASURES.D1,MEASURES.D2,MEASURES.D3,MEASURES.D4,MEASURES.D5,MEASURES.D6,MEASURES.D7}
ON 0,
[Product].[Product].[Product].MEMBERS ON 1
FROM [Adventure Works DW]
…to get it in an SSRS-friendly format with only measures on columns.
For the last few days I’ve had the pleasure of working with Bob Duffy (a man so frighteningly intelligent he’s not only an SSAS Maestro but a SQL Server MCM as well) on tuning a SSRS report like this on a fairly large cube. As Bob found, the problem with this style of query is that it isn’t all that efficient: if you look in Profiler at what happens on a cold cache, you can see there are seven separate Query Subcube events and seven separate partition scans (indicated by the Progress Report Begin/End events) for each calculated measure on columns.
The first thing that Bob tried to tune this was to rewrite the query something like this:
SELECT
{[Measures].[Sales Amount]}
ON 0,
NON EMPTY
[Product].[Product].[Product].MEMBERS
*
[Date].[Day Number Of Week].[Day Number Of Week].MEMBERS
ON 1
FROM [Adventure Works DW]
…and pivot the data in the SSRS tablix to get the desired layout with the Day Numbers on columns. The interesting thing, though, is that for this particular report while rewriting the query in this way made it run faster (there is only one Query Subcube event and partition scan now) it actually made the SSRS report run slower overall, simply because SSRS was taking a long time to pivot the values.
Instead, together we came up with a way to tune the original query using the WITH CACHE statement like so:
WITH
CACHE AS
‘([Measures].[Sales Amount]
, [Product].[Product].[Product].MEMBERS
, [Date].[Day Number Of Week].[Day Number Of Week].MEMBERS)’
MEMBER MEASURES.D1 AS
([Measures].[Sales Amount], [Date].[Day Number Of Week].&[1])
MEMBER MEASURES.D2 AS
([Measures].[Sales Amount], [Date].[Day Number Of Week].&[2])
MEMBER MEASURES.D3 AS
([Measures].[Sales Amount], [Date].[Day Number Of Week].&[3])
MEMBER MEASURES.D4 AS
([Measures].[Sales Amount], [Date].[Day Number Of Week].&[4])
MEMBER MEASURES.D5 AS
([Measures].[Sales Amount], [Date].[Day Number Of Week].&[5])
MEMBER MEASURES.D6 AS
([Measures].[Sales Amount], [Date].[Day Number Of Week].&[6])
MEMBER MEASURES.D7 AS
([Measures].[Sales Amount], [Date].[Day Number Of Week].&[7])
SELECT
{MEASURES.D1,MEASURES.D2,MEASURES.D3,MEASURES.D4,MEASURES.D5,MEASURES.D6,MEASURES.D7}
ON 0,
[Product].[Product].[Product].MEMBERS ON 1
FROM [Adventure Works DW]
What WITH CACHE statement does here is load all the data needed for the query into the Storage Engine cache before anything else happens. So even though there are still seven different Query Subcube events for each column, there’s now only one partition scan and each of the seven Query Subcube events now hits cache:
There’s no guarantee that this approach will result in the best performance even when you have a query in this form, but it’s worth testing if you have. It’s certainly the first time in a long while that I’ve used the WITH CACHE statement in the real world – so it’s interesting from an MDX point of view too.
