Chris Webb's BI Blog

Analysis Services, MDX, PowerPivot, DAX and anything BI-related

Archive for the ‘Analysis Services’ Category

Optimising SSAS Many-To-Many Relationships By Adding Redundant Dimensions

with one comment

The most elegant way of modelling your SSAS cube doesn’t always give you the best query performance. Here’s a trick I used recently to improve the performance of a many-to-many relationship going through a large fact dimension and large intermediate measure group…

Consider the following cube, built from the Adventure Works DW database and showing a many-to-many relationship:

image

The Fact Internet Sales measure group contains sales data; the Product, Date and Customer dimensions are what you would expect; Sales Order is a fact dimension with one member for each sales transaction and therefore one member for each row in the fact table that Fact Internet Sales is built from. Each Sales Order can be associated with zero to many Sales Reasons, and the Sales Reason dimension has a many-to-many relationship with the Fact Internet Sales measure group through the Fact Internet Sales Reason measure group. Only the Sales Order dimension connects directly to both the Fact Internet Sales Reason and Fact Internet Sales measure groups.

There’s nothing obviously wrong with the way this is modelled – it works and returns the correct figures – and the following query shows how the presence of the many-to-many relationship means you can see the Sales Amount measure (from the Fact Internet Sales measure group) broken down by Sales Reason:

select
{[Measures].[Sales Amount]} on 0,
non empty
[Sales Reason].[Sales Reason].[Sales Reason].members
on 1
from m2m1
where([Date].[Calendar Year].&[2003], 
[Product].[Product Category].&[3],
[Customer].[Country].&[United Kingdom])

 

image

However, to understand how we can improve the performance of a many-to-many relationship you have to understand how SSAS resolves the query internally. At a very basic level, in this query, SSAS starts with all of the Sales Reasons and then, for each one, finds the list of Sales Orders associated with it by querying the Fact Sales Reason measure group. Once it has the list of Sales Orders for each Sales Reason, it queries the Fact Internet Sales measure group (which is also filtered by the Year 2003, the Product Category Clothing and the Customer Country UK) and sums up the value of Sales Amount for those Sales Orders, getting a single value for each Sales Reason. A Profiler trace shows this very clearly:

image

The Resource Usage event gives the following statistics for this query:

READS, 7

READ_KB, 411

WRITES, 0

WRITE_KB, 0

CPU_TIME_MS, 15

ROWS_SCANNED, 87299

ROWS_RETURNED, 129466

Given that the Sales Order dimension is a large one (in this case around 60000 members – and large fact dimensions are quite common with many-to-many relationships) it’s likely that one Sales Reason will be associated with thousands of Sales Orders, and therefore SSAS will have to do a lot of work to resolve the relationship.

In this case, the optimisation comes with the realisation that in this case we can add the other dimensions present in the cube to the Fact Sales Reason measure group to try to reduce the number of Sales Orders that each Sales Reason is resolved to. Since Sales Order is a fact dimension, with one member for each sales transaction, then since each sales transaction also has a Date, a Product and a Customer associated with it we can add the keys for these dimensions to the fact table on which Fact Sales Reasons is built and join these dimensions to it directly:

image

This is not an assumption you can make for all many-to-many relationships, for sure, but it’s certainly true for a significant proportion.

The Product, Date and Customer dimensions don’t need to be present for the many-to-many relationship to work, but adding a Regular relationship between them and Fact Internet Sales Reason helps SSAS speed up the resolution of the many-to-many relationship when they are used in a query. This is because in the original design, in the test query the selection of a single member on Sales Reason becomes a selection on all of the Sales Orders that have ever been associated with that Sales Reason; with the new design, the selection of a single member on Sales Reason becomes a selection on a combination of Dates, Customers, Products and Sales Orders – and since the query itself is also applying a slice on Date, Customer and Product, this is a much smaller selection than before. For the query shown above, with the new design, the Resource Usage event now shows:

READS, 11

READ_KB, 394

WRITES, 0

WRITE_KB, 0

CPU_TIME_MS, 0

ROWS_SCANNED, 47872

ROWS_RETURNED, 1418

The much lower numbers for ROWS_SCANNED and ROWS_RETURNED shows that the Storage Engine is doing a lot less work. For the amount of data in Adventure Works the difference in query performance is negligible, but in the real world I’ve seen this optimisation make a massive difference to performance, resulting in queries running up to 15 times faster.

Don’t forget that there are many other ways of optimising many-to-many relationships such as the those described in this white paper. Also, if you have a large fact dimension, if it does not need to be visible to the end user and is only needed to make the many-to-many relationship work, you can reduce the overhead of processing it by breaking it up into multiple smaller dimensions as described here.

Written by Chris Webb

February 21, 2015 at 6:38 pm

If I Could Have New Features In SSAS Multidimensional, What Would They Be?

with 35 comments

Indulge me for a moment, please. Let’s imagine that somewhere in Microsoft, someone is planning for SQL Server v.next and is considering investing in new features for SSAS Multidimensional (don’t laugh – I wouldn’t be writing this post if I didn’t think it was a possibility). What features should they be?

Before I answer that question, it’s worth pointing out that despite what you might think there has been some investment in SSAS Multidimensional over the last few years. This post lists what was new in SSAS 2012 Multidimensional; since then support for DAX queries has been added and, umm, the new Divide() function. This must have been a lot of work for someone – but why does it get overlooked? One reason: none of these changes have made much difference to the ordinary SSAS Multidimensional developer’s life. DAX query support is great if you’re one of the few people that uses the SharePoint version of Power View; shockingly, it still doesn’t work in Excel 2013 Power View yet (though I guess it will be the way the new Power BI connects to on-prem Multidimensional). NUMA support is great if you work for an investment bank and have vast amounts of data and a high-spec server, but that’s only about 0.1% of the installed base.

So from this we can learn that the main consideration when choosing new features to implement should be that they should be relevant to the majority of SSAS Multidimensional developers, otherwise they’ll be ignored and MS may as well have not bothered doing anything. To that we can add these other considerations:

  • These features should provide compelling reasons to upgrade from earlier versions of SSAS to the new version
  • While some features should be available in all editions, there should also be some features that encourage customers to upgrade from Standard Edition to Enterprise Edition
  • There are a limited resources (time and developers) available and Power Pivot/SSAS Tabular will be the priority, so only a few features can be delivered.
  • Features that are only there to support Power BI don’t count

With all of that borne in mind, here’s what I would choose to implement based on what I see as a consultant and from the popularity of particular topics on my blog.

Last-Ever Non Empty

One of the most popular posts I’ve ever written – by a gigantic margin – is this one on the last-ever non-empty problem. Given that so many people seem to come up against this, and that the MDX solution is complex and still doesn’t perform brilliantly, I think it should be built into the engine as a new semi-additive aggregation type. Since semi-additive measures are Enterprise Edition only, this would be my sole Enterprise Edition feature.

MDX Calculation Parallelism

Ever since I’ve been working with SSAS, people have always asked why the Formula Engine has been single-threaded. I understand why the SSAS dev team have ignored this question and instead concentrated on tuning specific scenarios: doing parallelism properly would be extremely difficult given the way MDX calculations can be layered over each other, and in plenty of cases it could lead to worse performance, not better. However I’m not asking for a ‘proper’ implementation of parallelism. I just want something dumb: a boolean property that you can set on a calculation that tells the Formula Engine to do this calculation on a separate thread. If it makes performance better then great; if not, then don’t set it. My guess is that even a crude implementation like this could make a gigantic difference to performance on many calculation-heavy cubes.

Drillthrough

Drillthrough is one of those features that almost everyone wants to use, but for some reason has been left in a semi-broken state ever since 2005. Here’s what needs to change:

  • It should work with calculated members. I don’t expect SSAS to understand magically how to work out which rows to display for any given MDX calculation, but I would like a way of specifying in MDX what those rows should be.
  • Those stupid, ugly column names – SSDT should let us specify readable column names and let us have complete control over the order they appear in.
  • Excel should allow drillthrough on multiselect filters.

‘Between’ Relationships

This might seem a bit of a strange choice, and I suspect it may not be easy to implement, but another problem that I come across a lot in my consultancy is the ‘events-in-progress’ problem. I’ve blogged about solving it in MDX and DAX, as have many others. I would love to see a new ‘between’ dimension/measure group relationship type to solve this. In fact, competing OLAP vendor iccube already implemented this and you can see how it works on that platform here and here. My feeling is that this would open up a massive number of modelling opportunities, almost as many as many-to-many relationships.

 

And that’s it, four features that I think could make SSAS Multidimensional v.next a must-have upgrade. I’m not so naive to believe that any or all of these will be implemented, or even that we’ll get any new features at all, but who knows? If you have any other suggestions, please leave a comment.

Written by Chris Webb

February 2, 2015 at 11:02 pm

Deprecated/Discontinued Functionality In SSAS 2014

with one comment

Last week while reading Bill Anton’s blog (which is, by the way, highly recommended) I came across a link to a page in Books Online that I hadn’t seen before: a list of deprecated and discontinued functionality in SSAS 2014. Here it is:

https://msdn.microsoft.com/en-us/library/ms143479.aspx

The most interesting point is that the Non_Empty_Behavior property on calculations will not be supported in SSAS v.next. I still see this property being used a lot, and as I show here if you use it incorrectly it can give you bad results. Although I have seen a few cases where it has been necessary to set Non_Empty_Behavior (for example here) they have been very, very rare and I think deprecating it is the right decision. Other than that, remote partitions, linked dimensions and dimension writeback will also be no longer supported in a ‘future’ version, but I don’t think anyone will be too worried about those features.

Written by Chris Webb

January 27, 2015 at 9:30 am

Posted in Analysis Services

A Closer Look At Power Query/SSAS Integration

with 3 comments

In the November release of Power Query the most exciting new feature was the ability to connect to SSAS. I blogged about it at the time, but having used it for a month or so now I thought it was worth writing a more technical post showing how it works in more detail (since some things are not immediately obvious) as well as to see what the MDX it generates looks like.

This post was written using Power Query version 2.18.3874.242, released January 2015; some of the bugs and issues mentioned here will probably be fixed in later versions.

Connecting to SSAS

Power Query officially supports connecting to all versions of SSAS from 2008 onwards, although I’ve heard from a lot of people they have had problems getting the connection working. Certainly when I installed the version of Power Query with SSAS support in on my laptop, which has a full install of SQL Server 2014, it insisted I install the 2012 version of ADOMD.Net before it would work (I also needed to reboot). My guess is that if you’re having problems connecting you should try doing that too; ADOMD.Net 2012 is available to download in the SQL Server 2012 Feature Pack.

After clicking From Database/From SQL Server Analysis Services the following dialog appears, asking you to enter the name of the server you want to connect to.

image

If this is the first time you’re connecting to SSAS the following dialog will appear, asking you to confirm that you want to use Windows credentials to connect.

image

Unfortunately, if you’re connecting via http and need to enter a username and password you won’t be able to proceed any further. I expect this problem will be fixed soon.

Initial Selection

Once you’ve connected the Navigator pane appears on the right-hand side of the screen. Here you see all of the databases on the server you’ve connected to; expand a database and you see the cubes, and within each cube you see all of the measure groups, measures, dimensions and hierarchies.

image

The previous build of Power Query does not display any calculated measures that aren’t associated with measure groups (using the associated_measure_group property); this has been fixed in version 2.18.3874.242.

When you start to select measures and hierarchies the name of the cubes you have chosen items from will appear in the Selected items box. If you hover over the name of the cube the peek pane will appear and you’ll see a preview of the results of the query.

image

At this point you can either click the Load button to load the data either to the worksheet or the Excel Data Model, or click the Edit button to edit the query further.

You cannot specify your own MDX query to use for the query as yet.

The Query Editor

Once the Power Query Query Editor opens you’ll see the output of the query as it stands, and also on the Cube tab in the ribbon two new buttons: Add Items and Collapse Columns.

image

Here’s the MDX (captured from Profiler) showing the MDX generated for the query in the screenshot above:

select
{[Measures].[Internet Sales Amount],[Measures].[Internet Order Quantity]}
on 0,
subset(
nonempty(
[Date].[Calendar Year].[Calendar Year].allmembers
,{[Measures].[Internet Sales Amount],[Measures].[Internet Order Quantity]})
,0,50)
properties member_caption,member_unique_name 
on 1 
from [Adventure Works]

 

The MDX Subset() function is used here to ensure that the query doesn’t return more than 50 rows.

Adding Items

Clicking on the Add Items button allows you to add extra hierarchies and measures to the query. When you click the button the following dialog appears where you can choose what you want to add:

image

In this case I’ve added the Day Name hierarchy to the query, and this hierarchy appears as a new column on the right-hand edge of the query after the measures:

image

You can easily drag the column to wherever you want it though.

Here’s the MDX again:

select
{[Measures].[Internet Sales Amount],[Measures].[Internet Order Quantity]}
on 0,
subset(
nonempty(
crossjoin(
[Date].[Calendar Year].[Calendar Year].allmembers,
[Date].[Day Name].[Day Name].allmembers)
,{[Measures].[Internet Sales Amount],[Measures].[Internet Order Quantity]})
,0,50)
properties member_caption,member_unique_name 
on 1 
from [Adventure Works]

 

Collapsing Columns

Selecting the Day Name column and then clicking the Collapse Columns button simply rolls back to the previous state of the query. However, there’s more to this button than meets the eye. If you filter the Day Name column (for example, by selecting Saturday and Sunday as in the screenshot below) and then click Collapse and Remove, the filter will still be applied to the query even though the Day Name column is no longer visible.

image

Here’s what the Query Editor shows after the filter and after the Day Name column has been collapsed:

image

Compare the measure values with those shown in the original query – it’s now showing values only for Saturdays and Sundays, although that’s not really clear from the UI. Here’s the MDX generated to prove it – note the use of the subselect to do the filtering:

select
{[Measures].[Internet Sales Amount],[Measures].[Internet Order Quantity]}
on 0,
subset(
nonempty(
[Date].[Calendar Year].[Calendar Year].allmembers
,{[Measures].[Internet Sales Amount],[Measures].[Internet Order Quantity]})
,0,1000)
properties member_caption,member_unique_name 
on 1 
from(
select
({[Date].[Day Name].&[7],[Date].[Day Name].&[1]})
on 0 
from 
[Adventure Works])

 

From studying the MDX generated I can tell that certain other operations such as sorting and filtering the top n rows are folded back to SSAS.

It’s also important to realise that using the Remove option to remove a column from the query does not have the same effect as collapsing the column:

image

Using Remove just hides the column; the number of rows returned by the query remains the same.

image

User Hierarchies

In the examples above I’ve only used attribute hierarchies. User hierarchies aren’t much different – you can select either an individual level or the entire hierarchy (which is the same as selecting all of the levels of the hierarchy).

image

image

Parent-Child Hierarchies

Parent-child hierarchies work very much like user hierarchies, except that you will see some null values in columns to accommodate leaf members at different levels:

image

M Functions

There are a lot of M functions relating to cube functionality, although the documentation in the Library Specification document is fairly basic and all mention of them disappeared from the online help a month or so ago for some reason. Here’s the code for the query in the Collapsing Columns section above:

let
    Source = AnalysisServices.Databases("localhost"),
    #"Adventure Works DW 2008" = Source{[Name="Adventure Works DW 2008"]}[Data],
    #"Adventure Works1" = #"Adventure Works DW 2008"{[Id="Adventure Works"]}[Data],
    #"Adventure Works2" = #"Adventure Works1"{[Id="Adventure Works"]}[Data],
    #"Added Items" = Cube.Transform(#"Adventure Works2", {
             {Cube.AddAndExpandDimensionColumn, 
             "[Date]", {"[Date].[Calendar Year].[Calendar Year]"}, {"Date.Calendar Year"}}, 
             {Cube.AddMeasureColumn, "Internet Sales Amount", 
             "[Measures].[Internet Sales Amount]"}, 
             {Cube.AddMeasureColumn, "Internet Order Quantity", 
             "[Measures].[Internet Order Quantity]"}}),
    #"Added Items1" = Cube.Transform(#"Added Items", {
              {Cube.AddAndExpandDimensionColumn, "[Date]", 
             {"[Date].[Day Name].[Day Name]"}, {"Date.Day Name"}}}),
    #"Filtered Rows" = Table.SelectRows(#"Added Items1", each (
             Cube.AttributeMemberId([Date.Day Name]) = "[Date].[Day Name].&[7]" 
             meta [DisplayName = "Saturday"] 
             or 
             Cube.AttributeMemberId([Date.Day Name]) = "[Date].[Day Name].&[1]" 
             meta [DisplayName = "Sunday"])),
    #"Collapsed and Removed Columns" = Cube.CollapseAndRemoveColumns(
             #"Filtered Rows", 
             {"Date.Day Name"})
in
    #"Collapsed and Removed Columns"

It’s comprehensible but not exactly simple – yet another example of how difficult it is to shoe-horn multidimensional concepts into a tool that expects to work with relational data (see also SSRS). I doubt I’ll be writing any M code that uses these functions manually.

Written by Chris Webb

January 13, 2015 at 11:21 pm

News On SSAS Data Source Support In Power BI

with 14 comments

Yesterday we heard (again) that SSAS will be supported as a data source for cloud-based reports in Power BI. Today, in a session, two new important details on this emerged:

  • It will work with both Tabular and Multidimensional
  • It will connect as the user running the query, so SSAS security (eg dimension security) will work just the same as it does on-premises. No special setup will be needed; there were no details apart from the fact it will work using the EffectiveUserName connection string property.

I’m sure a lot of people will be interested to hear this…

Written by Chris Webb

November 6, 2014 at 10:41 pm

MDX Solve Order, SCOPE_ISOLATION and the Aggregate() function

with 8 comments

Solve order in MDX is a mess. Back in the good old days of Analysis Services 2000 it was a difficult concept but at least comprehensible; unfortunately when Analysis Services 2005 was released a well-intentioned attempt at making it easier to work with in fact ended up making things much, much worse. In this post I’m going to summarise everything I know about solve order in MDX to try to make this complicated topic a little bit easier to understand.

If you’re an experienced MDXer, at this point you’ll probably lose interest because you think you know everything there is to know about solve order already. Up until two weeks ago that’s what I though too, so even if you know everything I say in the first half of this post keep reading – there’s some new stuff at the end I’ve only just found out about.

Let’s start with a super-simple cube built from a single table, with two measures (Sales Amount and Cost Amount) and a Product dimension containing a single attribute hierarchy with two members (Apples and Oranges). Everything is built from the following table:

image

Solve Order and calculated members in the WITH clause

To understand what solve order is and how it can be manipulated, let’s start off looking at an example that uses only calculated members in the WITH clause of a query. Consider the following:

WITH

 

MEMBER [Measures].[Cost %] AS

DIVIDE([Measures].[Cost Amount],[Measures].[Sales Amount]),

FORMAT_STRING='0.0%'

 

MEMBER [Product].[Product].[Total Fruit] AS

SUM({[Product].[Product].&[Apples],

[Product].[Product].&[Oranges]})

 

SELECT

{[Measures].[Sales Amount],

[Measures].[Cost Amount],

MEASURES.[Cost %]}

ON COLUMNS,

{[Product].[Product].&[Apples],

[Product].[Product].&[Oranges],

[Product].[Product].[Total Fruit]}

ON ROWS

FROM SALES

There are two calculated members here:

  • Cost % divides Cost Amount by Sales Amount to find the percentage that costs make up of the sales amount
  • Total Fruit sums up the values for Apples and Oranges

The output of the query is as follows:

image

Solve order controls the order that MDX calculations are evaluated when two or more of them overlap in the same cell. In this case Cost % and Total Fruit are both evaluated in the bottom right-hand cell; Total Fruit is calculated first, giving the values of 30 for Sales Amount and 21 for Cost Amount, and Cost % is calculated after that. The bottom right-hand cell is the only cell where these two calculations overlap and the only cell where solve order is relevant in this query.

In this case, 70% is the value you would expect to get. You, however, can control solve order for calculations in the WITH clause by setting the SOLVE_ORDER property for each calculated member, like so:

WITH

 

MEMBER [Measures].[Cost %] AS

DIVIDE([Measures].[Cost Amount],[Measures].[Sales Amount]),

FORMAT_STRING='0.0%',

SOLVE_ORDER=1

 

MEMBER [Product].[Product].[Total Fruit] AS

SUM({[Product].[Product].&[Apples],

[Product].[Product].&[Oranges]}),

SOLVE_ORDER=2

 

SELECT

{[Measures].[Sales Amount],

[Measures].[Cost Amount],

MEASURES.[Cost %]}

ON COLUMNS,

{[Product].[Product].&[Apples],

[Product].[Product].&[Oranges],

[Product].[Product].[Total Fruit]}

ON ROWS

FROM SALES

image

Now the value in the bottom right-hand corner is 135% instead of 70%: Cost % is calculated first, then Total Fruit second so 60%+75%=135%. The SOLVE_ORDER property of a calculated member is an integer value, and the lower the SOLVE_ORDER value the earlier the calculation will be evaluated, so with Cost % having a solve order of 1 and Total Fruit having a solve order of 2, this forces Cost % to be calculated first now even though in this case it gives what is clearly an ‘incorrect’ result.

Solve Order and calculated members defined on the cube

Things now get a bit more complicated. There’s a different way of controlling solve order if your calculations are defined on the cube itself: in this case, solve order is determined by the order that the calculations appear on the Calculations tab. So if the calculations tab of the Cube Editor contains the calculations in this order:

CREATE MEMBER CURRENTCUBE.[Measures].[Cost %] AS

DIVIDE([Measures].[Cost Amount],[Measures].[Sales Amount]),

FORMAT_STRING='0.0%';

 

CREATE MEMBER CURRENTCUBE.[Product].[Product].[Total Fruit] AS

SUM({[Product].[Product].&[Apples],

[Product].[Product].&[Oranges]});

image

…and you run the following query:

SELECT

{[Measures].[Sales Amount],

[Measures].[Cost Amount],

MEASURES.[Cost %]}

ON COLUMNS,

{[Product].[Product].&[Apples],

[Product].[Product].&[Oranges],

[Product].[Product].[Total Fruit]}

ON ROWS

FROM SALES

You get the incorrect result again:

image

…but if you change the order of the calculations so that Total Fruit comes first:…and rerun the same query, you get the correct results:

image

image

The SOLVE_ORDER property can also be used with calculations defined on the cube to override the effect of the order of calculations. So defining the following calculations on the cube:

CREATE MEMBER CURRENTCUBE.MEASURES.[Cost %] AS

DIVIDE([Measures].[Cost Amount], [Measures].[Sales Amount]),

FORMAT_STRING='PERCENT', SOLVE_ORDER=2;

 

CREATE MEMBER CURRENTCUBE.[Product].[Product].[Total Fruit] AS

SUM({[Product].[Product].&[Apples],

[Product].[Product].&[Oranges]}), SOLVE_ORDER=1;

image

…means that, even though Total Fruit comes after Cost % on the Calculations tab, because it has a lower solve order set using the SOLVE_ORDER property it is evaluated before Cost % and the query still returns the correct value:

image

Solve order and calculations defined in the WITH clause and on the cube

What happens if some calculations are defined on the cube, and some are defined in the WITH clause of a query? By default, calculations defined on the cube always have a lower solve order than calculations defined in the WITH clause of a query; the SOLVE_ORDER property has no effect here. So if Total Fruit is defined in the WITH clause and Cost % on the cube, you get the incorrect result:

image

WITH

 

MEMBER [Product].[Product].[Total Fruit] AS

SUM({[Product].[Product].&[Apples],

[Product].[Product].&[Oranges]})

 

SELECT

{[Measures].[Sales Amount],

[Measures].[Cost Amount],

MEASURES.[Cost %]}

ON COLUMNS,

{[Product].[Product].&[Apples],

[Product].[Product].&[Oranges],

[Product].[Product].[Total Fruit]}

ON ROWS

FROM SALES

 

image

Of course, if Total Fruit is defined on the cube and Cost % is defined in the WITH clause you will get the correct answer. However, usually measures like Cost % are defined on the cube and it’s calculations like Total Fruit, which define custom groupings, that are defined on an ad hoc basis in the WITH clause. This is a problem.

The SCOPE_ISOLATION property

This default behaviour of calculations defined on the cube always having a lower solve order than calculations in the WITH clause can be overridden using the SCOPE_ISOLATION property. Setting SCOPE_ISOLATION=CUBE for a calculated member defined in the WITH clause will give that calculated member a lower solve order than any calculations defined on the cube. So, with Cost % still defined on the cube the following query now gives the correct results:

WITH

 

MEMBER [Product].[Product].[Total Fruit] AS

SUM({[Product].[Product].&[Apples],

[Product].[Product].&[Oranges]}),

SCOPE_ISOLATION=CUBE

 

SELECT

{[Measures].[Sales Amount],

[Measures].[Cost Amount],

MEASURES.[Cost %]}

ON COLUMNS,

{[Product].[Product].&[Apples],

[Product].[Product].&[Oranges],

[Product].[Product].[Total Fruit]}

ON ROWS

FROM SALES

image

The Aggregate() function

Using the MDX Aggregate() function (and in fact also the VisualTotals() function – but you probably won’t ever want to use it) inside a calculation has a similar effect to the SCOPE_ISOLATION property in that it forces a calculation to be evaluated at a lower solve order than anything else. Therefore, in the previous example, instead of using the SCOPE_ISOLATION property you can change the calculation to use the Aggregate() function instead of Sum() and get the correct results:

WITH

 

MEMBER [Product].[Product].[Total Fruit] AS

AGGREGATE({[Product].[Product].&[Apples],

[Product].[Product].&[Oranges]})

 

SELECT

{[Measures].[Sales Amount],

[Measures].[Cost Amount],

MEASURES.[Cost %]}

ON COLUMNS,

{[Product].[Product].&[Apples],

[Product].[Product].&[Oranges],

[Product].[Product].[Total Fruit]}

ON ROWS

FROM SALES

image

The general rule is, therefore, whenever you are creating custom-grouping type calculated members like Total Fruit in the WITH clause of a query, to use the Aggregate() function rather than Sum(). The fact that Aggregate() takes into account the AggregateFunction property of each measure on the cube (so that distinct count, min and max measures are dealt with correctly) is another good reason to use it.

Using the Aggregate() function in calculations defined on the cube has the same effect. Even when the Total Fruit calculated member is defined after Cost % on the Calculations tab, as here:

image

…so long as Total Fruit uses the Aggregate() function, running the test query gives the correct result:

SELECT

{[Measures].[Sales Amount],

[Measures].[Cost Amount],

MEASURES.[Cost %]}

ON COLUMNS,

{[Product].[Product].&[Apples],

[Product].[Product].&[Oranges],

[Product].[Product].[Total Fruit]}

ON ROWS

FROM SALES

image

 

There are some very interesting details about the way Aggregate() changes solve order though.

First of all, using the Aggregate() function in a calculated member doesn’t change the solve order of the whole calculation, just the part of the calculation that uses the Aggregate() function. With the following calculations defined on the cube:

CREATE MEMBER CURRENTCUBE.[Measures].[Cost %] AS

DIVIDE([Measures].[Cost Amount],[Measures].[Sales Amount]),

FORMAT_STRING='0.0%';

 

CREATE MEMBER CURRENTCUBE.[Product].[Product].[One Aggregate] AS

AGGREGATE({[Product].[Product].&[Apples],

[Product].[Product].&[Oranges]});

 

CREATE MEMBER CURRENTCUBE.[Product].[Product].[One Sum] AS

SUM({[Product].[Product].&[Apples],

[Product].[Product].&[Oranges]});

 

CREATE MEMBER CURRENTCUBE.[Product].[Product].[Two Aggregates] AS

AGGREGATE({[Product].[Product].&[Apples],

[Product].[Product].&[Oranges]})

+

AGGREGATE({[Product].[Product].&[Apples],

[Product].[Product].&[Oranges]});

 

CREATE MEMBER CURRENTCUBE.[Product].[Product].[Two Sums] AS

SUM({[Product].[Product].&[Apples],

[Product].[Product].&[Oranges]})

+

SUM({[Product].[Product].&[Apples],

[Product].[Product].&[Oranges]});

 

CREATE MEMBER CURRENTCUBE.[Product].[Product].[One Aggregate One Sum] AS

AGGREGATE({[Product].[Product].&[Apples],

[Product].[Product].&[Oranges]})

+

SUM({[Product].[Product].&[Apples],

[Product].[Product].&[Oranges]});

…running the following query:

SELECT

{[Measures].[Sales Amount],

[Measures].[Cost Amount],

MEASURES.[Cost %]}

ON COLUMNS,

{[Product].[Product].&[Apples],

[Product].[Product].&[Oranges],

[Product].[Product].[One Aggregate],

[Product].[Product].[One Sum],

[Product].[Product].[Two Aggregates],

[Product].[Product].[Two Sums],

[Product].[Product].[One Aggregate One Sum]}

ON ROWS

FROM SALES

…gives these results:

image

The value returned for the calculation [One Aggregate One Sum], which contains an Aggregate() and a Sum(), shows that the value returned by the Aggregate() is evaluated at a different solve order than the value returned by Sum(), even if they are inside the same calculated member.

Furthermore, in some very obscure cases the contents of the set passed to the Aggregate() function determine whether its special solve order behaviour happens or not. I don’t know for sure what all those cases are but I have seen this happen with time utility (aka date tool aka shell) dimensions. Here’s an example.

The demo cube I’ve been using in this post has been changed to add a new dimension, called Data Type, which has just one hierarchy with one member on it called Actuals; Data Type is a fairly standard time utility dimension. The Cost % calculation has also been changed so that it’s now a calculated member on the Data Type dimension, although it is still defined on the cube. Here’s its new definition:

CREATE MEMBER CURRENTCUBE.[Data Type].[Data Type].[Cost %] AS

DIVIDE(

([Measures].[Cost Amount],[Data Type].[Data Type].&[Actuals]),

([Measures].[Sales Amount],[Data Type].[Data Type].&[Actuals])),

FORMAT_STRING='0.0%';

Now if I run the following query:

WITH

 

MEMBER [Product].[Product].[Simple Set] AS

AGGREGATE({[Product].[Product].&[Apples],

[Product].[Product].&[Oranges]})

 

MEMBER [Product].[Product].[Nextmember Function Used] AS

AGGREGATE({[Product].[Product].&[Apples],

[Product].[Product].&[Apples].NEXTMEMBER})

 

MEMBER [Product].[Product].[Descendants Function Used] AS

AGGREGATE(DESCENDANTS({[Product].[Product].&[Apples],

[Product].[Product].&[Oranges]}))

 

MEMBER [Product].[Product].[Descendants Function Used Twice] AS

AGGREGATE({

DESCENDANTS([Product].[Product].&[Apples]),

DESCENDANTS([Product].[Product].&[Oranges])

})

 

MEMBER [Product].[Product].[Descendants Function Used Twice With Union] AS

AGGREGATE(

UNION(

DESCENDANTS([Product].[Product].&[Apples]),

DESCENDANTS([Product].[Product].&[Oranges])

))

 

SELECT

{[Measures].[Sales Amount]}

*

[Data Type].[Data Type].ALLMEMBERS

ON COLUMNS,

{[Product].[Product].&[Apples],

[Product].[Product].&[Oranges],

[Product].[Product].[Simple Set],

[Product].[Product].[Nextmember Function Used],

[Product].[Product].[Descendants Function Used],

[Product].[Product].[Descendants Function Used Twice],

[Product].[Product].[Descendants Function Used Twice With Union]}

ON ROWS

FROM [Sales With Data Type]

I get these results:

image

Note that for some of the calculations, the Aggregate() function results in a lower solve order in the way we’ve already seen, but not for all of them. Using the NextMember() function, or having two Descendants() functions without wrapping them in a Union() function, seems to stop SSAS assigning the calculation a lower solve order. Ugh. Luckily, though, I have only been able to replicate this with calculated members from two non-measures dimensions; if Cost % is a calculated measure Aggregate() always gives the lower solve order. Apparently this is something that SSAS does on purpose to try to recognise ‘visual total’-like calculated members and make them work the way you want automatically. This is definitely something to beware of if you are using time utility dimensions and calculations on other dimensions though, as it may result in incorrect values being displayed or performance problems if you’re not careful.

[Thanks to Gabi Münster for showing me how Aggregate() works with different sets and Marius Dumitru for confirming that this is intended behaviour]

Written by Chris Webb

October 14, 2014 at 2:30 pm

The DirectSlice Property And Aggregations

with 2 comments

A bit of an obscure one, this, but useful to know nonetheless…

You may or may not be aware of the DirectSlice property, which can be set on a many-to-many relationship in the Dimension Usage pane in Visual Studio:

image

One of the few good descriptions of it comes from this old white paper:

http://technet.microsoft.com/en-us/library/ms345139(v=sql.90).aspx

I quote:

When a measure expression is defined, it may be expensive to compute the result for the all member of the Many-to-Many dimension if the product needs to be computed across an enormous number of rows. And in examples like this, the value for the all member is the simple sum of the fact table. There is one more property on a many-to-many dimension that accommodates this, the Direct Slice. This is a tuple on the Many-to-Many dimension, where the server is instructed to not compute the measure expression except where the cube author guarantees that the aggregate value of the facts matches the result of this expression. For example, in a scenario using measure expression for currency conversion where the sales facts are stored in US Dollars, Currency.USD would be the DirectSlice in the Currency dimension.

Basically, what it does is let you tell SSAS to not evaluate the m2m relationship if a specific selection has been made and if doing so does not change the output of your query. Doing this could improve query performance if evaluating the m2m relationship is expensive – which it often is.

Here’s an example. The screenshot above shows a simplified example based on the Adventure Works cube containing two measure groups, Currency Rate Facts and Internet Sales Facts, and three dimensions, Source Currency, Destination Currency and Date. Destination Currency has a m2m relationship with the Internet Sales Facts measure group that is resolved through the Currency Rate Facts measure group; it also has IsAggregatable=False set and the US Dollar member is the default member.

Now, if you run a query to get data for a measure by year, like this:

select

{[Measures].[Sales Amount]} on 0,

[Date].[Calendar].[Calendar Year].members on 1

from 

[DirectSliceTest]

…and look in Profiler, you’ll see that both the measure groups are being queried by the Storage Engine – as you would expect.

image

What’s more, if you build an aggregation on the Internet Sales Facts measure group at the correct granularity, you’ll see this aggregation being used by the query:

image

Incidentally, there’s a widely-repeated myth that you can’t use aggregations with m2m relationships or measure expressions – this is not true, although you do have to understand the granularity of data that the Storage Engine is querying and building your aggregation in the right place.

Now it’s time to set the DirectSlice property. In this case I set it to a tuple containing US Dollars, which is as I said earlier the default member on the Destination Currency dimension:

([Destination Currency].[Destination Currency].&[US Dollar])

Remember that by setting this property, I’m telling SSAS that it doesn’t need to evaluate the m2m relationship when this member is selected on Destination Currency because the results of any queries will be identical.

If you rerun the query then there are two things to notice in Profiler:

image

First of all, the Currency Rates Facts measure group is no longer being queried, and so the m2m relationship is not being evaluated. This is good. Second, the aggregation seen above is no longer being used. In fact based on my research there’s no way to get an aggregation to be used when the DirectSlice property is set and the m2m relationship is being bypassed (and I’ve tried several variations, even building aggregations that include attributes from the Destination Currency dimension). This is not good. However as soon as you select something other than what is specified in the DirectSlice property, and the m2m relationship is being evaluated, aggregations can be used again.

In summary, then, using the DirectSlice property is a good idea if you have an expensive m2m relationship and you don’t expect your queries to use aggregations. However if you do need to use aggregations to improve query performance you probably should not set the DirectSlice property.

[Thanks to Akshai Mirchandani for helping with this]

Written by Chris Webb

October 6, 2014 at 4:34 pm

Posted in Analysis Services

Follow

Get every new post delivered to your Inbox.

Join 3,714 other followers