Archive for the ‘Multidimensional’ Category
Optimising SSAS ManyToMany Relationships By Adding Redundant Dimensions
The most elegant way of modelling your SSAS cube doesn’t always give you the best query performance. Here’s a trick I used recently to improve the performance of a manytomany relationship going through a large fact dimension and large intermediate measure group…
Consider the following cube, built from the Adventure Works DW database and showing a manytomany relationship:
The Fact Internet Sales measure group contains sales data; the Product, Date and Customer dimensions are what you would expect; Sales Order is a fact dimension with one member for each sales transaction and therefore one member for each row in the fact table that Fact Internet Sales is built from. Each Sales Order can be associated with zero to many Sales Reasons, and the Sales Reason dimension has a manytomany relationship with the Fact Internet Sales measure group through the Fact Internet Sales Reason measure group. Only the Sales Order dimension connects directly to both the Fact Internet Sales Reason and Fact Internet Sales measure groups.
There’s nothing obviously wrong with the way this is modelled – it works and returns the correct figures – and the following query shows how the presence of the manytomany relationship means you can see the Sales Amount measure (from the Fact Internet Sales measure group) broken down by Sales Reason:
select {[Measures].[Sales Amount]} on 0, non empty [Sales Reason].[Sales Reason].[Sales Reason].members on 1 from m2m1 where([Date].[Calendar Year].&[2003], [Product].[Product Category].&[3], [Customer].[Country].&[United Kingdom])
However, to understand how we can improve the performance of a manytomany relationship you have to understand how SSAS resolves the query internally. At a very basic level, in this query, SSAS starts with all of the Sales Reasons and then, for each one, finds the list of Sales Orders associated with it by querying the Fact Sales Reason measure group. Once it has the list of Sales Orders for each Sales Reason, it queries the Fact Internet Sales measure group (which is also filtered by the Year 2003, the Product Category Clothing and the Customer Country UK) and sums up the value of Sales Amount for those Sales Orders, getting a single value for each Sales Reason. A Profiler trace shows this very clearly:
The Resource Usage event gives the following statistics for this query:
READS, 7
READ_KB, 411
WRITES, 0
WRITE_KB, 0
CPU_TIME_MS, 15
ROWS_SCANNED, 87299
ROWS_RETURNED, 129466
Given that the Sales Order dimension is a large one (in this case around 60000 members – and large fact dimensions are quite common with manytomany relationships) it’s likely that one Sales Reason will be associated with thousands of Sales Orders, and therefore SSAS will have to do a lot of work to resolve the relationship.
In this case, the optimisation comes with the realisation that in this case we can add the other dimensions present in the cube to the Fact Sales Reason measure group to try to reduce the number of Sales Orders that each Sales Reason is resolved to. Since Sales Order is a fact dimension, with one member for each sales transaction, then since each sales transaction also has a Date, a Product and a Customer associated with it we can add the keys for these dimensions to the fact table on which Fact Sales Reasons is built and join these dimensions to it directly:
This is not an assumption you can make for all manytomany relationships, for sure, but it’s certainly true for a significant proportion.
The Product, Date and Customer dimensions don’t need to be present for the manytomany relationship to work, but adding a Regular relationship between them and Fact Internet Sales Reason helps SSAS speed up the resolution of the manytomany relationship when they are used in a query. This is because in the original design, in the test query the selection of a single member on Sales Reason becomes a selection on all of the Sales Orders that have ever been associated with that Sales Reason; with the new design, the selection of a single member on Sales Reason becomes a selection on a combination of Dates, Customers, Products and Sales Orders – and since the query itself is also applying a slice on Date, Customer and Product, this is a much smaller selection than before. For the query shown above, with the new design, the Resource Usage event now shows:
READS, 11
READ_KB, 394
WRITES, 0
WRITE_KB, 0
CPU_TIME_MS, 0
ROWS_SCANNED, 47872
ROWS_RETURNED, 1418
The much lower numbers for ROWS_SCANNED and ROWS_RETURNED shows that the Storage Engine is doing a lot less work. For the amount of data in Adventure Works the difference in query performance is negligible, but in the real world I’ve seen this optimisation make a massive difference to performance, resulting in queries running up to 15 times faster.
Don’t forget that there are many other ways of optimising manytomany relationships such as the those described in this white paper. Also, if you have a large fact dimension, if it does not need to be visible to the end user and is only needed to make the manytomany relationship work, you can reduce the overhead of processing it by breaking it up into multiple smaller dimensions as described here.
If I Could Have New Features In SSAS Multidimensional, What Would They Be?
Indulge me for a moment, please. Let’s imagine that somewhere in Microsoft, someone is planning for SQL Server v.next and is considering investing in new features for SSAS Multidimensional (don’t laugh – I wouldn’t be writing this post if I didn’t think it was a possibility). What features should they be?
Before I answer that question, it’s worth pointing out that despite what you might think there has been some investment in SSAS Multidimensional over the last few years. This post lists what was new in SSAS 2012 Multidimensional; since then support for DAX queries has been added and, umm, the new Divide() function. This must have been a lot of work for someone – but why does it get overlooked? One reason: none of these changes have made much difference to the ordinary SSAS Multidimensional developer’s life. DAX query support is great if you’re one of the few people that uses the SharePoint version of Power View; shockingly, it still doesn’t work in Excel 2013 Power View yet (though I guess it will be the way the new Power BI connects to onprem Multidimensional). NUMA support is great if you work for an investment bank and have vast amounts of data and a highspec server, but that’s only about 0.1% of the installed base.
So from this we can learn that the main consideration when choosing new features to implement should be that they should be relevant to the majority of SSAS Multidimensional developers, otherwise they’ll be ignored and MS may as well have not bothered doing anything. To that we can add these other considerations:
 These features should provide compelling reasons to upgrade from earlier versions of SSAS to the new version
 While some features should be available in all editions, there should also be some features that encourage customers to upgrade from Standard Edition to Enterprise Edition
 There are a limited resources (time and developers) available and Power Pivot/SSAS Tabular will be the priority, so only a few features can be delivered.
 Features that are only there to support Power BI don’t count
With all of that borne in mind, here’s what I would choose to implement based on what I see as a consultant and from the popularity of particular topics on my blog.
LastEver Non Empty
One of the most popular posts I’ve ever written – by a gigantic margin – is this one on the lastever nonempty problem. Given that so many people seem to come up against this, and that the MDX solution is complex and still doesn’t perform brilliantly, I think it should be built into the engine as a new semiadditive aggregation type. Since semiadditive measures are Enterprise Edition only, this would be my sole Enterprise Edition feature.
MDX Calculation Parallelism
Ever since I’ve been working with SSAS, people have always asked why the Formula Engine has been singlethreaded. I understand why the SSAS dev team have ignored this question and instead concentrated on tuning specific scenarios: doing parallelism properly would be extremely difficult given the way MDX calculations can be layered over each other, and in plenty of cases it could lead to worse performance, not better. However I’m not asking for a ‘proper’ implementation of parallelism. I just want something dumb: a boolean property that you can set on a calculation that tells the Formula Engine to do this calculation on a separate thread. If it makes performance better then great; if not, then don’t set it. My guess is that even a crude implementation like this could make a gigantic difference to performance on many calculationheavy cubes.
Drillthrough
Drillthrough is one of those features that almost everyone wants to use, but for some reason has been left in a semibroken state ever since 2005. Here’s what needs to change:
 It should work with calculated members. I don’t expect SSAS to understand magically how to work out which rows to display for any given MDX calculation, but I would like a way of specifying in MDX what those rows should be.
 Those stupid, ugly column names – SSDT should let us specify readable column names and let us have complete control over the order they appear in.
 Excel should allow drillthrough on multiselect filters.
‘Between’ Relationships
This might seem a bit of a strange choice, and I suspect it may not be easy to implement, but another problem that I come across a lot in my consultancy is the ‘eventsinprogress’ problem. I’ve blogged about solving it in MDX and DAX, as have many others. I would love to see a new ‘between’ dimension/measure group relationship type to solve this. In fact, competing OLAP vendor iccube already implemented this and you can see how it works on that platform here and here. My feeling is that this would open up a massive number of modelling opportunities, almost as many as manytomany relationships.
And that’s it, four features that I think could make SSAS Multidimensional v.next a musthave upgrade. I’m not so naive to believe that any or all of these will be implemented, or even that we’ll get any new features at all, but who knows? If you have any other suggestions, please leave a comment.
MDX Solve Order, SCOPE_ISOLATION and the Aggregate() function
Solve order in MDX is a mess. Back in the good old days of Analysis Services 2000 it was a difficult concept but at least comprehensible; unfortunately when Analysis Services 2005 was released a wellintentioned attempt at making it easier to work with in fact ended up making things much, much worse. In this post I’m going to summarise everything I know about solve order in MDX to try to make this complicated topic a little bit easier to understand.
If you’re an experienced MDXer, at this point you’ll probably lose interest because you think you know everything there is to know about solve order already. Up until two weeks ago that’s what I though too, so even if you know everything I say in the first half of this post keep reading – there’s some new stuff at the end I’ve only just found out about.
Let’s start with a supersimple cube built from a single table, with two measures (Sales Amount and Cost Amount) and a Product dimension containing a single attribute hierarchy with two members (Apples and Oranges). Everything is built from the following table:
Solve Order and calculated members in the WITH clause
To understand what solve order is and how it can be manipulated, let’s start off looking at an example that uses only calculated members in the WITH clause of a query. Consider the following:
WITH
MEMBER [Measures].[Cost %] AS
DIVIDE([Measures].[Cost Amount],[Measures].[Sales Amount]),
FORMAT_STRING='0.0%'
MEMBER [Product].[Product].[Total Fruit] AS
SUM({[Product].[Product].&[Apples],
[Product].[Product].&[Oranges]})
SELECT
{[Measures].[Sales Amount],
[Measures].[Cost Amount],
MEASURES.[Cost %]}
ON COLUMNS,
{[Product].[Product].&[Apples],
[Product].[Product].&[Oranges],
[Product].[Product].[Total Fruit]}
ON ROWS
FROM SALES
There are two calculated members here:
 Cost % divides Cost Amount by Sales Amount to find the percentage that costs make up of the sales amount
 Total Fruit sums up the values for Apples and Oranges
The output of the query is as follows:
Solve order controls the order that MDX calculations are evaluated when two or more of them overlap in the same cell. In this case Cost % and Total Fruit are both evaluated in the bottom righthand cell; Total Fruit is calculated first, giving the values of 30 for Sales Amount and 21 for Cost Amount, and Cost % is calculated after that. The bottom righthand cell is the only cell where these two calculations overlap and the only cell where solve order is relevant in this query.
In this case, 70% is the value you would expect to get. You, however, can control solve order for calculations in the WITH clause by setting the SOLVE_ORDER property for each calculated member, like so:
WITH
MEMBER [Measures].[Cost %] AS
DIVIDE([Measures].[Cost Amount],[Measures].[Sales Amount]),
FORMAT_STRING='0.0%',
SOLVE_ORDER=1
MEMBER [Product].[Product].[Total Fruit] AS
SUM({[Product].[Product].&[Apples],
[Product].[Product].&[Oranges]}),
SOLVE_ORDER=2
SELECT
{[Measures].[Sales Amount],
[Measures].[Cost Amount],
MEASURES.[Cost %]}
ON COLUMNS,
{[Product].[Product].&[Apples],
[Product].[Product].&[Oranges],
[Product].[Product].[Total Fruit]}
ON ROWS
FROM SALES
Now the value in the bottom righthand corner is 135% instead of 70%: Cost % is calculated first, then Total Fruit second so 60%+75%=135%. The SOLVE_ORDER property of a calculated member is an integer value, and the lower the SOLVE_ORDER value the earlier the calculation will be evaluated, so with Cost % having a solve order of 1 and Total Fruit having a solve order of 2, this forces Cost % to be calculated first now even though in this case it gives what is clearly an ‘incorrect’ result.
Solve Order and calculated members defined on the cube
Things now get a bit more complicated. There’s a different way of controlling solve order if your calculations are defined on the cube itself: in this case, solve order is determined by the order that the calculations appear on the Calculations tab. So if the calculations tab of the Cube Editor contains the calculations in this order:
CREATE MEMBER CURRENTCUBE.[Measures].[Cost %] AS
DIVIDE([Measures].[Cost Amount],[Measures].[Sales Amount]),
FORMAT_STRING='0.0%';
CREATE MEMBER CURRENTCUBE.[Product].[Product].[Total Fruit] AS
SUM({[Product].[Product].&[Apples],
[Product].[Product].&[Oranges]});
…and you run the following query:
SELECT
{[Measures].[Sales Amount],
[Measures].[Cost Amount],
MEASURES.[Cost %]}
ON COLUMNS,
{[Product].[Product].&[Apples],
[Product].[Product].&[Oranges],
[Product].[Product].[Total Fruit]}
ON ROWS
FROM SALES
You get the incorrect result again:
…but if you change the order of the calculations so that Total Fruit comes first:…and rerun the same query, you get the correct results:
The SOLVE_ORDER property can also be used with calculations defined on the cube to override the effect of the order of calculations. So defining the following calculations on the cube:
CREATE MEMBER CURRENTCUBE.MEASURES.[Cost %] AS
DIVIDE([Measures].[Cost Amount], [Measures].[Sales Amount]),
FORMAT_STRING='PERCENT', SOLVE_ORDER=2;
CREATE MEMBER CURRENTCUBE.[Product].[Product].[Total Fruit] AS
SUM({[Product].[Product].&[Apples],
[Product].[Product].&[Oranges]}), SOLVE_ORDER=1;
…means that, even though Total Fruit comes after Cost % on the Calculations tab, because it has a lower solve order set using the SOLVE_ORDER property it is evaluated before Cost % and the query still returns the correct value:
Solve order and calculations defined in the WITH clause and on the cube
What happens if some calculations are defined on the cube, and some are defined in the WITH clause of a query? By default, calculations defined on the cube always have a lower solve order than calculations defined in the WITH clause of a query; the SOLVE_ORDER property has no effect here. So if Total Fruit is defined in the WITH clause and Cost % on the cube, you get the incorrect result:
WITH
MEMBER [Product].[Product].[Total Fruit] AS
SUM({[Product].[Product].&[Apples],
[Product].[Product].&[Oranges]})
SELECT
{[Measures].[Sales Amount],
[Measures].[Cost Amount],
MEASURES.[Cost %]}
ON COLUMNS,
{[Product].[Product].&[Apples],
[Product].[Product].&[Oranges],
[Product].[Product].[Total Fruit]}
ON ROWS
FROM SALES
Of course, if Total Fruit is defined on the cube and Cost % is defined in the WITH clause you will get the correct answer. However, usually measures like Cost % are defined on the cube and it’s calculations like Total Fruit, which define custom groupings, that are defined on an ad hoc basis in the WITH clause. This is a problem.
The SCOPE_ISOLATION property
This default behaviour of calculations defined on the cube always having a lower solve order than calculations in the WITH clause can be overridden using the SCOPE_ISOLATION property. Setting SCOPE_ISOLATION=CUBE for a calculated member defined in the WITH clause will give that calculated member a lower solve order than any calculations defined on the cube. So, with Cost % still defined on the cube the following query now gives the correct results:
WITH
MEMBER [Product].[Product].[Total Fruit] AS
SUM({[Product].[Product].&[Apples],
[Product].[Product].&[Oranges]}),
SCOPE_ISOLATION=CUBE
SELECT
{[Measures].[Sales Amount],
[Measures].[Cost Amount],
MEASURES.[Cost %]}
ON COLUMNS,
{[Product].[Product].&[Apples],
[Product].[Product].&[Oranges],
[Product].[Product].[Total Fruit]}
ON ROWS
FROM SALES
The Aggregate() function
Using the MDX Aggregate() function (and in fact also the VisualTotals() function – but you probably won’t ever want to use it) inside a calculation has a similar effect to the SCOPE_ISOLATION property in that it forces a calculation to be evaluated at a lower solve order than anything else. Therefore, in the previous example, instead of using the SCOPE_ISOLATION property you can change the calculation to use the Aggregate() function instead of Sum() and get the correct results:
WITH
MEMBER [Product].[Product].[Total Fruit] AS
AGGREGATE({[Product].[Product].&[Apples],
[Product].[Product].&[Oranges]})
SELECT
{[Measures].[Sales Amount],
[Measures].[Cost Amount],
MEASURES.[Cost %]}
ON COLUMNS,
{[Product].[Product].&[Apples],
[Product].[Product].&[Oranges],
[Product].[Product].[Total Fruit]}
ON ROWS
FROM SALES
The general rule is, therefore, whenever you are creating customgrouping type calculated members like Total Fruit in the WITH clause of a query, to use the Aggregate() function rather than Sum(). The fact that Aggregate() takes into account the AggregateFunction property of each measure on the cube (so that distinct count, min and max measures are dealt with correctly) is another good reason to use it.
Using the Aggregate() function in calculations defined on the cube has the same effect. Even when the Total Fruit calculated member is defined after Cost % on the Calculations tab, as here:
…so long as Total Fruit uses the Aggregate() function, running the test query gives the correct result:
SELECT
{[Measures].[Sales Amount],
[Measures].[Cost Amount],
MEASURES.[Cost %]}
ON COLUMNS,
{[Product].[Product].&[Apples],
[Product].[Product].&[Oranges],
[Product].[Product].[Total Fruit]}
ON ROWS
FROM SALES
There are some very interesting details about the way Aggregate() changes solve order though.
First of all, using the Aggregate() function in a calculated member doesn’t change the solve order of the whole calculation, just the part of the calculation that uses the Aggregate() function. With the following calculations defined on the cube:
CREATE MEMBER CURRENTCUBE.[Measures].[Cost %] AS
DIVIDE([Measures].[Cost Amount],[Measures].[Sales Amount]),
FORMAT_STRING='0.0%';
CREATE MEMBER CURRENTCUBE.[Product].[Product].[One Aggregate] AS
AGGREGATE({[Product].[Product].&[Apples],
[Product].[Product].&[Oranges]});
CREATE MEMBER CURRENTCUBE.[Product].[Product].[One Sum] AS
SUM({[Product].[Product].&[Apples],
[Product].[Product].&[Oranges]});
CREATE MEMBER CURRENTCUBE.[Product].[Product].[Two Aggregates] AS
AGGREGATE({[Product].[Product].&[Apples],
[Product].[Product].&[Oranges]})
+
AGGREGATE({[Product].[Product].&[Apples],
[Product].[Product].&[Oranges]});
CREATE MEMBER CURRENTCUBE.[Product].[Product].[Two Sums] AS
SUM({[Product].[Product].&[Apples],
[Product].[Product].&[Oranges]})
+
SUM({[Product].[Product].&[Apples],
[Product].[Product].&[Oranges]});
CREATE MEMBER CURRENTCUBE.[Product].[Product].[One Aggregate One Sum] AS
AGGREGATE({[Product].[Product].&[Apples],
[Product].[Product].&[Oranges]})
+
SUM({[Product].[Product].&[Apples],
[Product].[Product].&[Oranges]});
…running the following query:
SELECT
{[Measures].[Sales Amount],
[Measures].[Cost Amount],
MEASURES.[Cost %]}
ON COLUMNS,
{[Product].[Product].&[Apples],
[Product].[Product].&[Oranges],
[Product].[Product].[One Aggregate],
[Product].[Product].[One Sum],
[Product].[Product].[Two Aggregates],
[Product].[Product].[Two Sums],
[Product].[Product].[One Aggregate One Sum]}
ON ROWS
FROM SALES
…gives these results:
The value returned for the calculation [One Aggregate One Sum], which contains an Aggregate() and a Sum(), shows that the value returned by the Aggregate() is evaluated at a different solve order than the value returned by Sum(), even if they are inside the same calculated member.
Furthermore, in some very obscure cases the contents of the set passed to the Aggregate() function determine whether its special solve order behaviour happens or not. I don’t know for sure what all those cases are but I have seen this happen with time utility (aka date tool aka shell) dimensions. Here’s an example.
The demo cube I’ve been using in this post has been changed to add a new dimension, called Data Type, which has just one hierarchy with one member on it called Actuals; Data Type is a fairly standard time utility dimension. The Cost % calculation has also been changed so that it’s now a calculated member on the Data Type dimension, although it is still defined on the cube. Here’s its new definition:
CREATE MEMBER CURRENTCUBE.[Data Type].[Data Type].[Cost %] AS
DIVIDE(
([Measures].[Cost Amount],[Data Type].[Data Type].&[Actuals]),
([Measures].[Sales Amount],[Data Type].[Data Type].&[Actuals])),
FORMAT_STRING='0.0%';
Now if I run the following query:
WITH
MEMBER [Product].[Product].[Simple Set] AS
AGGREGATE({[Product].[Product].&[Apples],
[Product].[Product].&[Oranges]})
MEMBER [Product].[Product].[Nextmember Function Used] AS
AGGREGATE({[Product].[Product].&[Apples],
[Product].[Product].&[Apples].NEXTMEMBER})
MEMBER [Product].[Product].[Descendants Function Used] AS
AGGREGATE(DESCENDANTS({[Product].[Product].&[Apples],
[Product].[Product].&[Oranges]}))
MEMBER [Product].[Product].[Descendants Function Used Twice] AS
AGGREGATE({
DESCENDANTS([Product].[Product].&[Apples]),
DESCENDANTS([Product].[Product].&[Oranges])
})
MEMBER [Product].[Product].[Descendants Function Used Twice With Union] AS
AGGREGATE(
UNION(
DESCENDANTS([Product].[Product].&[Apples]),
DESCENDANTS([Product].[Product].&[Oranges])
))
SELECT
{[Measures].[Sales Amount]}
*
[Data Type].[Data Type].ALLMEMBERS
ON COLUMNS,
{[Product].[Product].&[Apples],
[Product].[Product].&[Oranges],
[Product].[Product].[Simple Set],
[Product].[Product].[Nextmember Function Used],
[Product].[Product].[Descendants Function Used],
[Product].[Product].[Descendants Function Used Twice],
[Product].[Product].[Descendants Function Used Twice With Union]}
ON ROWS
FROM [Sales With Data Type]
I get these results:
Note that for some of the calculations, the Aggregate() function results in a lower solve order in the way we’ve already seen, but not for all of them. Using the NextMember() function, or having two Descendants() functions without wrapping them in a Union() function, seems to stop SSAS assigning the calculation a lower solve order. Ugh. Luckily, though, I have only been able to replicate this with calculated members from two nonmeasures dimensions; if Cost % is a calculated measure Aggregate() always gives the lower solve order. Apparently this is something that SSAS does on purpose to try to recognise ‘visual total’like calculated members and make them work the way you want automatically. This is definitely something to beware of if you are using time utility dimensions and calculations on other dimensions though, as it may result in incorrect values being displayed or performance problems if you’re not careful.
[Thanks to Gabi Münster for showing me how Aggregate() works with different sets and Marius Dumitru for confirming that this is intended behaviour]
Calculating The Value Of Overdue Invoices Using ManyToMany Relationships in SSAS Multidimensional
Recently I had to solve the same, rather complex, problem for three different customers within the space of two weeks, a variation of the ‘eventsinprogress’ problem I’ve blogged about a lot in the past. It’s this one: how can you calculate the value of your overdue invoices on any given date? It’s important to stress that we are not talking about the invoices that are overdue today – we want to be able to pick any date in the past and find out the value of invoices that were overdue at that point in time.
Let’s imagine you have a fact table containing payments against invoices: each row has an invoice number, the date of the payment, the payment amount, and the date that the invoice is due. A positive value in the Amount column indicates that this is the opening amount of the invoice; negative values in the Amount column are payments against the invoice. Multiple payments can be made against an invoice before it is fully closed. Here’s some example data:
Looking at the rows highlighted for invoice 5, you can see in the upper box that there is a positive amount of £35 shown on January 1st 2014 – this is the date that the invoice was opened, and £35 is the full value of the invoice. The invoice is due on January 10th 2014. In the lower box you can see there were four separate payments of £5, £5, £10 and £15 before the invoice was fully paid off on January 12th 2014.
Now, if you were to do a totaltodate in MDX (similar to what I describe here) it would be pretty easy to calculate the outstanding amount on all invoices on any given date. However the requirement here is not only to do that, but to break the value down so that you can see what the total value for overdue invoices and nonoverdue (which I’m going to call backlog in this post) invoices is on any given date. This is tricky to do because we have to take the Due Date of each invoice into account as well as the Payment Date. The desired output for invoice 5 is this:
As you can see in this PivotTable, for invoice 5 £25 was still outstanding on January 9th 2014, but this is shown as backlog because this is before the due date of January 10th. On January 11th, one day after the due date, the remaining outstanding amount of £15 is shown as overdue. Of course, we also want to be able to calculate the correct values for all invoices:
One way of solving this problem would be to calculate the overdue and backlog values for each date that each invoice is open in your ETL, and store these values in a snapshot fact table. This works, and will give you the best possible query performance, but it has two major drawbacks: first, it makes your ETL much slower, and second it makes your fact table much larger. This post describes how you can calculate the overdue and nonoverdue values on any given date using manytomany relationships instead, without having to blow out the fact table.
To go along with the fact table (called FactInvoice) shown above, I have a date dimension table called DimDate (I have deliberately reduced the number of rows here to the dates I have data for, for reasons that will become clear soon):
I have a dimension table containing all of the days that an invoice can be overdue or not overdue for (which has one less than double the number of rows as the date dimension table), plus a second column classifying each row as ‘Backlog’ or ‘Overdue’:
… and an invoice dimension table that just contains the distinct invoice numbers called DimInvoice.
Now, let me explain how to build the SSAS cube.
Step 1
 Build dimensions from all of the tables shown above, so you have dimensions called Date, Overdue Days and Invoice.
 Build a cube with one measure group, based on FactInvoice, and create one measure with AggregateFunction Sum based on the Amount column in that table.
 Add the Invoice dimension to the cube with a regular relationship. Add the Date dimension to the cube twice, as Payment Date and Due Date, with regular relationships on the PaymentDate and DueDate columns. The Dimension Usage tab should look like this:
Step 2
 Create the following view in SQL Server:
 Add this view to the DSV and create a new measure group from it; you’ll need to create a measure here, but it can just be a Count measure.
 Add the Date dimension to the cube once again, this time leaving the name as Date (you will now have three roleplaying copies of the Date dimension in the cube).
 Set up regular relationships between the Date and Payment Date dimensions and the new measure group, then a manytomany relationship between Date and the Invoice measure group. This means that when you query the cube by the Date dimension, the manytomany relationship will mean you see the sum of all Amounts whose payment date is up to and including the date selected. The Dimension Usage tab will look like this:
CREATE VIEW [dbo].[FactDateToPaymentDate]
AS
SELECT a.DateKey, b.DateKey AS PaymentDateKey
FROM dbo.DimDate AS a INNER JOIN
dbo.DimDate AS b ON a.DateKey >= b.DateKey
This view returns all of the combinations of a given date and all dates up to and including the date.
Step 4
 Create another view in SQL Server with this definition:
 Add this view to the DSV and build another measure group from it, again with a single Count measure on it.
 Set up a regular relationship between this new measure group and the Due Date dimension.
 Add the Overdue Days dimension to the cube, give it a regular relationship with the new measure group and a manytomany relationship with the Invoice measure group through the new measure group.
 Add the Date dimension to the cube yet again, creating a new roleplaying dimension called As Of Date, give it a regular relationship with the new measure group and a manytomany relationship with the Invoice measure group through the new measure group. The Dimension Usage tab will now look like this:
CREATE VIEW [dbo].[FactDueDateToAsOfDate]
AS
SELECT a.DateKey AS DueDateKey, b.DateKey AS AsOfDateKey, DATEDIFF(dd, a.FullDate, b.FullDate) AS OverDueDays
FROM dbo.DimDate AS a CROSS JOIN
dbo.DimDate AS b
Yes, I am cross joining the DimDate table with itself and yes, this could return a lot of rows. However you should find that the view is very fast to execute.
The purpose of this measure group is this: if you select a date on the As Of Date dimension, you will be able to select ‘Overdue’ on the Overdue Days dimension and this will give you all of the dates on Due Date that were overdue on that date.
Step 5
 You only want to have to select one date in your PivotTable, so create the following MDX calculated measures that take your selection on the Date dimension and applies it to the As Of Date dimension too:
CREATE MEMBER CURRENTCUBE.MEASURES.[Open Amount Hidden] AS
([Measures].[Amount],
LINKMEMBER([Date].[Date].CURRENTMEMBER, [As Of Date].[Date])
), VISIBLE=FALSE;
CREATE MEMBER CURRENTCUBE.MEASURES.[Open Amount] AS
IIF(MEASURES.[Open Amount Hidden]=0, NULL, MEASURES.[Open Amount Hidden]);
 Hide all measures apart from the calculated measures you’ve just created
 Hide the Payment Date, Due Date and As Of Date dimensions
Conclusion
This is a very complex pattern, I know, and this is after I’ve simplified it a lot (if you need currency conversion as well then things get even worse) but I also know it’s extremely useful from a business point of view. Query performance is also reasonably good, at least in the places where I have implemented this.
You can download my sample SQL Server database and VS 2012 project here.
Optimising MDX Calculations With The Unorder() Function
The Unorder() function is probably one of the least used functions in the whole of MDX. It exists only as a query performance hint and, since I had never up to now found a scenario where it did improve the performance of a calculation I had pretty much forgotten about it (as Books Online says, the optimisation it performs is applied automatically in many cases). However I was playing around with some calculations last week and found out that it does have its uses…
What does the Unorder() function do? It’s a function that takes a set and returns a set, and what it does is remove any implicit ordering from that set. By default all sets in MDX are ordered, but for some types of operation that ordering is unimportant and ignoring it can result in faster query performance.
Take, for example, the following query on the Adventure Works cube which shows the number of customers who have bought something up to the current date:
WITH
MEMBER MEASURES.CUSTOMERSTODATE AS
COUNT(
NONEMPTY(
[Customer].[Customer].[Customer].MEMBERS
, {[Measures].[Internet Sales Amount]}
*
{NULL : [Date].[Calendar].CURRENTMEMBER})
)
SELECT
{MEASURES.CUSTOMERSTODATE}
ON COLUMNS,
[Date].[Calendar].[Date].MEMBERS
ON ROWS
FROM
[Adventure Works]
On my laptop it executes in 35 seconds on a cold cache. We can optimise the calculation here simply by wrapping the set of all members on the Customer level of the Customer hierarchy with the Unorder() function, so:
UNORDER([Customer].[Customer].[Customer].MEMBERS)
The following query now executes in 27 seconds on a cold cache:
WITH
MEMBER MEASURES.CUSTOMERSTODATE AS
COUNT(
NONEMPTY(
UNORDER([Customer].[Customer].[Customer].MEMBERS)
, {[Measures].[Internet Sales Amount]}
*
{NULL : [Date].[Calendar].CURRENTMEMBER})
)
SELECT
{MEASURES.CUSTOMERSTODATE}
ON COLUMNS,
[Date].[Calendar].[Date].MEMBERS
ON ROWS
FROM
[Adventure Works]
As far as I can tell, Unorder() only makes a difference on calculations when used in combination with NonEmpty(), and when it is used over a large set (here the set of customers has around 18000 members). If you have calculations like this I would recommend testing to see if Unorder() makes a difference – if it does, please leave a comment and let me know what you find!
SSAS Multidimensional Formula Engine Caching and LocaleDependent Properties
One subject I have blogged about many times here is how the use of certain MDX functions and features prevents the Formula Engine from caching the result of a calculation for longer than the lifetime of a query (see here and here for just two examples). Reading the new SSAS 2012/2014 Performance Guide, I spotted the following:
The use of MDX functions that are localedependent (such as CAPTION or .Properties) prevents the use of the global cache, because different sessions may be connected with different locales and cached results for one locale may not be correct for another locale.
It’s a bit vague but here’s an example of it happening. Take the following calculated measure for the Adventure Works cube:
CREATE MEMBER CURRENTCUBE.MEASURES.TEST AS
IIF(
[Date].[Date].CURRENTMEMBER.PROPERTIES("MEMBER_NAME")="July 1, 2001",
[Measures].[Internet Sales Amount],
[Measures].[Internet Tax Amount]);
If you run the following query, first on a cold cache and then on a warm cache:
select
{measures.test}
on 0,
[Date].[Date].[Date].members
on 1
from
[Adventure Works]
In Profiler you will see that the second time it is run, the values for the calculated measure are returning from the Formula Engine cache:
Here, the Get Data From Cache event is showing that on the second execution of the query the Formula Engine cache is being used.
However, if you change the calculation so it uses the MEMBER_CAPTION property instead of the MEMBER_NAME property, like so:
CREATE MEMBER CURRENTCUBE.MEASURES.TEST AS
IIF(
[Date].[Date].CURRENTMEMBER.PROPERTIES("MEMBER_CAPTION")="July 1, 2001",
[Measures].[Internet Sales Amount],
[Measures].[Internet Tax Amount]);
Then run the query again, on a cold cache then a warm cache, you will see the following:
Here the Storage Engine cache is being used – and since the query only contains a single calculated measure, this means that the Formula Engine cache is not being used, and may result in noticeably worse overall query performance if your calculation is expensive.
The reason there is a difference between MEMBER_NAME and MEMBER_CAPTION is that the former property will return the same value for all users, but the latter may return different values for users in different locales. Therefore it is not safe to cache the results of calculations that use the MEMBER_CAPTION property because these calculations could return different values for different users for the same query.
Calculating Ages In MDX
One area where I see people have a lot of problems when they’re learning MDX is when they are working with dates. SQL has a lot of great functionality for working with dates while MDX, to be honest, does not – therefore people tie themselves in knots writing complex MDX expressions (which often don’t perform all that well) to recreate what they would do in SQL. However, as is often the case with MDX, approaching the questions from a completely different angle yields much better results.
For example, let’s imagine that you wanted to calculate the age in years of any customer on any given date. Using the Adventure Works cube you have a Date dimension, which will be how you want to select the date that the age is to be calculated at; there is also, on the Customer dimension, an attribute called Birth Date that gives you each customer’s date of birth. You need a calculated measure that calculates the age of each customer for any date selected on the Date dimension.
If you think about this as a SQL date problem, you’ll probably want to use functions like DateDiff() – one of the VBA functions that is available in MDX (but is not native MDX). The following article shows how DateDiff() and other VBA functions can be used to calculate ages in VBA code:
http://msdn.microsoft.com/enus/library/aa227466(v=vs.60).aspx
Here’s a query showing how this calculation can be implemented in MDX:
with
calculate the difference between the years of the
current date and the customer's birth date
member measures.yeardiff as
DateDiff("yyyy",
[Customer].[Customer].currentmember.properties("Birth Date", typed),
[Date].[Date].currentmember.member_value
)
calculate a value which is the month number
multiplied by 100 plus the day number of the month
for the current date
member measures.datemonthday as
(month([Date].[Date].currentmember.member_value) * 100)
+
day([Date].[Date].currentmember.member_value)
calculate a value which is the month number
multiplied by 100 plus the day number of the month
for the customer birth date
member measures.birthmonthday as
(month([Customer].[Customer].currentmember.properties("Birth Date", typed)) * 100)
+
day([Customer].[Customer].currentmember.properties("Birth Date", typed))
calculate customer age as the difference in years
minus 1 if the customer's birthday this year is
after the current date
member measures.customerage as
measures.yeardiff 
iif(measures.datemonthday>=measures.birthmonthday,0,1)
select
return all dates in 2002 on columns
descendants(
[Date].[Calendar].[Calendar Year].&[2002],
[Date].[Calendar].[Date])
on 0,
return all customers in Coff's Harbour on rows
descendants(
[Customer].[Customer Geography].[City].&[Coffs Harbour]&[NSW],
[Customer].[Customer Geography].[Customer])
on 1
from
[Adventure Works]
where(measures.customerage)
The output of the query is this:
On my laptop this query, which returns 365 columns and 106 rows, returns in around 3.5 seconds. A few things to point out:
 I’m using the .Member_Value function instead of the .MemberValue function to get the typed date value associated with each date, as Bill Anton’s recent blog post points out that it is marginally faster.
 I’m using .Properties(“Birth Date”, typed) to get the typed date value for each customer’s birth date. If I set the ValueColumn property on the Customer attribute of the Customer dimension, or enabled the hierarchy on the Birth Date attribute, I would be able to use .Member_Value here too and get a very minor performance improvement.
 Most of time spent by the query is taken by the yeardiff calculated member, which uses the DateDiff() function. This isn’t surprising because calling any VBA function in MDX is expensive and something to be avoided.
Performance here is ok, but we can do better. The most important thing to do when tuning any MDX calculation is to try to precalculate as much as possible. In this case there are two calculations we can move out of MDX into the structure of the cube:
 The DateDiff() function here just calculates the difference between the years of the two dates. Therefore if you create a new attribute on the Customer dimension to hold the integer representing the year of the customer’s birth date, and use the existing Calendar Year attribute on the Date dimension, you can avoid using DateDiff() completely and do a simple subtraction.
 Likewise, rather than calculating the combination of month and date in the MDX, these values can be precalculated in the dimension and then accessed as properties.
Here’s the rewritten version of the calculation, assuming that the following new attributes have been added to the Customer and Date dimensions:
 [Customer].[Birth Year] holds the year of the customer’s birth as an integer
 [Customer].[Birth Month Day] is an integer containing the month number multiplied by 100 plus the day number of the month for the customer’s birth date. For example the date 5th June would have the value 605.
 [Date].[Month Day] is the same as [Customer].[Birth Month Day] but for the Date dimension.
with
member measures.yeardiff as
[Date].[Calendar Year].currentmember.member_value

[Customer].[Birth Year].currentmember.member_value
member measures.customerage as
measures.yeardiff

iif(
[Date].[Month Day].currentmember.member_value
>=
[Customer].[Birth Month Day].currentmember.member_value
,0,1)
select
descendants(
[Date].[Calendar].[Calendar Year].&[2002],
[Date].[Calendar].[Date])
on 0,
descendants(
[Customer].[Customer Geography].[City].&[Coffs Harbour]&[NSW],
[Customer].[Customer Geography].[Customer])
on 1
from
[Adventure Works]
where(measures.customerage)
This query runs in around 1.2 seconds, a significant improvement. Of course if you were to create a completely new fact table that contained the age of every customer on every date in the Date dimension then you wouldn’t need to use any calculated measures and performance would be even better, but that may not be practical if you have a lot of customers and a lot of dates.