Chris Webb's BI Blog

Analysis Services, MDX, PowerPivot, DAX and anything BI-related

Archive for the ‘Multidimensional’ Category

Optimising MDX Calculations With The Unorder() Function

with 7 comments

The Unorder() function is probably one of the least used functions in the whole of MDX. It exists only as a query performance hint and, since I had never up to now found a scenario where it did improve the performance of a calculation I had pretty much forgotten about it (as Books Online says, the optimisation it performs is applied automatically in many cases). However I was playing around with some calculations last week and found out that it does have its uses…

What does the Unorder() function do? It’s a function that takes a set and returns a set, and what it does is remove any implicit ordering from that set. By default all sets in MDX are ordered, but for some types of operation that ordering is unimportant and ignoring it can result in faster query performance.

Take, for example, the following query on the Adventure Works cube which shows the number of customers who have bought something up to the current date:

WITH

MEMBER MEASURES.CUSTOMERSTODATE AS

COUNT(

    NONEMPTY(

        [Customer].[Customer].[Customer].MEMBERS

    , {[Measures].[Internet Sales Amount]}

      *

      {NULL : [Date].[Calendar].CURRENTMEMBER})

)

 

SELECT

{MEASURES.CUSTOMERSTODATE}

ON COLUMNS,

[Date].[Calendar].[Date].MEMBERS

ON ROWS

FROM

[Adventure Works]

 

On my laptop it executes in 35 seconds on a cold cache. We can optimise the calculation here simply by wrapping the set of all members on the Customer level of the Customer hierarchy with the Unorder() function, so:

UNORDER([Customer].[Customer].[Customer].MEMBERS)

The following query now executes in 27 seconds on a cold cache:

WITH

MEMBER MEASURES.CUSTOMERSTODATE AS

COUNT(

    NONEMPTY(

        UNORDER([Customer].[Customer].[Customer].MEMBERS)

    , {[Measures].[Internet Sales Amount]}

      *

      {NULL : [Date].[Calendar].CURRENTMEMBER})

)

 

SELECT

{MEASURES.CUSTOMERSTODATE}

ON COLUMNS,

[Date].[Calendar].[Date].MEMBERS

ON ROWS

FROM

[Adventure Works]

 

As far as I can tell, Unorder() only makes a difference on calculations when used in combination with NonEmpty(), and when it is used over a large set (here the set of customers has around 18000 members). If you have calculations like this I would recommend testing to see if Unorder() makes a difference – if it does, please leave a comment and let me know what you find!

Written by Chris Webb

July 7, 2014 at 9:30 am

SSAS Multidimensional Formula Engine Caching and Locale-Dependent Properties

with 3 comments

One subject I have blogged about many times here is how the use of certain MDX functions and features prevents the Formula Engine from caching the result of a calculation for longer than the lifetime of a query (see here and here for just two examples). Reading the new SSAS 2012/2014 Performance Guide, I spotted the following:

The use of MDX functions that are locale-dependent (such as CAPTION or .Properties) prevents the use of the global cache, because different sessions may be connected with different locales and cached results for one locale may not be correct for another locale.

It’s a bit vague but here’s an example of it happening. Take the following calculated measure for the Adventure Works cube:

CREATE MEMBER CURRENTCUBE.MEASURES.TEST AS

IIF(

[Date].[Date].CURRENTMEMBER.PROPERTIES("MEMBER_NAME")="July 1, 2001",

[Measures].[Internet Sales Amount],

[Measures].[Internet Tax Amount]);

If you run the following query, first on a cold cache and then on a warm cache:

select

{measures.test} 

on 0,

[Date].[Date].[Date].members

on 1

from

[Adventure Works]

In Profiler you will see that the second time it is run, the values for the calculated measure are returning from the Formula Engine cache:

image

Here, the Get Data From Cache event is showing that on the second execution of the query the Formula Engine cache is being used.

However, if you change the calculation so it uses the MEMBER_CAPTION property instead of the MEMBER_NAME property, like so:

CREATE MEMBER CURRENTCUBE.MEASURES.TEST AS

IIF(

[Date].[Date].CURRENTMEMBER.PROPERTIES("MEMBER_CAPTION")="July 1, 2001",

[Measures].[Internet Sales Amount],

[Measures].[Internet Tax Amount]);

Then run the query again, on a cold cache then a warm cache, you will see the following:

image

Here the Storage Engine cache is being used – and since the query only contains a single calculated measure, this means that the Formula Engine cache is not being used, and may result in noticeably worse overall query performance if your calculation is expensive.

The reason there is a difference between MEMBER_NAME and MEMBER_CAPTION is that the former property will return the same value for all users, but the latter may return different values for users in different locales. Therefore it is not safe to cache the results of calculations that use the MEMBER_CAPTION property because these calculations could return different values for different users for the same query.

Written by Chris Webb

June 16, 2014 at 9:30 am

Calculating Ages In MDX

with 4 comments

One area where I see people have a lot of problems when they’re learning MDX is when they are working with dates. SQL has a lot of great functionality for working with dates while MDX, to be honest, does not – therefore people tie themselves in knots writing complex MDX expressions (which often don’t perform all that well) to recreate what they would do in SQL. However, as is often the case with MDX, approaching the questions from a completely different angle yields much better results.

For example, let’s imagine that you wanted to calculate the age in years of any customer on any given date. Using the Adventure Works cube you have a Date dimension, which will be how you want to select the date that the age is to be calculated at; there is also, on the Customer dimension, an attribute called Birth Date that gives you each customer’s date of birth. You need a calculated measure that calculates the age of each customer for any date selected on the Date dimension.

If you think about this as a SQL date problem, you’ll probably want to use functions like DateDiff() – one of the VBA functions that is available in MDX (but is not native MDX). The following article shows how DateDiff() and other VBA functions can be used to calculate ages in VBA code:
http://msdn.microsoft.com/en-us/library/aa227466(v=vs.60).aspx

Here’s a query showing how this calculation can be implemented in MDX:

with

--calculate the difference between the years of the 

--current date and the customer's birth date

member measures.yeardiff as

DateDiff("yyyy",

[Customer].[Customer].currentmember.properties("Birth Date", typed),

[Date].[Date].currentmember.member_value

)

--calculate a value which is the month number

--multiplied by 100 plus the day number of the month

--for the current date

member measures.datemonthday as

(month([Date].[Date].currentmember.member_value) * 100)

+

day([Date].[Date].currentmember.member_value)

--calculate a value which is the month number

--multiplied by 100 plus the day number of the month

--for the customer birth date

member measures.birthmonthday as

(month([Customer].[Customer].currentmember.properties("Birth Date", typed)) * 100)

+

day([Customer].[Customer].currentmember.properties("Birth Date", typed))

 

--calculate customer age as the difference in years

--minus 1 if the customer's birthday this year is

--after the current date

member measures.customerage as

measures.yeardiff -

iif(measures.datemonthday>=measures.birthmonthday,0,1)

 

select

--return all dates in 2002 on columns

descendants(

[Date].[Calendar].[Calendar Year].&[2002],

[Date].[Calendar].[Date])

on 0,

--return all customers in Coff's Harbour on rows

descendants(

[Customer].[Customer Geography].[City].&[Coffs Harbour]&[NSW],

[Customer].[Customer Geography].[Customer])

on 1

from

[Adventure Works]

where(measures.customerage)

The output of the query is this:

image

On my laptop this query, which returns 365 columns and 106 rows, returns in around 3.5 seconds. A few things to point out:

  • I’m using the .Member_Value function instead of the .MemberValue function to get the typed date value associated with each date, as Bill Anton’s recent blog post points out that it is marginally faster.
  • I’m using .Properties(“Birth Date”, typed) to get the typed date value for each customer’s birth date. If I set the ValueColumn property on the Customer attribute of the Customer dimension, or enabled the hierarchy on the Birth Date attribute, I would be able to use .Member_Value here too and get a very minor performance improvement.
  • Most of time spent by the query is taken by the yeardiff calculated member, which uses the DateDiff() function. This isn’t surprising because calling any VBA function in MDX is expensive and something to be avoided.

Performance here is ok, but we can do better. The most important thing to do when tuning any MDX calculation is to try to pre-calculate as much as possible. In this case there are two calculations we can move out of MDX into the structure of the cube:

  • The DateDiff() function here just calculates the difference between the years of the two dates. Therefore if you create a new attribute on the Customer dimension to hold the integer representing the year of the customer’s birth date, and use the existing Calendar Year attribute on the Date dimension, you can avoid using DateDiff() completely and do a simple subtraction.
  • Likewise, rather than calculating the combination of month and date in the MDX, these values can be pre-calculated in the dimension and then accessed as properties.

Here’s the rewritten version of the calculation, assuming that the following new attributes have been added to the Customer and Date dimensions:

  • [Customer].[Birth Year] holds the year of the customer’s birth as an integer
  • [Customer].[Birth Month Day] is an integer containing the month number multiplied by 100 plus the day number of the month for the customer’s birth date. For example the date 5th June would have the value 605.
  • [Date].[Month Day] is the same as [Customer].[Birth Month Day] but for the Date dimension.
with

member measures.yeardiff as

[Date].[Calendar Year].currentmember.member_value

-

[Customer].[Birth Year].currentmember.member_value

 

member measures.customerage as

measures.yeardiff

-

iif(

[Date].[Month Day].currentmember.member_value

>=

[Customer].[Birth Month Day].currentmember.member_value

,0,1)

 

select

descendants(

[Date].[Calendar].[Calendar Year].&[2002],

[Date].[Calendar].[Date])

on 0,

descendants(

[Customer].[Customer Geography].[City].&[Coffs Harbour]&[NSW],

[Customer].[Customer Geography].[Customer])

on 1

from

[Adventure Works]

where(measures.customerage)

 

This query runs in around 1.2 seconds, a significant improvement. Of course if you were to create a completely new fact table that contained the age of every customer on every date in the Date dimension then you wouldn’t need to use any calculated measures and performance would be even better, but that may not be practical if you have a lot of customers and a lot of dates.

Written by Chris Webb

June 9, 2014 at 9:15 am

MDX Cell Properties Supported By Excel

with 5 comments

I was wondering the other day (as you do) which of the MDX cell properties Excel PivotTables actually support. This page has all the details on the cell properties that are available in an MDX query but most client tools don’t bother retrieving all of them and Excel is no different. Of course it retrieves the most important properties and it retrieves one or two others, but I couldn’t resist doing a bit of research to find out the exact situation with Excel 2013.

The first thing to note is that you can control which cell properties Excel retrieves for a given connection in the connection properties dialog, in the OLAP Server Formatting section:

image

If you can live with not getting all of the cell properties back from SSAS there are some scenarios where unchecking all of the boxes in the OLAP Server Formatting section can improve performance:

  • When you have a large number of databases and cubes on your server, and/or complex security, because a side effect of the way Excel retrieves properties is that it causes all cubes in all databases to be loaded, their MDX Scripts executed and security evaluated. See here for more details.
  • When your queries return a lot of data and your network is slow. See page 61 of the SQLCAT Guide to BI and Analytics for more details.

You can also quite easily see which cell properties Excel is retrieving by looking at the MDX queries it generates (thank you OLAP PivotTable Extensions). Here’s an example of a simple PivotTable query run on a connection which has all of the boxes checked in the OLAP Server Formatted section:

SELECT 

{[Measures].[D1],[Measures].[D2]} 

DIMENSION PROPERTIES PARENT_UNIQUE_NAME,HIERARCHY_UNIQUE_NAME 

ON COLUMNS  

FROM 

[Adventure Works DW2008] 

CELL PROPERTIES VALUE, FORMAT_STRING, LANGUAGE, BACK_COLOR, FORE_COLOR, FONT_FLAGS

 

In the cell properties clause of this query you can see the six properties returned. Here’s a breakdown of each of them.

VALUE and FORMAT_STRING

Excel doesn’t actually retrieve the FORMATTED_VALUE cell property, which gives you the measure value with formatting applied by SSAS. Instead, assuming you have the Number Format box checked in the Connection Properties dialog, it retrieves the VALUE property from SSAS (which contains the raw, unformatted measure value) and the FORMAT_STRING property (which contains the format string you defined on the server). It then tries to convert the format string into an Excel format for the PivotTable. Unfortunately it can’t always do the conversion successfully – I blogged about a problem with the Percent format some time ago and this is still a problem with Excel 2013. Excel also doesn’t support formats defined in the fourth section (see here for more details) of the FORMAT_STRING property for null values, and there are probably lots of other relatively obscure types of formatting it doesn’t support too.

LANGUAGE

The language property of a cell controls things like the currency symbol displayed when you are using the “Currency” built-in format string and the thousands and decimal separators used. Whether Excel returns the language property or not is also linked to the Number Format checkbox in the Connection Properties dialog. I strongly recommend that you do not use the “Currency” format string and the Language property if you are working with multiple currencies, for reasons I outlined here.

BACK_COLOR and FORE_COLOR

I’ve never particularly liked using the BACK_COLOR and FORE_COLOR properties on a cell to do traffic light-style reporting, to be honest, although I know some people love it. However I do use these properties a lot when debugging scoped assignments as seen here. BACK_COLOR is retrieved if you have the Fill Color box checked on the connection properties dialog; FORE_COLOR is retrieved if you have the Text Color box checked on the connection properties dialog.

FONT_FLAGS

Probably the only surprise of this whole exercise was the fact that Excel retrieved the FONT_FLAGS property if you have the Font Style box checked on the connection properties dialog; alas it doesn’t support FONT_SIZE or FONT_NAME. For example if you put the following calculated members on your cube:

CREATE MEMBER CURRENTCUBE.MEASURES.BOLD as "This is BOLD", FONT_FLAGS=1;

 

CREATE MEMBER CURRENTCUBE.MEASURES.ITALIC as "This is ITALIC", FONT_FLAGS=2;

 

CREATE MEMBER CURRENTCUBE.MEASURES.UNDERLINE as "This is UNDERLINE", FONT_FLAGS=4;

 

CREATE MEMBER CURRENTCUBE.MEASURES.STRIKEOUT as "This is STRIKEOUT", FONT_FLAGS=8;

If you drop them into a PivotTable, you will see the following returned:

image

Written by Chris Webb

April 25, 2014 at 11:02 am

CREATE SESSION CUBE and disk usage

with 4 comments

You learn all kinds of interesting things on the forums! Recently I saw this post from Nils Rottgardt about a problem with the CREATE SESSION CUBE statement and SSAS Multidimensional:

http://social.msdn.microsoft.com/Forums/sqlserver/en-US/175fc61f-676e-4e3d-bed4-283f151641ec/create-session-cube-command-by-excel-grouping-creates-a-shadow-copy-on-the-server-disk-storage?forum=sqlanalysisservices

Here’s a related Connect:

https://connect.microsoft.com/SQLServer/feedback/details/822778/excel-grouping-create-session-cube-breaks-ssas-envirounment-because-of-phyically-copy-the-cube-data-for-every-pivot-table

Basically, when you run a CREATE SESSION CUBE statement SSAS creates a shadow copy of the cube on disk – and if you have a very large cube, with very large dimensions, this could use up a lot of storage space on the server. When you use the custom grouping functionality in a PivotTable Excel generates a CREATE SESSION CUBE statement behind the scenes, so it’s possible to cause a lot of mischief just by using custom grouping in Excel.

Incidentally, this isn’t a problem for SSAS Tabular because it doesn’t support the CREATE SESSION CUBE statement.

I always knew this particular functionality could be very slow and always warned people against using it, but I didn’t know about the disk space issue… if your users haven’t found out about custom grouping yet, I recommend you don’t tell them about it!

Written by Chris Webb

April 1, 2014 at 9:36 am

How Many Partitions Per Measure Group Are Allowed in SSAS Standard Edition?

with 4 comments

Most people know that the ability to partition a measure group is a feature of Analysis Services Multidimensional Enterprise Edition, but that doesn’t mean that in Standard Edition you are limited to just having one partition per measure group. In fact it is possible to use up to three partitions per measure group in SSAS Multidimensional SE, with some limitations. For a long time I wasn’t sure whether this was legal, as opposed to possible, according to the terms of the SQL Server licence but since this page in Books Online (thanks to Rob Kerr for the link) states that you can have up to three partitions in SE then I assume it is:

http://msdn.microsoft.com/en-us/library/cc645993(v=SQL.110).aspx#BISemModel_multi

HOWEVER

If you do decide to use more than one partition in SSAS SE then you do need to understand the risks involved – and the reason I wanted to write this post is because I see a lot of people using more than one partition per measure group in SE without understanding those risks. Strictly speaking, SE is only designed to work with one partition per measure group. It needs those extra two partitions for two pieces of functionality:

  1. To support writeback, because using this feature requires SSAS to create a separate partition to hold writeback values
  2. To support incremental processing, because when you do incremental processing on a partition in the background SSAS needs to create a new partition, process it and then merge it with your existing partition

Therefore if you create more than one partition per measure group in SE you may find that writeback and/or incremental processing will break.

Written by Chris Webb

February 12, 2014 at 10:34 am

Semi-Additive Measures, Unary Operators, Custom Rollups And The Calculate() Statement

leave a comment »

Last week I was doing some performance tuning on SSAS Multidimensional and saw something very strange: when the MDX Script of the cube was executed (as always happens after the cache has been cleared, or processing has taken place – you can monitor this via the Execute MDX Script Begin/End events in Profiler) the Calculate() statement was taking just over four seconds. I’d never seen this before so I asked the nice people on the SSAS dev team what was going on, and Akshai Mirchandani very kindly filled me in on the details.

There are two types of calculation on an SSAS cube: those explicitly defined in the MDX Script (ie those seen on the Calculations tab of the cube editor); and semi-additive measures, unary operators and custom rollups, which are defined in the model itself. This second type of calculation is added to the cube when the Calculate() statement fires, and the more of them there are the longer it takes SSAS to work out where they should be applied in the space of the cube. In my customer’s case there were several large (80000+ members) parent/child hierarchies with unary operators as well as a few semi-additive measures and so this was the reason why Calculate() was so slow. Up to now I had only known that Calculate() triggers the aggregation of data up through the cube, which is why if you delete it the cube seems to contain no data.

Written by Chris Webb

December 16, 2013 at 4:05 pm

Problems With Calculated Members And Level Names

with one comment

Here’s something strange I came across today: a customer had created a calculated member, not on the measures dimension but on another hierarchy, and even though it had been selected in their client tool it wasn’t appearing in the query results. I tested in Excel and saw some strange error message. I was mystified, but after a bit of thought I found out what was going on…

Take the following query in Adventure Works:

WITH
MEMBER [Customer].[Gender].[CALC] AS 123
SELECT {[Measures].[Internet Sales Amount]} ON 0,
{[Customer].[Gender].&[F],
[Customer].[Gender].&[M],
[Customer].[Gender].[CALC]}
ON 1
FROM [Adventure Works]

When you run it, you get the results you would expect:

image

Now, if you change the calculated member name, you’ll see the problem that my customer was running into:

WITH
MEMBER [Customer].[Gender].[Gender] AS 123
SELECT {[Measures].[Internet Sales Amount]} ON 0,
{[Customer].[Gender].&[F],
[Customer].[Gender].&[M],
[Customer].[Gender].[Gender]}
ON 1
FROM [Adventure Works]

image

Instead of the calculated member, you now see Female and Male repeated twice. The reason why this is happening is down to how SSAS interprets the expression [Customer].[Gender].[Gender]. It’s the name of the calculated member that’s been defined, but it’s also the unique name of the Gender level on the Gender hierarchy of the Customer dimension:

image

As I explained in this blog post, when SSAS sees the unique name of a level it sticks the .MEMBERS function on the end and this returns the set of all members on the Gender level of the Gender hierarchy – and not the calculated member.

Of course, if you rewrite the query to use the ALLMEMBERS function as follows:

WITH
MEMBER [Customer].[Gender].[Gender] AS 123
SELECT {[Measures].[Internet Sales Amount]} ON 0,
[Customer].[Gender].ALLMEMBERS
ON 1
FROM [Adventure Works]

…you’ll see the calculated member returned:

image

So, the moral of this post is don’t create calculated members that have the same name as a level on a hierarchy. It’s unlikely that you’ll do this, but possible – in my customer’s case they had a hierarchy on a time utility dimension called “Periods To Date” and a calculated member on that hierarchy with the same name.

Written by Chris Webb

September 6, 2013 at 10:04 pm

Subselects And Formula Engine Caching In SSAS 2012

with 13 comments

One of the many interesting things that caught my eye in the new SSAS Tabular Performance Tuning white paper is actually about new functionality in the SSAS 2012 Multidimensional and nothing to do with Tabular! It turns out that in the DAXMD release of SSAS 2012, ie SQL Server 2012 SP1 CU4, some work was done to enable the Formula Engine to cache the results of MDX calculations for longer than the lifetime of a query when a query includes a subselect. Here’s an excerpt from the paper:

Certain types of queries cannot benefit from MDX caching. For example, multi-select filters (represented in the MDX query with a subselect) prevent use of the global MDX formula engine cache. However, this limitation was lifted for most subselect scenarios as part of SQL Server 2012 SP1 CU4 (http://support.microsoft.com/kb/2833645/en-us). However, subselect global scope caching is still prevented for arbitrary shapes (http://blog.kejser.org/2006/11/16/arbitrary-shapes-in-as-2005/), transient calculations like NOW(), and queries without a dimension on rows or columns. If the business requirements for the report do not require visual totals and caching is not occurring for your query, consider changing the query to use a set in the WHERE clause instead of a subselect as this enables better caching in some scenarios.

This is a subject I’ve blogged about in the past, both for SSRS reports (which almost always use subselects in the MDX created by the query editor) and Excel PivotTables (which sometimes, but not always, use subselects) and you may want to read this posts to get some background. In my experience, if you have a lot of complex MDX calculations on your cube (financial applications are a great example), this issue can have a major impact on your overall query performance, even if it isn’t immediately obvious that this is the case. On builds of SSAS before 2012 SP1 CU4, even if Storage Engine caching is working properly, if a query references a lot of MDX calculations and includes a subselect it will be consistently slow however many times you run it because the calculations will need to be re-evaluated every time the query is run.

I’ve heard of a few problems with CU4 regarding SSRS so I don’t recommend upgrading your production SSAS servers just yet, but when these problems have been ironed out in the next full service pack I think this could be a compelling reason for many people to move to SSAS 2012. There’s also still a limitation whereby queries that return a single cell value and use a subselect may still not be able to use the global Formula Engine cache, but hopefully this will be dealt with in a future release too. Overall, though, I’m extremely pleased to see yet another improvement to the Multidimensional engine.

Thanks to Jeffrey Wang for answering my questions about this functionality.

Written by Chris Webb

August 7, 2013 at 12:10 am

Ordering Of Named Sets In Excel

with 3 comments

A bit of an obscure one, this, but it’s come up twice this week so worth mentioning. When you define a named set on your SSAS Multidimensional cube, Excel doesn’t respect the order of items in that set by default when you use it in a PivotTable. Consider the following named set defined on the Adventure Works cube (on the Calculations tab of the cube, not in defined in Excel itself):

CREATE SET [MY COUNTRIES] AS
{[Customer].[Country].&[France], [Customer].[Country].&[Canada], [Customer].[Country].&[Australia]};

Note that the countries are in the order France, Canada, Australia. When you use this named set in Excel, this order is overridden and the countries come out in hierarchy order, that’s to say the order that they appear on the Country hierarchy: Australia, Canada, France.

image 

image

How can you stop this? After all, in a lot of cases the order of members in a named set is important. If you have Excel 2010 or Excel 2013 (I believe this option isn’t available in Excel 2007), you need to click on the name of the set in the PivotTable Field List pane and select Field Settings:

image

Then in the Field Settings dialog go to the Layout and Print tab and uncheck the “Automatically order and remove duplicates from the set” option:

image

When you do that, the order of your set is respected:

image

Written by Chris Webb

July 31, 2013 at 2:12 pm

Follow

Get every new post delivered to your Inbox.

Join 3,072 other followers