Archive for the ‘Analysis Services’ Category
One area where I see people have a lot of problems when they’re learning MDX is when they are working with dates. SQL has a lot of great functionality for working with dates while MDX, to be honest, does not – therefore people tie themselves in knots writing complex MDX expressions (which often don’t perform all that well) to recreate what they would do in SQL. However, as is often the case with MDX, approaching the questions from a completely different angle yields much better results.
For example, let’s imagine that you wanted to calculate the age in years of any customer on any given date. Using the Adventure Works cube you have a Date dimension, which will be how you want to select the date that the age is to be calculated at; there is also, on the Customer dimension, an attribute called Birth Date that gives you each customer’s date of birth. You need a calculated measure that calculates the age of each customer for any date selected on the Date dimension.
If you think about this as a SQL date problem, you’ll probably want to use functions like DateDiff() – one of the VBA functions that is available in MDX (but is not native MDX). The following article shows how DateDiff() and other VBA functions can be used to calculate ages in VBA code:
Here’s a query showing how this calculation can be implemented in MDX:
The output of the query is this:
On my laptop this query, which returns 365 columns and 106 rows, returns in around 3.5 seconds. A few things to point out:
- I’m using the .Member_Value function instead of the .MemberValue function to get the typed date value associated with each date, as Bill Anton’s recent blog post points out that it is marginally faster.
- I’m using .Properties(“Birth Date”, typed) to get the typed date value for each customer’s birth date. If I set the ValueColumn property on the Customer attribute of the Customer dimension, or enabled the hierarchy on the Birth Date attribute, I would be able to use .Member_Value here too and get a very minor performance improvement.
- Most of time spent by the query is taken by the yeardiff calculated member, which uses the DateDiff() function. This isn’t surprising because calling any VBA function in MDX is expensive and something to be avoided.
Performance here is ok, but we can do better. The most important thing to do when tuning any MDX calculation is to try to pre-calculate as much as possible. In this case there are two calculations we can move out of MDX into the structure of the cube:
- The DateDiff() function here just calculates the difference between the years of the two dates. Therefore if you create a new attribute on the Customer dimension to hold the integer representing the year of the customer’s birth date, and use the existing Calendar Year attribute on the Date dimension, you can avoid using DateDiff() completely and do a simple subtraction.
- Likewise, rather than calculating the combination of month and date in the MDX, these values can be pre-calculated in the dimension and then accessed as properties.
Here’s the rewritten version of the calculation, assuming that the following new attributes have been added to the Customer and Date dimensions:
- [Customer].[Birth Year] holds the year of the customer’s birth as an integer
- [Customer].[Birth Month Day] is an integer containing the month number multiplied by 100 plus the day number of the month for the customer’s birth date. For example the date 5th June would have the value 605.
- [Date].[Month Day] is the same as [Customer].[Birth Month Day] but for the Date dimension.
This query runs in around 1.2 seconds, a significant improvement. Of course if you were to create a completely new fact table that contained the age of every customer on every date in the Date dimension then you wouldn’t need to use any calculated measures and performance would be even better, but that may not be practical if you have a lot of customers and a lot of dates.
I was wondering the other day (as you do) which of the MDX cell properties Excel PivotTables actually support. This page has all the details on the cell properties that are available in an MDX query but most client tools don’t bother retrieving all of them and Excel is no different. Of course it retrieves the most important properties and it retrieves one or two others, but I couldn’t resist doing a bit of research to find out the exact situation with Excel 2013.
The first thing to note is that you can control which cell properties Excel retrieves for a given connection in the connection properties dialog, in the OLAP Server Formatting section:
If you can live with not getting all of the cell properties back from SSAS there are some scenarios where unchecking all of the boxes in the OLAP Server Formatting section can improve performance:
- When you have a large number of databases and cubes on your server, and/or complex security, because a side effect of the way Excel retrieves properties is that it causes all cubes in all databases to be loaded, their MDX Scripts executed and security evaluated. See here for more details.
- When your queries return a lot of data and your network is slow. See page 61 of the SQLCAT Guide to BI and Analytics for more details.
You can also quite easily see which cell properties Excel is retrieving by looking at the MDX queries it generates (thank you OLAP PivotTable Extensions). Here’s an example of a simple PivotTable query run on a connection which has all of the boxes checked in the OLAP Server Formatted section:
In the cell properties clause of this query you can see the six properties returned. Here’s a breakdown of each of them.
VALUE and FORMAT_STRING
Excel doesn’t actually retrieve the FORMATTED_VALUE cell property, which gives you the measure value with formatting applied by SSAS. Instead, assuming you have the Number Format box checked in the Connection Properties dialog, it retrieves the VALUE property from SSAS (which contains the raw, unformatted measure value) and the FORMAT_STRING property (which contains the format string you defined on the server). It then tries to convert the format string into an Excel format for the PivotTable. Unfortunately it can’t always do the conversion successfully – I blogged about a problem with the Percent format some time ago and this is still a problem with Excel 2013. Excel also doesn’t support formats defined in the fourth section (see here for more details) of the FORMAT_STRING property for null values, and there are probably lots of other relatively obscure types of formatting it doesn’t support too.
The language property of a cell controls things like the currency symbol displayed when you are using the “Currency” built-in format string and the thousands and decimal separators used. Whether Excel returns the language property or not is also linked to the Number Format checkbox in the Connection Properties dialog. I strongly recommend that you do not use the “Currency” format string and the Language property if you are working with multiple currencies, for reasons I outlined here.
BACK_COLOR and FORE_COLOR
I’ve never particularly liked using the BACK_COLOR and FORE_COLOR properties on a cell to do traffic light-style reporting, to be honest, although I know some people love it. However I do use these properties a lot when debugging scoped assignments as seen here. BACK_COLOR is retrieved if you have the Fill Color box checked on the connection properties dialog; FORE_COLOR is retrieved if you have the Text Color box checked on the connection properties dialog.
Probably the only surprise of this whole exercise was the fact that Excel retrieved the FONT_FLAGS property if you have the Font Style box checked on the connection properties dialog; alas it doesn’t support FONT_SIZE or FONT_NAME. For example if you put the following calculated members on your cube:
If you drop them into a PivotTable, you will see the following returned:
I stopped writing book reviews on my blog a long time ago because, frankly, I knew most of the authors of the books I featured so it was hard to be impartial. That doesn’t mean I can’t plug my friends’ books in a totally biased way, though, in the same way that I plug my own books/courses/consultancy etc!
I’ve known Paul te Braak for several years now and he’s one of the best SSAS guys out there. “Microsoft Tabular Modeling Cookbook” is a great introduction to building models in Power Pivot and SSAS Tabular models, and also covers client-side interaction with Excel and Power View. As the name suggests it follows the cookbook format rather than the more verbose, traditional tech book style of, for example, the SSAS Tabular book that Marco, Alberto and I wrote. I like the cookbook format a lot – it gives you information in a concise, easy-to-follow way and doesn’t force you to read the whole book cover-to-cover. Paul has done a superb job of covering all of the important points without getting bogged down with unnecessary detail. Highly recommended.
You learn all kinds of interesting things on the forums! Recently I saw this post from Nils Rottgardt about a problem with the CREATE SESSION CUBE statement and SSAS Multidimensional:
Here’s a related Connect:
Basically, when you run a CREATE SESSION CUBE statement SSAS creates a shadow copy of the cube on disk – and if you have a very large cube, with very large dimensions, this could use up a lot of storage space on the server. When you use the custom grouping functionality in a PivotTable Excel generates a CREATE SESSION CUBE statement behind the scenes, so it’s possible to cause a lot of mischief just by using custom grouping in Excel.
Incidentally, this isn’t a problem for SSAS Tabular because it doesn’t support the CREATE SESSION CUBE statement.
I always knew this particular functionality could be very slow and always warned people against using it, but I didn’t know about the disk space issue… if your users haven’t found out about custom grouping yet, I recommend you don’t tell them about it!
Within a matter of days, “Expert Cube Development with SSAS 2012 Multidimensional Models” will be published. It’s the second edition of the very successful (19 5* reviews on Amazon US as of now) book on SSAS cube development that Marco, Alberto and I wrote a few years ago, updated for SSAS 2012.
Before you rush off to order a copy, there are a three things I’d like to point out:
- This is basically the same book as the first edition with updated screenshots, a few bugs fixed, and several sections updated/expanded for SSAS 2012. There are no substantial changes. If you already have a copy of the first edition it’s probably not worth buying a copy of the second edition.
- The book only covers SSAS Multidimensional models, it does not cover SSAS Tabular models.
- This is not a basic introduction to building SSAS cubes – it’s aimed at intermediate-level SSAS developers who are already familiar with cubes, dimensions and MDX and who want to learn about best practices, design patterns, performance tuning and (most importantly) which features work well and which ones don’t. If you like the material I post here on my blog, you’ll probably like the book.
If you’re OK with that then by all means, go ahead and get your wallet out!
Most people know that the ability to partition a measure group is a feature of Analysis Services Multidimensional Enterprise Edition, but that doesn’t mean that in Standard Edition you are limited to just having one partition per measure group. In fact it is possible to use up to three partitions per measure group in SSAS Multidimensional SE, with some limitations. For a long time I wasn’t sure whether this was legal, as opposed to possible, according to the terms of the SQL Server licence but since this page in Books Online (thanks to Rob Kerr for the link) states that you can have up to three partitions in SE then I assume it is:
If you do decide to use more than one partition in SSAS SE then you do need to understand the risks involved – and the reason I wanted to write this post is because I see a lot of people using more than one partition per measure group in SE without understanding those risks. Strictly speaking, SE is only designed to work with one partition per measure group. It needs those extra two partitions for two pieces of functionality:
- To support writeback, because using this feature requires SSAS to create a separate partition to hold writeback values
- To support incremental processing, because when you do incremental processing on a partition in the background SSAS needs to create a new partition, process it and then merge it with your existing partition
Therefore if you create more than one partition per measure group in SE you may find that writeback and/or incremental processing will break.
New modules are being added to my MDX online training course all the time, and now there’s another free video available: a short introduction to the CurrentMember function. You can view it here (registration required):
The previous free video on MDX SELECT statements is also still available here, now with no registration required. If you’d like to subscribe to this course you can get a 10% discount by using the code TECHNITRAIN2014 when registering.
Don’t forget, if you are looking for classroom training in London for MDX, DAX, Analysis Services Multidimensional or Tabular, Power BI, Integration Services or the SQL Server engine check out http://www.technitrain.com/courses.php