Chris Webb's BI Blog

Analysis Services, MDX, PowerPivot, DAX and anything BI-related

Archive for the ‘Analysis Services’ Category

New Free Video On The MDX CurrentMember Function

leave a comment »

New modules are being added to my MDX online training course all the time, and now there’s another free video available: a short introduction to the CurrentMember function. You can view it here (registration required):
https://projectbotticelli.com/knowledge/what-is-mdx-current-member-function-video-tutorial?pk_campaign=tt2013cwb

clip_image002

The previous free video on MDX SELECT statements is also still available here, now with no registration required. If you’d like to subscribe to this course you can get a 10% discount by using the code TECHNITRAIN2014 when registering.

Don’t forget, if you are looking for classroom training in London for MDX, DAX, Analysis Services Multidimensional or Tabular, Power BI, Integration Services or the SQL Server engine check out http://www.technitrain.com/courses.php

Written by Chris Webb

February 6, 2014 at 11:02 am

PDW and SSAS

with 12 comments

One new feature of SQL Server PDW 2012 that hasn’t had the attention it deserves is the fact that it is now officially supported as a data source for Analysis Services, both Multidimensional (in ROLAP and MOLAP modes) and Tabular (in In-Memory and DirectQuery modes). If you are working with extremely large data volumes in SSAS then PDW might be something you want to do some research on. For SSAS Multidimensional in MOLAP mode or Tabular models in In-Memory mode, using PDW as a data source should make processing run extremely quickly. For SSAS Multidimensional in ROLAP mode or Tabular models in DirectQuery mode, it can give you interactive query access to data volumes that MOLAP/In-Memory simply couldn’t handle (remember though that DirectQuery only works with DAX queries, so Excel PivotTables don’t work with it, only Power View).

There are a few public sources of information on PDW/SSAS integration. One is the white paper on PDW that you can download from Henk van der Valk’s blog here:
http://henkvandervalk.com/introduction-to-sql-server-2012-parallel-data-warehouse

Here’s the relevant section:

New in SQL Server 2012 PDW, you can use PDW as a high performance relational data source for building multidimensional OR tabular models with SQL Server Analysis Services (SSAS). For example, you can:

· Use DirectQuery to perform real-time queries against a SQL Server PDW data source from a tabular model.

· Reduce ROLAP query times by using clustered columnstore indexes on SQL Server PDW tables.

· Use new features of SSAS. For example, use EnableRolapDistinctCountOnDataSource to run distinct count operations on PDW tables.

· Use PDW as a data warehouse for aggregating Hadoop data for use in multidimensional or tabular models.

The DirectQuery page in Books Online says similar things:
http://msdn.microsoft.com/en-us/library/hh230898.aspx

Some quotes:

In contrast, a tabular model in DirectQuery mode uses data that is stored in a SQL Server database, or in a SQL Server PDW data warehouse. At design time, you import all or a small sample of the data into the cache and build your model as usual. When you are ready to deploy the model, you change the operating mode to DirectQuery. After you change the operating mode, any queries against the model will use the specified relational data source (either SQL Server or SQL Server PDW), not the cached data.

DirectQuery can take advantage of provider-side query acceleration, such as that provided by xVelocity memory optimized column indexes. xVelocity columnstore indexes are provided in both SQL Server 2012 and SQL Server PDW, to support improved DirectQuery performance.

It’s a shame there isn’t more information out there though. This post from Michael Mukovskiy has some interesting findings on using PDW v1 as a data source for SSAS:
http://blog.oraylis.de/2013/07/pdw-cube-processing-experience/
…presumably PDW 2012 would give even better results.

Written by Chris Webb

January 31, 2014 at 1:19 pm

Posted in Analysis Services, PDW

Semi-Additive Measures, Unary Operators, Custom Rollups And The Calculate() Statement

leave a comment »

Last week I was doing some performance tuning on SSAS Multidimensional and saw something very strange: when the MDX Script of the cube was executed (as always happens after the cache has been cleared, or processing has taken place – you can monitor this via the Execute MDX Script Begin/End events in Profiler) the Calculate() statement was taking just over four seconds. I’d never seen this before so I asked the nice people on the SSAS dev team what was going on, and Akshai Mirchandani very kindly filled me in on the details.

There are two types of calculation on an SSAS cube: those explicitly defined in the MDX Script (ie those seen on the Calculations tab of the cube editor); and semi-additive measures, unary operators and custom rollups, which are defined in the model itself. This second type of calculation is added to the cube when the Calculate() statement fires, and the more of them there are the longer it takes SSAS to work out where they should be applied in the space of the cube. In my customer’s case there were several large (80000+ members) parent/child hierarchies with unary operators as well as a few semi-additive measures and so this was the reason why Calculate() was so slow. Up to now I had only known that Calculate() triggers the aggregation of data up through the cube, which is why if you delete it the cube seems to contain no data.

Written by Chris Webb

December 16, 2013 at 4:05 pm

My Online MDX Training Course Is Now Live!

with 2 comments

Recently I spent a few days with Rafal Lukawiecki recording videos of my MDX training course for Project Botticelli. I’m now pleased to announce that the first two videos (to add to the free video I recorded last year) from the course are now live with more to follow soon. You can find the main course page here; the two videos available as of today are:

The MDX SELECT Statement

This video covers writing simple MDX queries and looks at the SELECT clause, the FROM clause and the WHERE clause. The cool thing is that it’s free to view – although you will need to register on the site first.

clip_image001

 

MDX Calculated Members

This video covers the basics of creating calculated members in MDX in the WITH clause and using the CREATE MEMBER statement. It’s available to subscribers only.

clip_image001[7]

 

Apart from my course there’s a lot of other great Microsoft BI video training available via Project Botticelli, including several DAX videos by my old friends Marco Russo and Alberto Ferrari. Subscriptions to the site are very reasonably priced, but if you register before the end of December 2013 you can get a 20% discount by using the following promotion code:

TECHNITRAIN20HOLS2013

Of course, if you prefer your training in a classroom, you can always attend one of my Technitrain courses in London next year.

Written by Chris Webb

December 4, 2013 at 10:08 pm

BI Survey 13 Results

with 9 comments

As in previous years, in return for promoting their survey the people at BARC have given me a free copy of the latest BI Survey – always an interesting read. I saw a story in the press last week about adoption of mobile BI slowing based on research from the BI Survey, but what does it have to say about Analysis Services? To be honest, much the same as in previous years: it’s a good, cheap, reliable, mature solution. A few points that caught my eye:

  • 11% of SSAS users are still using Proclarity as their front-end. That’s down from 22% two years ago, but still… I guess that the likes of Pyramid Analytics, which specialises in Proclarity migration, will be pleased to hear that.
  • 29% of SSAS users are using Power View. That’s a big surprise for me – I haven’t seen it at any of my customers yet. Presumably this must be the Sharepoint version of Power View going against SSAS Tabular.
  • I found the ‘cost per seat’ (calculated as license fees plus external implementation cost divided by number of deployed seats) table particularly interesting: SSAS comes out with a score of $1111, about a quarter of the way from the bottom; Qliktech comes in at $1499, Tableau at $1039. In general self-service BI tools don’t appear to cost any less to implement overall than traditional corporate BI tools.
  • SSAS has a very respectable win-rate in competitive evaluations of 64%, although this is declining over time (it has to be said that almost all other large vendors are showing a decline too). Again, compare this with a win-rate of 34% for Tableau and 37% for Qliktech.
  • Sadly there’s no mention of PowerPivot anywhere. It would have been good to know how it’s doing…

Written by Chris Webb

December 2, 2013 at 2:17 pm

Posted in Analysis Services

Caching The Rows Returned By An MDX Query

with 2 comments

Here’s another tip for those of you struggling with the performance of SSRS reports that run on top of an Analysis Services Multidimensional cube. Quite often, SSRS reports require quite complex set expressions to be used on the rows axis of an MDX query, and one of the weaknesses of SSAS is that while it can (usually) cache the values of cells returned by a query it can’t cache the structure of the cellset returned by the query. What does this mean exactly? Well, consider the following query:

SELECT

{[Measures].[Internet Sales Amount]} ON 0,

NONEMPTY(

GENERATE(

[Date].[Calendar].[Month].MEMBERS,

{[Date].[Calendar].CURRENTMEMBER}

*

HEAD(ORDER([Customer].[Customer].[Customer].MEMBERS,

[Measures].[Internet Sales Amount],

BDESC),2)

),

[Measures].[Internet Sales Amount])

ON 1

FROM [Adventure Works]

WHERE([Product].[Category].&[3])

Here I’m taking every month on the Calendar hierarchy of the Date dimension and finding the top two customers by Internet Sales Amount for each Month; notice also that I’m slicing the query by a Product Category. The results look like this:

image

On my laptop this query takes just over three seconds to run however many times you run it (and yes, I know there are other ways this query can be optimised, but let’s imagine this is a query that can’t be optimised). The reason it is consistently slow is because the vast majority of the time taken for the query is to evaluate the set used on rows – even when the Storage Engine has cached the values for Internet Sales Amount for all combinations of month and customer, it still takes the Formula Engine a long time to find the top two customers for each month. Unfortunately, once the set of rows has been found it is discarded, and the next time the query is run it has to be re-evaluated.

How can we improve this? SSAS can’t cache the results of a set used on an axis in a query, but SSAS can cache the result of a calculated measure and calculated measures can return strings, and these strings can contain representations of sets. Therefore, if you go into Visual Studio and add the following calculated measure onto the MDX Script of the cube on the Calculations tab of the Cube Editor:

CREATE MEMBER CURRENTCUBE.MEASURES.REPORTROWS AS

SETTOSTR(

NONEMPTY(

GENERATE(

[Date].[Calendar].[Month].MEMBERS,

{[Date].[Calendar].CURRENTMEMBER}

*

HEAD(ORDER([Customer].[Customer].[Customer].MEMBERS,

[Measures].[Internet Sales Amount],

BDESC),2)

),

[Measures].[Internet Sales Amount])

);

You can then use this calculated measure in your query as follows:

SELECT

{[Measures].[Internet Sales Amount]} ON 0,

STRTOSET(MEASURES.REPORTROWS)

ON 1

FROM [Adventure Works]

WHERE([Product].[Category].&[3])

Having done this, on my laptop the query is just as slow as before the first time it is run but on subsequent executions it returns almost instantly. This is because the first time the query is run the set expression used on rows is evaluated inside the calculated measure ReportRows and it is then turned into a string using the SetToStr() function; this string is then returned on the rows axis of the query and converted back to a set using the StrToSet() function. The second time the query is run the string returned by the ReportRows measure has already been cached by the Formula Engine, which explains why it is so fast.

Couldn’t I have used a static named set declared on the cube to do this instead? I could, if I knew that the Where clause of the query would never change, but if I wanted to change the slice and look at a different Product Category I would expect to see a different set of rows displayed. While in theory I could create one gigantic named set containing every set of rows that ever might need to be displayed and then display the appropriate subset based on what’s present in the Where clause, this set could take a very long time to evaluate and thus cause performance problems elsewhere. The beauty of the calculated measure approach is that if you change the Where clause the calculated measure will cache a new result for the new context.

There are some things to watch out for if you use this technique, however:

  • It relies on Formula Engine caching to work. That’s why I declared the calculated measure on the cube – it won’t work if the calculated measure is declared in the WITH clause. There are a lot of other things that you can do that will prevent the Formula Engine cache from working too, such as declaring any other calculated members in the WITH clause, using subselects in your query (unless you have SSAS 2012 SP1 CU4), using non-deterministic functions and so on.
  • Remember also that users who are members of different roles can’t share formula engine caches, so if you have a lot of roles then the effectiveness of this technique will be reduced.
  • There is a limit to the size of strings that SSAS calculated measures can return, and you may hit that limit if your set is large. In my opinion an SSRS report should never return more than a few hundred rows at most for the sake of usability, but I know that in the real world customers do love to run gigantic reports…
  • There is also a limit to the size of the Formula Engine flat cache (the cache that is being used here), which is 10% of the TotalMemoryLimit. I guess it is possible that if you run a lot of different queries you could hit this limit, and if you do then the flat cache is completely emptied.

Written by Chris Webb

November 4, 2013 at 9:30 am

Problems With Calculated Members And Level Names

with one comment

Here’s something strange I came across today: a customer had created a calculated member, not on the measures dimension but on another hierarchy, and even though it had been selected in their client tool it wasn’t appearing in the query results. I tested in Excel and saw some strange error message. I was mystified, but after a bit of thought I found out what was going on…

Take the following query in Adventure Works:

WITH
MEMBER [Customer].[Gender].[CALC] AS 123
SELECT {[Measures].[Internet Sales Amount]} ON 0,
{[Customer].[Gender].&[F],
[Customer].[Gender].&[M],
[Customer].[Gender].[CALC]}
ON 1
FROM [Adventure Works]

When you run it, you get the results you would expect:

image

Now, if you change the calculated member name, you’ll see the problem that my customer was running into:

WITH
MEMBER [Customer].[Gender].[Gender] AS 123
SELECT {[Measures].[Internet Sales Amount]} ON 0,
{[Customer].[Gender].&[F],
[Customer].[Gender].&[M],
[Customer].[Gender].[Gender]}
ON 1
FROM [Adventure Works]

image

Instead of the calculated member, you now see Female and Male repeated twice. The reason why this is happening is down to how SSAS interprets the expression [Customer].[Gender].[Gender]. It’s the name of the calculated member that’s been defined, but it’s also the unique name of the Gender level on the Gender hierarchy of the Customer dimension:

image

As I explained in this blog post, when SSAS sees the unique name of a level it sticks the .MEMBERS function on the end and this returns the set of all members on the Gender level of the Gender hierarchy – and not the calculated member.

Of course, if you rewrite the query to use the ALLMEMBERS function as follows:

WITH
MEMBER [Customer].[Gender].[Gender] AS 123
SELECT {[Measures].[Internet Sales Amount]} ON 0,
[Customer].[Gender].ALLMEMBERS
ON 1
FROM [Adventure Works]

…you’ll see the calculated member returned:

image

So, the moral of this post is don’t create calculated members that have the same name as a level on a hierarchy. It’s unlikely that you’ll do this, but possible – in my customer’s case they had a hierarchy on a time utility dimension called “Periods To Date” and a calculated member on that hierarchy with the same name.

Written by Chris Webb

September 6, 2013 at 10:04 pm

Subselects And Formula Engine Caching In SSAS 2012

with 13 comments

One of the many interesting things that caught my eye in the new SSAS Tabular Performance Tuning white paper is actually about new functionality in the SSAS 2012 Multidimensional and nothing to do with Tabular! It turns out that in the DAXMD release of SSAS 2012, ie SQL Server 2012 SP1 CU4, some work was done to enable the Formula Engine to cache the results of MDX calculations for longer than the lifetime of a query when a query includes a subselect. Here’s an excerpt from the paper:

Certain types of queries cannot benefit from MDX caching. For example, multi-select filters (represented in the MDX query with a subselect) prevent use of the global MDX formula engine cache. However, this limitation was lifted for most subselect scenarios as part of SQL Server 2012 SP1 CU4 (http://support.microsoft.com/kb/2833645/en-us). However, subselect global scope caching is still prevented for arbitrary shapes (http://blog.kejser.org/2006/11/16/arbitrary-shapes-in-as-2005/), transient calculations like NOW(), and queries without a dimension on rows or columns. If the business requirements for the report do not require visual totals and caching is not occurring for your query, consider changing the query to use a set in the WHERE clause instead of a subselect as this enables better caching in some scenarios.

This is a subject I’ve blogged about in the past, both for SSRS reports (which almost always use subselects in the MDX created by the query editor) and Excel PivotTables (which sometimes, but not always, use subselects) and you may want to read this posts to get some background. In my experience, if you have a lot of complex MDX calculations on your cube (financial applications are a great example), this issue can have a major impact on your overall query performance, even if it isn’t immediately obvious that this is the case. On builds of SSAS before 2012 SP1 CU4, even if Storage Engine caching is working properly, if a query references a lot of MDX calculations and includes a subselect it will be consistently slow however many times you run it because the calculations will need to be re-evaluated every time the query is run.

I’ve heard of a few problems with CU4 regarding SSRS so I don’t recommend upgrading your production SSAS servers just yet, but when these problems have been ironed out in the next full service pack I think this could be a compelling reason for many people to move to SSAS 2012. There’s also still a limitation whereby queries that return a single cell value and use a subselect may still not be able to use the global Formula Engine cache, but hopefully this will be dealt with in a future release too. Overall, though, I’m extremely pleased to see yet another improvement to the Multidimensional engine.

Thanks to Jeffrey Wang for answering my questions about this functionality.

Written by Chris Webb

August 7, 2013 at 12:10 am

SSAS Tabular Performance Tuning White Paper

with 3 comments

In case you haven’t already heard via Twitter, a new white paper on performance tuning Analysis Services 2012 Tabular models was released at the end of last week. You can download it here:

http://msdn.microsoft.com/en-us/library/dn393915.aspx

It’s good stuff, required reading for anyone working with SSAS Tabular. Kudos to the authors, John Sirmon, Greg Galloway, Cindy Gross and Karan Gulati!

Written by Chris Webb

August 4, 2013 at 11:31 pm

Ordering Of Named Sets In Excel

with 3 comments

A bit of an obscure one, this, but it’s come up twice this week so worth mentioning. When you define a named set on your SSAS Multidimensional cube, Excel doesn’t respect the order of items in that set by default when you use it in a PivotTable. Consider the following named set defined on the Adventure Works cube (on the Calculations tab of the cube, not in defined in Excel itself):

CREATE SET [MY COUNTRIES] AS
{[Customer].[Country].&[France], [Customer].[Country].&[Canada], [Customer].[Country].&[Australia]};

Note that the countries are in the order France, Canada, Australia. When you use this named set in Excel, this order is overridden and the countries come out in hierarchy order, that’s to say the order that they appear on the Country hierarchy: Australia, Canada, France.

image 

image

How can you stop this? After all, in a lot of cases the order of members in a named set is important. If you have Excel 2010 or Excel 2013 (I believe this option isn’t available in Excel 2007), you need to click on the name of the set in the PivotTable Field List pane and select Field Settings:

image

Then in the Field Settings dialog go to the Layout and Print tab and uncheck the “Automatically order and remove duplicates from the set” option:

image

When you do that, the order of your set is respected:

image

Written by Chris Webb

July 31, 2013 at 2:12 pm

Follow

Get every new post delivered to your Inbox.

Join 3,082 other followers