Chris Webb's BI Blog

Analysis Services, MDX, PowerPivot, DAX and anything BI-related

Archive for the ‘Analysis Services’ Category

SSAS on Windows Azure Virtual Machines

with 6 comments

You may have already seen the announcement about Windows Azure Virtual Machines today; what isn’t immediately clear (thanks to Teo Lachev for the link) is that Analysis Services 2012 Multidimensional and Reporting Services are installed on the new SQL Server images. For more details, see:
http://msdn.microsoft.com/en-us/library/jj992719.aspx

SSAS 2012 Tabular is also supported but not initially installed.

Written by Chris Webb

April 16, 2013 at 9:13 pm

UK/US Date Format Bug in PowerPivot and SSAS Tabular

with one comment

I don’t usually blog about bugs, but this one has been irritating me no end for the last year – so I thought it deserved some publicity…

In Excel 2010 PowerPivot and and in SSAS 2012 Tabular models (but not the Excel 2013 Data Model interestingly), if you have an English locale that is not US English (eg UK or Australian English), you may find that date columns appear to be formatted correctly as dd/mm/yyyy inside the PowerPivot window or in SSDT, but when you get to Excel you see the dates formatted in the US mm/dd/yyyy format. So, for example, on my laptop if I import the DimDate table from Adventure Works into Excel 2010 then I see dates formatted as dd/mm/yyyy as I’ve specified in the Formatting section of the ribbon in the PowerPivot window:

image

image

However, in an Excel PivotTable, I see dates formatted as mm/dd/yyyy:

image

There is a workaround though, which I found on the PowerPivot forum (thank you Steve Johnson, if you’re reading) – you can get the dates to format correctly if you go to More Date Formats and choose dd/MM/yy or one of the other formats from the dropdown list that appears:

image

image

Here are the correctly formatted dates in a PivotTable:

image

It seems like there is already a Connect open on this issue here, so please vote to get it fixed!

Written by Chris Webb

March 21, 2013 at 9:30 am

The Dangers of Non_Empty_Behavior

with 5 comments

One thing I see quite often when I’m reviewing MDX is the incorrect use of the Non_Empty_Behavior property. 99% of the time it’s set without the developer really knowing what it does, and luckily in most cases it makes no difference to performance or what the calculation returns. However… in this post I’m going to argue that you should not use it unless you really know what you’re doing and you can prove that it’s beneficial – because if it’s set incorrectly it can sometimes mean you get incorrect results from your queries.

First of all, what does Non_Empty_Behavior actually do? Basically, it’s a hint to the SSAS Formula Engine that says that a calculation will return a null value if a given tuple also returns a null value. Moreover it’s a hint that may or may not be ignored. For example, if you had an MDX calculated measure that was defined as

CREATE MEMBER CURRENTCUBE.[Measures].[MyMeasure] as [Measures].[Internet Sales Amount]*2;

It would be true to say that this calculated measure would always return a null value when the real measure [Measures].[Internet Sales Amount] returned null, so you would be able to say:

CREATE MEMBER CURRENTCUBE.[Measures].[MyMeasure] as [Measures].[Internet Sales Amount]*2
, NON_EMPTY_BEHAVIOR=[Measures].[Internet Sales Amount];

The late, great Mosha covered its use in detail in a number of blog posts from a few years ago:
http://sqlblog.com/blogs/mosha/archive/2006/11/05/budget-variance-a-study-of-mdx-optimizations-evaluation-modes-and-non-empty-behavior.aspx
http://sqlblog.com/blogs/mosha/archive/2007/03/06/averages-ratios-division-by-zero-and-non-empty-behavior.aspx

These posts all refer to SSAS 2005, and from SSAS 2008 onwards a lot of work went on to ensure that most of the scenarios where Non_Empty_Behavior had been necessary to get good performance were optimised away. Certainly, in the simple example I showed above there would be no need to set Non_Empty_Behavior at all – the Formula Engine will already be able to optimise it as much as it can. This means that if you’re on SSAS 2005 then Non_Empty_Behavior can be very useful for tuning calculations, but if you are using SSAS 2008 or greater it’s highly likely that you do not need to use Non_Empty_Behavior at all. In fact, Mosha’s advice back in 2009 when he did a preconference seminar at the PASS Summit on SSAS performance tuning was summed up in these two slides:

image

image

I think the message here is pretty clear…

So, it probably won’t help your query performance, but why is it dangerous? Consider the following query:

WITH
MEMBER MEASURES.SomeTest AS
IIF(Now()>0, "Internet", "Reseller")
MEMBER MEASURES.Demo as
STRTOMEMBER("[Measures].[" + measures.sometest + " Sales Amount]")
SELECT
{[Measures].Demo} 
ON 0,
NON EMPTY
[Product].[Category].[Category].members
*
[Date].[Date].[Date].members
ON 1
FROM [Adventure Works]

This returns 1887 rows. However, if I set Non_Empty_Behavior incorrectly, like so:

WITH
MEMBER MEASURES.SomeTest AS
IIF(Now()>0, "Internet", "Reseller")
MEMBER MEASURES.Demo as
STRTOMEMBER("[Measures].[" + measures.sometest + " Sales Amount]")
, NON_EMPTY_BEHAVIOR=[Measures].[Reseller Sales Amount]
SELECT
{[Measures].Demo}
ON 0,
NON EMPTY
[Product].[Category].[Category].members
*
[Date].[Date].[Date].members
ON 1
FROM [Adventure Works]

The query now returns only 64 rows! By saying that the calculation will return a null value when Reseller Sales Amount is null – which is not the case – then rows that do in fact have values will get filtered out by the non empty clause. Clearly this is a bad thing, and also one that may be very hard to see happening when you are testing.

The example above was quite obviously wrong, and in fact it’s quite hard on SSAS 2012 to find a query where Non_Empty_Behavior returns incorrect results. More often I see calculations like this:

WITH
MEMBER MEASURES.PreviousSales AS
([Measures].[Internet Sales Amount], [Date].[Calendar].CURRENTMEMBER.PREVMEMBER)
, NON_EMPTY_BEHAVIOR=[Measures].[Internet Sales Amount]
SELECT {[Measures].[Internet Sales Amount], measures.PreviousSales} ON 0,
[Date].[Calendar].[Calendar Year].MEMBERS ON 1
FROM [Adventure Works]

…where Non_Empty_Behavior is set incorrectly (the calculation returns the Internet Sales Amount for the previous time period, and it’s not true to say that the previous time period’s sales will be null if the current time period’s sales are null) but the query still returns the correct results. This is because, luckily, Non_Empty_Behavior is being ignored.

So, to sum up, if you are using SSAS 2008 or greater Non_Empty_Behavior may not solve your performance problem and may result in incorrect results. That is not to say that there are still one or two isolated scenarios where it can still be useful. Jorg Klein came across one fairly recently and blogged about it here:
http://sqlblog.com/blogs/jorg_klein/archive/2013/02/18/implement-ssas-md-cell-security-using-dimension-security-with-blazing-performance.aspx

But these scenarios are few and far between. Therefore don’t use it if you’re not sure what it does!

Written by Chris Webb

March 11, 2013 at 1:30 pm

Counting Consistent Customers in MDX

with 3 comments

The post I wrote a few years ago on counting new and returning customers in MDX has proved to be one of the most popular here; it’s certainly a topic I’ve returned to a number of times for DAX, and other people (most recently Gerhard Brueckl) seem equally interested in solving this problem. However I had a comment from Sohrab Niramwalla yesterday which raised an interesting question: how do you could the number of customers who have bought from you in every time period from the beginning of time until the current date?

The Customer dimension in Adventure Works isn’t actually very good for illustrating this because customers very rarely buy more than once, but if you look at Countries then you can recreate the scenario. Consider the following query:

select
{[Measures].[Internet Sales Amount]} *
[Customer].[Country].[Country].members on 0,
[Date].[Date].[Date].members
on 1
from [Adventure Works]

image

From the screenshot you can see that on July 1st 2001 there were purchases in 4 out of 6 countries; of those 4 countries, only 2 (Australia and USA) had purchases on July 2nd; both of those had purchases on July 3rd; but by July 4th only USA had seen purchases on every day since the beginning of time.

How then is is possible to get this count of the number of countries that have seen purchases since the beginning of time? I can think of a few approaches. One would involve using recursion and strtoset/settostr, but I generally like to avoid recursion and strtoset because performance can be unpredictable and bad. Probably the best way is to think of the problem like this: if a country has had sales on every day since the beginning of time, then count of days that had sales since the beginning of time will be the same as the total count of days since the beginning of time. Therefore, you can write a query something like this:

with
–count the number of days since the beginning of time
member measures.daycount as
count(null:[Date].[Date].currentmember)

–count the number of non empty days for sales since the beginning of time
member measures.nonemptydaycount as
count(
nonempty(
null:[Date].[Date].currentmember
, [Measures].[Internet Sales Amount])
)

–count the number of countries that have nonemptydaycount = daycount
member measures.[Consistent Customers] as
count(
filter(
[Customer].[Country].[Country].members
, measures.nonemptydaycount = measures.daycount)
)
select measures.[Consistent Customers] on 0,
head([Date].[Date].[Date].members, 15) on 1
from [Adventure Works]

image

Mosha’s technique for optimising count(filter()) calculations might also be useful here, although I have to admit I’ve not tested it to see if it does improve performance.

However, for those of you who are fans of obscure MDX, here’s another solution:

with
member measures.[Consistent Customers] as
count(
–iterate over every date since the beginning of time
generate(
{null : {{[Date].[Date].currentmember} as currentdate}.item(0)}
,
{
–if the current date in the iteration is the first date
iif([Date].[Date].currentmember is [Date].[Date].[Date].members.item(0)
,
–then define the inline named set customerset as
–all the nonempty customers
intersect(
nonempty(
[Customer].[Country].[Country].members
, [Measures].[Internet Sales Amount]) as customerset
, {})
,
–else, redefine the set customerset as the intersection of
–customerset and the nonempty countries in the current time period
intersect(
intersect(
customerset
, nonempty(
[Customer].[Country].[Country].members
, [Measures].[Internet Sales Amount])
) as customerset
, {})
)
–note that the intersect function is used to ensure only an empty
–set is ever returned from these expressions
,
–if the current date in the iteration is the current time period
–ie we are at the final iteration
iif([Date].[Date].currentmember is currentdate.item(0)
–then return the contents of the named set customerset
, customerset
, {})
}
))

select {measures.[Consistent Customers]} on 0,
head(
[Date].[Date].[Date].members
, 15)
on 1
from [Adventure Works]

 

It uses the generate() function to loop over ever date from the beginning of time to the current time period, and then redefines an inline named set (called customerset) on every step of the iteration to find the non empty countries. It doesn’t perform as well as the previous solution in this particular case, but if there were more countries that might change; a few tweaks to the code might also speed it up. I thought it was worth mentioning, though, for the novelty value.

Written by Chris Webb

February 24, 2013 at 9:25 pm

Posted in Analysis Services, MDX

Validating MDX Queries Without Running Them

with 3 comments

Here’s something interesting I came across while looking over some old documentation the other day: a way of checking whether an MDX query is syntactically correct without actually having to run it. You can do this by setting the Content connection string property. The default value for this property is:

Content=SchemaData

And this runs your queries as normal. For example, take the following query on the Adventure Works cube:

select {[Measures].[Internet Sales Amount]} on 0,
[Date].[Calendar Year].members on 1
from [Adventure Works]

On a cold cache I can see lots of activity on Profiler when it’s run, as I’d expect:

image

However, with the connection string set as follows (see here for how to do this in SQL Server Management Studio; but beware – this bug is still around in 2012) :

Content=Schema

When I rerun the query on a cold cache I can see the MDX Script being evaluated but nothing happening for the query:

image

In SQL Server Management Studio the query is executed successfully but no results are returned; notice that in the Resource Usage event all the values are zero too.

If I modify the query to include an error, however:

select {[Measures].[Internet Sales Amount]} on 0,
blah blah blah
[Date].[Calendar Year].members on 1
from [Adventure Works]

I do see an error in SQL Server Management Studio:

image

This functionality could be useful in situations where you wanted to test the syntax of an MDX query or indeed just a calculation – it would allow you to do this without actually running the query and then killing it (and some queries don’t die immediately when they’re cancelled, as you might know).

Written by Chris Webb

January 29, 2013 at 11:53 am

Posted in Analysis Services, MDX

Technitrain 2013 Course Schedule: SSAS, PowerPivot, MDX, SSIS, TSQL and SQL Server Engine Training in London

with one comment

As you probably know, when I’m not blogging or wrestling with SSAS I run a small SQL Server training company here in the UK called Technitrain. If you’ve got some training budget to spare and London is convenient for you, you might want to check out the 2013 course schedule below:

As you can see, I’m doing a fair bit of teaching myself but I’m also proud to have Thomas Kejser, Allan Mitchell, Dave Ballantyne, Bob Phillips and Andy Leonard teaching courses for me too. The aim is to deliver reasonably-priced, expert-level Microsoft BI and SQL Server training of the sort the regular training companies don’t provide; it’s basically the kind of training you’d get at a SQLBits or SQL Saturday pre-con but in a more traditional classroom environment and not limited to one day. If you’re reading my blog, you’re my target audience for these courses – so I hope to see you at one of them this year!

Written by Chris Webb

January 7, 2013 at 11:14 pm

Unnecessary All Members and Performance Problems

with 4 comments

Maybe an obscure problem, this one, but worth recording nonetheless. The other week I was performance tuning some queries on a customer’s SSAS 2008R2 instance and came across a very strange issue related to the presence of unnecessary All Members in tuples. In this case it was in machine-generated MDX but it’s certainly the case the people new to MDX often include All Members in tuples when they are not actually needed; it’s a not good idea to do this because it can sometimes have unexpected effects as a result of attribute overwrite and because, as I found, it can also cause severe performance problems too.

The problem can be reproduced very easily against Adventure Works on the Customer dimension. Consider the following query that returns a list of customers who bought more than $1000 of goods in 2003:

with
set filteredcustomers as
filter(
[Customer].[Customer Geography].[Customer].members
, ([Measures].[Internet Sales Amount]
, [Date].[Calendar Year].&[2003])>1000)
select
{}
on columns,
filteredcustomers 
on rows
from [Adventure Works]

Pretty straightforward, and it returns instantly on my laptop as I’d expect. However, adding the All Member from the City hierarchy into the tuple used in the filter() function makes the query run very slowly indeed (I killed it after several minutes):

with
set filteredcustomers as
filter(
[Customer].[Customer Geography].[Customer].members
, ([Measures].[Internet Sales Amount]
, [Customer].[City].[All Customers]
, [Date].[Calendar Year].&[2003])>1000)
select
{}
on columns,
filteredcustomers 
on rows
from [Adventure Works]

The All Member here isn’t necessary at all; it won’t affect how the filter works or the set returned at all. Looking in Profiler it seems as though its presence triggers cell-by-cell mode, which is the cause of the awful performance. Interestingly, the performance got worse the more attributes were on the hierarchy – deleting attributes, even when they weren’t used in the query, improved query performance. I’m told the problem could be the result of attribute decoding (which Mosha referred to here, but which I don’t know much else about) as a result of attribute overwrite

Anyway in my case it wasn’t possible to change the MDX because it was being generated by a client tool – the All Member was there because the City hierarchy was being used as a parameter in the query, although in this case nothing had been selected on it. There was a workaround that I found though: it turns out the problem does not occur for user hierarchies that include the key attribute as their lowest level. So, I renamed the City attribute, hid it, and then created a new user hierarchy called City that had Customer as its lowest level:

image

With this done, both of the queries above return instantly.

Written by Chris Webb

December 22, 2012 at 2:52 pm

Send Your Feedback to the SSAS Dev Team!

with one comment

It’s been all over Twitter today and Kasper has already blogged about it, but I thought this was worth a blog post from me all the same: the SSAS dev team are looking for feedback on features for the next version of SSAS and have put together a survey here:

http://www.instant.ly/s/Wqdj4mEAIAA

It’s not a list of features that will definitely be delivered, and doesn’t cover everything they’re thinking about, but it’s a way to help them prioritise some features over others and it should take no more than 20 minutes to complete, so why not help them out?

Written by Chris Webb

November 29, 2012 at 5:16 pm

Posted in Analysis Services

Storage Engine Caching, Measures and Measure Groups

with 6 comments

I’ve been doing some performance tuning work on SSAS Multidimensional recently that has forced me to look at some behaviour I’ve observed several times but never properly understood: what happens with Storage Engine caching when you are querying multiple measures in the same measure group. Here are some of my findings (thanks, as always, to Akshai and Marius for answering my questions on this) although this post only deals with a few basic scenarios…

Consider the following, quite basic cube built from Adventure Works. It has one measure group and two measures, Sales Amount and Tax Amount, that both have AggregateFunction Sum:

image

And a single Date dimension with the following attribute relationships:

image

If I run a Profiler trace, clear the cache and run the following query twice:

SELECT
{[Measures].[Sales Amount]}
ON 0,
[Date].[Year].[Year].MEMBERS
ON 1
FROM
[Measure Caching]

I can see that the first time the query is run it doesn’t hit cache, and the second time the query (in the second red box below) is run it does hit the Storage Engine cache:

image

This is as you’d expect. However, now look what happens when I run a query that returns the Tax Amount measure – which was not in the original query – without clearing the cache:

SELECT
{[Measures].[Tax Amount]}
ON 0,
[Date].[Year].[Year].MEMBERS
ON 1
FROM
[Measure Caching]

image

Even though this is the first time I’ve queried for this measure since the cache was cleared, this query still hits the cache. This is because when you query for one measure, the SSAS Storage Engine will retrieve data for all other measures in the same measure group for the granularity of data requested.

This means that the AggregateFunction property of a measure is significant here. If I add a new measure to the cube with AggregateFunction set to Count instead of Sum:

image

I see the same thing happening, ie queries that request data for Sales Amount or Tax Amount also warm the SE cache with values for Internet Sales Count. This is because a query for Internet Sales Count can be answered with data of the same granularity as a query for Sales Amount. However, if I add a new measure called Last Sales Amount with AggregateFunction Last Non-Empty:

image

And then clear the cache, and run the two following queries one after the other:

SELECT
{[Measures].[Sales Amount]}
ON 0,
[Date].[Year].[Year].MEMBERS
ON 1
FROM
[Measure Caching]

SELECT
{[Measures].[Last Sales Amount]}
ON 0,
[Date].[Year].[Year].MEMBERS
ON 1
FROM
[Measure Caching]

I can see that the first query does not warm the cache for the second query – both queries go to disk:

image

Why is this happening? Why isn’t the cache being used? A clue lies in the Query Subcube Verbose event for both queries. For the first query, using Sales Amount, the following granularity of data is being requested:

Dimension 0 [Date] (0 0 0 *)  [Date]:0  [Month]:0  [Quarter]:0  [Year]:*

Whereas the second query, using Last Sales Amount, requests this granularity:

Dimension 0 [Date] (* 0 0 *)  [Date]:*  [Month]:0  [Quarter]:0  [Year]:*

Both queries have Years on rows, but because Last Sales Amount is semi-additive the values returned are actually from the Year and Date granularity. So, when the semi-additive measure is requested in the second query the data needed for it is not in the Storage Engine cache: the first query requested data at the Year granularity only.

From what I understand, the logic governing this behaviour is very complex and the exact query plan that gets generated will depend on the overall design of your cube, the AggregateFunction used for the measures in each measure group (measures with measure expressions are going to work in a similar way to semi-additive measures) and the queries you’re running. However it is useful to be aware of this kind of behaviour when designing and tuning SSAS cubes. For example, it could be that if you have a large number of measures (tens or even a hundred) in the same measure group it could be worth splitting them out into separate measure groups to improve performance, especially if some measures are never queried together – you would need to test this thoroughly first though. This behaviour would also be relevant in cases where you’re designing aggregations manually.

Written by Chris Webb

November 27, 2012 at 1:53 pm

Power View on SSAS Multidimensional

with 6 comments

Earlier this week I mentioned that the announcement about Power View working on SSAS Multidimensional had leaked out. There was a full session on it yesterday giving all the details and I thought I’d summarise them here for your enjoyment. Here are the main points:

  • This has not shipped yet – it is not in SSAS 2012 SP1. No release date had been announced but it sounds like it is coming very soon.
  • It will come as a new build of SSAS 2012, so to use Power View you will need to upgrade to that build and no earlier version of SSAS will work.
  • It will also require an update to Power View. This means:
    • Power View in Sharepoint (ie SSRS) will need to be updated too
    • Power View in Excel will still not work initially, even when the new build of SSAS has been released. We’ll have to wait for another update for Office (perhaps a service pack?) before Power View in Excel works on SSAS Multidimensional too.
  • In technical terms, what has happened is that SSAS Multidimensional now
    • Exposes Tabular metadata
    • Can be queried in DAX
  • There is no translation going on from DAX to MDX, SSAS Multidimensional supports DAX natively.
  • The consensus in the session room was that the SSAS team had done a really good job thinking through all the details of how this will work. In general your existing cube will not need to be redesigned, it will just work and all your queries and calculations will return the results you expect. In the session a lot of time was spent showing how things like default members will work.
  • But some things are not supported: if you have named sets or actions they will not be exposed; in some scenarios calculated members on non-measures dimensions will not be exposed either, but time utility dimensions should work; and cell security is not supported either – if a user is a member of a role that has cell security applied, they will not be able to run DAX queries.
  • There’s a minor new feature in SSAS that allows you to mark an attribute as containing a URL for an image, so that Power View can display the images automatically.

UPDATE: the public CTP is now available http://blogs.msdn.com/b/analysisservices/archive/2012/11/29/power-view-for-multidimensional-models-preview.aspx

Written by Chris Webb

November 9, 2012 at 6:20 pm

Follow

Get every new post delivered to your Inbox.

Join 1,800 other followers