Chris Webb's BI Blog

Analysis Services, MDX, PowerPivot, DAX and anything BI-related

Archive for February 2012

Natural and Unnatural Hierarchies in the SSAS 2012 Tabular Model

with 13 comments

I’m sure you’re all aware of the difference between natural and unnatural user hierarchies in the Analysis Services Multidimensional model (if you’re confused as to what I mean when I say ‘Multidimensional model’, have a quick read of this post from a few weeks ago which explains the terminology). To recap, natural user hierarchies in Multidimensional look like this in BIDS:

image

There is a one-to-many set of attribute relationships between each level, so each Calendar Year has multiple Calendar Semesters but one Calendar Semester has only one Calendar Year, and they are a Good Thing as far as query performance is concerned. Unnatural user hierarchies look like this:

image

They work, but there aren’t one-to-many relationships between every level and query performance may be worse than with a natural user hierarchy.

But what does all this have to do with the Tabular model? In SSDT when you create a hierarchy there’s no indication as to whether it’s natural or unnatural:

image

…and up to recently I assumed that this was an issue that simply wasn’t relevant to Tabular. However, after a recent conversation with Marius Dumitru from the dev team and Greg Galloway I now know this isn’t the case!

If you query the MDSCHEMA_HIERARCHIES DMV as follows:

SELECT
[DIMENSION_UNIQUE_NAME], [HIERARCHY_NAME], [STRUCTURE_TYPE]
FROM $SYSTEM.MDSCHEMA_HIERARCHIES

You can see whether a hierarchy in a Tabular model is natural or unnatural:

image

In this case you can see that the Calendar hierarchy that I created on the DimDate table is unnatural; SSAS has determined this during processing (specifically during the Process ReCalc stage) by examining the data itself automatically.

It turns out that natural hierarchies in Tabular can result in faster query performance because certain MDX and Formula Engine code paths in the SSAS engine are still not optimised for unnatural hierarchies. I don’t have any specific examples of when this occurs at the moment but if I do find them I’ll be sure to update this post. And if anyone else using Tabular, or even PowerPivot (and I assume this is relevant to PowerPivot too) finds a good example of how changing to a natural user hierarchy improves performance please leave a comment.

In the example above, I created the Calendar hierarchy in the Tabular model by simply dragging the CalendarYear, CalendarSemester, CalendarQuarter, EnglishMonthName and FullDateAlternateKey columns underneath each other in the new hierarchy. It’s unnatural because there are only two distinct values in Calendar Semester (the semester numbers 1 and 2), four distinct values in Calendar Quarter (the quarter numbers 1 to 4) and there are only twelve distinct values in EnglishMonthName (the names of the months), so there is a many-to-many relationship between the values in all these columns. I can make it natural by creating three calculated columns that concatenate CalendarYear and CalendarSemester, CalendarYear and CalendarQuarter, and CalendarYear and EnglishMonthName as follows:

DimDate[Calendar Semester of Year] =DimDate[CalendarYear] & " " & DimDate[CalendarSemester]

DimDate[Calendar Quarter of Year] = DimDate[CalendarYear] & " Q" & DimDate[CalendarQuarter]

DimDate[Calendar Month of Year] = DimDate[CalendarYear] & " " & DimDate[EnglishMonthName]

Using these calculated columns for the Semester, Quarter and Month levels of the hierarchy as follows:

image

…will make the Calendar hierarchy natural:

image

Written by Chris Webb

February 28, 2012 at 6:05 pm

PASS BI Virtual Chapter

leave a comment »

Just to echo what Marco said in his blog post from last week, the PASS BI Virtual Chapter is currently looking for presenters, especially from Europe or European-friendly time zones. If you are interested please send some ideas for abstracts plus some information about yourself and your speaking experience to PASSDWBIVC@sqlpass.org. Presenting online is a great way to get speaking experience, especially if you are not able to attend conferences or user groups on a regular basis. We hope to hear from you soon!

Written by Chris Webb

February 27, 2012 at 11:27 am

Posted in PASS

Using the MDX Script Debugger in BIDS

with 4 comments

Almost every time I write MDX calculations, I end up using scoped assignments – it’s by far the easiest way to control where your calculations work inside a cube. However making sure that your scoped assignments work in the places that they’re meant to work and don’t overwrite each other can be very tricky indeed (for a brief introduction to the subject and these problems, see this session I gave at SQLBits a year or so ago), so in this post I’m going to show you how you can use the MDX Script Debugger in BIDS to help you do this.

Despite its name the MDX Script Debugger doesn’t actually help you to debug individual MDX expressions or calculations. What it does is clear the MDX Script of the cube then allow you to step through each statement in the MDX Script, applying them one after the other, so you can see the cumulative effect on a query. This is only really useful when you’re working with scoped assignments because it allows you to see which cells in your query are changed with each successive assignment.

To illustrate, let’s use the Adventure Works cube. Comment out everything on the MDX Script (ie what’s on the Calculations tab) except the Calculate statement and add the following code to the bottom:

CREATE MEMBER CURRENTCUBE.MEASURES.DEMO AS 1;

SCOPE(MEASURES.DEMO);
    SCOPE([Date].[Date].MEMBERS, [Date].[Calendar Quarter].[Calendar Quarter].MEMBERS);
        THIS=2;
    END SCOPE;
    SCOPE([Date].[Calendar Year].[Calendar Year].MEMBERS);
        THIS=3;
    END SCOPE;
END SCOPE;

SCOPE([Measures].[Internet Sales Amount]);
    SCOPE([Date].[Date].MEMBERS, [Date].[Calendar Quarter].[Calendar Quarter].MEMBERS);
        THIS=2;
    END SCOPE;
    SCOPE([Date].[Calendar Year].[Calendar Year].MEMBERS);
        THIS=3;
    END SCOPE;
END SCOPE;

Then deploy your database and go to Debug menu in BIDS and click Start Debugging:

image

The MDX Debugger screen will then be displayed:

image

In the top pane you can see the Calculate statement highlighted – this is the point in the script that has been reached as you step through the code. In the bottom pane you have a browser control where you can construct a query plus four panes where you can enter your own hand-written MDX queries.

In the browser control, drag the Internet Sales Amount measure onto columns and the Calendar hierarchy from the Date dimension onto rows, until you see something like this:

image

You don’t see any data at the moment because the Calculate statement hasn’t been executed yet (see here for more details on what the Calculate statement does). If you hit F10 to step to the next statement in the MDX Script you’ll see the values for Internet Sales Amount appear because the Calculate statement has now been executed:

image

If you hit F10 again, the calculated member DEMO will be created and you can now drag it into the browser; at this point you’ll see it always returns the value 1 because none of the scoped assignments have been executed yet:

image

Hit F10 again until you reach the first END SCOPE statement and you’ll see the following:

image

You can see that MEASURES.DEMO now returns 2 for the Date, Month and Quarter level as a result of this first assignment; you can also see that only the values that have been affected by this assignment have been changed. Hit F10 some more to execute the second assignment and you’ll see that DEMO returns 3 at the Year level and the affected cells are again highlighted:

image

Notice how, in this case, because you’re scoping on a calculated measure only the cells you scoped on have their values changed. This is in contrast with scoped assignments on regular measures: because regular measures aggregate up, scoping on a regular measure not only affects the values in the cells you scoped on, but those values will then also be aggregated up though the cube.

To show what does happen when you scope on a regular measure, look at the next set of scoped assignments on the Internet Sales Amount measure. The first assignment scopes on the Date, Month and Quarter levels and sets their values to 2; however the Year level values now show the aggregated totals of all the quarters in the year, so if there are four quarters in a year then the year will show 4 * 2 = 8. The All level total is also similarly affected.

image

The final assignment sets the Year totals to 3 for the Year level; this overwrites the values that have been previously aggregated up from the Quarter level, and the Year level values are again aggregated up to the All level:

image

Hit F10 some more and you’ll reach the end of the MDX Script, whereupon you’ll go back to the beginning and can start all this again. Go to the Debug menu and click Stop Debugging to finish. Useful bit of functionality, isn’t it? Certainly one of the least-known features of BIDS too.

One last point – if you try to use the Debugger and hit the infamous SSAS Locale Identifier bug, check out Boyan Penev’s post here on how to solve this issue.

Written by Chris Webb

February 21, 2012 at 9:56 pm

Posted in MDX

So, what is the BI Semantic Model?

with 37 comments

Over six years ago now I wrote what proved to be a very popular post here on my blog called “So, what is the UDM?”. It was written in response to the widespread confusion around the time of the release of Analysis Services 2005 about what the term “UDM” actually meant. In a sense “UDM” was just another name for an Analysis Services cube, but it also represented the nebulous concept of Analysis Services being a semantic layer suitable for all your reporting needs; however people often thought it was a new product that somehow replaced Analysis Services cubes and got themselves tied in all sorts of knots as a result. Thankfully, use of the term died out pretty quickly as everyone started referring to Analysis Services as, well, Analysis Services.

Fast forward to the present day and I can see a similar amount of confusion about the term “BI Semantic Model” or BISM for many of the same reasons. What is the BI Semantic Model exactly? It is… Analysis Services 2012 plus PowerPivot. Let’s be clear though: it is not just the new Analysis Services Tabular Model, although the term BISM is all too often used as if it did mean that. It’s not even a specific bit of software. Analysis Services 2012 consists of two parts, the Multidimensional Model which is the Analysis Services of cubes and dimensions that we already had in 2008 R2 and earlier versions, and the new Tabular model which is the Analysis Services of tables, relationships, in-memory storage and column store. BISM refers to both models plus PowerPivot, or rather it refers to the way that Analysis Services and PowerPivot can be used as a semantic layer on top of other data for reporting purposes.

So what’s the point of a term like BISM then if it doesn’t refer to something tangible? Why not just call Analysis Services “Analysis Services” and PowerPivot “PowerPivot”? Well there’s certainly some conceptual stuff going on here (as outlined in the Vision and Roadmap blog post) but just as we had with the term UDM I’d say there’s also some marketing-led obfuscation here as well, for three reasons:

  • A single term like BISM suggests that Analysis Services 2012 and PowerPivot are a single, cohesive product, whereas the Tabular and Multidimensional models are actually very different beasts. If you’re going to be working with Analysis Services 2012 on a project the first decision you’ll have to make is which type of model to use, and if you change your mind later you’ll have to start development again from scratch and learn a lot of new skills. I hope one day that the two models will merge again but it won’t happen soon.
  • Microsoft has correctly identified that many people want to do BI but were put off by the complexity of building Multidimensional models in previous versions of Analysis Services. The simplicity of the Tabular model goes a long way to solving this problem; Tabular also replaces Report Builder models which were really a just a simple semantic layer for people who didn’t like SSAS or had valid reasons to stay with relational reporting. In order not to scare off this market a new name is necessary to avoid the negative connotations that come with “Analysis Services” and “cubes”.
  • Calling something a “semantic model” suggests that it’s a nice, light, thin, easy-to implement layer on top of your relational data warehouse, with no data duplication (which is often seen as a Bad Thing) involved. In actual fact anyone who has used the Multidimensional model will know you almost always use MOLAP storage which involves all the data being copied in Analysis Services; and I suspect when people start using the Tabular model they will be using it in Vertipaq mode (where again all the data gets copied into Analysis Services) rather than in DirectQuery mode (where all queries are translated to SQL which is then run against SQL Server).

Now I’m not going to suggest that there’s anything wrong with these marketing objectives – anything that sells more Analysis Services is good for me – or that the Tabular model is bad, or anything like that. All I want to do is suggest that in the SQL Server technical community we stop using the term BISM and instead refer to Analysis Services Multidimensional, Analysis Services Tabular and PowerPivot so we’re always clear about what we’re talking about in blog posts, articles, books and so on. I don’t think using the term BISM is useful in any technical context, just as the term UDM wasn’t in the past, because it is a blanket term for several different things. I also think that so many people are confused about what the term BISM actually means that it is becoming counter-productive to use it: people will not buy into the MS BI stack if they’re not really sure what it consists of.

What does everyone think about this?

Written by Chris Webb

February 14, 2012 at 8:22 pm

Posted in Analysis Services, BISM

DAX Measures, MDX Measures and Type

with one comment

One of the strengths of MDX is the fact that calculated measures in MDX are not strongly typed: they return variants. This means that it’s possible to create calculations that return values of several different types, such as the example below that returns text in some cases and integers in others:

WITH
MEMBER MEASURES.DEMO AS
IIF(
[Measures].[Internet Sales Amount]>7000000
, "TOO HIGH!"
, [Measures].[Internet Sales Amount])

SELECT
{[Measures].[Internet Sales Amount], MEASURES.DEMO}
ON 0,
[Date].[Calendar Year].[Calendar Year].MEMBERS ON 1
FROM [Adventure Works]

image

This gives you a lot of flexibility when writing calculations but it also can be a big problem when you want to extract data from a cube into another system, as anyone who has tried to do this will know.

In DAX and the Tabular model, on the other hand measures, like columns (calculated or otherwise) are strongly typed. Although you can’t see the type of a measure in SQL Server Data Tools when you create it, the automatically inferred type can be found in the MDSCHEMA_MEASURES schema rowset and which be queried as follows:

select *
from $system.mdschema_measures

image

If you try to use an expression for a measure like this that, as in my first example, could return text or an integer:

testtype:=IF(SUM(‘Internet Sales’[Sales Amount Value])>1000, 1, "test")
 
Then you get the error:
 
Measure ‘Internet Sales’[testtype] : The second and third arguments of function IF have different data types. This is not supported.

On balance I think I prefer having measures strongly typed, and for one thing it opens up the possibility of using the Tabular model for certain forms of ETL. I’ve already seen one customer of mine replace a CTE in TSQL by loading their data into Tabular and using the PATH functions instead, getting some significant performance benefits as a result, and I’m sure there will be plenty of other scenarios where ETL requires complex calculations to take place that the incredible performance of DAX will make loading all the data into a Tabular model (even if the final destination of the data is a relational data warehouse) a serious option.

Written by Chris Webb

February 8, 2012 at 9:39 pm

Posted in DAX, MDX

DependsOnDimension

with 4 comments

There are lots of properties in SSAS that have no useful purpose, or whose purpose is somewhat obscure, and one property I’ve always wondered about is the DependsOnDimension property of a dimension. You can find it in the Dimension Editor in BIDS but there’s no indication in the description as to what it does, so in a moment of idle curiosity I Googled Binged it; the best information I found was this forums answer from 2007 from Matt Carroll:

http://social.msdn.microsoft.com/forums/en-US/sqlanalysisservices/thread/bd289ca5-0cb0-4279-8947-ffa2690bcd1e/

DependsOnDimension is really just a hint for use when designing aggregations using the Aggregation Design Wizard or the Usage Based Optimization Wizard.  Normally the aggregation design algorithm estimates the size of aggregations assuming dimesions are independent, but this property lets it know that one dimension is strongly determined by another dimension.  This should not affect your browsing experience in any way.

In Analysis Services 2000, DependsOnDimension was also used as part of the definition of virtual dimensions as a way of specifying the source for virtual dimension’s levels.

I did some quick experiments in BIDS and couldn’t see any evidence that setting this property affected what aggregations were designed by the Aggregation Design Wizard, but they weren’t exhaustive tests by any means. If you’re taking the trouble to enter row counts and set the AggregationUsage property you should probably try setting it if you have any dimensions that are very closely linked, just in case it does have an impact.

Written by Chris Webb

February 2, 2012 at 11:30 pm

Posted in Analysis Services

Tagged with

Follow

Get every new post delivered to your Inbox.

Join 3,301 other followers