Microsoft SQL Server 2012: The BISM Tabular Model is now available to buy!

I decided to wait until I had a real, physical, made-of-dead-tree copy of it in my hands before blogging, but I’m pleased to announced that the new book that Marco, Alberto and I wrote on SSAS 2012 Tabular models is now available to purchase (even though, as Marco says here it was actually officially released a few weeks ago).

A sample chapter can be found here:
http://cdn.oreilly.com/oreilly/booksamplers/msp/9780735658189_sampler.pdf

You can buy it all all good bookshops, including Amazon UK. It has two five-star reviews on Amazon already, and Javier Guillén wrote a very detailed review here. Why not buy several copies so you can share it with your colleagues, friends, significant others, children, neighbours etc? It also makes ideal beach reading if you are currently on your holidays.

You may also have noticed there’s another SSAS Tabular book out, by Teo Lachev. Teo is an author I have the utmost respect for and I’m a big fan of everything he’s written; this book lives up to the high standards of his previous work. While it’s true there’s some overlap between his book and ours, the focus of his book is broader, covering topics such as Sharepoint, whereas ours has a narrower focus and goes into much greater detail on subjects such as DAX; so (again, as Marco says) you should probably consider buying both.

So, what is the BI Semantic Model?

Over six years ago now I wrote what proved to be a very popular post here on my blog called “So, what is the UDM?”. It was written in response to the widespread confusion around the time of the release of Analysis Services 2005 about what the term “UDM” actually meant. In a sense “UDM” was just another name for an Analysis Services cube, but it also represented the nebulous concept of Analysis Services being a semantic layer suitable for all your reporting needs; however people often thought it was a new product that somehow replaced Analysis Services cubes and got themselves tied in all sorts of knots as a result. Thankfully, use of the term died out pretty quickly as everyone started referring to Analysis Services as, well, Analysis Services.

Fast forward to the present day and I can see a similar amount of confusion about the term “BI Semantic Model” or BISM for many of the same reasons. What is the BI Semantic Model exactly? It is… Analysis Services 2012 plus PowerPivot. Let’s be clear though: it is not just the new Analysis Services Tabular Model, although the term BISM is all too often used as if it did mean that. It’s not even a specific bit of software. Analysis Services 2012 consists of two parts, the Multidimensional Model which is the Analysis Services of cubes and dimensions that we already had in 2008 R2 and earlier versions, and the new Tabular model which is the Analysis Services of tables, relationships, in-memory storage and column store. BISM refers to both models plus PowerPivot, or rather it refers to the way that Analysis Services and PowerPivot can be used as a semantic layer on top of other data for reporting purposes.

So what’s the point of a term like BISM then if it doesn’t refer to something tangible? Why not just call Analysis Services “Analysis Services” and PowerPivot “PowerPivot”? Well there’s certainly some conceptual stuff going on here (as outlined in the Vision and Roadmap blog post) but just as we had with the term UDM I’d say there’s also some marketing-led obfuscation here as well, for three reasons:

  • A single term like BISM suggests that Analysis Services 2012 and PowerPivot are a single, cohesive product, whereas the Tabular and Multidimensional models are actually very different beasts. If you’re going to be working with Analysis Services 2012 on a project the first decision you’ll have to make is which type of model to use, and if you change your mind later you’ll have to start development again from scratch and learn a lot of new skills. I hope one day that the two models will merge again but it won’t happen soon.
  • Microsoft has correctly identified that many people want to do BI but were put off by the complexity of building Multidimensional models in previous versions of Analysis Services. The simplicity of the Tabular model goes a long way to solving this problem; Tabular also replaces Report Builder models which were really a just a simple semantic layer for people who didn’t like SSAS or had valid reasons to stay with relational reporting. In order not to scare off this market a new name is necessary to avoid the negative connotations that come with “Analysis Services” and “cubes”.
  • Calling something a “semantic model” suggests that it’s a nice, light, thin, easy-to implement layer on top of your relational data warehouse, with no data duplication (which is often seen as a Bad Thing) involved. In actual fact anyone who has used the Multidimensional model will know you almost always use MOLAP storage which involves all the data being copied in Analysis Services; and I suspect when people start using the Tabular model they will be using it in Vertipaq mode (where again all the data gets copied into Analysis Services) rather than in DirectQuery mode (where all queries are translated to SQL which is then run against SQL Server).

Now I’m not going to suggest that there’s anything wrong with these marketing objectives – anything that sells more Analysis Services is good for me – or that the Tabular model is bad, or anything like that. All I want to do is suggest that in the SQL Server technical community we stop using the term BISM and instead refer to Analysis Services Multidimensional, Analysis Services Tabular and PowerPivot so we’re always clear about what we’re talking about in blog posts, articles, books and so on. I don’t think using the term BISM is useful in any technical context, just as the term UDM wasn’t in the past, because it is a blanket term for several different things. I also think that so many people are confused about what the term BISM actually means that it is becoming counter-productive to use it: people will not buy into the MS BI stack if they’re not really sure what it consists of.

What does everyone think about this?

Excel subtotals when querying Multidimensional and Tabular models

As I mentioned briefly in a recent post, the fact that Excel generates some pretty rubbish MDX for detail-level reports has been well documented by Richard Lees and Rui Quintino. The new Excel 2010 named set functionality allows you to work around these problems if you can write your own MDX, but let’s face it most people who are building Excel reports will not be able to do this so this problem can be a major headache and cause severe performance problems. One interesting point to note, however, is that Excel 2010 will generate slightly better MDX when querying a Tabular Model (and I’m including PowerPivot models here) compared to when it’s querying a Multidimensional Model (ie a SSAS cube). Take the following pivot table built using Excel 2010 against the Adventure Works cube running on SQL 2008 R2:

image

I’ve put the Internet Sales Amount measure on columns, the Calendar Year and Day Name hierarchies from the Date dimension on rows, and turned off all subtotals and grand totals. Here’s the MDX that Excel generates for this pivot table:

SELECT
NON EMPTY
CrossJoin(
Hierarchize(
{DrilldownLevel(
{[Date].[Calendar Year].[All Periods]},,,INCLUDE_CALC_MEMBERS)})
, Hierarchize(
{DrilldownLevel({[Date].[Day Name].[All Periods]},,,INCLUDE_CALC_MEMBERS)}))
DIMENSION PROPERTIES PARENT_UNIQUE_NAME,HIERARCHY_UNIQUE_NAME ON COLUMNS 
FROM [Adventure Works]
WHERE ([Measures].[Internet Sales Amount])
CELL PROPERTIES VALUE, FORMAT_STRING, LANGUAGE, BACK_COLOR, FORE_COLOR, FONT_FLAGS

If you run this query you’ll see it returns 40 columns and that the first few columns contain the subtotals and grand totals that we specifically said we didn’t want, which can be the cause of performance problems:

image

However, if you create a PowerPivot model containing DimDate and FactInternetSales and build exactly the same pivot table, you’ll see a different pattern of MDX being produced. To enable easy comparison, I’ve taken the MDX that my PowerPivot model generated and changed the hierarchy names so it will work on the SSAS Adventure Works cube:

SELECT
NON EMPTY
Hierarchize(
DrilldownMember(
CrossJoin(
{[Date].[Calendar Year].[All],[Date].[Calendar Year].[Calendar Year].AllMembers}
, {([Date].[Day Name].[All])})
, [Date].[Calendar Year].[Calendar Year].AllMembers
, [Date].[Day Name]))
DIMENSION PROPERTIES PARENT_UNIQUE_NAME,HIERARCHY_UNIQUE_NAME
ON COLUMNS 
FROM [Adventure Works]
WHERE ([Measures].[Internet Sales Amount])
CELL PROPERTIES VALUE, FORMAT_STRING, LANGUAGE, BACK_COLOR, FORE_COLOR, FONT_FLAGS

If you look at the results, you see that while the grand total is still being returned, the most of the unwanted subtotals are not and we only get 33 columns returned:

image

So if all other things were equal (and they’re not – the Tabular and Multidimensional engines are very different) then this MDX query has a big advantage over the first one because it’s doing much less work. Excel knows to use this new type of query by looking at the PREFERRED_QUERY_PATTERNS value returned by the MDSCHEMA_CUBES rowset; I’m told that the reason it isn’t used with Multidimensional models is that in many cases it could lead to worse, rather than better performance. This is another good reason to prefer Tabular models over Multidimensional models for detail-level reporting if you’re using Excel as a front-end.

An interesting side-note is that this new style of MDX is only possible in SSAS 2008 R2 because the DrillDownMember has got a new parameter called Target_Hierarchy, which allows you to specify which hierarchy you want to drill down on in a tuple (note there were some other changes with this type of function that I blogged about here). You can see the documentation here:
http://msdn.microsoft.com/en-us/library/ms145580(v=SQL.105).aspx

Here’s a quick example on Adventure Works. Consider the following query, where we’re drilling down on a tuple containing the all members from the Day Name and the Calendar Year hierarchies:

SELECT {[Measures].[Internet Sales Amount]} ON 0,
DRILLDOWNMEMBER(
{([Date].[Calendar Year].[All Periods],[Date].[Day Name].[All Periods])}
,{([Date].[Calendar Year].[All Periods],[Date].[Day Name].[All Periods])}
)
ON 1
FROM [Adventure Works]

Here are the results:

image

You’ll see that only the Day Name hierarchy has been drilled down on, and this is (as far as I can see) because it’s the last hierarchy that appears in the tuple. However, with the new parameter, we can specify that we want the Calendar Year hierarchy drilled down on instead:

SELECT {[Measures].[Internet Sales Amount]} ON 0,
DRILLDOWNMEMBER(
{([Date].[Calendar Year].[All Periods],[Date].[Day Name].[All Periods])}
,{([Date].[Calendar Year].[All Periods],[Date].[Day Name].[All Periods])}
, [Date].[Calendar Year]
)
ON 1
FROM [Adventure Works]

image

Thanks, as always, to Akshai, Marius, Greg Galloway, Teo Lachev and the usual string of SSAS obsessives for providing the background info for this post…

Documenting dependencies between DAX calculations

There’s an interesting new DMV available for PowerPivot or the Tabular model in Denali called DISCOVER_CALC_DEPENDENCY that allows you to list all the dependencies between objects in your model. The full documentation is here:
http://msdn.microsoft.com/en-us/library/gg471590(v=sql.110).aspx

…but I thought it would be good to blog about because some of the practical uses of it are not explored in BOL.

For the purposes of illustration, I created a simple Tabular model with two tables, a relationship between the tables, some calculated columns and some measures. In SQL Management Studio I then connected to this database with an MDX query window and ran the following query:

select * from $system.discover_calc_dependency

Here’s what I got back:

image

OK, so there’s a lot going on here and unless you’ve got really good eyesight you won’t be able to make out much of this. I’d like to draw your attention to the second column though, where you can see the types of objects we can see dependencies for: Measures, Hierarchies, Calculated Columns, and Relationships. We can filter by this column, for example using a query like this:

select distinct [table], [object], [expression] from $system.discover_calc_dependency
where object_type=’MEASURE’

(None of that SystemRestrictSchema rubbish needed, thank goodness) This just returns the measures in the model, and is probably the most interesting thing we can get from this DMV. Here are the results of this query:

image

This then shows us a list of the three measures in our model, what table they’re on, and the DAX expression behind them. Pretty useful. Even better, though, if one measure depends on another measure or calculated column, you can find the related object and its expression too. In this case [Sum of Sales After Tax Times 2] is a measure that sums the results of a calculated column, as the following query shows:

select referenced_object_type, referenced_table, referenced_object, referenced_expression
from $system.discover_calc_dependency
where [object]=’Sum of Sales After Tax Times 2′

image

(I’m not sure where that dependency on RowNumber is coming from, though…)

So this is all very useful for you as a developer, for documentation and so on. But wouldn’t it be useful if your users could see all this too? Well, they can, using a technique very similar to the one I blogged about here. Back in BIDS, after the initial deployment of the database, I added a new Analysis Services connection  pointing to the Analysis Services database I’d just deployed – so the SSAS database was using itself as a datasource. I was then able to use the first query above,

select * from $system.discover_calc_dependency

To populate a table inside my existing model:

image

After the model had been deployed again, this meant I could browse the results of the DMV using an Excel Pivot Table:

image

I’m sure in the future many users, especially if they’re PowerPivot users feeling a little frustrated at the lack of control they have over the Tabular model you’ve built, will be very interested in seeing these formulas so they can understand how they work and reuse them in their own models. And hopefully in the long run the information returned by this DMV will make not only importing data from Tabular models back into PowerPivot much easier, but also make importing parts of existing Tabular models into new PowerPivot models much easier.

Good news on the future of Analysis Services

If you read my blog, you’re probably aware of a post from last November I wrote about the future of Analysis Services while I was at the PASS Summit. It caused an awful lot of fuss and ever since I’ve not been able to go for more than 5 minutes at a SQL Server event without somebody stopping me and asking “So, Chris, is SSAS really dead?” (the answer is an emphatic NO, by the way). Anyway, as you may or may not have already seen, there’s been a new announcement on the future of Analysis Services at this year’s TechEd and an accompanying blog post from TK Anand here:
http://blogs.msdn.com/b/powerpivot/archive/2011/05/16/analysis-services-vision-amp-roadmap-update.aspx

…and as a result of my involvement in the public debate on this subject, I feel obliged to add my comments on it. I’ll do that by extracting and discussing the main points made in TK’s post, but before I start let me state clearly that this new announcement is extremely positive in my opinion and contains news that I am very glad to hear.

In the past six months, we have been talking to many people in the Microsoft BI community – customers, partners, developers, and MVPs – to get feedback on the roadmap.

The first thing I’d like to say is how much my respect has grown for the SSAS development team over the last six months, as a result of how they’ve handled the debate over the future of their product. There were some heated discussions going on in November but once everyone had calmed down, all of the misunderstandings had been cleared up and we had apologised to each other – and I certainly had much to apologise for, as much as anyone else – there were some very fruitful discussions about how to move forward. I don’t expect Microsoft do develop products to my precise specifications, and nor should they; back in November I thought I’d made it clear that there were good reasons for going down the BISM route and I understood why they were made. However today’s announcement does address all of the genuine (as opposed to the purely emotional) reasons I had for being upset. It is not a U-turn, more of a modification to what was presented before, but it is nonetheless significant and shows that the SSAS team have taken on board the feedback they’ve received.

The BI Semantic Model is one model for all end user experiences – reporting, analytics, scorecards, dashboards, and custom applications

So what is the new direction, then? Reading the quote above, those of you with long memories will remember the promise of the UDM back in SSAS 2005: it too was meant to be a single reporting model for all purposes. It didn’t work out that way, of course, despite the massive success of SSAS 2005; furthermore I suspect there will be plenty of people who read TK’s blog post and are still confused about the exact story because it’s quite complicated (again, I’m reminded of the confusion around what the UDM exactly was in the 2005 release). But the overall aim of BISM in Denali is to make good on this old promise of ‘one model to rule them all’ and I think Microsoft has a much, much better chance of succeeding this time.

In purely technical terms the story is almost, but not quite, the same as it was in November. The BI Semantic Model (BISM) is the new name for Analysis Services. In Denali when you develop with BISM you will have a choice of two types of project which represent two design experiences: use the tabular model to get all the advantages of a table-based approach, DAX queries and calculations, and Vertipaq storage; use the multidimensional model to get the rich functionality of the cubes we have today, MDX and MOLAP storage (I would urge you to read and reread TK’s post to get the full details on this). What has a new emphasis is that these are two complementary models that have their strengths and weaknesses and are appropriate in different circumstances; it’s not a case of old versus new or deprecated versus the future, they are two sides to the same coin. What’s more there’s now firm evidence that MS will be paying more than lip-service to this idea:

The multidimensional project lets model developers use the multidimensional modeling experience along with MDX and MOLAP/ROLAP (this is what existing UDM projects get upgraded to).  The tabular project lets model developers use the tabular modeling experience along with DAX and VertiPaq/DirectQuery.  It is important to note that these restrictions in the two projects are not rigid or permanent; they could very well change in future releases based on customer requirements.  For example, we could offer model developers VertiPaq as a storage option for multidimensional projects or MDX scripts for tabular projects. Another limitation in the upcoming CTP release is that models built using the multidimensional project will not support DAX queries (and thereby Crescent, which uses DAX to retrieve data from the model). We recognize that removing this restriction is very important for customers with existing Analysis Services solutions to be able to upgrade to SQL Server “Denali” and leverage Crescent. The product team is actively working on it and is committed to making this functionality available to customers.

This is the important section of the post, and it addresses my two main complaints from last year. The first is that the multidimensional way of modelling should not be jettisoned because it has inherent advantages for certain types of BI application, such as financial applications with complex calculations. Let me be clear: I’m not talking about the underlying technology here (I couldn’t care less if MOLAP storage disappeared tomorrow and was replaced by Vertipaq if it meant my queries ran faster), but the basic concept of modelling data as cubes, dimensions and hierarchies instead of as a series of tables and relationships. The tabular way of modelling data has many advantages of its own of course, and in the long run I think that for something like 30% of BI projects the tabular model will be the correct choice, for 10% the multidimensional model will be the correct choice and for the remaining 60% either will work just as well. The point is, though, that we will have a choice between two different but equally valuable ways of modelling data and that by committing to the delivery of new functionality in the multidimensional model in the future, choosing to use it today on a project no longer looks like a bet on a defunct technology.

My second complaint was that there was that there was no good migration roadmap for existing customers who have invested heavily in SSAS over the last few years, beyond telling them to redevelop their applications from scratch in the tabular model. As I said in the previous paragraph it’s now clear that there will be no need to redevelop using the tabular model in the long term if there’s no desire to do so because, even if there’s not much new for them in the Denali release, there will be new functionality coming at some point as the technologies underpinning the two models come closer together. For example, now we know that we will one day have DAX support for the multidimensional model, we know that people will be able to use Crescent on top of their existing cubes just by upgrading them and without completely redeveloping them. The cool new stuff will not be restricted to users of the tabular model.

Overall, then, I’m very happy with what’s been said. I’ve been working with Analysis Services since the very beginning, when it was OLAP Services, and I’m now getting ready to learn BISM and prepare for the future – a future that looks a lot brighter to me now. Bring on the next CTP of Denali!