Chris Webb's BI Blog

Analysis Services, MDX, PowerPivot, DAX and anything BI-related

Archive for July 2011

DAX Queries, Part 4

with 6 comments

I was extremely pleased to see that there was a Crossjoin() function in DAX, if only because it meant that I wouldn’t have to rename my company. Let’s see how to use it…

The Crossjoin() function returns a table that returns the cross join of two table expressions. Here’s a very simple example:

evaluate(
crossjoin(
values(DimDate[CalendarYear])
, values(DimDate[FiscalYear])
)
)

image

In this case the two parameters for Crossjoin() return a table of distinct values from DimDate[CalendarYear] and DimDate[FiscalYear], and the table returned gives every combination of values from those two tables. From an MDX point of view, it’s interesting to note that we really do get every single combination: there’s no auto-exists being applied, and we get combinations like CalendarYear 2001 and FiscalYear 2004 that do not exist in the DimDate table (I have no problem with this – it’s what I’d expect to happen in DAX).

I can imagine using Crossjoin() in a number of different ways, although the most obvious scenario is in a query along with the Summarize() function, for example:

evaluate(
summarize(
crossjoin(
values(DimDate[CalendarYear])
, values(DimProductCategory[EnglishProductCategoryName])
)
, DimDate[CalendarYear]
, DimProductCategory[EnglishProductCategoryName]
, “Sum of Sales”
, sum(FactInternetSales[SalesAmount])
)
)

image

It’s worth comparing the query above with the output of the following query:

evaluate(
summarize(
FactInternetSales
, DimDate[CalendarYear]
, DimProductCategory[EnglishProductCategoryName]
, “Sum of Sales”
, sum(FactInternetSales[SalesAmount])
)
)

image

Notice how, in the first query, you get one row for every distinct combination of Year and Category whether there are any sales or not, whereas in the second query you only see the combinations where sales exist.

In part 5, I’ll look at how to use the Generate() and TopN() functions.

Written by Chris Webb

July 27, 2011 at 10:17 pm

Posted in DAX

The question of Sharepoint and the Microsoft BI Strategy

with 30 comments

It’s been clear for a while now that Sharepoint is at the heart of Microsoft’s BI strategy. The first sign was the way PerformancePoint was touted as a replacement for Proclarity. Then came the news that if you wanted to share models between multiple users in PowerPivot, you needed Sharepoint 2010 Enterprise Edition. In Denali, if you want to use cool new stuff like Crescent or the new SSRS alerting functionality you need Sharepoint. But is this a good thing for Microsoft BI? Behind the scenes a lot of people have been debating this question for a long time, so I thought it was an appropriate subject for a blog post – and you know how I like controversial topics…!

Let me start by saying that I have I have an opinion on this but not one that I feel 100% sure of asserting: I have the nagging feeling that my own view of the market is too limited to know whether the Sharepoint strategy is good or not, so my mind isn’t completely made up (and in fact the more I think about this issue, the more unsure about my opinion I am). Also, I don’t think anyone has objections to the purely technical reasons for the Sharepoint strategy – after all, why should the various Microsoft BI teams be in the business of building portals when Microsoft has its own, extremely successful portal they can integrate with, which gives them a lot of rich functionality for free? The question is essentially a commercial one: will more customers buy Microsoft BI as a result of increased integration with Sharepoint (in turn leading to Microsoft and its partners making more money, which is all anyone really cares about), or will a Sharepoint dependency actually put customers off and drive them into the arms of Microsoft’s competitors?

The argument in favour of the Sharepoint strategy goes something like this:

  • Microsoft’s BI products need portal functionality. Time and money for development of BI products is limited, so if the portal functionality can be got from Sharepoint then it can be delivered quicker, at a lower cost, and with time and money left over for other new functionality that would not be possible otherwise. More and better functionality means the customer is more likely to buy.
  • Integrating with Sharepoint also gives the wider Microsoft BI offering a coherence it wouldn’t otherwise have (and something it has historically lacked), and the whole ends up being greater than its constituent parts. This lack of overlapping functionality looks good in front of the customer, and also increases the opportunity to cross-sell BI to existing Sharepoint customers and vice versa.
  • Sharepoint is massively successful, one of Microsoft’s leading server products, so most customers you want to sell BI to will have Sharepoint anyway; therefore there will be little resistance to buying new tools that have a dependency on Sharepoint. The Sharepoint market is so large that even if only a small percentage of it is interested in or able to use MS BI, that’s still a massive potential market.
  • Presumably, at some point there will be a fully-featured “Sharepoint in the cloud” with all the BI features baked in, which means that it will be even easier for companies to adopt it.
  • Microsoft is well aware of the arguments against Sharepoint that are listed below, and because it wants the Sharepoint strategy to work it is taking action to address these problems of cost, complexity and uptake. One example is the increasing number of Microsoft BI appliances that are available, where all of the tough configuration decisions are made for you.

The argument against is this:

  • Sharepoint is expensive (or at least perceived as expensive) in terms of license costs, infrastructure and administration, so it makes the overall MS BI solution more expensive to have a dependency on it.
  • Sharepoint is a complex product (or at least perceived as complex), and Microsoft’s BI tools are pretty complex as well; integrating the two makes something even more complex. As a result, whereas in the past a single BI guy could just install SSAS, SSRS and so on on a server, now you need a BI guy and a Sharepoint guy to do all the setup and admin, which doubles the cost of labour; the added complexity also makes it more likely that the setup and admin will take longer. Microsoft BI products have traditionally seen a lot of their adoption  come from internal IT departments taking the ‘it’s effectively free, so let’s install it somewhere and see what it does’ path, and this will become much less common because of the added overhead of Sharepoint.
  • The added dependencies between Sharepoint and BI could actually make it slower to deliver new features because now there are multiple MS dev teams that need to work together, co-ordinate functionality and release cycles, and deal with conflicting priorities. History has shown that MS dev teams don’t always do this well (think of Excel and SSAS support), and even when they do some compromises are inevitable.
  • Many customers do have Sharepoint, but not all of them have the editions or versions that the MS BI stack requires. And very often due to political divisions, an internal corporate Sharepoint team have their own agenda to follow which has no place for BI, and aren’t interested in upgrading to a certain version or otherwise accommodating the BI team when it might impact on their own goals.
  • Some customers do not have Sharepoint and have made a conscious decision not to have it; these customers include not only the die-hard anything-but-Microsoft shops but also some who would be interested in a solution with fewer dependencies. For these customers, a Sharepoint dependency removes all question of the use of MS BI.
  • The MS partner ecosystem, at least at the mid-level, is segregated into BI partners and Sharepoint partners, and while there’s a certain amount of convergence you still tend to find that many consulting companies are BI partners who do a bit of Sharepoint on the side or Sharepoint partners who do a bit of BI on the side, so not all of them are capable of selling or implementing an overarching BI-Sharepoint solution.

The nature of my work means that I get to see a lot of different Microsoft BI implementations, probably more than the average consultant. I reckon I work with around 30-40 different customers every year, ranging in size from one-man-bands to the biggest enterprises, and in the five or so years I’ve been in business I’ve only ever seen a relatively small number who actively use Sharepoint in combination with the rest of the Microsoft BI stack. If you work for a large partner that specialises in and actively sells Microsoft BI and Sharepoint you may have seen much greater use of Sharepoint than I have, and if you work for a specialist Sharepoint partner I dare say you only ever work with customers who are very committed to Sharepoint, so I’ll admit my point of view is biased. On the other hand I can’t deny the evidence of my own experience and as a result my natural inclination is to be slightly sceptical about the Sharepoint BI strategy, because I don’t see any basis for the claims that Sharepoint is a ubiquitous platform and one that users actively want to integrate with BI. I’d also add that a couple of years ago I was equally sceptical about Excel’s central role in the Microsoft’s SSAS and wider BI strategy, but now I see Excel used successfully on a wide range of SSAS projects and I’m very much more pro-Excel (although I’m not blind to Excel’s continuing shortcomings as an SSAS client tool for more advanced users). Maybe in a year or two’s time all my customers really will have Sharepoint, the Sharepoint strategy will bear fruit, and my fears will have been proved groundless.

So… what do you think about all this? What are your experiences with Sharepoint, do you have it in-house already or (if you’re a consultant) will you be able to sell a BI platform based on it? Please leave a comment…

Written by Chris Webb

July 21, 2011 at 10:06 am

Posted in BI

Tagged with

DAX Queries, Part 3

with 3 comments

Following on from my previous post, let’s now look at how you can add derived columns to a table expression. Let’s start with a simple query that returns every column in DimDate:

evaluate(
DimDate
)

If I want to add some columns to this query without doing any grouping we can use the AddColumns function like so:

evaluate(
AddColumns(
DimDate
, “Calendar Year Name”
, “Calendar Year ” & DimDate[CalendarYear]
)
)

image

Here I’m adding a new column (shown on the far right in the resultset in the above screenshot) called Calendar Year Name that does a simple concatenation of the string “Calendar Year” with the actual Calendar Year value from the table. You can add multiple columns in this way with more than one column name/expression pair.

The important thing to notice here is that you can’t use this new column to group by in a Summarize() function, so the following query:

evaluate(
Summarize(
AddColumns(
DimDate
, “Calendar Year Name”
, “Calendar Year ” & DimDate[CalendarYear]
)
, [Calendar Year Name]
–, DimDate[CalendarYear]
, “Sum of Sales”
, Sum(FactInternetSales[SalesAmount])
)
)

image

…gives results you may not be expecting: instead of getting the sum for each year, you get the sum for all years repeated; I’m told this scenario may be changed to throw an error at RTM. AddColumns() should only be used for formatting a resultset.

If you uncomment the line grouping by DimDate[CalendarYear], you’ll see you get meaningful results:

image

 

In part 4, I’ll take a look at the Crossjoin() function.

Written by Chris Webb

July 18, 2011 at 4:08 pm

Posted in DAX

New trace events in 2008R2 SP1

leave a comment »

You may have seen that SP1 for SQL 2008R2 got released last week; however (as usual) looking at the master KB article listing all the fixes and changes you’d think there were no significant changes for SSAS. That’s not the case though: SP1 adds several important new trace events that can make monitoring SSAS much easier. They are documented in some detail here:

http://support.microsoft.com/kb/2458438

Highlights include the ability to see how long each command in your MDX Script takes to execute (useful when you’re trying to work out why your MDX Script takes a long time to execute – usually expensive named sets are the problem); more detail on SSAS locks; and the new Resource Usage class.

Written by Chris Webb

July 18, 2011 at 11:48 am

Posted in Analysis Services

DAX Queries, Part 2

with 9 comments

Following on from my last post, let’s now see how we can do group-by style in DAX queries.

The key to aggregating data is the Summarize() function, which is broadly similar to a Group By in SQL. So if the following query returns every row from the FactInternetSales table:

evaluate(
FactInternetSales
)

The following query just returns a list of all the distinct combinations of values from the OrderDateKey and CustomerKey columns:

evaluate(
summarize(
FactInternetSales
, FactInternetSales[OrderDateKey]
, FactInternetSales[CustomerKey]
)
)
order by
FactInternetSales[OrderDateKey]
, FactInternetSales[CustomerKey]

image

Once we’ve specified the name of the table in the first parameter of Summarize, we can then supply a list of columns in the table to group by. This list can contain any number of columns, but we can also do aggregations inside Summarize by supplying a list of column names and DAX numeric expressions after the list of columns. So, for example:

evaluate(
summarize(
FactInternetSales
, FactInternetSales[OrderDateKey]
, “Sum of Sales”
, Sum(FactInternetSales[SalesAmount])
, “Distinct Customers”
, DistinctCount(FactInternetSales[CustomerKey])
)
)
order by
FactInternetSales[OrderDateKey]

Gives us the sum of the SalesAmount column and the number of distinct values in CustomerKey, for each distinct OrderDateKey value, as extra columns in the resultset.

image

We can also reference columns from related tables in our aggregations. So for example this query uses the DimDate table and gives us data aggregated up by year:

evaluate(
summarize(
DimDate
, DimDate[CalendarYear]
, “Sum of Sales”
, Sum(FactInternetSales[SalesAmount])
, “Distinct Customers”
, DistinctCount(FactInternetSales[CustomerKey])
)
)
order by
DimDate[CalendarYear]

image

Finally (at least for today), we can do subtotalling by using the Rollup() function inside our list of group by columns; each column we list inside Rollup() will work like a regular group by column but it will also have a subtotal row added for it in the resultset. So here, for example, is the query above with an extra group by on the days of the week:

evaluate(
summarize(
DimDate
, DimDate[CalendarYear]
, ROLLUP(DimDate[EnglishDayNameOfWeek])
, “Sum of Sales”
, Sum(FactInternetSales[SalesAmount])
, “Distinct Customers”
, DistinctCount(FactInternetSales[CustomerKey])
)
)
order by
DimDate[CalendarYear]

image

I’ve highlighted the subtotals rows here, but we can also identify these rows using the new IsSubTotal() function:

evaluate(
summarize(
DimDate
, DimDate[CalendarYear]
, ROLLUP(DimDate[EnglishDayNameOfWeek])
, “Sum of Sales”
, Sum(FactInternetSales[SalesAmount])
, “Distinct Customers”
, DistinctCount(FactInternetSales[CustomerKey])
, “Is this a SubTotal?”
, if(IsSubtotal(DimDate[EnglishDayNameOfWeek]), “Yes”, “No”)
)
)
order by
DimDate[CalendarYear]

image

 

In part 3, I’ll take a look at how to add derived columns.

Written by Chris Webb

July 15, 2011 at 2:31 pm

Posted in DAX

DAX Queries, Part 1

with 10 comments

So at long last Denali CTP3 has been released and I can blog about all the cool new stuff in it. It feels like years (in fact it has been years) since I’ve had anywhere near this much new functionality to learn and discuss here – it’s better than Christmas! And where better to start than with DAX queries? We’ve had DAX calculations for a while now, of course, but now in BISM Tabular mode we can actually write queries in DAX too. Let’s take a look at how the language works…

For the purposes of this post and the others in this series I built a very simple Tabular mode with two tables from the Adventure Works database, DimDate, DimProduct, DimProductSubCategory, DimProductCategory and FactInternetSales.

image

(I know I’ve not talked about how you would actually go about building a Tabular model yet, but this post from the BISM team is a good starting point and in any case it’s not too different from how you’d build a PowerPivot model)

The simplest DAX query we can write on this model is probably this:

evaluate(
FactInternetSales
)

This returns every row and every column from the FactInternetSales table. Here’s what you see when you run this query in an MDX query window (!) in SQL Server Management Studio:

image

As you can see, the Evaluate statement is at the heart of the new DAX query language: it returns the results of a DAX table expression as the results of a query.

We can filter the rows we receive from a query by using the Filter function:

evaluate(
Filter(
FactInternetSales
, FactInternetSales[OrderDateKey]>20030101)
)

image

Here, I’m getting all the columns from FactInternetSales but only those rows where the OrderDateKey is greater than 20030101.

We can also order our resultset by using an Order By clause, for example:

evaluate(
Filter(
FactInternetSales
, FactInternetSales[OrderDateKey]>20030101)
)
order by
FactInternetSales[OrderDateKey] asc,
FactInternetSales[CustomerKey] asc

image

And there’s also a Start At clause where, for every column that appears in the Order By clause, you can specify to start the result set at the first row where a particular set of values appears; I would imagine that this will be useful for pagination in reports. If you compare the results of the following query with the results of the previous query:

evaluate(
Filter(
FactInternetSales
, FactInternetSales[OrderDateKey]>20030101)
)
order by
FactInternetSales[OrderDateKey] asc,
FactInternetSales[CustomerKey] asc
start at
20030104,
23120

image

…you can see that our resultset now starts at the row where OrderDateKey = 20030104 and CustomerKey=23120.

In part 2, I’ll take a look at the Summarize function…

Written by Chris Webb

July 13, 2011 at 11:59 pm

Posted in DAX

SQLBits 9 Registration Now Open

leave a comment »

Registration for SQLBits 9 opened today, although judging by the number of automatically generated emails in my inbox this evening this isn’t news to lots of you. While the options for the training day are available here, we haven’t yet published the agenda for the Friday or Saturday but if you look at the sessions that have been submitted you’ll see that we’ve got a great selection to choose from and you can be sure that we’ll have an extremely strong line-up.

It’s particularly encouraging to see so many people registering from outside the UK. SQLBits is already the second-largest SQL Server conference in the world, but we know there’s a massive untapped potential market in Europe of SQL Server people and it could get even bigger. So if you’re reading this in France, Germany, the Netherlands, Belgium, Switzerland, or Scandinavia, why not come over? You’ll have a lot of fun and who knows, you might even learn something useful!

Written by Chris Webb

July 11, 2011 at 10:25 pm

Posted in Events

Tagged with

The rather-too-many ways to crossjoin in MDX

with 11 comments

In my last post I made the point that it’s a bit too easy to write and MDX query that works, even if you don’t really understand why it works, and in this post I’m going to address a similar issue. In MDX one of the commonest set operations is a crossjoin, and while most people understand what this operation does there are so many ways of writing a crossjoin in MDX that it can hurt readability and make the language even more confusing for newcomers. So what are all these different ways of crossjoining, and which one is to be preferred?

First of all, let’s look at what a crossjoin actually does. Imagine we have two sets, each with two members in them: {A,B} and {X,Y}. If we crossjoin these two sets together, we get a set of tuples containing every possible combination of A and B and X and Y, ie the set {(A,X), (A,Y), (B,X), (B,Y)}.

As an example of this, let’s look at the first way of doing a crossjoin in MDX: the Crossjoin() function. Here’s a query against the Adventure Works cube that returns the crossjoin of the two sets {Married, Single} and {Female, Male} on the rows axis:

SELECT {[Measures].[Internet Sales Amount]} ON 0,
CROSSJOIN(
{[Customer].[Marital Status].&[M], [Customer].[Marital Status].&[S]}
,
{[Customer].[Gender].&[F],[Customer].[Gender].&[M]}
)
ON 1
FROM [Adventure Works]

Here’s the output:

image

As you’d expect, you get four rows for each of the four tuples in the set: {(Married, Female), (Married, Male), (Single, Female), (Single, Male)}.

What are the pros and cons of using the Crossjoin() function then? Well, one thing it’s worth stating is that all of the different ways of doing crossjoins in MDX perform just as well as the others, so it’s purely a question of readability and maintainability. On those criteria its main advantage is that it’s very clear you’re doing a crossjoin – after all, that’s the name of the function! However, because it carries an overhead in terms of the numbers of brackets and commas and the name of the function itself, which isn’t so good for readability, and this is why I generally don’t use it. When you’re crossjoining a lot of sets together, for example:

SELECT {[Measures].[Internet Sales Amount]} ON 0,
CROSSJOIN(
{[Customer].[Marital Status].&[M], [Customer].[Marital Status].&[S]}
,
{[Customer].[Gender].&[F],[Customer].[Gender].&[M]}
,
[Customer].[Education].[Education].MEMBERS
,
[Customer].[Total Children].[Total Children].MEMBERS
)
ON 1
FROM [Adventure Works]

…you might need to look a long way up to the top of the query to find out you’re doing a crossjoin.

The most popular alternative to the Crossjoin() function is the * operator. This allows you to crossjoin several sets by simply putting an asterisk between them, for example:

SELECT {[Measures].[Internet Sales Amount]} ON 0,
{[Customer].[Marital Status].&[M], [Customer].[Marital Status].&[S]}
*
{[Customer].[Gender].&[F],[Customer].[Gender].&[M]}
ON 1
FROM [Adventure Works]

It’s more concise than the Crossjoin() function and I think easier to read; it also has the advantage of being the most frequently-used syntax. However there are rare cases when it can be ambiguous because an asterisk is of course also used for multiplication. Consider the following calculated measure in the following query:

WITH
MEMBER MEASURES.DEMO AS
([Measures].[Internet Sales Amount]) * ([Customer].[Gender].&[F])
SELECT {[Measures].DEMO} ON 0,
{[Customer].[Marital Status].&[M], [Customer].[Marital Status].&[S]}
ON 1
FROM [Adventure Works]

Are we crossjoining or multiplying here? Actually, we’re multiplying the result of the two tuples together, rather than returning the result of the tuple ([Measures].[Internet Sales Amount], [Customer].[Gender].&[F]), but it’s not easy to tell.

The third way of doing a crossjoin is one I particularly dislike, and it’s the use of brackets and commas on their own as follows:

SELECT {[Measures].[Internet Sales Amount]} ON 0,
({[Customer].[Marital Status].&[M], [Customer].[Marital Status].&[S]}
,
{[Customer].[Gender].&[F],[Customer].[Gender].&[M]})
ON 1
FROM [Adventure Works]

This is for me the least readable and most ambiguous syntax: in my mind round brackets denote a tuple and here we’re getting a set of tuples. I’d therefore advise you not to use this syntax.

Last of all, for maximum MDX geek points, is the Nest() function. Almost completely undocumented and unused, a hangover from the very earliest days of MDX, it works in exactly the same way as the Crossjoin() function:

SELECT {[Measures].[Internet Sales Amount]} ON 0,
NEST(
{[Customer].[Marital Status].&[M], [Customer].[Marital Status].&[S]}
,
{[Customer].[Gender].&[F],[Customer].[Gender].&[M]}
)
ON 1
FROM [Adventure Works]

Of course you’d never want to use it unless you were either showing off or wanted to confuse your colleagues as much as possible…

In summary, I’d recommend using the * operator since it’s probably the clearest syntax and also the most widely-accepted. Equally importantly, I’d advise you to be consistent: choose one syntax, stick with it and make sure everyone on the project does the same.

Written by Chris Webb

July 5, 2011 at 4:51 pm

Posted in MDX

Follow

Get every new post delivered to your Inbox.

Join 2,859 other followers