Chris Webb's BI Blog

Analysis Services, MDX, PowerPivot, DAX and anything BI-related

Archive for June 2012

OR Selections in DAX

with 3 comments

Most of the time, selecting items in a PivotTable in a PowerPivot or SSAS 2012 Tabular model works in the way you want: if you select more than one item from a column it acts as an OR filter on that column; selecting items on another column acts as an AND filter on the selection. Consider a simple PowerPivot model built from the following Excel tables (where the only relationship is on the FruitKey columns):

image

With a measure that sums up the Sales column, such as:

Sum of Sales:=SUM(SalesFact[Sales])

You can built a PivotTable like the one below that shows the sales of Citrus fruit that are Orange or Yellow (Citrus=Yes and (Colour=Orange OR Colour=Yellow):

image

So far so good. What happens if you want to see the sales of fruit that are either Citrus OR Yellow? If you know the selection in advance, or you have control over how your DAX is generated, you can build an expression like this measure that does the job:

Sum of Sales Citrus or Yellow:=
CALCULATE(
  SUM(SalesFact[Sales])
  , FILTER(ALL(Fruit1), Fruit1[Citrus]="Yes" || Fruit1[Colour]="Yellow")
)

image

But what if you don’t want to hard-code the filter you’re using, and want to let your users control what they are ORing? Here’s how…

First of all, you need to have two identical tables in your model that allow the user to make selections for the two conditions they are ORing. Here’s an example with the data from above:

image

Here I’ve got two tables, Fruit1 and Fruit2, for controlling the OR selection; only Fruit1 has a relationship with SalesFact though. Next, you need a measure that will apply the OR filter. The way this needs to work is as follows: take the whole of the Fruit table and if the user has selected something on Fruit1 then allow those rows through, and if the user has selected something on Fruit2 then allow those rows through the filter as well. Here’s the final DAX measure:

OR Sales:=
IF(
  OR(ISCROSSFILTERED(Fruit1[FruitKey]), ISCROSSFILTERED(Fruit2[FruitKey]))
  ,CALCULATE(
    SUM(SalesFact[Sales])
    , FILTER(
      ALL(Fruit1)
      , IF(ISCROSSFILTERED(Fruit1[FruitKey]), CONTAINS(Fruit1, Fruit1[FruitKey], [FruitKey]), FALSE())
      ||
      IF(ISCROSSFILTERED(Fruit2[FruitKey]), CONTAINS(Fruit2, Fruit2[FruitKey], [FruitKey]), FALSE())
    )
  )
  , SUM(SalesFact[Sales])
)

The key points here are (starting from the inside of the expression and working outwards):

  • I’m using ALL(Fruit1) to get all the rows from the Fruit1 table, regardless of what has been selected, and then passing that table to the FILTER() function
  • I’m then using the ISCROSSFILTERED() function on the FruitKey columns on both tables to see if the user has selected anything from any columns on those tables; if they have, then the FruitKey column will be filtered in some way
  • Then, if something has been selected on either table, in my filter of ALL(Fruit1) I’m allowing a row to pass through the filter if the value of FruitKey is present in the user-selections on Fruit1 or Fruit2. This preserves the original selection on Fruit1 and adds the selection on Fruit2 to it. It would be much easier to do this if there was a way of unioning table expressions in DAX!
  • Finally, I’m using the resulting table in the CALCULATE() function to get the sum of Sales for that selection. If nothing has been selected, however, the outer IF() simply returns the sum of Sales Amount with none of this logic applied.

Here’s a PivotTable using this measure with Citrus from Fruit1 on rows and Colour from Fruit2 on columns:

image

And, if you don’t want this OR functionality, all you need to do is ignore the Fruit2 table. Here’s a PivotTable with Citrus and Colour from Fruit1 on rows and columns:

image

I’m not sure this is the most elegant solution to this problem… if I think of one I’ll blog about it, and if you can think of one then please leave a comment!

Written by Chris Webb

June 27, 2012 at 11:43 pm

Posted in DAX, PowerPivot, Tabular

Comparing Any Two Time Periods in DAX

with 7 comments

Simple time series calculations, such as comparing sales in the current year with the previous year, are easy to do in DAX and well-documented. What happens if, in PowerPivot or the SSAS 2012 Tabular model, you want to compare the sales in any two arbitrary time periods – for example you want to compare (and maybe find the growth from) sales in January last year and February this year?

There are several ways of solving this, but here’s one. First, you need two Date dimension tables, one for each of the two time periods you’re comparing. Using the Adventure Works DW database here’s an example of how this can be done:

image

In this case I loaded DimDate and FactInternetSales and created an active relationship between the two tables on OrderDateKey=DateKey. I then loaded DimDate again, called it DimDateCompare, and created another relationship between it and FactInternetSales on OrderDateKey=DateKey but set this second relationship as inactive. This means that any selection on DimDateCompare will be ignored unless the relationship between it and FactInternetSales is activated using the UseRelationship() function.

You can then create a measure to get the Sum of the SalesAmount column for the selection on DimDate, which is simply:

Sum of Sales Amount:=SUM([SalesAmount])

Then create another measure that uses the relationship between DimDateCompare and FactInternetSales and ignores any selections on DimDate (to do this, you just need to use the All() function):

Sum of Sales Amount for Comparison Period:=
CALCULATE(
    SUM([SalesAmount])
, ALL(DimDate)
, USERELATIONSHIP(FactInternetSales[OrderDateKey], DimDateCompare[DateKey]))

And finally create a measure to find the difference between these two measures:

Difference:=[Sum of Sales Amount] – [Sum of Sales Amount for Comparison Period]

To show how this works, look at the following PivotTable showing these three measures:

image

I have CalendarYear from DimDate on the rows of the PivotTable and CalendarYear from DimDateCompare on the Slicer. The Sum of Sales Amount measure shows, as you would expect, the sum of sales for each year on DimDate; the Sum of Sales Amount for Comparison Period ignores what’s on rows and returns the sum of sales for the two years selected on DimDateComparison, ie 2001 and 2002. This is £3,266,373.66 + £6,530,343.53 = £9,796,717.18; and Difference shows the second of these values subtracted from the first.

Not a very interesting calculation in itself perhaps, but I’m intrigued by the possibilities of having tables in the model that only have inactive relationships with other tables; up to now I’ve only used inactive relationships in scenarios where there have been another active relationship between the same two tables, for example as described in Kasper’s post here.

Written by Chris Webb

June 20, 2012 at 2:48 pm

Posted in DAX, PowerPivot, Tabular

PASS Global Growth Initiatives

with 3 comments

Earlier this year I got involved in one of the slanging matches that seem to blow up every so often about some aspect of PASS politics. I argued that PASS could only grow to become a truly international organisation if changes were made to its constitution, to ensure that all of the different SQL Server communities around the world have a say in the way it is run. I was therefore interested to read this announcement about what PASS has planned in this area:
http://www.sqlpass.org/Community/PASSBlog/entryid/442/PASS-Global-Growth-Initiatives.aspx
More details can be found here:
http://www.sqlpass.org/Community/GlobalGrowth.aspx
http://www.sqlpass.org/LinkClick.aspx?fileticket=-HbsvHNwH_U%3d&tabid=8515
…but if you’ve only got 30 seconds, I think these org charts showing the proposed structure of the PASS Board in the coming years (next year there would be a new “EMEA Director at Large” seat on the board with voting rights, and a similar APAC seat would be created the year after) are important:
http://www.sqlpass.org/LinkClick.aspx?fileticket=95puV_d_sxU%3d&tabid=8515

If you’re reading this outside the US, why should you care about what PASS is doing here? I know from my involvement with SQLBits that a lot of good things can be accomplished at the national level without the direct involvement of PASS; but I also know from my involvement with SQLBits that at a certain point the benefits of international co-operation between different SQL Server communities becomes clear, and that the only body that can co-ordinate these activities is PASS. I don’t want to see PASS assume day-to-day control over every single aspect of the SQL Server community everywhere in the world and I doubt that would ever happen, but I do want to see PASS active in providing help and support to foster the growth of national SQL Server communities when it is needed and requested. I believe that can only happen effectively if PASS has an international leadership.

So if you’ve ever been to a SQLBits, a SQLRally, a SQL Saturday, a user group or any other SQL Server community event and thought it was worthwhile then I’d urge you to get involved. Join PASS if you aren’t already a member (it’s easy and free), vote for the people you think will represent your views and needs, and make your voice heard. It will benefit PASS, it will benefit your local SQL Server community and ultimately it will benefit you.

Written by Chris Webb

June 17, 2012 at 12:23 am

Posted in PASS

OData Extensions for Data Aggregation

with 2 comments

I was just reading the following blog post on the OASIS OData Technical Committee Call for Participation:
http://www.odata.org/blog/2012/6/11/oasis-odata-technical-committee-call-for-participation

…when I saw this:

In addition to the core OData version 3.0 protocol found here, the Technical Committee will be defining some key extensions in the first version of the OASIS Standard:

OData Extensions for Data Aggregation - Business Intelligence provides the ability to get the right set of aggregated results from large data warehouses. OData Extensions for Analytics enable OData to support Business Intelligence by allowing services to model data analytic "cubes" (dimensions, hierarchies, measures) and consumers to query aggregated data

Follow the link in the quoted text – it’s very interesting reading! Here’s just one juicy quote:

OData services expose a data model that describes the schema of the service in terms of
the Entity Data Model (EDM), an Entity-Relationship model that describes the data and
then allows for querying that data. The responses returned by an OData provider are
based on that exposed data model and retain the relationships between the entities in the
model. Adding the notion of  aggregation to OData, without changing any of the base
principles in OData as is, has two sides to it:
1. Means for the server to describe an “analytical shape” of the data represented by
the service
2. Means for the client to query an “analytical shape” on top of any given data
model (for sufficiently capable servers/services)
It’s important to notice that, while each of these two sides might be valuable in its own
right and can be used independently of the other, their combination provides additional
value for clients. The  descriptions  provided by the server  will help a consumer
understand more of the data structure looking at the service’s exposed data model from an
analytics perspective, whereas the  query extensions allow  the  clients to  express an
“analytical shape” for a particular query. The query extensions will also allow clients to
refer to the server-described “analytical shape” as shorthand.

One of the authors, Siva Harinath, I know from the SSAS dev team and it’s easy to see how this extension for OData would be incredibly useful for Microsoft BI. Not only could you imagine cloud-based SSAS (when it finally arrives) being queryable via OData, but it would allow an OData provider to not only of supply raw data to PowerPivot but the design of the PowerPivot model itself.

Written by Chris Webb

June 12, 2012 at 9:28 pm

Posted in Cloud, DAX, OData

Tuning Queries with the WITH CACHE Statement

with 4 comments

One of the side-effects of the irritating limitations that SSRS places on the MDX you can use in your reports is the widespread use of calculated measures to get the columns you want. For example, a query like this (note, this query isn’t on the Adventure Works cube but on a simpler cube built on the Adventure Works DW database):

SELECT
{[Measures].[Sales Amount]}
*
[Date].[Day Number Of Week].[Day Number Of Week].MEMBERS
ON 0,
[Product].[Product].[Product].MEMBERS ON 1
FROM [Adventure Works DW]

image

…which wouldn’t be allowed in SSRS, could be rewritten like so:

WITH
MEMBER MEASURES.D1 AS
([Measures].[Sales Amount], [Date].[Day Number Of Week].&[1])
MEMBER MEASURES.D2 AS
([Measures].[Sales Amount], [Date].[Day Number Of Week].&[2])
MEMBER MEASURES.D3 AS
([Measures].[Sales Amount], [Date].[Day Number Of Week].&[3])
MEMBER MEASURES.D4 AS
([Measures].[Sales Amount], [Date].[Day Number Of Week].&[4])
MEMBER MEASURES.D5 AS
([Measures].[Sales Amount], [Date].[Day Number Of Week].&[5])
MEMBER MEASURES.D6 AS
([Measures].[Sales Amount], [Date].[Day Number Of Week].&[6])
MEMBER MEASURES.D7 AS
([Measures].[Sales Amount], [Date].[Day Number Of Week].&[7])
SELECT
{MEASURES.D1,MEASURES.D2,MEASURES.D3,MEASURES.D4,MEASURES.D5,MEASURES.D6,MEASURES.D7}
ON 0,
[Product].[Product].[Product].MEMBERS ON 1
FROM [Adventure Works DW]

…to get it in an SSRS-friendly format with only measures on columns.

For the last few days I’ve had the pleasure of working with Bob Duffy (a man so frighteningly intelligent he’s not only an SSAS Maestro but a SQL Server MCM as well) on tuning a SSRS report like this on a fairly large cube. As Bob found, the problem with this style of query is that it isn’t all that efficient: if you look in Profiler at what happens on a cold cache, you can see there are seven separate Query Subcube events and seven separate partition scans (indicated by the Progress Report Begin/End events) for each calculated measure on columns.

image

The first thing that Bob tried to tune this was to rewrite the query something like this:

SELECT
{[Measures].[Sales Amount]}
ON 0,
NON EMPTY
[Product].[Product].[Product].MEMBERS
*
[Date].[Day Number Of Week].[Day Number Of Week].MEMBERS
ON 1
FROM [Adventure Works DW]

…and pivot the data in the SSRS tablix to get the desired layout with the Day Numbers on columns. The interesting thing, though, is that for this particular report while rewriting the query in this way made it run faster (there is only one Query Subcube event and partition scan now) it actually made the SSRS report run slower overall, simply because SSRS was taking a long time to pivot the values.

Instead, together we came up with a way to tune the original query using the WITH CACHE statement like so:

WITH
CACHE AS
‘([Measures].[Sales Amount]
, [Product].[Product].[Product].MEMBERS
, [Date].[Day Number Of Week].[Day Number Of Week].MEMBERS)’

MEMBER MEASURES.D1 AS
([Measures].[Sales Amount], [Date].[Day Number Of Week].&[1])
MEMBER MEASURES.D2 AS
([Measures].[Sales Amount], [Date].[Day Number Of Week].&[2])
MEMBER MEASURES.D3 AS
([Measures].[Sales Amount], [Date].[Day Number Of Week].&[3])
MEMBER MEASURES.D4 AS
([Measures].[Sales Amount], [Date].[Day Number Of Week].&[4])
MEMBER MEASURES.D5 AS
([Measures].[Sales Amount], [Date].[Day Number Of Week].&[5])
MEMBER MEASURES.D6 AS
([Measures].[Sales Amount], [Date].[Day Number Of Week].&[6])
MEMBER MEASURES.D7 AS
([Measures].[Sales Amount], [Date].[Day Number Of Week].&[7])
SELECT
{MEASURES.D1,MEASURES.D2,MEASURES.D3,MEASURES.D4,MEASURES.D5,MEASURES.D6,MEASURES.D7}
ON 0,
[Product].[Product].[Product].MEMBERS ON 1
FROM [Adventure Works DW]

What WITH CACHE statement does here is load all the data needed for the query into the Storage Engine cache before anything else happens. So even though there are still seven different Query Subcube events for each column, there’s now only one partition scan and each of the seven Query Subcube events now hits cache:

image

There’s no guarantee that this approach will result in the best performance even when you have a query in this form, but it’s worth testing if you have. It’s certainly the first time in a long while that I’ve used the WITH CACHE statement in the real world – so it’s interesting from an MDX point of view too.

Written by Chris Webb

June 10, 2012 at 10:20 pm

Handling SSRS Multi-Valued Parameters in DAX Queries

with 23 comments

Last year I blogged about using DAX queries inside SSRS reports but one topic I didn’t touch on was how to handle parameters. Marco wrote a post on using parameters in DAX earlier this year but didn’t cover how to handle multi-valued parameters in SSRS, so I thought I’d write a post to fill that gap.

Let’s start by assuming you’ve built a simple SSAS 2012 Tabular model that contains the DimDate table from Adventure Works, that you want to build an SSRS report that shows data from that table and you want to parameterise that report by the values in the EnglishDayNameOfWeek column, ie by day.

If you were writing the query from scratch, probably the best way of doing it would be to use the CalculateTable() function like so:

evaluate
calculatetable(
DimDate
, DimDate[EnglishDayNameOfWeek] = "Monday"
)

This query returns the whole of the DimDate table filtered where EnglishDayNameOfWeek is Monday.

image

To filter by Monday or Tuesday, you’d need to do something like this:

evaluate
calculatetable(
DimDate
, DimDate[EnglishDayNameOfWeek] = "Monday"
|| DimDate[EnglishDayNameOfWeek] = "Tuesday"
)

Since there is no way of doing an IN in DAX, you might be able to guess what the big problem that needs to solved when handling multi-value parameters in DAX is: with this style of query, for each parameter value that is selected you need to add a new OR condition and the only way of doing that is to generate the query dynamically. While that’s perfectly possible in SSRS it’s something you may want to avoid because it makes report development awkward.

As an alternative to dynamic query generation you can make use of the DAX PathContains() function. Although it’s intended for use when flattening out parent/child hierarchies, it’s useful here because it allows you to see whether a particular string value appears in a pipe-delimited list. You can use it with the Filter() function to get all the Mondays and Tuesdays like so:

evaluate
filter(
DimDate
, pathcontains("Monday|Tuesday", DimDate[EnglishDayNameOfWeek])
)

There’s a problem with using it this way, unfortunately – it’s not very efficient. Looking in Profiler at the Vertipaq SE Query Begin/End events, you can see that Storage Engine has to call back to the Formula Engine to use the PathContains() function when filtering:

image

…and as Jeffrey Wang describes here, this can be very expensive indeed. However, its impact can be lessened if you have a small number of potential parameter values by doing this instead:

evaluate
calculatetable(
DimDate
, filter(
values(DimDate[EnglishDayNameOfWeek])
, pathcontains("Monday|Tuesday", DimDate[EnglishDayNameOfWeek])
)
)

Here, I’m only filtering the table of distinct values in the EnglishDayNameOfWeek column (which has only got 7 rows), and then taking that table and using it as a filter with the calculatetable() function on the main DimDate table. In this case there’s still a callback but only on a filter on a relatively small table, and the filter on the larger table, DimDate, is performed in the most efficient way possible.

How, then, can you take this query pattern and use in in your SSRS report? The first thing to do is to create a dataset that returns all the days of the week, using the following query:

evaluate values(DimDate[EnglishDayNameOfWeek])

image

This, then, can be used to provide the available values for an SSRS multi-valued parameter. This should all be straightforward for anyone who has used SSRS before.

image

image

You can then add a new query parameter to our main dataset, the one that queries the DimDate table:

image 

The next problem to solve is the fact that when using the Analysis Services connection type in SSRS, when you have a multi-valued parameter SSRS assumes you’re using MDX to query a cube. Therefore, if you have selected Monday and Tuesday in your parameter, SSRS will pass a string containing an MDX set expression like the one below through to the query parameter:

{ Monday,Tuesday }

You therefore need to do some string manipulation to turn this into a pipe-delimited string of the type that PathContains() can use. Here’s the complete DAX query that shows how this can be done:

evaluate
calculatetable(
DimDate
, filter(
values(DimDate[EnglishDayNameOfWeek])
, pathcontains(
substitute(
substitute(
substitute(
@DayNames
, "{ ", "")
, " }", "")
, ",", "|")
, DimDate[EnglishDayNameOfWeek])
)
)

It’s pretty simple: I’m using Substitute() to remove the opening and closing braces, and to turn the commas into pipes.  With that done, you now have an SSRS report using a DAX query that can handle multi-valued parameters!

image

Written by Chris Webb

June 1, 2012 at 6:23 pm

Posted in DAX, Tabular

Follow

Get every new post delivered to your Inbox.

Join 2,859 other followers