Chris Webb's BI Blog

Analysis Services, MDX, PowerPivot, DAX and anything BI-related

Archive for March 2010

Different Approaches To Handling Tied Ranks

with 2 comments

Even if blogging about MDX feels, these days, a bit like blogging about COBOL (I’ll be returning to DAX soon, I promise), here’s an interesting MDX problem I came across the other day that I thought was writing about.

Calculating ranks is one of those things in MDX that is slightly trickier than it first appears. There’s a RANK function, of course, but in order to get good performance from it you need to know what you’re doing. It’s fairly widely known that with normal ranks what you need to do is to order the set you’re using before you find the rank of a tuple inside that set. Consider the following query on Adventure Works:

WITH
MEMBER MEASURES.REGULARRANK AS
RANK([Customer].[Customer].CURRENTMEMBER,
ORDER(
[Customer].[Customer].[Customer].MEMBERS
, [Measures].[Internet Sales Amount]
, BDESC))

SELECT
{[Measures].[Internet Sales Amount], [Measures].REGULARRANK}
ON COLUMNS,
ORDER(
[Customer].[Customer].[Customer].MEMBERS
, [Measures].[Internet Sales Amount]
, BDESC)
ON ROWS
FROM [Adventure Works]

It’s unbearably slow (in fact I killed the query rather than wait for it to complete) because, in the calculated member, what we’re doing is ordering the set of all customers every time we calculate a rank. Obviously we don’t need to do this, so the solution to this problem is of course to order the set just once, use a named set to store the result, and then reference the named set in the calculated member as follows:

WITH

SET
ORDEREDCUSTOMERS AS
ORDER(
[Customer].[Customer].[Customer].MEMBERS
, [Measures].[Internet Sales Amount]
, BDESC)

MEMBER MEASURES.REGULARRANK AS
RANK([Customer].[Customer].CURRENTMEMBER,
ORDEREDCUSTOMERS)

SELECT
{[Measures].[Internet Sales Amount], [Measures].REGULARRANK}
ON COLUMNS,
ORDEREDCUSTOMERS
ON ROWS
FROM [Adventure Works]

This query now executes in 5 seconds on my laptop. You probably knew all this already though.

But what happens if you need to handle tied ranks? The approach above doesn’t give you tied ranks because the RANK function, in its two-parameter form, simply finds the position of a tuple in a set, and no two tuples can occupy the same position in a set. That’s why you get results like this:

image

Even though Courtney A. Edwards and Jackson L. Liu have the same value for Internet Sales Amount, their ranks are 799 and 800 respectively, because Courtney A. Edwards comes before Jackson L. Liu in the ORDEREDCUSTOMERS set.

BOL tells us of the three-parameter form of RANK that does give us tied ranks. This is how you use it:

WITH

SET
ORDEREDCUSTOMERS AS
ORDER(
[Customer].[Customer].[Customer].MEMBERS
, [Measures].[Internet Sales Amount]
, BDESC)

MEMBER [Measures].REGULARRANKTIED AS
RANK([Customer].[Customer].CURRENTMEMBER
,[Customer].[Customer].[Customer].MEMBERS,[Measures].[Internet Sales Amount])

SELECT
{[Measures].[Internet Sales Amount], [Measures].REGULARRANKTIED}
ON COLUMNS,
ORDEREDCUSTOMERS
ON ROWS
FROM [Adventure Works]

But, unfortunately, the performance is as bad as the original version of the non-tied rank calculation, ie incredibly bad, because once again we’re sorting the set every time we calculate a rank. So how can we get tied ranks and good performance?

The first approach I tried was to use a recursive calculation, which used the named set approach to calculate the non-tied rank and then checked to see if the CurrentMember on Customer had the same value for Internet Sales Amount as the member before it in the set of Ordered Customers; if it did, it displayed the rank of the previous Customer. Here it is:

WITH

SET
ORDEREDCUSTOMERS AS
ORDER(
[Customer].[Customer].[Customer].MEMBERS
, [Measures].[Internet Sales Amount]
, BDESC)

MEMBER MEASURES.REGULARRANK AS
RANK([Customer].[Customer].CURRENTMEMBER, ORDEREDCUSTOMERS)

MEMBER MEASURES.TIEDRANK AS
IIF(
[Measures].[Internet Sales Amount] =
(ORDEREDCUSTOMERS.ITEM(MEASURES.REGULARRANK-2), [Measures].[Internet Sales Amount])
AND MEASURES.REGULARRANK>1
, (ORDEREDCUSTOMERS.ITEM(MEASURES.REGULARRANK-2), MEASURES.TIEDRANK)
,MEASURES.REGULARRANK)

SELECT
{[Measures].[Internet Sales Amount], MEASURES.TIEDRANK,[Measures].REGULARRANK}
ON COLUMNS,
ORDEREDCUSTOMERS
ON ROWS
FROM [Adventure Works]

Now this particular query performs pretty well – 6 seconds on my laptop, only marginally worse than the non-tied rank. And it gives the correct results; the middle column of values below shows the tied rank:

image

Unfortunately, the performance of this approach varies a lot depending on the number of tied ranks that are present in the set. If we slice the query by the year 2001, when there were a lot more customers with tied ranks, as follows:

WITH

SET
ORDEREDCUSTOMERS AS
ORDER(
[Customer].[Customer].[Customer].MEMBERS
, [Measures].[Internet Sales Amount]
, BDESC)

MEMBER MEASURES.REGULARRANK AS
RANK([Customer].[Customer].CURRENTMEMBER, ORDEREDCUSTOMERS)

MEMBER MEASURES.TIEDRANK AS
IIF(
[Measures].[Internet Sales Amount] =
(ORDEREDCUSTOMERS.ITEM(MEASURES.REGULARRANK-2), [Measures].[Internet Sales Amount])
AND MEASURES.REGULARRANK>1
, (ORDEREDCUSTOMERS.ITEM(MEASURES.REGULARRANK-2), MEASURES.TIEDRANK)
,MEASURES.REGULARRANK)

SELECT
{[Measures].[Internet Sales Amount], MEASURES.TIEDRANK,[Measures].REGULARRANK}
ON COLUMNS,
ORDEREDCUSTOMERS
ON ROWS
FROM [Adventure Works]
WHERE([Date].[Calendar Year].&[2001])

…then performance gets really bad once again.

Then I came up with a new approach. After ordering the set of all Customers, I made a second pass over it and created a second set with exactly the same number of items in it: for every customer in the first set, in the second set I added the current Customer if that Customer did not have a tied rank; if the Customer did have a tied rank, I added the first Customer in the original set that shared its tied rank. So if there were four customers, A, B, C and D, and if A had sales of 1, B had sales of 2, C had sales of 2 and D had sales of 3, then this new set would contain the members A, B, B, D. I could then say, for Customer C, that it was the third Customer in the original set, but the third item in the new set was B, and that was the Customer whose rank I needed to display for Customer C. So each item in this second set gives us the member whose rank we need to display for the member in the same position in the set of ordered Customers.

Here’s the MDX:

WITH

SET
ORDEREDCUSTOMERS AS
ORDER(
[Customer].[Customer].[Customer].MEMBERS
, [Measures].[Internet Sales Amount]
, BDESC)

MEMBER MEASURES.REGULARRANK AS
RANK([Customer].[Customer].CURRENTMEMBER, ORDEREDCUSTOMERS)

SET CUSTOMERSWITHTIES AS
GENERATE(
{INTERSECT({ORDEREDCUSTOMERS.ITEM(0)} AS FIRSTTIE,{}), ORDEREDCUSTOMERS} AS ORDEREDCUSTOMERS2
, IIF(
({ORDEREDCUSTOMERS2.CURRENT AS CURRCUST}.ITEM(0), [Measures].[Internet Sales Amount]) =
({ORDEREDCUSTOMERS2.ITEM(ORDEREDCUSTOMERS2.CURRENTORDINAL-2) AS PREVCUST}.ITEM(0), [Measures].[Internet Sales Amount])
, IIF(
(PREVCUST.ITEM(0), [Measures].[Internet Sales Amount])
=
(FIRSTTIE.ITEM(0), [Measures].[Internet Sales Amount])
, {FIRSTTIE}
, {PREVCUST AS FIRSTTIE})
, {CURRCUST})
, ALL)

MEMBER MEASURES.HASTIE AS
RANK([Customer].[Customer].CURRENTMEMBER, CUSTOMERSWITHTIES)

MEMBER MEASURES.TIEDRANK AS
(MEASURES.REGULARRANK, CUSTOMERSWITHTIES.ITEM(MEASURES.REGULARRANK-1))

SELECT
{[Measures].[Internet Sales Amount], MEASURES.TIEDRANK,[Measures].REGULARRANK}
ON COLUMNS,
ORDEREDCUSTOMERS
ON ROWS
FROM [Adventure Works]

The named set CUSTOMERSWITHTIES is where the interesting stuff happens. I’m iterating over the set ORDEREDCUSTOMERS using the GENERATE function, and using inline named sets to store the current Customer in the iteration, the previous Customer, and the first Customer containing the shared tied rank (see here for a similar example of using named sets). It consistently executes in 12 seconds regardless of how you slice the query, so it’s not as good as the best performance of the recursive approach but it’s much, much better than the worst performance of the recursive approach. If anyone has any other ideas on how to solve this problem, I’d love to hear them. I’m still sure there’s a better way of doing this…

Of course, what I really want is for the Formula Engine to be able to optimise queries containing set functions like Order in scenarios like this – I’d want it to know that when a particular set operation returns the same result for a block of cells, it should only perform that set operation once. However, even this wouldn’t necessarily be good enough in all cases – there are plenty of situations where you need to perform the same expensive set operation like a sort or a filter in multiple similar calculations, and you’d like to share the result of this set operation between calculations. For example, you might have a calculated member that counted the number of Customers who bought something both this month and in the previous month, and a second calculated member that counted the number of Customers who not only bought this month and in the previous month and spent more than $1000. In both cases you end up finding the set of Customers who bought this month and last month, which may take a long time to do. This is why I think it would be useful to be able to have calculated members return set objects, which can then be cached, so you can share the set between multiple other calculated members; if you agree, please vote on this Connect

Written by Chris Webb

March 29, 2010 at 10:47 pm

Posted in MDX

SQL Azure BI – now hiring

with 3 comments

Hmmm, once again Microsoft’s job ads give away a few details on upcoming products. The SQL Azure team mentioned on their blog they were hiring, so I had a look what jobs were on offer. And lo-and-behold, there were a few jobs there for developers and testers for an Azure BI team. Here are two examples:

https://careers.microsoft.com/JobDetails.aspx?ss=&pg=0&so=&rw=1&jid=14197&jlang=EN
https://careers.microsoft.com/JobDetails.aspx?ss=&pg=0&so=&rw=2&jid=14206&jlang=EN

I quote from the first:

The SQL Azure team leads the expansion of the existing SQL Server business to the Cloud. As part of Microsoft’s comprehensive Azure Services offering we entered commercial availability earlier this year with a relational database service. We are extending the platform to bring Microsoft’s market-leading Business Intelligence services as part of a self-service BI solution.

and

Prior experience with SQL Server Reporting Services (SSRS) and SQL Server Analysis Services (SSAS) or other business intelligence technologies is desired

To be fair, we already knew that SSAS and SSRS in the cloud were coming at some point – it was on several publicly-available slide decks. Also, with PowerPivot being flavour-of-the-month and the cloud being the cloud, it’s hardly surprising that it’s a self-service BI solution. I dare say Dallas and OData fit in there somewhere too (incidentally, while we’re on the subject of OData, check out the Sesame Data Browser). The fact that all these ads have gone up recently and some of them talk about the ‘new’ BI team suggests development hasn’t started yet though, so I guess we won’t be seeing PowerPivot-in-the-cloud for another year or two; perhaps in time for Denali?

Written by Chris Webb

March 23, 2010 at 10:53 pm

Posted in Cloud

Query performance tuning chapter from “Expert Cube Development” available online

leave a comment »

As you probably know, last year I co-wrote a book called “Expert Cube Development with SQL Server Analysis Services 2008” with Marco Russo and Alberto Ferrari. Although I’m sure you all already own a copy, those of you that don’t might be interested to know that as well as the sample chapter that’s available on the book’s home page, the chapter on query performance tuning is available as a two-part article here:
http://www.packtpub.com/article/query-performance-tuning-microsoft-analysis-services-part1
http://www.packtpub.com/article/query-performance-tuning-microsoft-analysis-services-part2

In fact it’s been available for quite a while, but I thought I’d post a link up because I was searching for it the other day and couldn’t find it myself…

Written by Chris Webb

March 23, 2010 at 5:12 pm

Posted in Books

SQLBits VI Registration Now Open!

leave a comment »

Yes, registration is now open for SQLBits VI – go to http://www.sqlbits.com/ for all the details. Better hurry though as we think it’s going to fill up quickly. Also check out the all-star agenda here:
http://www.sqlbits.com/information/newagenda.aspx

Written by Chris Webb

March 17, 2010 at 4:26 pm

Posted in Events

BI.Quality

leave a comment »

Here’s something interesting that I’ve just seen on Hilmar Buchta’s blog: a new, open source testing suite for the Microsoft BI stack (including SSAS) called BI.Quality. It’s available for download from Codeplex here:
http://biquality.codeplex.com/

Unfortunately, at the time of writing, the documentation’s only in German although an English user manual is promised. From what I can make out, though, it looks like it could be really useful.

Written by Chris Webb

March 13, 2010 at 8:30 pm

Posted in Analysis Services

Google Public Data Explorer

with 2 comments

Google made yet another tentative step into the world of web-based BI with the launch of Public Data Explorer, a tool for analysing at least some of those public data sets that have been appearing thick and fast over the last year or so. Although it’s very fairly basic compared to other tools out there like (to pick two examples from many) Tableau Public or Timetric, it’s actually quite fun and much better than other Google efforts in this area like Fusion Tables. There’s a very limited number of data sets available at the moment and there aren’t many options for slicing and dicing, but the functionality that is there is quite slick – I especially like the way you can zoom in and out along the time dimension. Here’s a screenshot showing some European unemployment data:

image

Here’s a link to that analysis if you’d like to play with it yourself.

Of course, this will remain a toy until you can get more data into it: all the public data that’s available anywhere on the web, data that’s in Google spreadsheets, and maybe any data in more ‘difficult’ formats such as html tables in web pages (though that’s when you really need a complex tool like Kapow to extract it) or even any data that’s returned when you run a search (which Google has been doing to a limited extent for a year or so now; Public Data Explorer builds on this).

I’m surprised Wolfram Alpha hasn’t done something like this already; I wonder if Microsoft has something up its sleeve here too? After all it has a partnership with Wolfram Alpha to get data from there, and in Bing a search engine in search of differentiating features. Maybe Live Labs Pivot is part of the story? And given that it’s likely there’ll be some kind of cloud-based Analysis Services in the future, which I’m willing to bet would actually be more like a cloud-based PowerPivot and which will presumably work with Excel on the web, I can see Microsoft could have quite a strong story if it wanted. But this is all conjecture, of course. I wonder if there’s anyone on the Bing team who understands BI? No, I’m really making wild guesses now, so I think I’ll finish here…!

Written by Chris Webb

March 9, 2010 at 10:47 pm

Posted in Google

Thoughts on how PowerPivot and SSAS could work together

with 10 comments

After yesterday’s stream of consciousness on how PowerPivot could be used in SSRS, here’s a follow-up post on how PowerPivot and ‘traditional’ SSAS could be integrated. Hold on, you say, surely that’s a no-brainer? Surely all that would need to happen would be that Vertipaq would become a new storage mode inside SSAS, along with MOLAP, ROLAP and HOLAP, and everyone would be happy? Well, maybe. But here’s alternative idea that I bounced off some friends a while back and got good feedback on, which I thought I’d air here.

Before I go on, let me state my position on some things:

  • I like PowerPivot, and the more I use PowerPivot the more I like it.
  • I really like the power of the Vertipaq engine, and I want to be able to use it in a corporate BI environment.
  • I really like DAX, and I want to be able to use it in a corporate BI environment.
  • BUT SSAS as we have it today is a very mature, rich tool that I don’t want to lose. PowerPivot models will always be a little rough-and-ready; a good SSAS cube is a lot more ‘finished’ and user-friendly (I always liken building a cube to building a UI). SSAS dimension security is, for example, an absolute killer feature in many corporate BI solutions; PowerPivot won’t have anything like this until at least the next version, whenever that will be.
  • I also love MDX and I don’t want to lose it. MDX Script assignments, calculated members on non-measures dimensions, all of the things that PowerPivot and DAX can’t do (and probably won’t ever do) are things that I use regularly and in my opinion are essential for many complex, enterprise BI implementations.
  • I don’t want the dev team to abandon corporate SSAS, and neither do I want the dev team to waste time re-implementing things in PowerPivot that we already have in corporate SSAS. Already people are asking when they can have security and partitioning in PowerPivot. I want new stuff though!

So, like all users I want absolutely everything possible. How could it be done? Here’s my basic idea: let us be able to build regular SSAS cubes using PowerPivot models as data sources, with SSAS working in something similar to ROLAP mode so every request for data from the cube is translated into an MDX (or SQL – remember SSAS, and presumably PowerPivot, supports a basic version of SQL) query against the PowerPivot model.

In more detail, let’s imagine we have an instance of SSAS running in Vertipaq mode and an instance of SSAS running in normal mode. You’d be able to do the following:

  • Fire up BIDS and create a new SSAS project.
  • Create a data source, which was a PowerPivot database on your Vertipaq instance of SSAS.
  • Create a new Data Source View, which showed all of the tables in your PowerPivot database already joined. Alternatively, here I can imagine connecting to other data sources like SQL Server, creating a Data Source View as normal and then taking the DSV and deploying it as a PowerPivot model onto the Vertipaq instance of SSAS. So in effect, the DSV designer becomes a development environment for PowerPivot models.
  • Create a regular SSAS cube in the usual way, only using the PowerPivot tables in the DSV.
  • Set the storage mode of your dimensions and partitions to the new ROLAP-like storage mode; each SSAS partition could then be based on a separate PowerPivot table. This would mean that when you queried the cube, the SSAS instance issued MDX or SQL queries against the Vertipaq instance of SSAS, just as it issues SQL queries in ROLAP mode today. I suppose though there would be an overhead to making an out-of-process call, so maybe it would be better if you only had one instance of SSAS that could host both Vertipaq and regular SSAS databases at the same time, so all these requests could stay in-process.

The first, obvious, point here is that with this approach we get the traditional, rich SSAS cubes that we know and love and the raw speed of Vertipaq. So one objective is achieved. But I think there would be a lot of other benefits:

  • You’d get two cubes for the price of one: the PowerPivot cube and the SSAS cube. You could choose which one to query depending on your needs.
  • The ability to turn DSVs into PowerPivot models also gives you a proper development environment for creating PowerPivot models, integrated with BIDS and Visual Studio (so you also get source control). The current Excel-based UI is all very well, but us developer types want a nice visual way of creating relationships between tables.
  • You’re able to use all of the new data sources that PowerPivot can work with in traditional SSAS. Imagine being able to create a planning and budgeting solution where users wrote values into an Excel Services spreadsheet, which then fed into PowerPivot via the new Excel Services REST API, which then in turn fed into a SSAS planning and budgeting cube complete with custom rollups and all the complex financial calculations you can only do in MDX.
  • If your users have already built an existing PowerPivot model that they like and want to turn into an ‘official’ BI solution, you can very easily take that model as the starting point for building your cube by importing it into a DSV.
  • It would also make it relatively easy to upgrade existing SSAS projects to use PowerPivot storage – you’d just convert your existing DSV into a PowerPivot model.
  • SSAS drillthrough would be much, much faster because you’d be drilling through to the PowerPivot model and not the underlying relational source.
  • You’d also have the possibility of working in something like HOLAP mode. Vertipaq may be fast, but with really large data volumes some pre-calculated aggregations are always going to be useful.
  • You could define calculated measures in DAX in the PowerPivot model, and then expose them as measures in the SSAS cube. Probably you’d need some special way of handling them so they didn’t get aggregated like regular measures, but in some cases you’d want to take a calculated measure and sum it up like a regular measure (kind of like SQL calculations defined in named calculations today); many more calculations, like year-to-dates, can be treated as semi-additive measures. Effectively this means you are performing some multidimensional calculations outside the Formula Engine, in the SSAS Storage Engine (which in this case is PowerPivot), in the same way I believe that measure expressions work at the moment.
  • For such additive and semi-additive calculations, it also opens up the possibility of parallelism since these calculations can be done in parallel in each partition and the result summed at the end. It also means you get the option to use either DAX or MDX, and can choose the right language for the job.
  • There’s no duplication of dev work needed. For users of PowerPivot who want features like security, partitioning or parent/child relationships, you tell them they have to upgrade to regular SSAS; PowerPivot becomes something like SSAS Express. For users of SSAS who want the speed of Vertipaq, you tell them they have to use a PowerPivot database as their data source. The two complement each other nicely, rather like twins… now where have I heard that analogy before?
  • You also have a convincing story for BI professionals who are sceptical/hostile to PowerPivot to win them over: traditional, corporate SSAS does not go away but is able to build on the new features of PowerPivot.

So there we have it, another fantasy on the future of the MS BI stack sketched out. You may be wondering why I’ve taken the time to write these two posts – after all, I don’t work for Microsoft and I’m sure plenty of people on the dev team have their own ideas on what features they want to implement for Denali. Well, as the saying goes, if you don’t ask you don’t get! And with Kilimanjaro almost out of the door now’s the time to ask. If you agree with what I’ve said here, or you disagree, or you have a better idea, please leave a comment…

Written by Chris Webb

March 4, 2010 at 10:54 pm

Posted in Random Thoughts

PowerPivot/Excel/Sharepoint and SSRS – should they merge?

with 12 comments

I’ve been doing a fair amount of work with SSRS over the last few days, and with PowerPivot also fresh in my mind it got me thinking about the amount of overlap between SSRS and the PowerPivot/Excel/Sharepoint stack. Of course anyone who’s had to try to sell a MS BI solution to a potential customer over the last few years will have had to deal with conversations like this:

Customer: So, what is Microsoft’s solution for building BI dashboards?
Consultant: Well there’s SSRS, or if you want to build an SSAS cube you can use PerformancePoint, or maybe Excel and Excel Services, or you can go with any of these 50 third-party tools…it depends…
Customer: I’m confused already!

But just about any large software company has a certain amount of overlap between their products, that’s just life. However, that doesn’t mean that sometimes some rationalisation of products isn’t a good idea.

Let’s take a look at some of the things you’d want to do when building a dashboard, and how you can achieve them with both stacks:

Requirement SSRS PowerPivot/ Excel/ Sharepoint Comments
Get data from a number of different sources Create data sources and then datasets to return the data you want Import data into PowerPivot ‘tables’ from Excel, RDBMSes, OData feeds There’s a slight difference between the data sources here, but the most important case is always going to be getting data from a RDBMS, which both do well.

The key difference, though, is that in general with SSRS you get data on demand through parameterised queries, whereas with PowerPivot you import all the data you’re ever likely to need up front.

Integrate that data No real solution here, though SSRS developers have wanted to be able to do joins on datasets for a while. The new R2 lookup functions partly address this. Create joins between PowerPivot tables PowerPivot has the obvious advantage here, although for SSRS you can argue that in most cases any data integration should be happening upstream in your ETL.
Perform calculations on that data Use SSRS expressions Use DAX calculations I’d say that SSRS expressions, while as not powerful as DAX, are easier for most people to understand; however there are a lot of things that only DAX can do.
Create reports from that data Use BIDS if you’re a developer, or Report Builder if you’re a power user Use Excel or any client tool that speaks MDX (including SSRS) For developers, BIDS is a great tool for creating reports. However SSRS has always struggled with Report Builder – in my experience users find it too difficult. And that’s where Excel comes into its own: it’s a powerful tool and most end-users are familiar with it.
Publish reports to a wider audience Deploy up to your SSRS server Publish to Excel Services/Sharepoint The advantage SSRS has here is that most companies have no problem with the IT department setting up an SSRS server. On the other hand, Sharepoint is a Big Deal. If your company has a Sharepoint strategy, and is planning on installing Sharepoint 2010 Enterprise Edition, you’ll be fine with PowerPivot. If not, and I guess many companies are in this position, you have a problem.
Export reports to a variety of formats SSRS handles exporting to a lot of different formats Export to Excel isn’t a problem, but other formats are a bit trickier (though doable) SSRS has the clear advantage here
Schedule report refresh Again, SSRS has a lot of options for controlling when reports are refreshed PowerPivot’s functionality for scheduling when data is refreshed is a bit v1.0 SSRS has the advantage again

Anyway, you get the idea – there’s a fair amount of overlap and some things are done better by one tool, some things are done better by the other. Isn’t it, though, a bit of Microsoft’s time, money and energy to develop two parallel BI stacks? If they could merge in some way, it would mean less effort spent developing duplicate functionality and a more coherent message for customers to hear.

How could this be done, you ask? Well here are some vague ideas I had about what you’d need:

  • Inside SSRS – BIDS as well as Report Builder – in addition to the existing functionality for bring data into datasets, and possibly in the long term as a replacement for it, you get the option of building a PowerPivot model to act as the main source of data for your reports. For Report Builder especially I think this would be a winner, given that the PowerPivot UI for building models is already aimed at the same power users that Report Builder is aimed at.
  • The fact that you need to load all of your data into a PowerPivot model upfront is both an advantage and a disadvantage, depending on your scenario. When you know the data’s not going to change much, or you’ve got relatively small amounts of data, it’s good because you get joins and fast query performance. But if the data changes a lot or you don’t want the overhead of loading it into PowerPivot then you’d need the option to pass requests straight through PowerPivot back to your sources – so maybe PowerPivot would need something like ROLAP mode, or proactive caching, or the ability to make its tables work like existing SSRS datasets and send parameters to them.
  • Include proper support for MDX queries in SSRS reports (my old hobby horse). This would involve developing a proper, fully-functional MDX query builder (not the rubbish one SSRS has right now – a standard MDX query generator across all MS products which was also available as a control for custom development would be ideal) and the ability to bind the results of an MDX query direct to a tablix (and no messing around with mapping field names to rowgroupthingies in the tablix control please). If power users didn’t have to worry about tablixes and could just build their queries as easily as they could in an Excel pivot table, Report Builder would be a much more popular tool. I think many developers would appreciate it too. Once all the data you need for your report is in a PowerPivot model, and you have full MDX support in SSRS, the business of report design is much easier. You also no longer need to join datasets because the data is already joined in PowerPivot, you have a powerful calculation language in DAX, and query performance is extremely fast. Oh, and with this functionality in place you could probably kill off PerformancePoint too and no-one would complain…
  • Blur the line between Excel and SSRS. There’s been talk about being able to author SSRS reports in Excel for a long time (whatever happened to the Softartisans technology MS licensed?), but nothing’s ever come of it. Why not also have the option within SSRS to take a range from Excel Services and make that the body of your report? So your report is essentially still a fragment of an Excel worksheet, but it’s just surfaced via SSRS which then handles the refreshing and rendering to different formats.
  • You’d also need SSRS to be able to schedule the refresh of your PowerPivot model, but that should be very doable; it would be great if it could refresh different parts of the model at different times. SSRS would also maintain control over report security, rendering, folders etc etc.

The end result would be that this PowerPivot/Excel/SSRS hybrid would give you the best of both worlds. I also have some ideas about how PowerPivot and SSAS should be integrated which I might get round to blogging about soon too, that would fit nicely with this vision of the future.

What are the chances of all this happening? Practically zero. It would involve the SSRS team, the SSAS team and the Excel team setting aside their differences, co-operating, and possibly sacrificing large chunks of different products. But it’s a nice thought to kick around…

Written by Chris Webb

March 3, 2010 at 7:45 pm

Posted in Random Thoughts

SSRS and SSAS-Sourced Parameter Dataset Performance Issues

with 14 comments

Ahhh, Reporting Services and Analysis Services integration – a never ending source of pain and frustration! Although I shouldn’t complain because it’s been quite lucrative for me in terms of consultancy work…

Anyway, recently I was tuning a SSRS report based on a SSAS cube for a customer. Looking at Profiler I could see a number of slow queries being executed, which was strange given that it was a fairly simple report. It turned out that during the design phase this report had had a number of parameters created in the SSAS query designer that had later been deleted; however, when you delete a parameter in the SSAS query designer BIDS does not delete the hidden dataset that it is bound to. What’s worse is that when an SSRS report is executed all dataset queries are also executed, even if the datasets aren’t used anywhere in the report, which means you get the overhead of running extra queries. It’s an easy mistake to make and in this case the execution of unused datasets was adding several seconds to the execution time of the report.

You can reproduce this problem very easily by creating a simple report based on the Adventure Works cube. Once you’ve created the data source, open the query designer, drag the Internet Sales Amount measure onto the grid, drag the Customer hierarchy from the Customer dimension onto the filter pane and check the Parameters box:

image

Now close the query designer and reopen it, then remove the Customer hierarchy from the filter pane, close the query designer again and delete the report parameter. When you Preview the report you’ll see the following in Profiler:

image

The highlighted row shows the hidden dataset is being executed. What you need to do to fix this is to right-click on your data source and check the Show Hidden Datasets option:

image

You’ll then see the offending, automatically-generated, hidden dataset and you can delete it:

image

Luckily, BIDS Helper has functionality to find unused datasets in your reports for you:
http://bidshelper.codeplex.com/wikipage?title=Dataset%20Usage%20Reports&referringTitle=Home

And there’s more! What I found really interesting about this parameter dataset query was how long it was taking to execute. In this example 2.5 seconds, even on a warm cache, seems like a very long time to me even though there are a lot of members on the Customer hierarchy. Once the report is deployed that goes down to a consistent 2.1 seconds, and when I run the same query through SQL Management Studio it goes down to 1.5 seconds. Why the difference in execution times? I’m not sure, but I suspect it’s a combination of the connection string properties used and the use of a flattened rowset. In any case, 1.5 seconds is still slow and it’s certainly not good if you actually do want to use a query like this in a dataset bound to a parameter.

Luckily, if our parameter datasets are causing performance problems, we can usually rewrite the queries involved to make them faster. Here’s the original query from the parameter in the example:

WITH
MEMBER [Measures].[ParameterCaption] AS
[Customer].[Customer].CURRENTMEMBER.MEMBER_CAPTION
MEMBER [Measures].[ParameterValue] AS
[Customer].[Customer].CURRENTMEMBER.UNIQUENAME
MEMBER [Measures].[ParameterLevel] AS
[Customer].[Customer].CURRENTMEMBER.LEVEL.ORDINAL
SELECT
{[Measures].[ParameterCaption],
[Measures].[ParameterValue],
[Measures].[ParameterLevel]}
ON COLUMNS ,
[Customer].[Customer].ALLMEMBERS
ON ROWS
FROM [Adventure Works]

If we decide that we can make do without the All Member and the level-based indenting that goes on in the parameter dataset (this is an attribute hierarchy, after all, so there’s just one level), we can use the following query in the dataset instead:

WITH MEMBER MEASURES.DUMMY AS NULL
SELECT
{MEASURES.DUMMY}
ON COLUMNS ,
[Customer].[Customer].[Customer].MEMBERS
DIMENSION PROPERTIES MEMBER_CAPTION, UNIQUE_NAME
ON ROWS
FROM [Adventure Works]

Once the first query above has been replaced with the second, and the report parameter has been hooked up to use the new fields, Profiler shows that the time taken to execute the parameter dataset has gone down to around 0.7 seconds:

image

That is, of course, almost 2 seconds faster than the original query in Preview mode and almost 1.5 seconds faster than the original query in the deployed report. Not a lot on its own but certainly noticeable, and if you have more than one large parameter the cumulative gain could be quite significant. If you create a separate OLEDB connection to the cube and use the second query in a dataset, the execution time is even faster, going down to around 0.45 seconds:

image

Incidentally, some of the more astute may be asking why I need to include MEASURES.DUMMY in the query above when I can use an empty set on the columns axis instead. Two reasons: one, if you use an empty set on columns in the OLEDB connection you get no rows returned; two, I noticed when the query was being executed in SSRS a Query Subcube event was raised suggesting measure data was being requested from the cube – this didn’t happen when I ran the query in SQL Management Studio. I suspect both problems are something to with SSRS using a flattened rowset, so I’ll investigate and post back here when I get an answer.

Written by Chris Webb

March 2, 2010 at 12:26 am

Posted in Reporting Services

Follow

Get every new post delivered to your Inbox.

Join 2,867 other followers