Things I’d like to see in Analysis Services 2011
It’s that time in the release cycle. I know what the new features are going to be for Analysis Services (for a full list see here) and it’s rather like Boxing Day: I’ve just got my presents and I’m starting to wonder to what Santa will bring on his next visit. And to a certain extent I’m feeling as though I should have been a bit more specific when I was writing my Christmas list last time, given that I’m a teeny bit underwhelmed by the features I did get in AS2008. So here, for the benefit of the elves working away in Redmond, is a list of things I’d love to see in Analysis Services 2011 or whatever it will be called:
Building and maintaining partitions is a job that almost everyone needs to do and yet it’s something that involves way too much hard work at the moment. As such it’s an ideal candidate for some form of automation, and indeed a while back Jamie Thomson posted the following suggestion on Connect: https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=127378. Other possibilities include the use of data mining to determine how you should set the slices for your partitions (which I blogged about a long time ago) and something like the MDX driven auto-partitioning that Greg Galloway implemented for the Analysis Services Stored Procedure Project.
- Do away with the arbitrary shaped sets error
This is a particular bugbear of mine. The more I work on cubes with complex calculations, the more I find that I want to scope calculations on arbitrary shaped sets, which of course I’m not allowed to do. That means I end up making several identical assignments to get the same result, which leads to way too much duplicate code and a maintenance nightmare. If I can break up an arbitrary shaped set into something acceptable to Analysis Services, why can’t Analysis Services do this itself? Here’s the Connect I logged on this: https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=339861
- Fix parent-child hierarchies
Parent-child hierarchies are a pain. For a start they cause performance problems so what I would want to see is the ability for AS to create aggregations at intermediate levels in a parent-child hierarchy at the very least; I’m sure there are a lot of other potential engine improvements that could be made for them. Secondly, scoping calculations on them is a real nightmare and on a related note trying to use dimension security with them is an equally thankless task. Given that there are some situations where you have no option but to use them, I think some time needs to be invested in making them better.
- Build parallelism into the calculation engine
What with all the block computation improvements in AS2008, I’m finding that many calculations are performing a lot better than in AS2005. However I’m still finding scenarios where the calculation engine is the bottleneck and just one CPU is being used on the server when I’m running a query (something I discussed here). Surely it’s possible to get the calculation engine to do some calculations in parallel when a query is being run?
- Be able to rename attributes in role-playing dimensions
If there was one feature I wanted to see in AS2008 it was this. It would have been so easy to do, I know loads of other people wanted it, and I did log it in Connect ages ago: https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=144500. And it wasn’t included. All I want to do is to be able to rename the individual attributes of a role-playing dimension instead of just being able to change the dimension name – for example if I have Order Date and Ship Date role-playing dimensions based on a single Date dimension, then I want my ‘Calendar Date’ hierarchy in Date to appear as ‘Calendar Order Date’ and ‘Calendar Ship Date’ to my end users. Not being able to do this has stopped me using role-playing dimensions on so many occasions simply because seeing multiple hierarchies with the same name is too confusing.
- Support for cross-dimensional user hierarchies
Sometimes I find myself in the situation where I’ve got a large dimension, such as a Customer dimension, where the lower-level attributes are queried much less often than the higher level attributes. Now AS2005 handles large dimensions pretty well, in my experience, but it would still be useful to be able to store less useful lower-level attributes in ROLAP mode and more useful higher-level attributes in MOLAP mode. One solution to this could be the ability to set the storage mode on individual attributes rather than the whole dimension but I’m not sure that’s practical; instead it would be great if you could split the attributes up into a ROLAP dimension and a MOLAP materialised reference dimension and be able to build a user hierarchy that spans both. So you’d be able to drill down from Country to City to Post Code to Customer, and when you were querying the first three levels you’d be hitting a MOLAP dimension and when you queried Customer you’d be hitting a different ROLAP dimension. In fact I’m sure the ability to set up cross-dimensional drill paths would be useful in other scenarios too and make cubes much easier to use.
- Support for partitioned dimensions
Dimensions are getting bigger and bigger, and I wonder if at some stage it would be useful to be able to partition dimensions as well as measure groups to get better performance and manageability? Just a thought.
- Be able to manage unary operators and custom member formulas from the MDX Script
Perhaps this is just me, but sometimes I find it’s a bit of a pain when you’re using unary operators and custom member formulas that their values have to be stored in the dimension table. Can’t we have the option to be able to set or override a unary operator or custom member formula from the MDX Script – after all, that’s meant to the one central repository for all my calculations. I wonder if this is possible anyway with an ALTER CUBE statement? Hmm, I don’t think so.
- New features for calculated members
As I’ve mentioned here before, quite often I see people creating real members in dimensions then overwriting their values with scoped assignments when I’m sure they would be better off with using calculated members. So to meet that and other challenges, here’s a list of things it would be cool to allow with calculated members:
- Allow calculated members to have other calculated members as children
- Be able to control where calculated members appear in a hierarchy
- Allow calculated members to have member properties, either returning static values or the results of MDX expressions
- Following on from that, if calculated members can have calculated member property values, real members should have that too. For example, on a Customer dimension I might know a date of birth but I might also want to be able to calculate a customer’s age and show that as a property
- Following on from that, it would be great to be able to create entire hierarchies based on calculated members or properties. So if I can calculate a customer’s age, and then have another calculated property that placed that customer into a bucket based on their age (eg 0-9, 10-19, 20-29 and so on), I’d also like to be able to determine the number of buckets based on some MDX as well (maybe using server parameters – see below) and create a user hierarchy where the top level would consist of calculated members representing these buckets and where I could drill down from these buckets to the individual ages and then down to the real customer members themselves.
The ability to parameterise an MDX query is cool, but it would also be great if you could also parameterise calculations on the server. For example you could set up a kind of server-wide variable which was the rate of Income Tax, do calculations using that value and then allow users to have the option of changing its value and explore ‘what-if’ scenarios. This would clearly have an impact on AS’s ability to cache calculations but so long as people knew about this then I’m sure it would be ok. These parameters would also be visible through the metadata so client tools would know to show users they could change them.
After my posting on how dynamic sets actually work in AS2005 a few months back, I got to thinking about why I was disappointed and what I really wanted to see. The problem is that however well you model your cube, and however easy-to-use your client tool is, there’s always a gap between what the tools are capable of and what the user is able to accomplish with them. For example, you might know that your end users want to be able to create reports showing the top 10 customers with an extra value underneath that represents the aggregated values of all other customers outside the top ten. This is certainly possible in MDX and perhaps there are client tools out there that support this, but wouldn’t it be cool if you could create this as a kind of pre-canned selection that was parameterisable (eg show the top 20 instead of the top 10, or show products not customers) and visible through the metadata so any client tool could display it to the end user and allow it to be used? This would be a selection that would be designed to work wherever the user placed it in his or her query; it’s not the same thing as a dynamic set, more of a parameterisable custom set function. The concept could be extended from pre-canned selections to entire queries too – and if I haven’t convinced you (or been entirely successful in explaining) the concept of pre-canned selections then I think the case for entire queries stored on the server is much easier to make. Again these would be parameterisable and discoverable through metadata, so that any client tool would (hopefully) be able to use them; for Reporting Services alone this would be extremely useful, so you could have one query displayed in many reports with a single point of maintenance. I’m sure there would also be an opportunity to introduce some performance benefits for these stored queries too, perhaps in terms of being able to cache entire cellsets rather than just the values of individual cells.
As I mentioned before here, the tools we’ve got at the moment are not up to the task of managing large numbers of security roles and some serious work needs to be done in the UI. It would also be great if the tools supported dynamic security and the creation of all the supporting objects, perhaps through a wizard.
This is an easy one to specify: I want all of the functionality in MDX Studio put into SQL Management Studio. And a hundred other little things, like in the metadata pane I want to be able to open a level and expand a member to drill down as well instead of always having to start at the All Member of a hierarchy; oh, and I’d like to have a Reconnect button for when I’ve got a query and I’ve killed my session because of cube changes, instead of having to disconnect and reconnect. And the ability to display…
A lot of people have been asking for this already; here’s the Connect: https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=321161. This would be soooo useful in performance tuning.
Analysis Services is useful not only as a database engine in its own right, but as a query/calculation layer on top of other database engines. OK no-one uses ROLAP on SQL Server because performance is rubbish compared to MOLAP, but for years people have been experimenting with using AS in ROLAP mode on top of Teradata (see here) and I’m sure that it could be used in a similar way on top of other data warehouse appliances or the new generation of COP databases like Vertica. You’d be selling it as a way of OLAP-ifying these databases, getting the incredible scalability and performance they (allegedly) offer in combination with the power of MDX for calculations and access to the data via Excel. Writing new cartridges for these databases should be pretty easy; I suppose the problem comes with trying to generate SQL optimised for each of these platforms.
MOLAP writeback in AS2008 certainly does improve performance, but the need always to allocate your values down to the granularity of the fact table is always going to lead to a performance hit. Wouldn’t it be useful if you could write a value back to a cell without having to allocate it down always? You’d get instant writeback. I’m sure that would be useful in many cases, and in fact it would work in the same way as if you’d assigned a value to that cell in the MDX Script. I wrote some code in the Analysis Services Stored Procedure Project that did this, in fact, although it wasn’t really anything more than a proof-of-concept exercise.
Anyway, this post has gone on quite long enough now, although I’m sure if I thought about it I could go on for even longer. I need to do some work! But what would you like to see in Analysis Services 2011? Answers in a comment, please…