Archive for April 2008
It’s that time in the release cycle. I know what the new features are going to be for Analysis Services (for a full list see here) and it’s rather like Boxing Day: I’ve just got my presents and I’m starting to wonder to what Santa will bring on his next visit. And to a certain extent I’m feeling as though I should have been a bit more specific when I was writing my Christmas list last time, given that I’m a teeny bit underwhelmed by the features I did get in AS2008. So here, for the benefit of the elves working away in Redmond, is a list of things I’d love to see in Analysis Services 2011 or whatever it will be called:
Building and maintaining partitions is a job that almost everyone needs to do and yet it’s something that involves way too much hard work at the moment. As such it’s an ideal candidate for some form of automation, and indeed a while back Jamie Thomson posted the following suggestion on Connect: https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=127378. Other possibilities include the use of data mining to determine how you should set the slices for your partitions (which I blogged about a long time ago) and something like the MDX driven auto-partitioning that Greg Galloway implemented for the Analysis Services Stored Procedure Project.
- Do away with the arbitrary shaped sets error
This is a particular bugbear of mine. The more I work on cubes with complex calculations, the more I find that I want to scope calculations on arbitrary shaped sets, which of course I’m not allowed to do. That means I end up making several identical assignments to get the same result, which leads to way too much duplicate code and a maintenance nightmare. If I can break up an arbitrary shaped set into something acceptable to Analysis Services, why can’t Analysis Services do this itself? Here’s the Connect I logged on this: https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=339861
- Fix parent-child hierarchies
Parent-child hierarchies are a pain. For a start they cause performance problems so what I would want to see is the ability for AS to create aggregations at intermediate levels in a parent-child hierarchy at the very least; I’m sure there are a lot of other potential engine improvements that could be made for them. Secondly, scoping calculations on them is a real nightmare and on a related note trying to use dimension security with them is an equally thankless task. Given that there are some situations where you have no option but to use them, I think some time needs to be invested in making them better.
- Build parallelism into the calculation engine
What with all the block computation improvements in AS2008, I’m finding that many calculations are performing a lot better than in AS2005. However I’m still finding scenarios where the calculation engine is the bottleneck and just one CPU is being used on the server when I’m running a query (something I discussed here). Surely it’s possible to get the calculation engine to do some calculations in parallel when a query is being run?
- Be able to rename attributes in role-playing dimensions
If there was one feature I wanted to see in AS2008 it was this. It would have been so easy to do, I know loads of other people wanted it, and I did log it in Connect ages ago: https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=144500. And it wasn’t included. All I want to do is to be able to rename the individual attributes of a role-playing dimension instead of just being able to change the dimension name – for example if I have Order Date and Ship Date role-playing dimensions based on a single Date dimension, then I want my ‘Calendar Date’ hierarchy in Date to appear as ‘Calendar Order Date’ and ‘Calendar Ship Date’ to my end users. Not being able to do this has stopped me using role-playing dimensions on so many occasions simply because seeing multiple hierarchies with the same name is too confusing.
- Support for cross-dimensional user hierarchies
Sometimes I find myself in the situation where I’ve got a large dimension, such as a Customer dimension, where the lower-level attributes are queried much less often than the higher level attributes. Now AS2005 handles large dimensions pretty well, in my experience, but it would still be useful to be able to store less useful lower-level attributes in ROLAP mode and more useful higher-level attributes in MOLAP mode. One solution to this could be the ability to set the storage mode on individual attributes rather than the whole dimension but I’m not sure that’s practical; instead it would be great if you could split the attributes up into a ROLAP dimension and a MOLAP materialised reference dimension and be able to build a user hierarchy that spans both. So you’d be able to drill down from Country to City to Post Code to Customer, and when you were querying the first three levels you’d be hitting a MOLAP dimension and when you queried Customer you’d be hitting a different ROLAP dimension. In fact I’m sure the ability to set up cross-dimensional drill paths would be useful in other scenarios too and make cubes much easier to use.
- Support for partitioned dimensions
Dimensions are getting bigger and bigger, and I wonder if at some stage it would be useful to be able to partition dimensions as well as measure groups to get better performance and manageability? Just a thought.
- Be able to manage unary operators and custom member formulas from the MDX Script
Perhaps this is just me, but sometimes I find it’s a bit of a pain when you’re using unary operators and custom member formulas that their values have to be stored in the dimension table. Can’t we have the option to be able to set or override a unary operator or custom member formula from the MDX Script – after all, that’s meant to the one central repository for all my calculations. I wonder if this is possible anyway with an ALTER CUBE statement? Hmm, I don’t think so.
- New features for calculated members
As I’ve mentioned here before, quite often I see people creating real members in dimensions then overwriting their values with scoped assignments when I’m sure they would be better off with using calculated members. So to meet that and other challenges, here’s a list of things it would be cool to allow with calculated members:
- Allow calculated members to have other calculated members as children
- Be able to control where calculated members appear in a hierarchy
- Allow calculated members to have member properties, either returning static values or the results of MDX expressions
- Following on from that, if calculated members can have calculated member property values, real members should have that too. For example, on a Customer dimension I might know a date of birth but I might also want to be able to calculate a customer’s age and show that as a property
- Following on from that, it would be great to be able to create entire hierarchies based on calculated members or properties. So if I can calculate a customer’s age, and then have another calculated property that placed that customer into a bucket based on their age (eg 0-9, 10-19, 20-29 and so on), I’d also like to be able to determine the number of buckets based on some MDX as well (maybe using server parameters – see below) and create a user hierarchy where the top level would consist of calculated members representing these buckets and where I could drill down from these buckets to the individual ages and then down to the real customer members themselves.
The ability to parameterise an MDX query is cool, but it would also be great if you could also parameterise calculations on the server. For example you could set up a kind of server-wide variable which was the rate of Income Tax, do calculations using that value and then allow users to have the option of changing its value and explore ‘what-if’ scenarios. This would clearly have an impact on AS’s ability to cache calculations but so long as people knew about this then I’m sure it would be ok. These parameters would also be visible through the metadata so client tools would know to show users they could change them.
After my posting on how dynamic sets actually work in AS2005 a few months back, I got to thinking about why I was disappointed and what I really wanted to see. The problem is that however well you model your cube, and however easy-to-use your client tool is, there’s always a gap between what the tools are capable of and what the user is able to accomplish with them. For example, you might know that your end users want to be able to create reports showing the top 10 customers with an extra value underneath that represents the aggregated values of all other customers outside the top ten. This is certainly possible in MDX and perhaps there are client tools out there that support this, but wouldn’t it be cool if you could create this as a kind of pre-canned selection that was parameterisable (eg show the top 20 instead of the top 10, or show products not customers) and visible through the metadata so any client tool could display it to the end user and allow it to be used? This would be a selection that would be designed to work wherever the user placed it in his or her query; it’s not the same thing as a dynamic set, more of a parameterisable custom set function. The concept could be extended from pre-canned selections to entire queries too – and if I haven’t convinced you (or been entirely successful in explaining) the concept of pre-canned selections then I think the case for entire queries stored on the server is much easier to make. Again these would be parameterisable and discoverable through metadata, so that any client tool would (hopefully) be able to use them; for Reporting Services alone this would be extremely useful, so you could have one query displayed in many reports with a single point of maintenance. I’m sure there would also be an opportunity to introduce some performance benefits for these stored queries too, perhaps in terms of being able to cache entire cellsets rather than just the values of individual cells.
As I mentioned before here, the tools we’ve got at the moment are not up to the task of managing large numbers of security roles and some serious work needs to be done in the UI. It would also be great if the tools supported dynamic security and the creation of all the supporting objects, perhaps through a wizard.
This is an easy one to specify: I want all of the functionality in MDX Studio put into SQL Management Studio. And a hundred other little things, like in the metadata pane I want to be able to open a level and expand a member to drill down as well instead of always having to start at the All Member of a hierarchy; oh, and I’d like to have a Reconnect button for when I’ve got a query and I’ve killed my session because of cube changes, instead of having to disconnect and reconnect. And the ability to display…
A lot of people have been asking for this already; here’s the Connect: https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=321161. This would be soooo useful in performance tuning.
Analysis Services is useful not only as a database engine in its own right, but as a query/calculation layer on top of other database engines. OK no-one uses ROLAP on SQL Server because performance is rubbish compared to MOLAP, but for years people have been experimenting with using AS in ROLAP mode on top of Teradata (see here) and I’m sure that it could be used in a similar way on top of other data warehouse appliances or the new generation of COP databases like Vertica. You’d be selling it as a way of OLAP-ifying these databases, getting the incredible scalability and performance they (allegedly) offer in combination with the power of MDX for calculations and access to the data via Excel. Writing new cartridges for these databases should be pretty easy; I suppose the problem comes with trying to generate SQL optimised for each of these platforms.
MOLAP writeback in AS2008 certainly does improve performance, but the need always to allocate your values down to the granularity of the fact table is always going to lead to a performance hit. Wouldn’t it be useful if you could write a value back to a cell without having to allocate it down always? You’d get instant writeback. I’m sure that would be useful in many cases, and in fact it would work in the same way as if you’d assigned a value to that cell in the MDX Script. I wrote some code in the Analysis Services Stored Procedure Project that did this, in fact, although it wasn’t really anything more than a proof-of-concept exercise.
Anyway, this post has gone on quite long enough now, although I’m sure if I thought about it I could go on for even longer. I need to do some work! But what would you like to see in Analysis Services 2011? Answers in a comment, please…
One of the most commonly asked questions on the AS MSDN Forum is how to format measures that contain values in different currencies with the correct currency symbol. I’ve never blogged about this because a lot of people have already written up the solution in detail, for example Mosha:
However I was thinking about this recently and in my opinion there’s a big problem with using the Language property to do this. And that is that when you set the Language of a cell, you not only change the currency symbol but you also change other ways that the number is formatted, for example the symbols used as thousands separators and decimal separators. In the US and UK of course, we use full stops (I think they’re called ‘periods’ in the US?) as decimal separators and commas as thousands separators, but in continental Europe the roles are reversed. So the value:
would be interpreted as one hundred thousand and one in the UK, but one hundred point zero zero one in Germany, say, and the value:
would be interpreted in the opposite way. Borrowing one of the screenshots from Vidas’s post you can see how the Language property respects these conventions:
So you can see what the potential problem is – what happens if you have values in Euros, USDs and GBPs in your cube? However much you educate your users you can guarantee that someone at some time is going to get confused or worse not realise what’s going on and interpret the values incorrectly.
What’s the alternative then? I think using Format_String has to be the way to go. If you alter Vidas’s example so that instead of locale ids you put currency symbols inside the currency dimension named query, for example:
SELECT CurrencyKey, CurrencyAlternateKey, CurrencyName,
CASE CurrencyAlternateKey WHEN ‘GBP’ THEN ‘£’ WHEN ‘EUR’ THEN ‘€’ WHEN ‘JPY’ THEN ‘¥’ WHEN ‘USD ‘ THEN ‘$ ‘ END AS LocaleID
WHERE (CurrencyKey IN
(SELECT DISTINCT CurrencyKey
and then change his MDX assignment to be something like:
SCOPE ([Destination Currency].[Destination Currency].[Destination Currency].Members);
Format_String(This) = [Destination Currency].[Destination Currency].[Symbol].MemberValue + "#,#.00";
Then you get the desired result. However, one thing I did notice when I was experimenting with this is that if you try to use more than one character for your symbol (for example you might want to use CHF for Swiss Francs) you sometimes get the following error:
#Error The following system error occurred: Out of present range. .
Not good. Here’s the bug logged on Connect:
Following on from all this, it also makes sense that users connecting from different locales automatically see numbers (but not currency symbols) formatted in the convention of their locale. So a German person might connect to the cube and see Euros with a € and USDs with a $, but see commas used as decimal separators, whereas a user in the UK would still see €s and $s with the correct symbol but full stops used as decimal separators. Now AS2K I seem to remember used to be able to handle this perfectly well – it could display the appropriate decimal separator and thousand separator depending on the client locale. However AS2005 RTM had a problem in that it worked ok for calculated measures but not for real measures; this was ‘fixed’ in SP2 so both calculated measures and real measures always got displayed in the locale of the server. American software, eh? And to think that so many members of the dev team are from Europe too. Here’s the Connect:
and here’s a forums thread on the subject:
Hmmmmm…. I need to check if this has been fixed properly in Katmai.
“In preparation of SQL Server 2008 Analysis Services RTM’ing, I am starting a small beta program for the next version of ‘ascmd’. This version will have two important new features:
1) Built and verified against Analysis Services 2008 – but also directly compatible with AS2K5 – thus I need both AS2K8 and AS2K5 participates.
2) New multi-user capabilities – the Microsoft SQLCAT team uses ascmd as the multi-user load facility for its Analysis Services benchmarks. Ever want to load up a few hundred users against a SSAS server? We are going to be retrofitting those capabilities (plus a few others) back into the released version of ascmd.
I am looking for users to test out this new version: 1) to, at least, slipstream this version into their current use to see if we’ve broken anything (we don’t think so, but we want to fix anything if we did), and 2) test out the new features if you can.
We are also actively soliciting new features and improvements for ascmd at this time – if we can sneak them in, we will.
The “readme” for the current version of ascmd is located here: http://msdn2.microsoft.com/en-us/library/ms365187.aspx
If you are an avid user of ascmd and would like to be included in this beta, please contact me directly (firstname.lastname@example.org).”
Yet another excellent paper on optimising distinct count measures from the SQLCat team:
Actually I’m beginning to wonder whether I should be linking to the SQLCat team site – in the same way I never link to Mosha because I assume that everyone who reads my blog reads his too, then I would hope everyone subscribes to the SQLCat team blog as well.
One topic missing is a comparison of the performance of distinct count measures with the technique of using many-to-many dimensions to get the same result that Marco Russo describes in his famous m2m white paper:
Marco presented on this at PASS Europe and mentioned (which tallies with my experience) that this approach can perform as well as, and sometimes better than, a distinct count measure.
New developments on the Panorama/Google front: users of Panorama’s pivot table for Google docs will be able to connect direct to Analysis Services via http (if they want to pay) and also upload local cube files and query them (which will be free). Here’s the announcement:
This comes hot on the heels of Google’s Salesforce link-up:
They’re slowly joining the dots…
I was asked not to talk about this until the press release went out, but then I found that Nigel Pendse has already beaten me to it and done an excellent job of summing everything up:
I agree with Nigel that I don’t think there will be many people at the moment who will be willing to expose their SSAS server via http. However there are a small number of companies who provide hosted SSAS solutions and this I think will become a much more popular option over the coming years. I do think, though, that the local cube option has some very interesting possibilities especially because it’s free. I can imagine it would be a great way of distributing data to disconnected employees like salespeople, or to partners, for small companies on a tight budget. Tim Peterson must be pleased – this should cause a resurgence of interest in local cubes.
I just got back from Germany after attending the PASS Europe conference. It was good fun as always and the standard of presentations was extremely high. Thanks are due to André Kamman and everyone else who organised it for making it such a success. Hopefully we’ll have another one next year? Bill Graziano has a picture (the one on the right) of me, Marco Russo, Allan Mitchell and various others here:
Both Marco Russo and I were impressed by Webdashboard, who were exhibiting:
You’ll probably follow the link and think it’s nothing special. In technical terms it isn’t, it’s a web-based dashboard that displays data from Analysis Services; it’s simple to use, quick to set up and runs pretty quickly. How do they expect to compete with PerformancePoint then, you ask? Well, have you seen just how much PerformancePoint costs? It may be cheap compared to other non-MS BI solutions, but in the MS BI world it’s very expensive. Webdashboard is just €95 per end user and for companies on a tight budget who don’t need a complex dashboard tool I think it makes for a compelling alternative.