Archive for November 2008
Here’s a question which comes up all the time – it was asked at Mosha’s MDX seminar last week, and a friend of mine asked me about it recently too – what does the NonEmpty function do if you don’t specify the second parameter?
Let’s take a look at some example queries. I think everyone knows that you can use NON EMPTY before an axis definition to remove all the empty tuples on that axis, as with:
SELECT [Measures].[Internet Sales Amount] ON 0,
FROM [Adventure Works]
The problem comes when people assume that you can use the NonEmpty() function in the following way to get the same result:
SELECT [Measures].[Internet Sales Amount] ON 0,
FROM [Adventure Works]
In a lot of cases you might not see any obvious differences between what the two uses return, but if you run the query above you can see a lot of empty rows returned so they clearly aren’t the same. So what’s happening? If you clear the cache, rerun this second query and then run a Profiler trace you can get a hint:
Why are the Reseller Sales measure group partitions being hit? Because the Reseller Sales Amount measure is the default measure on the Adventure Works cube, and since we didn’t specify a measure in the second parameter for NonEmpty() it’s using the default measure to decide which dates have values or not. To fix this we can explicitly tell AS which measure to use:
SELECT [Measures].[Internet Sales Amount] ON 0,
,[Measures].[Internet Sales Amount])
FROM [Adventure Works]
The moral here is always, always, always specify a measure in the second parameter for NonEmpty() whenever you use it. If you don’t you may get unexpected results back and you may also get poor performance, for example if the default measure comes from a very large measure group.
Oh, and as a bonus tip, don’t ever use NonEmptyCrossjoin() with AS2005 or later. It’s difficult to use and frankly unpredictable in what it does sometimes; you can always do whatever you want with NonEmpty or Exists (when specifying a measure group in the third parameter) much more reliably and just as fast.
For more information on this topic, have a look at this old-but-good blog post from Mosha:
I’m back from my trip to Seattle at the PASS Summit and have just about recovered from the jet lag, so now’s a good time to blog about the last week. What did I get up to?
- Monday: Mosha’s MDX pre-conference seminar. This was the first time I’ve ever paid for any form of training out of my own pocket and I was not disappointed – it was everything I was expecting it to be, ie a great in-depth look at the inner workings of MDX. Over the subsequent days I was stopped by a few Microsoft people who asked me how it went, and I got the impression they thought it was going to be too detailed, but to be honest I actively wanted detailed information and so did the vast majority of other people present (the likes of George Spofford, Deepak Puri, Tomislav Piasevoli and so on were also there). Mosha may not be as slick as some of the more experienced speakers out there on the SQL conference circuit but he’s more than competent; the material was laid out in a logical order and his slides were clear. And the most important thing of all is that he is probably the only person in the world who has the knowledge to be able to give this kind of seminar. Some of the material was repeated from his blog but benefited greatly from being put into a wider context; some of the material was completely new to me, and given that I’m someone who’s lived and breathed MDX for the last ten years or so that’s quite something. I only wish he could have gone on for another day since he clearly had enough topics prepared to be able to do so.
- Tuesday: I had off, so I headed over to Redmond for a few meetings. Heard some interesting things (as I did all week) but all were NDA, unfortunately. Maybe one interesting point though: for some reason I had assumed that there were separate development teams for Gemini and Analysis Services, but that’s not true – Gemini is Analysis Services, it’s all the same team.
- Wednesday: Day 1 of the conference. The first session I attended was from a company called Meta Integration. At first I was a bit surprised since it was clearly a ‘vendor presentation’ – they were plugging their software, which is definitely not the done thing at a conference like PASS. I didn’t mind that much since it was an interesting product, and then as the presentation went on came to the realisation that this was software that wasn’t on sale to the general public anyway. What Meta Integration do is provide tools for metadata integration and lineage to the big BI software suppliers and they make a big play of being independent from any one vendor. So, for example, you can track metadata from an ERwin model to an Informatica-based ETL process and so on through to an Analysis Services cubes and Reporting Services reports; the cool thing is that if you make any changes upstream you can track which cubes and reports are going to be impacted even if you’re using a variety of BI tools from different vendors, as most companies are. This is a massive missing piece in the current MS BI stack and the fact that they already have it working today had many people in the audience salivating; what they want to do is license this software to Microsoft and their presence at the conference was, as far as I can see, a clever bit of PR to try to twist Microsoft’s arm into reaching a deal with them by showing Microsoft’s customers a glimpse of what might be possible in the next version of SQL Server. So I’ll come right out and say it: Microsoft, please license this software and don’t try to build it yourself! We need it ASAP!
- Thursday: the day of my presentation. Compared to other conferences I’d had to do way more preparation for it than usual because I’d decided to talk about a topic that I needed an awful lot of research rather than one I knew all about already. In fact I’d already decided that I should have submitted an abstract on cache warming for Analysis Services instead, which would have been much less work, but then on Monday Mosha made the typically gnomic remark that ‘cache warming was a bad thing’ so I’m glad I didn’t. Anyway the presentation itself went ok and despite being up against Kalen Delaney and Bob Ward I had a good crowd in the room. I was also very lucky that the SQLCat team’s presentation on monitoring Analysis Services, which covered a lot of the same topics, was scheduled for the next day so I wasn’t completely upstaged.
To be honest, the SQLCat team had the best content out of all the speakers. In my case, where Carl Rabeler et al covered the same material as I did, they did it better than me. The same goes for other presentations: there were rather a lot covering MDX query performance and cube tuning (possibly too many) and the SQLCat session on this subject with Richard Tkachuk and Thomas Kejser was by far the best I saw. I suppose they do have the advantage of inside information and the fact that they do presentations like this for a living.
I also stopped by some stands in the vendor exhibition space too. There was a disappointing lack of pure BI vendors, but I suppose they all blew their conference budget on the BI Conference last month. I did talk to the guys on the Hyperbac stand – they work in the backup and compression space, and were suggesting their latest product Hyperbac Online (which Simon Sabin blogged about recently) might come in handy for Analysis Services. Hmm, I don’t know given that AS does its own compression, but I’d be interested to try it out.
- Friday: spent a lot of time in private MVP sessions, which of course are all NDA. I also saw a good session from Donald Farmer on integrating data mining into other apps, which made me think that there would be much greater uptake on data mining if it was as easy to use AS data mining with AS cubes as it was to use the data mining Excel adding. Donald demoed something similar to what I blogged about here; why can’t this type of thing be built into BIDS for cubes? Who has ever used the MDX Predict function? What about this old idea (although the SQLCat team noted that the old rule of thumb for partitioning whereby you should have no more than 20 million rows per partition no longer applied – you can get good, if not better performance with partitions three times that size)?
The last session I saw was by Carolyn Chau and Sean Boon of the Reporting Services team, showing off all the new features of RS2008. As I’m sure you know, there’s a lot of cool new stuff in there but is it just me or is the tablix (which I discovered should be pronounced tay-blix rather than tab-lix, as I had been saying) a bit intimidating? Probably no more so than MDX to the uninitiated. Anyway, apart from all the things they’ve done for 2008 and the massive list of things they’ve got on their list to do for the next version, Carolyn mentioned the interesting point that SMDL (the language for building Report Builder 1.0 models) would not be coming back, but the idea of a semantic layer for RS would – and it would probably use the Entity Data Model in some way. Which makes me think: what would happen if Analysis Services went in that direction too? Replace the dsv with the Entity Framework and make Analysis Services an extra layer of metadata for aggregating entities? I don’t know enough about this subject to speculate further, but it’s an interesting area – who knows, it could lead onto the resurrection of the idea of the Unified Dimensional Model, with a Gemini-powered AS as the super-fast caching layer for all your reporting/querying needs.
All in all I enjoyed myself and I think I made the right decision going to PASS rather than the BI Conference – there was more than enough BI content for me and the conference itself was well run and good fun. I’d really like it if PASS and the BI Conference were merged into one mega-conference so I didn’t have to choose.
Miky Schreiber has just posted a cool utility on his blog that addresses what is, in my experience, a frequent need on many Analysis Services projects: the ability to apply a common set of calculations to a number of cubes. You can find out more and download it here:
Definitely something that should be considered by the SSAS dev team for the next version.
I’ll be there. In fact I’m sure just about everyone who is anyone in the world of MDX will, like me, be there on Monday at Mosha’s MDX seminar – assuming, of course, that Mosha actually makes it himself this time…!
Anyway, if you see me, say hello, and come along to my session first thing on Thursday morning. I’m going to be speaking about "How to build a SSIS, SSRS and SSAS monitoring solution using SSIS, SSRS and SSAS" – basically building a data warehouse to hold all the logging data that the three servers produce, building a cube on top and writing some queries/building reports so you can visualise activity. I submitted the abstract for this before I’d done any work on the subject and of course soon came to realise that it’s a massive topic, so I’ve been working like mad for the past few weeks to get my demos and slides together. But I will have some cool stuff to show, I promise, even though I’ll still be putting the finishing touches to it 5 minutes before the session starts.
I’ve just been told that "MDX Solutions" has been translated into Chinese:
It’s a shame that Wiley aren’t interested in doing a third edition at the moment, but luckily not much has changed in MDX between AS2005 and AS2008 so the content is still relevant. I see there’s some competition coming in the future from "Microsoft SQL Server 2008 MDX Step-By-Step" though…
If you didn’t make it to the last SQLBits (yeah, I know you had a good excuse: it was on a Saturday and you had to wash your hair, or your wife doesn’t let you out on weekends, or you live in Nebraska) then you’re in luck: a film crew from Microsoft filmed most of the sessions including the whole BI track and they’re now available for viewing on the SQLBits website:
I’ve already linked to Nigel Pendse’s initial comments on Gemini, which you can see here:
However those nice people at the OLAP Report have just given me access to see the subscriber-only content, which is probably the most detailed write-up available (I assume Nigel has been briefed by the Gemini dev team):
It doesn’t say anything much new, but I guess if you’re someone like Qlikview you probably want to get as many details on this as you can! Nigel is much more positive about it all than I was; he may well have a better idea of MS’s proposed solutions to the management problems everyone’s been highlighting: "Microsoft has some clear ideas about the role IT will be able to play in Gemini deployments, but the details were slightly fuzzy during the October announcements. It promises more details will emerge in the following months".
He also notes the problem of Excel users needing to define multidimensional calculations and says "Microsoft has not provided details of how this will be done, except to emphasize that Gemini users will not need to learn MDX. It’s proposal is to allow simple dimensional tests in Excel-like formulas in Gemini. These Excel-like calculation rules will be equivalent to MDX, but far easier for an Excel user to understand". Hmm, we’ve already seen MDX and PEL try and fail to make multi-dimensional calculations easy to use… I wonder if we’ll have a third stab at the problem?
Just had one of my occasional looks at Richard Tkachuk’s site,
and there’s some interesting new information on the home page. There’s an article on how to handle time intervals in AS (I’ve got some ideas on a different way of handling this, but I’d need to test them out to see how they perform), a note on how the Aggregate function works with calculated measures, and a draft of the AS 2008 Performance Guide that is a must-read: