Archive for April 2010
One of the few sessions I was able to see last week at the PASS European Conference (I had to leave very early, unfortunately) was Markus Raatz’s excellent “New Treasures for Microsoft BI Found on Codeplex”. Among other things, he showed off a new project on Codeplex that I hadn’t seen before: SQL Server Metadata Toolkit 2008. It’s an updated version of the SQL Server Metadata Toolkit that was released for 2005 a few years ago, and allows you to trace metadata dependencies between the relational database, SSIS packages and SSAS objects. So if, for example, you change a column name in a dimension table you should be able to work out which SSIS packages and which SSAS dimensions will be broken. You can download it here:
I’ve not had a chance to download it yet, but it looks like it will be very useful.
There’s a lot of discussion going on in the SQL Server relational world about solid state disks and their benefits, but for some reason very little has been said about how useful they could be for Analysis Services. And yet, with their impressive performance for random reads, SSAS and SSDs would seem to be the perfect combination. I think the reason for the silence is that there’s less general understanding of SSAS performance issues out there in the blogosphere, and probably also since there are fewer really large installations of SSAS not many people have had the chance to test SSDs with SSAS properly.
Anyway, the other week I was lucky enough to spend some time testing SSDs with SSAS for a customer of mine. For reasons of privacy I can’t share any significant details about what I found, and before I go on I need to point out that what I was doing was comparing the performance of a particular SSAS cube on a particular existing hard disk configuration with the same SSAS cube on a particular SSD configuration. So while it’s certainly possible to say that, in lab tests, SSDs can perform extremely well for certain tasks, in the real world you have to ask the following question: what benefit will an SSD give me for my cube, over my existing disk configuration? So in fact even if I could share the details of my tests they wouldn’t be all that useful.
That said, here are a few high-level findings:
- Backup on the SSD was approximately 50% faster than on their existing, very respectable, hard disk configuration; Restore was 65% faster. These were the clearest examples of the SSD showing massive performance improvements.
- For various reasons I couldn’t do full tests on processing performance, but Process Index operations were about 10-15% faster, as was Synchronisation.
- Performance of individual queries was not significantly faster, maybe 1-2%; there were some cases where, with large numbers of concurrent users, I saw a bigger improvement of around maybe 10%. This didn’t surprise me at all because I knew that any query performance issues the customer has are related to the formula engine and not the storage engine, so IO wasn’t much of a problem in the first place.
These tests also led to some interesting behind-the-scenes discussions about how you can actually measure the impact of IO on SSAS storage engine operations, which Teo has already blogged about here and which are well worth reading:
For this particular customer, my recommendation was that it wasn’t worth the time, money and effort to move to SSDs, at least from a query performance point of view. In fact, based on several years experience of tuning SSAS implementations I find that in most cases IO is not one of the most important factors in determining query performance – it’s more often the case that the problem lies in the formula engine, and even when it doesn’t, an appropriate partitioning strategy and aggregation design can work wonders.
However, let me be clear: I am not making some general statement that SSDs are not useful for SSAS. What I am doing is giving the usual ‘your mileage may vary’ and ‘it depends’ answer. There are definitely going to be SSAS implementations where SSDs will make a massive difference to query performance, and where no other tuning technique can have the same impact. For example, I’ve heard that with distinct count measures (especially when there are a large number of concurrent users) SSDs can lead to very impressive improvements in query performance; I would assume that where there are large many-to-many relationships you would also benefit accordingly. What you need to do before you spend money on SSDs is to understand the causes of any performance issues you currently have and make sure you tune your cube to the best of you abilities.
Has anyone else had any experience with SSAS and SSDs they’d like to share?
UPDATE: the SQLCat team have published this technical note on the subject: http://sqlcat.com/technicalnotes/archive/2010/09/20/analysis-services-distinct-count-optimization-using-solid-state-devices.aspx
Wow, another SQLBits is over, and in my opinion it was the best yet. I know I always say that after we’ve done one, but this time we broke our attendance records (I think the official figure was 425 people, more than 50 more than the next highest attendance we’ve achieved) and everything went very smoothly indeed. We even escaped relatively unscathed by the volcanic ash problem: only one speaker couldn’t make it because of the flight cancellations, amazingly, although several European attendees were grounded and a lot of people will be having an unplanned weekend break in London right now. After his flight got cancelled, Bob Duffy got his pregnant wife to drive him overnight from Dublin so he could speak – that’s what I call dedication to the cause!
A big thank you goes to all my fellow members of the SQLBits committee, to all the volunteers who worked so hard on the day and the night before, to all our sponsors and to everyone who turned out and helped make the day such a success. Now, where shall we go for the next one? And how am I going to get to Germany next week for PASS Europe?
I’m pleased to announce that I’ll be teaching two public courses in Dublin, organised by my friends at Prodata. First up, on May 27th/28th, I’ll be teaching a course on real-world cube design and performance tuning for Analysis Services:
Then, in September (the exact date is to be confirmed), I’ll be doing my two-day Introduction to MDX course:
How you should go about naming objects in Analysis Services is another hobby horse of mine – I had a small rant about it in “Expert Cube Development with SQL Server Analysis Services 2008” and I thought I’d mention it here because it’s sure to provoke some debate.
My guiding principle when naming objects (cubes, dimensions, attributes, measure groups or measures) in Analysis Services is to think like you’re designing a user interface and not a database. That’s because a cube really is a kind of user interface: it’s a friendly, easy-to-query window onto all of the data in your data warehouse for non-technical users. That means that any object that the user is likely to see in the metadata, and which will appear in any reports, pivot tables or other queries the user generates, should have a name they understand and want to see. That also means that any techy naming conventions you follow in your relational database design should be completely ignored because, while they might make sense to you as an IT professional, they are likely to be gobbledegook to your users.
The commonest example of bad practice that I see is having Analysis Services dimensions called “Dim Something” – “Dim Product”, “Dim Time”, “Dim Customer” and so on. Hands up who has a cube with dimensions named like this? OK, feel ashamed. Ask yourself, do your users want to go to the CEO with an Excel spreadsheet containing column headers like this? No, of course not, they want to see “Product”, “Time” and “Customer”. They know these things are dimensions already and the “Dim” bit is totally redundant. Similarly, they don’t want to see measures called “Sls_Pct_Chg” or attributes called “CustID” or any of that; and even if you come up with what seems to be a reasonable, human-readable name yourself but it’s still not what the users want they’ll do their best to change it. By not giving them the names they want you’re generating extra work and hassle for them, putting them off using the cube, and making it more likely that different users will come up with different names for the same thing in reports.
Of course this means you have to go and talk to your users about what they want to have their objects called. Since changing an object’s name can end up breaking calculations and any reports that your users have already defined, then you need to do this right at the beginning of your project, even before you’ve run any wizards for the first time. You still need to make sure the names make sense, are consistent, and are acceptable to the whole user community, but ultimately it’s them making the decisions and not you. And if it’s too late to change things now on your current project, remember this post the next time you set about building a cube!