Archive for March 2007
Yes folks, in conjunction with the UK SQL Server Community I’m proud to announce another BI Evening on the 26th of April and – by popular demand – it’s going to be in central London. We’ve got a star-studded line-up of speakers too: David Francis talking about PerformancePoint Monitoring and Analytics; the legendary Allan Mitchell on dimension and fact table ETL methods with SSIS; and all the way from the US of A (though not just for this event, he was coming here anyway) Reed Jacobson on PerformancePoint Planning and Budgeting. You’ll detect a bit of a PerformancePoint theme going on here…
Anyway, here’s the link to register:
Spaces are limited so you’d better hurry. There’s also a non-BI SQL Server event going on at the same time, and since both are being sponsored by those nice people at Red-Gate software I’m told there will be all sorts of goodies to give away.
Great post here from Erik Veerman on a way of optimising some many-to-many relationships:
I blogged about something similar, but applicable in slightly different scenarios, here:
In general the idea is that if you can reduce the size of any dimensions and measure groups involved in a m2m relationship then you’ll get better performance. I believe partitioning your intermediate measure group appropriately can also help.
Via Darren Gosbell, I see that the best practices included in the forthcoming SQL2005 Best Practices Analyzer have been turned into a white paper which you can download here:
Most of the recommendations are fairly obvious, some were new to me, some provoked a ‘they need to fix that’ response, and some I wasn’t sure I agreed with 100%. For example:
Avoid creating diamond-shaped attribute relationships
I quite often find myself designing diamond-shaped relationships in my attributes; the example they give in the text isn’t particularly good, but Day->Month->Year and Day->Week->Year is very common. OK most of the time I design user hierarchies along these paths anyway, but this need to design user hierarchies just for the sake of performance rather than because users actually want to see these hierarchies (and this isn’t the first time I’ve heard this recommended) is something that I don’t feel particularly happy about. It would be nice if we could make the choice between using attribute hierarchies and user hierarchies based on ease-of-use alone.
Avoid including unrelated measure groups in the same cube
This is the multiple measure groups and one cube vs multiple cubes with one measure group question that has been kicking around for at least a year. We were promised it was going to be addressed in the AS2005 Performance Guide but it wasn’t; when are we going to get some details published? In my experience I have seen a slight improvement in performance when you split cubes up in this way but nothing major, yet clearly there must be some reason for MS to keep recommending this so perhaps I’ve not come across the right scenario yet.
Avoid having very big intermediate measure groups or dimensions of many-to-many dimensions
Obviously having large intermediate measure groups and/or dimensions in m2m relationships is going to slow things down, but in my experience I’ve always been pleasantly surprised with the performance of m2m dimensions. I’ve seen intermediate measure groups with much more than 1 million members in perform really well…
Avoid having partitions with more than 20 million rows
As an aside, I recently noticed that in Eric Jacobsen’s recently blog entry on partitioning here he says you can also think about 2Gb (rather than a number of rows) as a rough guideline for the maximum size of a partition, and more interestingly that you should not have more than about 2000 partitions in your measure group. I asked him about this and it turns out that there are some internal limitations in AS at the moment, which hopefully will be fixed soon, that mean that slow down performance when you have large numbers of partitions. This does suggest that at the moment roughly speaking there’s a maximum size for a MOLAP cube to perform well of around 4Tb or 30 billion rows… not that I’ve ever seen a AS2005 MOLAP cube that big, but interesting to note.
Do set the Slice property on partitions that are ROLAP or partitions that use proactive caching
Based on Eric’s article mentioned in the last point, I think it’s also a good idea to set the Slice property on MOLAP partitions where possible too given that the automatic detection of which members appear in which partitions gets confused by overlapping slices. Greg Galloway recently did some experiments on this which I saw where he showed that if you set the slice on a MOLAP partition the auto-slice information isn’t used.
Avoid creating aggregations that are larger than one-third the size of the fact data
I quite frequently find myself designing aggregations which fail the one-third rule, although not by much, and for certain queries and calculations it can be a good idea. But I agree you should only do so as a last resort.
The SQL 2005 samples are now available for download from Codeplex:
Nothing you can’t get from the usual sources, of course, but at least now you can download the bits you want individually; the site will also act as a pointer to other SQL Server projects on Codeplex too.
One topic that has come up occasionally over the years is whether there is any merit in using a RAM Disk with Analysis Services to improve performance. Certainly in the SQL Server world they seem to have some benefits (see for example Tony Rogerson’s post here: http://sqlblogcasts.com/blogs/tonyrogerson/archive/2006/08/24/958.aspx) but as far as I know no-one has ever done any testing for Analysis Services so when I found myself with a bit of free time and a customer who might potentially benefit I thought I’d do so myself.
The RAM Disk I used was RamDisk from Superspeed, which as far as I can tell is the same product that Tony used in his blog entry above. You can download a trial version here:
I then created a new RAM drive with 512Mb space on it, created a new directory and then made that new directory my local AS instance’s data directory. Incidentally it didn’t work if I tried to point AS to the root of the new drive – I don’t know whether there’s some kind of reason for this but I certainly wasted a long time finding this out. After that I was able to restore a (pretty small) backed up AS database so that the data was held on the RAM drive and compare performance with the same database on my local drive. And… I found that even on quite long-running queries(>20 seconds, where complex calculations forced multiple reads of the same partition) there was little or no difference between the two; perhaps the RAM drive was slightly faster but it wasn’t consistently so. Even though I was clearing the cache before each query I suspect that what I was really doing was comparing the performance of the RAM drive with the Windows file system cache, and in any case IO only represents a fraction of the amount of time taken by a query for AS on a properly tuned database.
That said, I could imagine that there would be scenarios where a RAM disk would have more of an impact: for example where you’re doing regular processing of a partition as with pro-active caching, or where you have a lot of memory available and you’re querying much larger partitions than I was. Some of Superspeed’s other products such as SuperCache and SuperVolume might be interesting to test out for these scenarios, and indeed they make some big claims for performance improvements on Cognos and SAS BI systems here:
…which makes me think further investigation could be worthwhile.
There’s a new white paper by Jordi Rambla (of Solid Quality Mentors, who I also a lot of work with) on “Creating, Editing, and Managing Report Models for Reporting Services” available here:
I think Report Builder’s support for Analysis Services is even worse than the rest of Reporting Services’ support for Analysis Services – which means it’s pretty bad – but at least I now have a list of all of its quirks.
From Richard Tkachuk’s site, a new paper on setting up simple dynamic security in AS2005:
…although it doesn’t mention any of the more complex ways of implementing dynamic security, such as calling a sproc to return the allowed member set or persisting the permissions inside a measure group. Incidentally there was an interesting thread on the MSDN Forum recently that had some interesting comparisons between how these two approaches performed:
To be honest I’ve gone off the idea of dynamic security because of the performance implications – it stops you making full use of caching. It would be interesting to see a comparison of query performance in complex security scenarios between dynamic security and creating individual roles.