Chris Webb's BI Blog

Analysis Services, MDX, PowerPivot, DAX and anything BI-related

Archive for October 2006

NON_EMPTY_BEHAVIOR and sets

with 5 comments

Late last year, in the middle of an email correspondence with Mosha, I included the following piece of an MDX Script containing a calculated member definition generated by BIDS when working in form view:

CREATE MEMBER
CURRENTCUBE.[MEASURES].[Demo]
AS [Measures].[Sales]*2,
FORMAT_STRING = "#,#",
NON_EMPTY_BEHAVIOR = { [Sales] },
VISIBLE = 1  ;

Mosha commented that putting braces round the measure [Sales] in the NON_EMPTY_BEHAVIOR property in this case would ‘do more harm than good’ and, although he didn’t expand on why this was (a good subject for a blog entry Mosha?) ever since then I’ve dutifully removed the braces that BIDS puts in but never noticed much impact. Until yesterday, when a query I was tuning which was running in 45 seconds started running in 8 seconds simply as a result of doing this. Hmmm…

While we’re here, it’s a personal hobby horse of mine to insist on using full unique names in all MDX calculations. So, in this case, I would use [Measures].[Sales] rather than [Sales]. Not only is it more readable but if you’re using dimension security you might run into problems if you don’t, as the following thread on the MSDN Forum demonstrates:
http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=753483&SiteID=1

So, just to be clear, if you’re using NON_EMPTY_BEHAVIOR and have created your calculated member in form view, always be sure to change it from the format above to be something like this:

… NON_EMPTY_BEHAVIOR = [Measures].[Sales] …

Written by Chris Webb

October 26, 2006 at 11:06 pm

Posted in Uncategorized

UK BI User Group Evening – 29th November

leave a comment »

As I mentioned the other week there’s going to be another BI User Group evening event at TVP in Reading next month, held in association with Tony Rogerson and the UK SQL Server User Community. I’ve finally got the agenda together and you can register here:
http://sqlserverfaq.com/?eid=83

First up we’ve got David Parker, a Visio MVP, speaking about the new BI features in Visio 2007. I’ve been interested in this subject ever since Nick Barclay first brought it to my attention; David is actually writing a book on the subject (see http://www.visualizinginformation.com/) which I’m looking forward to reading when it comes out. Next we’ve got Sanjay Nayyar of IMGroup talking about the various ways you can use Microsoft BI tools with SAP. Finally, we have Andrew Sadler of Proclarity/Microsoft talking about PerformancePoint Server, probably the hot topic of the moment. I’m sure there’ll be some interesting discussions afterwards…

Written by Chris Webb

October 24, 2006 at 12:06 pm

Posted in Uncategorized

Designing Effective Aggregations in AS2005

with 22 comments

Aggregations are, as we all know, the key to getting the best performance out of your cube. In AS2K it was usually enough to run the Aggregation Design Wizard to get a reasonable aggregation design; in my experience with AS2005 it’s much harder to achieve good results – a lot of people run the wizard and find that performance is still poor. So what I thought I’d do is detail the process I go through when I’m designing aggregations…

  1. The first step in getting good performance is making sure your dimension design is as clean as possible. The commonest thing I see when I do performance tuning on a cube is that there are loads of useless attributes in the dimensions which have been created by the cube design wizard. Let’s take a time dimension as an example: imagine you had a table with four columns, timeid (containing the surrogate key), month, quarter and year, where month is the lowest level of granularity. In a lot of cases the wizard will translate this to four attributes, one based on each column, but of course you don’t need an attribute based on the timeid column – no user is ever going to want to analyse data by a surrogate key value. Since timeid is in fact the key value corresponding to the month name given in the month column, what you should do in this case is delete the month attribute, set the name column property of the timeid attribute to point to the month column and rename the timeid attribute to Month; its key column property will stay pointing to the timeid column. As a result you have one attribute where you used to have two, a lot less complexity in the ‘space’ of your cube which will make designing aggregations easier, and a more efficient design.
  2. Set the AttributeHierarchyEnabled property of any attributes which you don’t want users to analyse by to False. An example would be an address attribute, which you might want to display as an AS2K-style member property, but you would never expect a user to drag onto a grid or slice by on its own. Again, this reduces the ‘space’ of the cube and makes designing aggregations easier by reducing the number of attributes that need to be considered during design.
  3. Next, I always see if I can design in one-to-many relationships between attributes where possible. Going back to the time dimension example, it could be that in the quarter column of your dimension table you have values like ‘Q1′ and ‘Q2′ – these tell you what quarter any given month is in, but don’t tell you what year the quarter is in, so there are no one-to-many relationships between year and quarter and quarter and month. If you add a column concatenating the quarter with the year, ie so you get values like ‘Q1 1999′ and ‘Q2 2000′ and build your quarter attribute from the new column, one-to-many relationships will exist; if your users still want to analyse by quarter independently of year then you can still build an attribute from that column too. Quite why this is a good thing will become clear later on.
  4. Model attribute relationships between attributes where possible. Attribute relationships are, as I’ve said on several occasions in the past, the single most important thing to consider when designing a dimension and because their importance is not flagged anywhere in the UI or BOL I find they’re widely neglected. Setting them correctly is extremely important for aggregation design and also for allowed AS to use aggregations at lower levels of granularity when your query doesn’t hit an aggregation directly. For more information about attribute relationships and how to set them, I recommend taking a look at the TechEd presentation on AS performance I linked to in my last post http://cwebbbi.spaces.live.com/blog/cns!7B84B0F2C239489A!906.entry, BIN 316. You should also read my post on the dangers of modelling redundant attribute relationships: http://cwebbbi.spaces.live.com/Blog/cns!7B84B0F2C239489A!619.entry
  5. Model all the user hierarchies you think you’ll need. If you’ve set you’re attribute relationships you’ll (hopefully) find that a fair propotion of them are natural hierarchies, ie a one-to-many relationship exists between each level. This is probably a good place to link to Elizabeth Vitt’s excellent post on influencing aggregation candidates: http://www.sqlskills.com/blogs/liz/2006/07/03/InfluencingAggregationCandidates.aspx . Before you read this post any further I suggest you read, learn and inwardly digest her post. The only thing I disagree with her about is possibly a typo – at the end of her post she says "scenario 6 and scenario 2 provide the best solutions from an aggregation candidate point of view" - I think she meant to say scenario 6 and scenario 4, since scenario 4 has attribute relationships set whereas scenario 2 doesn’t.
  6. Set the AggregationUsage property on each attribute appropriately. Elizabeth’s post above is very good on what this does; the only thing I have to add is that in many cases I find that relying on natural user hierarchies alone to influence aggregation design isn’t enough. For example, I find that on a Time dimension users almost always analyse data at the month attribute level even when higher levels of granularity on that dimension exist – in these cases I set the AggregationUsage property to Full for the month attribute, and this ensures that every aggregation designed by a wizard includes the month attribute. If you don’t do this then you might find you run the wizard and no aggregations are built at month, and so when your users start querying they get no benefit from the aggregations you’ve built. Similarly, I find it’s a good idea to set AggregationUsage to Unrestricted on any attribute hierarchies you’re exposing to users which you expect to be queried extensively, and setting AggregationUsage to None on any attributes which are part of user hierarchies (especially at higher levels) but which you nevertheless don’t expect to be queried that much.
  7. Run the Aggregation Design Wizard. What I usually do before this is to deploy and fully process my cube with no aggregations; you can then run the wizard and design aggregations, then process the aggregations separately by doing a Process Index. This way you can see what the performance of your cube is like before aggregations have been built and get a better feel for how much of an impact they’re making on performance; you can then throw your aggregations away, design some new ones and do another Process Index (which is generally fairly quick) if you need to. When you run the wizard, on the ‘Specify Object Counts’ step, where possible click count to make sure that the values in the Estimated Count column are as up-to-date as possible. More importantly, if you’re partitioning your cube, you need to make sure that the values in the Partition Count column reflect the count of the numbers of members on each attribute that will exist in each partition: for example, if you’re partitioning by month, you will want to tell AS that there’s only one month in each partition. These counts are, I believe, used during the aggregation design process to estimate the size of an aggregation, and if you don’t specify correct values then the algorithm may incorrectly think a useful aggregation is much bigger than it actually would be and not build it as a result. On the ‘Set Aggregation Options’ step of the wizard the rule was in AS2K to choose the ‘Performance Gain Reaches’ option and set it to stop at 30%. This is still good advice but more and more in AS2005 I find myself using the ‘Estimated Storage Reaches’ option instead: in an AS2005 cube there are a lot of very small aggregations that can be built, and by stopping at 30% Performance Gain you may find that you have still only built a few Kb of aggregations. If you select the ‘I click Stop’ option and watch the design grow until the estimated size is ridiculously large (maybe over a couple of Gb) you can then get a feeling for how many small aggregations can be built; you can then stop it, reset the aggregations and then restart using either the ‘Performance Gain’ or ‘Storage Reaches’ option set to an appropriate level.
  8. Design aggregations manually if necessary. In AS2K you only needed to design aggregations manually in very rare cases, but I find myself doing it much more frequently in AS2005 typically when I have a particular query or RS Report which needs tuning. There is apparently a tool in the pipeline which will help with this task (I’ve heard several people mention it on the newsgroup) but I’ve not seen it yet and I don’t know when it will be released; in the meantime designing aggregations manually means hacking the XMLA definition of a measure group. When you design aggregations using one of the wizards, the end result is an ‘aggregation design’ which gets saved on the measure group; each partition then has a property which refers to the id of the aggregation design on the measure group its assocated with. In SQL Management Studio, to see an aggregation design, you need to connect to AS in the Solution Explorer pane, expand your cube so you can see the measure groups in it then right-click on a measure group and select ‘Script Measure Group as/ALTER to/New Query Editor Window’. In the resulting XMLA if you collapse the Measures, Dimensions and Partitions nodes you should be able to see the AggregationDesign node; if you collapse the Dimensions node underneath it you’ll see an Aggregations collection containing the list of aggregations in the aggregation design. It’s pretty easy to work out what’s going on: each aggregation has a name and id which by default is a number in hexadecimal, and it consists of a collection of dimensions. If there’s an aggregation designed which is at a lower granularity than the root of the dimension then there’ll be an attribtues collection, inside which will be one or more attribute objects.  An aggregation is just fact data summarised up to a certain level, and when you see an attribute mentioned in an aggregation definition you know that the aggregation contains data summarised up to that level of granularity. To design your own aggregations manually all you need to do is copy the last aggregation in the list, paste it into the definition at the end of the list, update its id and name properties and add or remove attributes to set its granularity; to determine what granularity you need to build your aggregation at you’ll find it useful to run a Profiler trace and look at the Query Subcube and Query Subcube Verbose events which Mosha describes in more detail here: http://sqljunkies.com/WebLog/mosha/archive/2006/01/05/cache_prefetching.aspx. Assuming you’re editing an ALTER TO script you can then just run the command in SQLMS and run a Process Index to get all your aggregations, including the new one, built.
  9. Run the Usage Based Optimisation Wizard. In theory, Usage-Based Optimisation should be the icing on the cake as far as performance tuning goes. All you should need to do is turn on the query log (right-click on the server name in SQLMS and look at the server properties to do this) so that the queries your users are actually running are captured, then you just run the wizard, select which queries you want tuned and then extra aggregations are built appropriately. This is how it worked in AS2K, but the other week I made the discovery that in AS2005 when you run the wizard it overwrites any aggregations you have previously designed on the cube without giving you the option to add to the ones you’ve already got, so if you’re not careful you could end up tuning one set of queries and find that other ones start running slower. To stop this happening what you’d need to do is script your measure groups before running the wizard to save your existing aggregations, then after the wizard has completed try to merge the two sets of aggregations together manually. This isn’t exactly ideal so I’ve opened a Connect to try to get it changed: https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=221844 Feel free to vote on this issue to get it fixed!
  10. Check to see whether aggregations are actually being used by your queries. You can do this very easily in Profiler that includes the ‘Get Data From Aggregation’ event (found under ‘Query Processing’ and only visible if you check ‘Show all events’). It’s worth remembering that even when your query hits an aggregation directly it will still perform poorly – there are a lot of other factors in play such as calculations which can affect query performance.

So anyway, there are my thoughts on designing aggregations. If I’ve forgotten anything or got anything wrong, or you’ve got any of your own tips to add, please leave me a comment! Hopefully the AS2005 version of the "Analysis Services Performance Guide" which I understand is coming soon will contain all of this information and more, but in the meantime I hope you find it useful.

UPDATE: James Snape has a useful blog post describing what all those 1s and 0s mean in Profiler – something I probably should have explained myself:
http://www.jamessnape.me.uk/blog/2006/11/09/SubcubeQueries.aspx

Written by Chris Webb

October 23, 2006 at 4:01 pm

Posted in Uncategorized

Analysis Services TechEd Presentations

leave a comment »

If, like me, you’re not going to a TechEd this year you might be interested to see some of the slide decks for the Analysis Services presentations which I’ve just discovered can be downloaded here:

http://www.microsoft.com/hk/technet/teched2006/agenda.aspx

It’s very slow to download, but in the case of the performance deck it’s worth it. There are four presentations from Thierry d’Hers of the dev team:
BIN 217 – Office data mining addins (almost no content)
BIN 313 – Analysis Services Deep Dive (not really that detailed)
BIN 316 – Analysis Services Performance and Scalability (pretty good)
BIN 302 - Functionality in Sharepoint for BI (not really that detailed)

Written by Chris Webb

October 23, 2006 at 1:44 pm

Posted in Uncategorized

MDX Tips from SQLCat

with 2 comments

Just seen this interesting post on the SQLCat blog containing MDX tips, most of which I’d not heard about:
http://blogs.msdn.com/sqlcat/archive/2006/10/12/best-sql-server-2005-mdx-tips-and-tricks-part-1.aspx

I would be careful about taking them all as general rules though. I ran into the issue described about calculated members using ParallelPeriod vs hard-coded tuples earlier this year on a performanve tuning job, but couldn’t repro it in AdventureWorks or any other cube so I guessed it wasn’t as straightforward as it seemed and therefore didn’t blog about it; I suspect some of the others aren’t generally applicable either (I checked with higher authority and he agreed). That doesn’t mean they won’t be worth trying if you do have performance problems though.

Written by Chris Webb

October 16, 2006 at 9:33 am

Posted in MDX

Another BI Evening – Call for Presenters

leave a comment »

Following on from the successful BI Evening at TVP earlier this year I’m in the early stages of planning another; Tony Rogerson has managed to wangle a room on the night of November 29th and I’m now looking for people who’d like to present. I’ve contacted a few people but I thought I’d post an open invitation here: if you’re free that night, can make it to TVP and have something interesting and BI-related to talk about then please get in touch! Contact details can be found at http://www.crossjoin.co.uk/contact.html.

Written by Chris Webb

October 15, 2006 at 10:25 pm

Posted in Events

Load Testing Reporting Services

leave a comment »

Just come across a new paper on MSDN on load testing RS2005 using the tools provided in Visual Studio:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnsql90/html/VS05PLTSQL.asp
Very
useful to know for future reference, and I suppose you could take a similar approach to load testing AS2005 too.

Also noticed this paper on SoftArtisans OfficeWriter, which I’ve blogged about before and think is quite a cool tool:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnsql90/html/SQLSRSandOW.asp

Written by Chris Webb

October 11, 2006 at 3:41 pm

Posted in Uncategorized

Partially Processing a Partitioned Cube

with 4 comments

Perhaps everyone else knows this already, but this tip has saved me loads of time so I thought I’d post it anyway…
 
I’m always breaking cubes, or rather making changes to them that put them in an unprocessed state (usually intentionally, but quite often by accident). If cube processing takes a long time this can be quite annoying: it’s always good to have a cube to browse so you can see the effect of any changes you’ve made. You can of course use a view to limit the number of rows in your fact tables which makes processing much faster, or switch the fact tables used in the dsv, but that’s not always practical - a lot of the time I’m working with cubes where the partitioning has been done using a query to slice the fact table and you don’t want to go changing all that SQL – and you can’t simply process a single partition in the cube. Or so I thought. I found out last week if you do a Process Structure (which takes seconds) on an unprocessed cube you can browse it but no data is there; if you subsequently do a Process Full on one of the partitions then you get a cube with only that partition’s data in it.
 
While we’re on this topic, it’s also worth mentioning that if you redesign the aggregations on your measure group/partitions, you just need to do a Process Index to make them available which is much faster than doing a Process Full.

Written by Chris Webb

October 10, 2006 at 11:09 am

Posted in Analysis Services

PerformancePoint news

leave a comment »

Charlie Maitland has two good posts on PerformancePoint server:
I know, I’m a bit late in linking here but I was away last week… which was also the reason why I couldn’t go to the bootcamp that Charlie was at. Hohum.

Written by Chris Webb

October 8, 2006 at 7:30 am

Posted in Client Tools

The last word on AS205 local cubes

leave a comment »

After my recent posts on AS2005 local cubes I heard from local cube expert Tim Peterson again, who told me about the work he’d been doing in this area. As is always the case there’s a lot more complexity to this topic than can be described in a short blog posting but he’s written up his findings in a white paper which you can download here:
He also mentioned that you can register to download the beta of the latest version of cubeslice here:
http://cubeslice.com/dnldfr1ocms.htm

UPDATE: Tim has just let me know that he’s posted up a demo of CubeSlice which uses Federal Election Commission data, which you can download here:
http://www.cubeslice.com/fecsharefile.htm
Clever marketing move on his part, I think – there’s nothing like relevant data (well, relevant for all of you in the US at least) to catch people’s interest.

Written by Chris Webb

October 8, 2006 at 7:24 am

Posted in Analysis Services

Follow

Get every new post delivered to your Inbox.

Join 2,868 other followers