Chris Webb's BI Blog

Analysis Services, MDX, PowerPivot, DAX and anything BI-related

Archive for January 2005

Creating local cubes in AS2005

leave a comment »

I spotted the following thread on the AS2005 beta newsgroup, concerning the new way of creating local cubes in AS2005:

http://communities.microsoft.com/newsgroups/previewFrame.asp?ICP=sqlserver2005&sLCID=us&sgroupURL=microsoft.private.sqlserver2005.analysisservices.olap&sMessageID=%253CO18gH%2523NBFHA.2368@cpmsftngsa05.privatenews.microsoft.com%253E

(Apologies for the link, but the web-based UI didn’t give me a better way of linking to the whole thread – hopefully you can read it backwards)

Anyway, in summary, it means you create local cubes with the same DDL as you use to create server-based cubes – a very welcome improvement in my opinion.

Written by Chris Webb

January 31, 2005 at 2:45 pm

Posted in Analysis Services

Finding the currentmember on all dimensions, take#2

with 2 comments

Mosha’s teasing me now, I think. Anyway, here is a solution to the problem that is still AS2K-compatible, doesn’t hard-code any dimension names and doesn’t rely on having levels with set numbers of members:

 WITH
MEMBER MEASURES.TEST AS ‘")"’
CELL CALCULATION DEMO FOR ‘({MEASURES.[TEST]})’ AS ‘
IIF(CALCULATIONCURRENTPASS()=129,"(", "") +
DIMENSIONS(129 – CALCULATIONCURRENTPASS()).CURRENTMEMBER.NAME +
IIF(CALCULATIONCURRENTPASS()=(130-DIMENSIONS.COUNT),"", ",")
+ CSTR(CALCULATIONPASSVALUE(MEASURES.TEST, -1, RELATIVE))
‘, CALCULATION_PASS_NUMBER=129, CALCULATION_PASS_DEPTH=128, SOLVE_ORDER=2, CONDITION=’CALCULATIONCURRENTPASS()>(129-DIMENSIONS.COUNT)’

SELECT {MEASURES.TEST} ON 0 FROM SALES

Since you can only have 128 dimensions in a cube in AS2K you’re safe on the hard-coded pass-depth, but it’s still not a truly satisfying solution for me. Unfortunately, until my Yukon test server is back working again (hurry up with the reinstallation, Colin!) I won’t be able to test any solutions that use new MDX functionality.

Jon Axon and I have just been discussing what the Yukon solution might be: as Jon pointed out, since Yukon automatically resolves infinite recursion by taking the value from the previous calculation pass, without using CalculationPassValue(), then the same approach as above would look a lot nicer when expressed in Yukon MDX. But Mosha did say that his solution used GENERATE() too, so I reckon he has a completely different way of doing it. I’m sure it will be blindingly obvious when he finally tells me what it is…

 

Written by Chris Webb

January 31, 2005 at 1:41 pm

Posted in MDX

Finding the currentmember on all dimensions

with one comment

In yesterday’s entry I mentioned it would be nice if you could iterate over all the dimensions in your cube and find the currentmember on each of them. Mosha then hinted that this was possible, and there’s nothing I like more than a challenge. Here’s the solution in a Foodmart 2000 query – and it uses all-AS2K functionality too!

WITH MEMBER MEASURES.TEST AS ‘GENERATE(
HEAD(
UNION(
[Customers].[City].MEMBERS
, {CUSTOMERS.CURRENTMEMBER} AS CURRENTCUST
), DIMENSIONS.COUNT)
, IIF(
DIMENSIONS(RANK(CUSTOMERS.CURRENTMEMBER, [Customers].[City].MEMBERS)-1)
IS CUSTOMERS
, CURRENTCUST.ITEM(0).ITEM(0).NAME
, DIMENSIONS(RANK(CUSTOMERS.CURRENTMEMBER, [Customers].[City].MEMBERS)-1).CURRENTMEMBER.NAME
), ", ")’
SELECT {MEASURES.TEST} ON 0, 
[Education Level].[Education Level].MEMBERS ON 1
FROM SALES
WHERE([Customers].[All Customers].[Canada].[BC].[Burnaby],[Gender].[All Gender].[F] )

All you need is a level with more members than you have dimensions in the cube. OK, it’s not elegant, but it works…

Written by Chris Webb

January 28, 2005 at 12:19 pm

Posted in MDX

New AS2005 MDX white papers

with one comment

I see that Richard Tkachuk has put up two more MDX Scripting white papers on his site in addition to the ‘Loading Aggregate Data’ paper that Mosha pointed to in his blog at the end of last year. All three go into a lot of detail and show off some of the cool new features available in AS2005 MDX such as the ability to return a set from an IIF statement.

Looking at the gigantic statement used to retrieve a comment for a cell in the middle of the ‘Cell Annotations’ paper reminded me of a feature I’ve thought would be useful on several occasions: the ability to create sets containing dimensions instead of members. In Richard’s MDX he’s forced to hard-code the names of all the dimensions in his cube in order to get the keys of the currentmember of each of them; if you could simply get a set of all dimensions in the cube, you could then iterate over that set with a GENERATE statement and get the keys using something like a CURRENTDIMENSION function. This would also make the code somewhat more robust, as it would no longer break if you added or removed dimensions from the cube. I don’t know whether this kind of functionality is feasible or not, but I thought I’d mention it…

Written by Chris Webb

January 27, 2005 at 4:55 pm

Posted in MDX

Grouping in MDX – response to Mosha

with one comment

I’m honoured by my mention in Mosha’s blog! But I think my entry on Grouping in MDX, and Mosha’s comments on it, need some further clarification and discussion.

First of all, CREATE SESSION CUBE. I agree it is probably the best way to implement grouping at the moment and that it works well in Excel, but it’s not ideal:

  • Surely it’s a client-only solution? You’d want the option of creating these groupings on the server for all users, just as you would any other calculated member, and CREATE SESSION CUBE doesn’t offer you that, does it?
  • I don’t know what the overhead of creating a session cube on the client is (or, in Yukon, on the server) – but is it as small as creating a calculated member? What is the impact on caching, especially given the fact that you have to drop and create the session cube when you change or add a new group?
  • A lot of the time, for example in Reporting Services, you are only working with one-off queries rather than sessions and you really want to be able to create a group with query rather than session scope.
  • The MDX needed to create the session cube and then manage it within the session seems overly complex for such a simple requirement.

Secondly, to address Mosha’s point on why you would need the VisualTotals and Aggregate functions in the same query: it’s because you’d want to use your new group member in any scenario where you could use a normal member, and that includes a query which used VisualTotals. Imagine you had a measure which showed the distinct count of customers across all your stores, and you wanted a report which had a) a single group member containing your top 5 stores, b) several other individual stores and c) showed the visualtotal of all the distinct customers in both the group and the individual stores displayed. I think that would be a reasonable requirement and one which wouldn’t be possible unless AS ‘knew’ what members went into the group.

Thirdly, sets in the WHERE clause (and also subcubes in the FROM clause, which do the same thing) inYukon. Unfortunately, this only works when you’re slicing by the group and not when the group is on a visible axis so it doesn’t fit the scenario I was describing.

Overall, then, CREATE SESSION CUBE is almost the functionality that I want but it doesn’t allow groups to be defined on the server. So we’re close…!

Written by Chris Webb

January 24, 2005 at 2:47 pm

Posted in Random Thoughts

New version of the BI Portal

leave a comment »

Duncan Lamb points to the new version of the BI Portal (some of you might remember the previous version), which is available here. I first saw this demoed about 18 months ago when I still worked at Microsoft and I’m glad it’s got an ‘offical’ release at last. The only disappointment is that this is a sample app, which I guess means it’s not going to be supported – not that that will stop people putting it into production, I’m sure.

Written by Chris Webb

January 21, 2005 at 5:17 pm

Posted in Client Tools

Building Aggregations Manually

with 3 comments

I’ll start with two disclaimers:

  • This entry is based on my own understanding of how the AS engine works, based on my own experience and the occasional question I’ve asked the dev team. It might not be totally accurate on the details but hopefully the general principles are sound – I’ve used them successfully on a number of occasions.
  • 99% of the time the Storage Design Wizard and Usage Based Optimisation Wizard will design the right aggregations for you, and you won’t need to build aggregations manually.

 

So what about the 1% of times when you will need these techniques? They usually occur when you have several dimensions which have large numbers of members at their lower levels, and queries which retrieve data at these levels. Cubes with large parent-child dimensions and dimensions containing entities like customers are the most common real-life manifestations of this scenario. The 1/3 rule, which dictates which aggregations can be built by the Storage Design Wizard and the Usage Based Optimisation Wizard (see the Analysis Services Performance Guide for more details), may have stopped any aggregations which would be useful for your queries being built and so query performance suffers as a result. Now the 1/3 rule is there for a good reason – to stop large aggregations being built which increase processing time and the size of the cube, but don’t have much of an impact on query performance – but it is not infallible; equally, when faced with query performance problems very often you are glad of any improvement in response times, however small!

 

The first step to take when designing aggregations manually is to understand what data the queries you want to run are actually requesting. The best way to do this is to take a representative sample of these queries, clear the Query Log, set it to log every query and then run the queries one by one. You’ll notice that one MDX query doesn’t necessarily equate to one query in the log – in fact, one MDX query can generate multiple subqueries, even thousands, depending on its complexity; you’ll also notice that what is stored in the query log is somewhat cryptic. Rather than explain it here, I suggest you read the following white paper which explains the contents in some detail. The contents of the DataSet column represent the slice of the cube that each query in the log asked for data from; the same values can be viewed in Performance Monitor using the DSN Requested counter. The other useful value to monitor is the slice of the cube that the AS engine queried to get data to service each of these requests, and unfortunately this is not present in the Query Log – you can only see it in PerfMon using the DSN Used counter.

 

This is probably a good place to stop and give some examples. Imagine you have a cube with 4 dimensions apart from Measures, and these dimensions have respectively 5, 6, 7 and 8 levels each. If you ran a query requesting values at the top level of each of these dimensions, DSN Requested would show it as 1111 (top, top, top, top); similarly, if you ran a query requesting data from the leaf levels of each dimension, DSN Requested would show it as 5678 (leaf, leaf, leaf, leaf). Now if you had no aggregations in your cube and you ran the first of these queries, 1111, in order to get the value returned AS would have to aggregate values from the leaf member of every single dimension and DSN Used would show 5678; the fact that all this aggregation had to happen at runtime would mean that the query might not run very quickly. If, however, you had an aggregation built at the third level of each dimension and you ran the same query, DSN Used would instead show 3333, and because AS only had to aggregate the members on the third level of every dimension the query would run much faster. Following on from this, imagine that you wanted to run a query that took values from the top levels of the first two dimensions and the leaf levels of the last two dimensions, so that DSN Requested would be 1178. Since it is unlikely that any aggregations would have been built at the bottom level of two such deep dimensions (especially if they had large numbers of members on, for example if one was a Customer dimension), then DSN Used would have to be 5678 and AS would still have to aggregate lots of values at runtime.

 

Back looking at your Query Log and PerfMon, the next step is to decide whether you need to build aggregations manually. If you’ve been through the Storage Design Wizard and the Usage Based Optimisation Wizard and you’ve set the Aggregation Usage property on all your dimension appropriately (again, see the AS Performance Guide for more information on this), and you still see that your queries are not hitting aggregations (so that there are big differences between the DSN Requested and the DSN Used values) then you probably need to. On the other hand, if you see DSN Requested and DSN Used values which are the same or almost the same, building more aggregations might not do any good and you might need to look at other ways of improving performance, such as partitioning.

 

The tool you’ll need in order to build aggregations manually is ‘Partition Manager’, which is available in the SQL 2K Resource Kit and also in an updated form in the BI Accelerator, a free download from the Microsoft website. The UI can be a bit fiddly when you have more than a few dimensions but it beats writing the code in DSO.

 

The final problem is knowing which aggregations you need to build. Going back to the example above, if you saw the pattern 1178 in DSN Requested you could build an aggregation which matched this exactly, ie at the top levels of the first two dimensions and the leaf levels of the last two. This would do the trick, but if your users wanted to drill down on any of the first two dimensions then AS could no longer use it. Therefore it’s a much better idea to build aggregations at the lowest level your users are likely to drill down to, perhaps 3378, and sacrifice a little performance on some higher level queries for much better performance on queries which go slightly deeper. Overall, though, it’s only through trial and error that you’ll work out which aggregations you need.

Written by Chris Webb

January 21, 2005 at 12:50 pm

Posted in Analysis Services

Follow

Get every new post delivered to your Inbox.

Join 3,299 other followers