Breaking up large dimensions
One clever trick I learned on the newsgroup a few years ago was from someone called dxd, who wrote it up in this post and others in the same thread:
It describes how to break up a single, large dimension that you need in the cube but which users don’t themselves want to view most of the time (typically this would be a degenerate/fact dimension). In the AS2K world this was useful for getting multi-select to work with distinct count calculations; in AS2005, of course, distinct counts already work with multi-select but I recently found a new application for this technique which I thought I’d share.
I was doing a PoC in a scenario which was similar to the following: imagine a data warehouse which contains data recording purchases in a supermarket with two fact tables. The first fact table contains data on the whole transaction, with a transaction id as the primary key and other dimensions like Customer and Store and a measure recording the value of the whole transaction; the second contains each purchase in the transaction, has all the same dimensions as the first fact table but also includes a Product dimension. The users wanted to run queries like ‘Show me the total value of all transactions which contain Product X’, so it was clearly a distinct sum problem and needed a many-to-many relationship between the Product dimension and the first fact table with the second fact table as the intermediate measure group.
Unfortunately, the only way to be sure of this working properly was to link the two fact tables together using the transaction id – but there were hundreds of millions of transactions, so building a MOLAP dimension was out of the question and I wasn’t sure that a ROLAP dimension would perform well enough. Then I remembered the approach in the newsgroup post above and realised that I could break up the transaction dimension into three identical dimensions of 999 members each. It’s quite easy to visualise how this works. Imagine you have a transaction with the following id:
You could express this as three different dimensions with keys of 123, 456 and 789. And of course since each of these three dimensions was identical, I only needed to build it once and could use role-playing dimensions for the other two. I added them to the cube and made them invisible, added them to both the fact tables and bingo – I had the dimensions I needed to make the many-to-many relationship work.
Performance resolving the many-to-many relationship seemed very good when I looked at the queries I ran in Profiler. Unfortunately I ran into the problem that Mark Hill talks about here:
…and overall performance of the cube wasn’t great (I assumed I’d messed up my partition definitions), but if I had used a ROLAP transaction dimension instead I’m pretty sure that the cube would have been unusable.
Thinking some more about other applications, I wonder if this could be used to work around the problems that are becoming evident with drillthrough in AS2005? See
I think this deserves some further investigation…