One Cube vs Multiple Cubes
One of the questions discussed in the book that Marco, Alberto and I wrote last year, “Expert Cube Development with SSAS 2008” (available in all good bookshops, folks!) was whether, if you have multiple fact tables, you should create one big cube with multiple measure groups or multiple cubes each with a single measure group. While I still stand by what we wrote then, I recently took part in an interesting debate on this subject in the MSDN Forum with Akshai Mirchandani from the dev team about the pros and cons of each approach where some interesting new details came to light:
Here are the main points you need to consider when deciding whether to use the single cube approach or the multiple cube approach. In favour of the multiple cube approach:
- Having multiple, smaller cubes may result in faster query performance than one large cube in some cases, especially if your fact tables have very different dimensionality. This was true in SSAS 2005, and while it’s less obvious in 2008 it’s apparently still there. This was what I’d previously not been sure about: I’d heard rumours about this, and seen it happen in some cases myself with 2005 – although in other cases when I’d tested this out I’d seen no difference in performance – and I wasn’t sure what the situation was with 2008. You’d need to test the two approaches yourself on your cubes and queries to be sure. Here’s what Akshai says on the matter:
if you keep adding lots of dimensions to a cube, then the virtual space of the cube grows — it does not add to the storage cost, but it does hurt formula engine performance in some scenarios because the cell coordinates are based on the number of attributes in the cube space. Increasing the number of attributes in the cube space will start costing performance in lots of small ways and result in performance regressions. Adding lots of unrelated measure groups would result in you adding lots unrelated dimensions to the cube space and cause a performance slowdown — if you had 10 measure groups and they all shared lots of common dimensions, then one cube makes the most sense…
…As I already explained… it affects the sizes of data structures inside the formula engine that are based on the number of attributes in the cube space. When those data structures get larger, there is an incremental cost that can add up (depending on your calculations and query patterns).
For example, you see the Query Subcube Verbose events in Profiler — they show you the subcubes that are used for accessing measure groups. There are similar subcubes that are used for calculating formulas and cell coordinates — all those subcubes get wider and wider as you start adding more attributes into the cube. The cost of accessing and indexing those data structures is what we’re talking about here. If adding new measure groups doesn’t require adding new attributes/dimensions, then there is no problem…
We had measured the difference before 2005 shipped for some real customer cubes and found there there was a noticeable performance improvement to split up into multiple cubes…
- While it is possible to apply dimension security to the Measures dimension, it is much easier to allow or deny access to a cube with the multiple cube approach than it is to apply security to all the measures in a measure group using the single cube approach.
- Having multiple, simpler cubes can be much more user friendly than one monster cube with loads of dimensions and measure groups. If you have Enterprise Edition you can of course use Perspectives to counter this, but if you are using Standard Edition then Perspectives aren’t available.
- Maintenance can be easier and less disruptive with multiple cubes: if you need to make changes to a cube while users are querying it, you might end up invalidating users’ connections and dropping caches. With one cube the chances of this disruption affecting more users increases.
- It’s easier to scale out with multiple cubes: if you find your server is maxing out, you can simply buy another server and distribute your cubes equally between the two. With a single cube approach you end up having to look at (admittedly not that much) more complex scale-out scenarios like network load balancing.
On the other side, here are the arguments in favour of the single cube approach:
- If you ever need to work with data from two fact tables in the same query or calculation, or if you think you might ever need to in the future, you should go with the single cube approach. The two options for cross-cube querying, linked measure groups and the LookUpCube MDX function, should be avoided. Linked measure groups are a pain to manage, carry a slight query performance overhead, and can result in the same MDX calculations being duplicated across the original cube and the cube containing the linked measure group (which means maintenance becomes more difficult). The LookUpCube function is probably the worst MDX function to use in a calculation from a performance point of view and should be avoided at all costs. So a single cube is the only feasible option.
- Even if your users tell you they will not ever need to analyse data from two fact tables in the same query, be prepared for them to change their minds. In my experience, SSAS projects have a tendency to grow in complexity over time, and cubes that start out simple in a first release often grow lots of new functionality as time goes on – and the more successful the project, the quicker things get complicated. As soon as your users see what’s possible with SSAS they will start to have new, more ambitious ideas about the kind of analysis they want to do with their data, and it’s very likely that they will realise they do need to do cross-measure-group queries and calculations. If you started out on the multiple cube approach and then this happens you will have no choice but to use linked measure groups, and as I said this can make maintenance difficult; using the single-cube approach from the start means you won’t have this problem.
My personal preference is to use the single cube approach by default, and then move to multiple cubes if there are pressing reasons to do so, for example if query performance is a problem. This might seem a bit strange given the number of reasons I’ve given for the multiple cube approach, but frankly the need to support cross-measure-group querying and calculations trumps them all. As I said, if you need to do it (and 99% of the time you will), or you even half suspect you might need to do it sometime in the future, you have to go with the single cube approach. That said, I know other people are more inclined to the multiple cube approach than I am and to a certain extent it’s a matter of taste.

Nice article
For me, the decisive factor is the number of shared dimensions we have between the fact tables – which defines whether we could be building any cross-measure-group calculations. This is imho not related to current user need – as it is more of a measure of the data itself – if it is related enough to make sense in the same report/query.
Boyan
September 2, 2010 at 1:29 am
There could be cases where the measures share some of the dimensions, but those measures might not make sense for cross analysis. If the measures share all of the dimensions, I agree with Chris that they might be used for cross analysis, if not now, maybe in the future. But I would differ to say the same about measures which don\’t share all the dimensions. And it is in such cases where you would start to think about splitting your cubes for better performance.
jason
September 2, 2010 at 4:38 pm
Having said that, I would say that this issue is like a double edged sword, you might as well as harm the performance if you don\’t do this correctly, as Chris rightly said
jason
September 2, 2010 at 4:40 pm
Another reason for 2 cubes could be, to have measure groups on a different granularity level. Or you have similar dimensions with one dimensions that should be aggregatable for one measure group and not aggregatable for another measure group
Viktor
September 3, 2010 at 7:44 am
Chris, there is one other aspect (w/o being destructive): in companies like mine where at least more then 6 developers in 2 countries woking together it is hard nough to work with one solution in one cube catalog (i believe you rember all that discussions we had while you was with us), so i think work with big cube woul be impossible at the end, isn\’t it? But on the other hand – it still looks analysis services isn\’t a product to work with developer teams……
Dietmar
September 4, 2010 at 9:23 am
Good point, Dietmar – yes, having more than one developer on a cube is practically impossible. However to avoid any problems you\’d need to have multiple databases, not just multiple cubes…
Chris
September 4, 2010 at 4:15 pm
of course you are right Chris – and then we are back in trouble with having redundancy and the problem to hold all the confirmed dimensions similiar….
Dietmar
November 2, 2010 at 7:05 am
Viktor, I don\’t really understand/agree with your point about granularity – you know you can join a dimension to two measure groups in the same cube at different granularities?
Chris
September 4, 2010 at 8:07 pm
Nice article Chris.
Like you I tend to default to a single cube by default, but often I find it’s beneficial to create one cube per department or business function. This makes security a breeze, and each cube can contain the combination of dimensions and facts relevent to each department.
It often means that some measure groups are duplicated in multiple cubes, so it’s only a viable approach if space and processing time allow.
Alex Whittles
October 19, 2010 at 6:16 pm
Great post Chris
On the other hand my case is a bit different. I’m just starting to be involved in a project where the currently proposed design, because of security, is to have up to 200 (identical) cubes!
Am I correct to assume that the overhead of managing all these cubes would counterbalance any other type of advantage here?
Using dynamic security (for example as proposed in your book) should allow to have one single cube…
Enzo
October 16, 2011 at 7:05 pm
Having 200 identical cubes sounds like a maintenance nightmare – using dynamic security will probably be a much better idea.
Chris Webb
October 17, 2011 at 8:13 am