Chris Webb's BI Blog

Analysis Services, MDX, PowerPivot, DAX and anything BI-related

One Cube vs Multiple Cubes

with 25 comments

One of the questions discussed in the book that Marco, Alberto and I wrote last year, “Expert Cube Development with SSAS 2008” (available in all good bookshops, folks!) was whether, if you have multiple fact tables, you should create one big cube with multiple measure groups or multiple cubes each with a single measure group. While I still stand by what we wrote then, I recently took part in an interesting debate on this subject in the MSDN Forum with Akshai Mirchandani from the dev team about the pros and cons of each approach where some interesting new details came to light:

http://social.technet.microsoft.com/Forums/en-US/sqlanalysisservices/thread/681e59bd-93ca-4a91-9f26-8ed96e825553

Here are the main points you need to consider when deciding whether to use the single cube approach or the multiple cube approach. In favour of the multiple cube approach:

  • Having multiple, smaller cubes may result in faster query performance than one large cube in some cases, especially if your fact tables have very different dimensionality. This was true in SSAS 2005, and while it’s less obvious in 2008 it’s apparently still there. This was what I’d previously not been sure about: I’d heard rumours about this, and seen it happen in some cases myself with 2005 – although in other cases when I’d tested this out I’d seen no difference in performance – and I wasn’t sure what the situation was with 2008. You’d need to test the two approaches yourself on your cubes and queries to be sure. Here’s what Akshai says on the matter:

    if you keep adding lots of dimensions to a cube, then the virtual space of the cube grows — it does not add to the storage cost, but it does hurt formula engine performance in some scenarios because the cell coordinates are based on the number of attributes in the cube space. Increasing the number of attributes in the cube space will start costing performance in lots of small ways and result in performance regressions. Adding lots of unrelated measure groups would result in you adding lots unrelated dimensions to the cube space and cause a performance slowdown — if you had 10 measure groups and they all shared lots of common dimensions, then one cube makes the most sense…

    …As I already explained… it affects the sizes of data structures inside the formula engine that are based on the number of attributes in the cube space. When those data structures get larger, there is an incremental cost that can add up (depending on your calculations and query patterns).

    For example, you see the Query Subcube Verbose events in Profiler — they show you the subcubes that are used for accessing measure groups. There are similar subcubes that are used for calculating formulas and cell coordinates — all those subcubes get wider and wider as you start adding more attributes into the cube. The cost of accessing and indexing those data structures is what we’re talking about here. If adding new measure groups doesn’t require adding new attributes/dimensions, then there is no problem…

    We had measured the difference before 2005 shipped for some real customer cubes and found there there was a noticeable performance improvement to split up into multiple cubes…

  • While it is possible to apply dimension security to the Measures dimension, it is much easier to allow or deny access to a cube with the multiple cube approach than it is to apply security to all the measures in a measure group using the single cube approach.
  • Having multiple, simpler cubes can be much more user friendly than one monster cube with loads of dimensions and measure groups. If you have Enterprise Edition you can of course use Perspectives to counter this, but if you are using Standard Edition then Perspectives aren’t available.
  • Maintenance can be easier and less disruptive with multiple cubes: if you need to make changes to a cube while users are querying it, you might end up invalidating users’ connections and dropping caches. With one cube the chances of this disruption affecting more users increases.
  • It’s easier to scale out with multiple cubes: if you find your server is maxing out, you can simply buy another server and distribute your cubes equally between the two. With a single cube approach you end up having to look at (admittedly not that much) more complex scale-out scenarios like network load balancing.

On the other side, here are the arguments in favour of the single cube approach:

  • If you ever need to work with data from two fact tables in the same query or calculation, or if you think you might ever need to in the future, you should go with the single cube approach. The two options for cross-cube querying, linked measure groups and the LookUpCube MDX function, should be avoided. Linked measure groups are a pain to manage, carry a slight query performance overhead, and can result in the same MDX calculations being duplicated across the original cube and the cube containing the linked measure group (which means maintenance becomes more difficult). The LookUpCube function is probably the worst MDX function to use in a calculation from a performance point of view and should be avoided at all costs. So a single cube is the only feasible option.
  • Even if your users tell you they will not ever need to analyse data from two fact tables in the same query, be prepared for them to change their minds. In my experience, SSAS projects have a tendency to grow in complexity over time, and cubes that start out simple in a first release often grow lots of new functionality as time goes on – and the more successful the project, the quicker things get complicated. As soon as your users see what’s possible with SSAS they will start to have new, more ambitious ideas about the kind of analysis they want to do with their data, and it’s very likely that they will realise they do need to do cross-measure-group queries and calculations. If you started out on the multiple cube approach and then this happens you will have no choice but to use linked measure groups, and as I said this can make maintenance difficult; using the single-cube approach from the start means you won’t have this problem.

My personal preference is to use the single cube approach by default, and then move to multiple cubes if there are pressing reasons to do so, for example if query performance is a problem. This might seem a bit strange given the number of reasons I’ve given for the multiple cube approach, but frankly the need to support cross-measure-group querying and calculations trumps them all. As I said, if you need to do it (and 99% of the time you will), or you even half suspect you might need to do it sometime in the future, you have to go with the single cube approach. That said, I know other people are more inclined to the multiple cube approach than I am and to a certain extent it’s a matter of taste.

Written by Chris Webb

September 2, 2010 at 12:15 am

Posted in Analysis Services

25 Responses

Subscribe to comments with RSS.

  1. Nice article :)For me, the decisive factor is the number of shared dimensions we have between the fact tables – which defines whether we could be building any cross-measure-group calculations. This is imho not related to current user need – as it is more of a measure of the data itself – if it is related enough to make sense in the same report/query.

    Boyan

    September 2, 2010 at 1:29 am

  2. There could be cases where the measures share some of the dimensions, but those measures might not make sense for cross analysis. If the measures share all of the dimensions, I agree with Chris that they might be used for cross analysis, if not now, maybe in the future. But I would differ to say the same about measures which don\’t share all the dimensions. And it is in such cases where you would start to think about splitting your cubes for better performance.

    jason

    September 2, 2010 at 4:38 pm

  3. Having said that, I would say that this issue is like a double edged sword, you might as well as harm the performance if you don\’t do this correctly, as Chris rightly said

    jason

    September 2, 2010 at 4:40 pm

  4. Another reason for 2 cubes could be, to have measure groups on a different granularity level. Or you have similar dimensions with one dimensions that should be aggregatable for one measure group and not aggregatable for another measure group

    Viktor

    September 3, 2010 at 7:44 am

  5. Chris, there is one other aspect (w/o being destructive): in companies like mine where at least more then 6 developers in 2 countries woking together it is hard nough to work with one solution in one cube catalog (i believe you rember all that discussions we had while you was with us), so i think work with big cube woul be impossible at the end, isn\’t it? But on the other hand – it still looks analysis services isn\’t a product to work with developer teams……

    Dietmar

    September 4, 2010 at 9:23 am

  6. Good point, Dietmar – yes, having more than one developer on a cube is practically impossible. However to avoid any problems you\’d need to have multiple databases, not just multiple cubes…

    Chris

    September 4, 2010 at 4:15 pm

    • of course you are right Chris – and then we are back in trouble with having redundancy and the problem to hold all the confirmed dimensions similiar….

      Dietmar

      November 2, 2010 at 7:05 am

  7. Viktor, I don\’t really understand/agree with your point about granularity – you know you can join a dimension to two measure groups in the same cube at different granularities?

    Chris

    September 4, 2010 at 8:07 pm

  8. Nice article Chris.
    Like you I tend to default to a single cube by default, but often I find it’s beneficial to create one cube per department or business function. This makes security a breeze, and each cube can contain the combination of dimensions and facts relevent to each department.

    It often means that some measure groups are duplicated in multiple cubes, so it’s only a viable approach if space and processing time allow.

    Alex Whittles

    October 19, 2010 at 6:16 pm

  9. Great post Chris

    On the other hand my case is a bit different. I’m just starting to be involved in a project where the currently proposed design, because of security, is to have up to 200 (identical) cubes!

    Am I correct to assume that the overhead of managing all these cubes would counterbalance any other type of advantage here?

    Using dynamic security (for example as proposed in your book) should allow to have one single cube…

    Enzo

    October 16, 2011 at 7:05 pm

    • Having 200 identical cubes sounds like a maintenance nightmare – using dynamic security will probably be a much better idea.

      Chris Webb

      October 17, 2011 at 8:13 am

  10. Reblogged this on David Laplante and commented:
    Recently, I was having a discution on whether a given cube implementation should have, or not, been implemented as multiple cubes instead of just one. This blog post from Chris Webb pretty much sums up my thinking. I tend to go the “1 cube route” by default for the ease of use. But I do like the arguments for both cases.

    David Laplante

    April 28, 2012 at 4:53 am

  11. What about using single cubes then creating a cube combining all the single cubes together using “linked” measure groups?

    1111mem

    May 8, 2012 at 11:15 pm

    • I would not recommend using linked measure groups – there is a slight performance hit using them, and more importantly you end up duplicating calculations between the original cube and the cube with the linked measure groups in, which can make maintenance difficult.

      Chris Webb

      May 9, 2012 at 8:03 am

      • Thanks for the response.

        What about only placing the calculations in the “linked measure group” cube?

        For example, we have 15 cubes, 14 regular and one cube linking together the 14 other cubes. We have the 14 cubes hidden from the end user and place all the calculations in that master cube.

        15 total dimensions, only four of them are common across the 14 measure groups. The end user is smart enough to know what to slice by in the master cube.

        Thanks again!

        1111mem

        May 9, 2012 at 5:58 pm

      • You could just put the calcs in the linked measure group cube, but then what would be the point of building separate cubes? If the calcs aren’t there in the individual cubes then users will tend to use the linked measure group cube and you might as well have just created one cube in the first place.

        Chris Webb

        May 10, 2012 at 9:10 am

  12. [...] One Cube vs Multiple Cubes [...]

  13. In Analysis Services, I am using sharing dimensions and have 2 cubes against one database. But today I could only see one cube, I have checked the security and everything looks ok to me. can anyone help me please? Recently I have done some changes to the cube but don’t know where to start from..

    M

    November 5, 2013 at 9:47 am

  14. […] One Cube vs Multiple Cubes […]

  15. Reblogged this on Einstein and commented:
    One Cube vs Multiple Cubes

    seenualnat

    February 26, 2014 at 2:05 pm

  16. I have two cubes with shared dimensions, when I process one cube (after processing its dimensions) the other cube is unavailable, any solutions, Please ?

    BI Man

    March 27, 2014 at 9:27 am

    • You’re probably doing a full process on the dimensions in your first cube – when you do a full process on a dimension, all cubes that use that dimension also need to be fully processed. Have you looked at other processing options, such a Process Update?

      Chris Webb

      March 27, 2014 at 9:29 am

      • Thanks for the response.
        Yes, It’s done using Process Update.

        Thanks again !

        BI Man

        March 27, 2014 at 3:41 pm


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 3,309 other followers

%d bloggers like this: