Partitions and Aggregations
Something that’s easy to forget (at least I’m always forgetting it) when creating an aggregation design is the impact that partitioning can have on it. Aggregations are only created on a per-partition basis – you can’t aggregate across partitions – so that means, for example, when you’re partitioning by Month then there’s little point building aggregations at granularities higher than Month; after all, if you only have one Month of data in your partition and you aggregate that data to the Month, Quarter or Year levels then the figures for each aggregation will be identical! In fact building aggregations above the granularity of Month can be a bit of a waste: an aggregation at Month granularity can be used by queries at Quarter and Year, but an aggregation for the same partition at Year granularity that is in all other respects identical can only be used by queries at the Year granularity.
The best thing to do to ensure this doesn’t happen is to set the AggregationUsage property to None for all attributes above the granularity that you’re partitioning by, and also to make sure your partition row counts are set correctly, before you run either of the Aggregation Design wizards. This should ensure that your aggregation design properly reflects your partitioning strategy.
One of the questions I’ve always meant to research further on this topic is whether partitioning could hurt query performance, given that it effectively prevents certain aggregations being built. Taking the example we’ve been using so far, what happens if you’re partitioning by Month and you’ve got a slow query at Year granularity that you’d like to build an aggregation for – would partitioning by Year rather than Month, so you could build a true Year-level aggregation, make sense? Well, I’ve just had a chance to test this out on a fairly large cube (with approx 100 million rows in the main fact table, and Product and Customer dimensions with 1 million+ members on their key attributes) and interestingly partitioning seems to make no difference at all. I created two measure groups, one partitioned by Month and one partitioned by Year, and then created one aggregation on the former at Month/Product and one aggregation on the latter at Year/Product. I then ran a query that returned the top 10 Products at Year on each measure group and the performance was identical. Clearly this is not something to be worried about, thank goodness…
[Thanks are due to Eric Jacobsen of the SQLCat team for talking through some of these issues after the BI Conference earlier this year. He accurately predicted that partitioning wouldn’t hurt performance because the AS query engine is very efficient at merging the data caches of partitions.]