Building a Better Cache-Warmer, Part 2: The Formula Engine Cache
Sorry for the very long delay – at long last, here’s the companion piece to the post I wrote last September about cache-warming. That post dealt with the relatively straightforward topic of warming the Storage Engine cache; this time we’ll be looking at how to warm the Formula Engine cache. As I promised in the first post, the idea is that I’m going to take the ideas discussed here and use them to create a better cache warmer than the ones I’ve blogged about in the past.
The first thing to note when talking about caching the result of MDX calculations is that it does happen – a surprising number of people think that Analysis Services performs every MDX calculation needed by a query from scratch every time. This does indeed happen in worst-case scenarios – and it’s all too easy to do things which accidentally get you into these worst-case scenarios – but when SSAS can cache the result of calculations it can have a massive impact on query performance.
The second thing to note is that when I say that SSAS can cache the result of calculations, what I mean is that it can cache the values returned by cells in the cube which have been subject to some form of MDX calculation. Analysis Services can’t cache the result of MDX expressions as such, it can only do so when the result of those expressions is surfaced as a value returned by a cell somewhere in the cube. This means that if we create a calculated measure on the cube’s MDX Script, and then include this calculated measure in a query, we can expect that SSAS should be able to cache the values returned by this calculated measure. On the other hand if we have a complex MDX set expression on the Rows axis of our query, the set that expression returns cannot be cached – it will be re-evaluated every time the query is run.
The third, and possibly most important thing to remember is that SSAS can only cache the results of a calculation for as long as the calculation actually exists. As you may already know, there are three places you can create a calculated member in SSAS, and these equate to three different ‘lifetimes’. If you create a calculated measure on the MDX Script of the cube it will live until the next time the cube’s MDX Script is dropped and recreated, and this will happen if you explicitly clear the cache or you do any kind of processing. This is called ‘global’ scope. If, however, you create a calculated member in the WITH clause of a query then that calculated member will only exist for the lifetime of that query – so while you will benefit from caching while the query is executing, after the query has returned the calculated member will disappear and so will any related cache. This is called ‘query’ scope. The third scope, or lifetime, for creating a calculated member is ‘session’ scope: you can create calculated members that live for the lifetime of a session using the CREATE MEMBER statement, so values can only be cached for one user and one session; this is used only very rarely though.
The fourth to understand is that SSAS can’t cache the results of some calculations, or if it can cache them for one user it cannot share the contents of this cache with other users. For example this will happen if the MDX calculation might return a different result every time it runs (eg if it used the Now() function to return the system date and time – which is obviously going to be different each time it’s called) or if it might return different results for different users (eg when two different users run the same query through different security roles, because they might be able to see different parts of the cube the values returned might be different). The way in which a query is constructed can also force ‘query’ scoping for the formula engine cache too: the presence of a calculated member in the WITH clause can do this, as can the use of subselects. This also means that FE cache warming is pointless for certain popular client tools like SSRS or Excel because the MDX they generate prevents the use of the FE cache.
Finally, the Formula Engine has two different structures it uses to hold cached values. For calculations that execute in bulk mode then it can use the same structure that the Storage Engine cache uses, namely the data cache registry. For calculations that execute in cell-by-cell mode, however, it uses a different structure called the flat cache. Calculations that execute in bulk mode in most cases cannot make use of values stored in the flat cache, and calculations that execute in cell-by-cell mode cannot make use of values stored in the data cache registry. Furthermore, the size of the flat cache is restricted to 10% of the TotalMemoryLimit server property; if it grows bigger than that it will be completely emptied.
(Incidentally, if you’re looking for more detail on any of the above points, I suggest you watch the video of my session on “Cache Warming Strategies for SSAS 2008” here).
So clearly there are many potential pitfalls to watch out for when warming the FE cache, and indeed in many cases I’d say that it’s just easier to concentrate on warming the SE cache and tuning your calculations so they execute as fast as possible even on a cold FE cache!
If you do need to warm the FE cache though, you need to be very careful. Unfortunately the CREATE CACHE statement mentioned in the previous post only works for the SE cache, so the only way to warm the FE cache is to use MDX queries. These queries should be hand-written specifically for the purpose – if you simply record queries run in production using Profiler, it’s likely you’ll end up with queries that don’t warm the FE cache because they contain a WITH clause or a subselect – but it’s nonetheless a good idea to use production queries as a starting point and then modify them so they become FE-friendly. A smaller number of larger queries is going to be better than a large number of queries that return small amounts of data, to avoid cache fragmentation in the data cache registry, and it’s a good idea to keep them as simple (and as non-arbitrary-shaped) as possible.
Since it’s likely that the calculations you’re most interested in caching are going to be the ones that, despite your best efforts at tuning them, execute in cell-by-cell mode, then it’s a good idea to keep an eye on the overall size of the various flat caches in existence. The following DMV, I think, shows the size of all the flat caches across all the databases and for every session on an SSAS instance:
select * from
…but I need to do a bit more research to make sure it shows what I think it shows and to make sure I can observe this 10% limit resulting in the flat cache being emptied.
Lastly, as with the SE cache, you don’t want to overfill it and leave no memory for the natural growth in cache that will occur as a result of normal querying; and you certainly don’t want to go over the LowMemoryLimit or TotalMemoryLimit while you’re warming the cache and end up with cache evictions taking place.