Archive for the ‘Analysis Services’ Category
I stopped writing book reviews on my blog a long time ago because, frankly, I knew most of the authors of the books I featured so it was hard to be impartial. That doesn’t mean I can’t plug my friends’ books in a totally biased way, though, in the same way that I plug my own books/courses/consultancy etc!
I’ve known Paul te Braak for several years now and he’s one of the best SSAS guys out there. “Microsoft Tabular Modeling Cookbook” is a great introduction to building models in Power Pivot and SSAS Tabular models, and also covers client-side interaction with Excel and Power View. As the name suggests it follows the cookbook format rather than the more verbose, traditional tech book style of, for example, the SSAS Tabular book that Marco, Alberto and I wrote. I like the cookbook format a lot – it gives you information in a concise, easy-to-follow way and doesn’t force you to read the whole book cover-to-cover. Paul has done a superb job of covering all of the important points without getting bogged down with unnecessary detail. Highly recommended.
You learn all kinds of interesting things on the forums! Recently I saw this post from Nils Rottgardt about a problem with the CREATE SESSION CUBE statement and SSAS Multidimensional:
Here’s a related Connect:
Basically, when you run a CREATE SESSION CUBE statement SSAS creates a shadow copy of the cube on disk – and if you have a very large cube, with very large dimensions, this could use up a lot of storage space on the server. When you use the custom grouping functionality in a PivotTable Excel generates a CREATE SESSION CUBE statement behind the scenes, so it’s possible to cause a lot of mischief just by using custom grouping in Excel.
Incidentally, this isn’t a problem for SSAS Tabular because it doesn’t support the CREATE SESSION CUBE statement.
I always knew this particular functionality could be very slow and always warned people against using it, but I didn’t know about the disk space issue… if your users haven’t found out about custom grouping yet, I recommend you don’t tell them about it!
Within a matter of days, “Expert Cube Development with SSAS 2012 Multidimensional Models” will be published. It’s the second edition of the very successful (19 5* reviews on Amazon US as of now) book on SSAS cube development that Marco, Alberto and I wrote a few years ago, updated for SSAS 2012.
Before you rush off to order a copy, there are a three things I’d like to point out:
- This is basically the same book as the first edition with updated screenshots, a few bugs fixed, and several sections updated/expanded for SSAS 2012. There are no substantial changes. If you already have a copy of the first edition it’s probably not worth buying a copy of the second edition.
- The book only covers SSAS Multidimensional models, it does not cover SSAS Tabular models.
- This is not a basic introduction to building SSAS cubes – it’s aimed at intermediate-level SSAS developers who are already familiar with cubes, dimensions and MDX and who want to learn about best practices, design patterns, performance tuning and (most importantly) which features work well and which ones don’t. If you like the material I post here on my blog, you’ll probably like the book.
If you’re OK with that then by all means, go ahead and get your wallet out!
Most people know that the ability to partition a measure group is a feature of Analysis Services Multidimensional Enterprise Edition, but that doesn’t mean that in Standard Edition you are limited to just having one partition per measure group. In fact it is possible to use up to three partitions per measure group in SSAS Multidimensional SE, with some limitations. For a long time I wasn’t sure whether this was legal, as opposed to possible, according to the terms of the SQL Server licence but since this page in Books Online (thanks to Rob Kerr for the link) states that you can have up to three partitions in SE then I assume it is:
If you do decide to use more than one partition in SSAS SE then you do need to understand the risks involved – and the reason I wanted to write this post is because I see a lot of people using more than one partition per measure group in SE without understanding those risks. Strictly speaking, SE is only designed to work with one partition per measure group. It needs those extra two partitions for two pieces of functionality:
- To support writeback, because using this feature requires SSAS to create a separate partition to hold writeback values
- To support incremental processing, because when you do incremental processing on a partition in the background SSAS needs to create a new partition, process it and then merge it with your existing partition
Therefore if you create more than one partition per measure group in SE you may find that writeback and/or incremental processing will break.
New modules are being added to my MDX online training course all the time, and now there’s another free video available: a short introduction to the CurrentMember function. You can view it here (registration required):
The previous free video on MDX SELECT statements is also still available here, now with no registration required. If you’d like to subscribe to this course you can get a 10% discount by using the code TECHNITRAIN2014 when registering.
Don’t forget, if you are looking for classroom training in London for MDX, DAX, Analysis Services Multidimensional or Tabular, Power BI, Integration Services or the SQL Server engine check out http://www.technitrain.com/courses.php
One new feature of SQL Server PDW 2012 that hasn’t had the attention it deserves is the fact that it is now officially supported as a data source for Analysis Services, both Multidimensional (in ROLAP and MOLAP modes) and Tabular (in In-Memory and DirectQuery modes). If you are working with extremely large data volumes in SSAS then PDW might be something you want to do some research on. For SSAS Multidimensional in MOLAP mode or Tabular models in In-Memory mode, using PDW as a data source should make processing run extremely quickly. For SSAS Multidimensional in ROLAP mode or Tabular models in DirectQuery mode, it can give you interactive query access to data volumes that MOLAP/In-Memory simply couldn’t handle (remember though that DirectQuery only works with DAX queries, so Excel PivotTables don’t work with it, only Power View).
There are a few public sources of information on PDW/SSAS integration. One is the white paper on PDW that you can download from Henk van der Valk’s blog here:
Here’s the relevant section:
New in SQL Server 2012 PDW, you can use PDW as a high performance relational data source for building multidimensional OR tabular models with SQL Server Analysis Services (SSAS). For example, you can:
· Use DirectQuery to perform real-time queries against a SQL Server PDW data source from a tabular model.
· Reduce ROLAP query times by using clustered columnstore indexes on SQL Server PDW tables.
· Use new features of SSAS. For example, use EnableRolapDistinctCountOnDataSource to run distinct count operations on PDW tables.
· Use PDW as a data warehouse for aggregating Hadoop data for use in multidimensional or tabular models.
The DirectQuery page in Books Online says similar things:
In contrast, a tabular model in DirectQuery mode uses data that is stored in a SQL Server database, or in a SQL Server PDW data warehouse. At design time, you import all or a small sample of the data into the cache and build your model as usual. When you are ready to deploy the model, you change the operating mode to DirectQuery. After you change the operating mode, any queries against the model will use the specified relational data source (either SQL Server or SQL Server PDW), not the cached data.
DirectQuery can take advantage of provider-side query acceleration, such as that provided by xVelocity memory optimized column indexes. xVelocity columnstore indexes are provided in both SQL Server 2012 and SQL Server PDW, to support improved DirectQuery performance.
It’s a shame there isn’t more information out there though. This post from Michael Mukovskiy has some interesting findings on using PDW v1 as a data source for SSAS:
…presumably PDW 2012 would give even better results.
Last week I was doing some performance tuning on SSAS Multidimensional and saw something very strange: when the MDX Script of the cube was executed (as always happens after the cache has been cleared, or processing has taken place – you can monitor this via the Execute MDX Script Begin/End events in Profiler) the Calculate() statement was taking just over four seconds. I’d never seen this before so I asked the nice people on the SSAS dev team what was going on, and Akshai Mirchandani very kindly filled me in on the details.
There are two types of calculation on an SSAS cube: those explicitly defined in the MDX Script (ie those seen on the Calculations tab of the cube editor); and semi-additive measures, unary operators and custom rollups, which are defined in the model itself. This second type of calculation is added to the cube when the Calculate() statement fires, and the more of them there are the longer it takes SSAS to work out where they should be applied in the space of the cube. In my customer’s case there were several large (80000+ members) parent/child hierarchies with unary operators as well as a few semi-additive measures and so this was the reason why Calculate() was so slow. Up to now I had only known that Calculate() triggers the aggregation of data up through the cube, which is why if you delete it the cube seems to contain no data.
Recently I spent a few days with Rafal Lukawiecki recording videos of my MDX training course for Project Botticelli. I’m now pleased to announce that the first two videos (to add to the free video I recorded last year) from the course are now live with more to follow soon. You can find the main course page here; the two videos available as of today are:
This video covers writing simple MDX queries and looks at the SELECT clause, the FROM clause and the WHERE clause. The cool thing is that it’s free to view – although you will need to register on the site first.
This video covers the basics of creating calculated members in MDX in the WITH clause and using the CREATE MEMBER statement. It’s available to subscribers only.
Apart from my course there’s a lot of other great Microsoft BI video training available via Project Botticelli, including several DAX videos by my old friends Marco Russo and Alberto Ferrari. Subscriptions to the site are very reasonably priced, but if you register before the end of December 2013 you can get a 20% discount by using the following promotion code:
Of course, if you prefer your training in a classroom, you can always attend one of my Technitrain courses in London next year.
As in previous years, in return for promoting their survey the people at BARC have given me a free copy of the latest BI Survey – always an interesting read. I saw a story in the press last week about adoption of mobile BI slowing based on research from the BI Survey, but what does it have to say about Analysis Services? To be honest, much the same as in previous years: it’s a good, cheap, reliable, mature solution. A few points that caught my eye:
- 11% of SSAS users are still using Proclarity as their front-end. That’s down from 22% two years ago, but still… I guess that the likes of Pyramid Analytics, which specialises in Proclarity migration, will be pleased to hear that.
- 29% of SSAS users are using Power View. That’s a big surprise for me – I haven’t seen it at any of my customers yet. Presumably this must be the Sharepoint version of Power View going against SSAS Tabular.
- I found the ‘cost per seat’ (calculated as license fees plus external implementation cost divided by number of deployed seats) table particularly interesting: SSAS comes out with a score of $1111, about a quarter of the way from the bottom; Qliktech comes in at $1499, Tableau at $1039. In general self-service BI tools don’t appear to cost any less to implement overall than traditional corporate BI tools.
- SSAS has a very respectable win-rate in competitive evaluations of 64%, although this is declining over time (it has to be said that almost all other large vendors are showing a decline too). Again, compare this with a win-rate of 34% for Tableau and 37% for Qliktech.
- Sadly there’s no mention of PowerPivot anywhere. It would have been good to know how it’s doing…
Here’s another tip for those of you struggling with the performance of SSRS reports that run on top of an Analysis Services Multidimensional cube. Quite often, SSRS reports require quite complex set expressions to be used on the rows axis of an MDX query, and one of the weaknesses of SSAS is that while it can (usually) cache the values of cells returned by a query it can’t cache the structure of the cellset returned by the query. What does this mean exactly? Well, consider the following query:
Here I’m taking every month on the Calendar hierarchy of the Date dimension and finding the top two customers by Internet Sales Amount for each Month; notice also that I’m slicing the query by a Product Category. The results look like this:
On my laptop this query takes just over three seconds to run however many times you run it (and yes, I know there are other ways this query can be optimised, but let’s imagine this is a query that can’t be optimised). The reason it is consistently slow is because the vast majority of the time taken for the query is to evaluate the set used on rows – even when the Storage Engine has cached the values for Internet Sales Amount for all combinations of month and customer, it still takes the Formula Engine a long time to find the top two customers for each month. Unfortunately, once the set of rows has been found it is discarded, and the next time the query is run it has to be re-evaluated.
How can we improve this? SSAS can’t cache the results of a set used on an axis in a query, but SSAS can cache the result of a calculated measure and calculated measures can return strings, and these strings can contain representations of sets. Therefore, if you go into Visual Studio and add the following calculated measure onto the MDX Script of the cube on the Calculations tab of the Cube Editor:
You can then use this calculated measure in your query as follows:
Having done this, on my laptop the query is just as slow as before the first time it is run but on subsequent executions it returns almost instantly. This is because the first time the query is run the set expression used on rows is evaluated inside the calculated measure ReportRows and it is then turned into a string using the SetToStr() function; this string is then returned on the rows axis of the query and converted back to a set using the StrToSet() function. The second time the query is run the string returned by the ReportRows measure has already been cached by the Formula Engine, which explains why it is so fast.
Couldn’t I have used a static named set declared on the cube to do this instead? I could, if I knew that the Where clause of the query would never change, but if I wanted to change the slice and look at a different Product Category I would expect to see a different set of rows displayed. While in theory I could create one gigantic named set containing every set of rows that ever might need to be displayed and then display the appropriate subset based on what’s present in the Where clause, this set could take a very long time to evaluate and thus cause performance problems elsewhere. The beauty of the calculated measure approach is that if you change the Where clause the calculated measure will cache a new result for the new context.
There are some things to watch out for if you use this technique, however:
- It relies on Formula Engine caching to work. That’s why I declared the calculated measure on the cube – it won’t work if the calculated measure is declared in the WITH clause. There are a lot of other things that you can do that will prevent the Formula Engine cache from working too, such as declaring any other calculated members in the WITH clause, using subselects in your query (unless you have SSAS 2012 SP1 CU4), using non-deterministic functions and so on.
- Remember also that users who are members of different roles can’t share formula engine caches, so if you have a lot of roles then the effectiveness of this technique will be reduced.
- There is a limit to the size of strings that SSAS calculated measures can return, and you may hit that limit if your set is large. In my opinion an SSRS report should never return more than a few hundred rows at most for the sake of usability, but I know that in the real world customers do love to run gigantic reports…
- There is also a limit to the size of the Formula Engine flat cache (the cache that is being used here), which is 10% of the TotalMemoryLimit. I guess it is possible that if you run a lot of different queries you could hit this limit, and if you do then the flat cache is completely emptied.