Archive for the ‘Analysis Services’ Category
You learn all kinds of interesting things on the forums! Recently I saw this post from Nils Rottgardt about a problem with the CREATE SESSION CUBE statement and SSAS Multidimensional:
Here’s a related Connect:
Basically, when you run a CREATE SESSION CUBE statement SSAS creates a shadow copy of the cube on disk – and if you have a very large cube, with very large dimensions, this could use up a lot of storage space on the server. When you use the custom grouping functionality in a PivotTable Excel generates a CREATE SESSION CUBE statement behind the scenes, so it’s possible to cause a lot of mischief just by using custom grouping in Excel.
Incidentally, this isn’t a problem for SSAS Tabular because it doesn’t support the CREATE SESSION CUBE statement.
I always knew this particular functionality could be very slow and always warned people against using it, but I didn’t know about the disk space issue… if your users haven’t found out about custom grouping yet, I recommend you don’t tell them about it!
Within a matter of days, “Expert Cube Development with SSAS 2012 Multidimensional Models” will be published. It’s the second edition of the very successful (19 5* reviews on Amazon US as of now) book on SSAS cube development that Marco, Alberto and I wrote a few years ago, updated for SSAS 2012.
Before you rush off to order a copy, there are a three things I’d like to point out:
- This is basically the same book as the first edition with updated screenshots, a few bugs fixed, and several sections updated/expanded for SSAS 2012. There are no substantial changes. If you already have a copy of the first edition it’s probably not worth buying a copy of the second edition.
- The book only covers SSAS Multidimensional models, it does not cover SSAS Tabular models.
- This is not a basic introduction to building SSAS cubes – it’s aimed at intermediate-level SSAS developers who are already familiar with cubes, dimensions and MDX and who want to learn about best practices, design patterns, performance tuning and (most importantly) which features work well and which ones don’t. If you like the material I post here on my blog, you’ll probably like the book.
If you’re OK with that then by all means, go ahead and get your wallet out!
Most people know that the ability to partition a measure group is a feature of Analysis Services Multidimensional Enterprise Edition, but that doesn’t mean that in Standard Edition you are limited to just having one partition per measure group. In fact it is possible to use up to three partitions per measure group in SSAS Multidimensional SE, with some limitations. For a long time I wasn’t sure whether this was legal, as opposed to possible, according to the terms of the SQL Server licence but since this page in Books Online (thanks to Rob Kerr for the link) states that you can have up to three partitions in SE then I assume it is:
If you do decide to use more than one partition in SSAS SE then you do need to understand the risks involved – and the reason I wanted to write this post is because I see a lot of people using more than one partition per measure group in SE without understanding those risks. Strictly speaking, SE is only designed to work with one partition per measure group. It needs those extra two partitions for two pieces of functionality:
- To support writeback, because using this feature requires SSAS to create a separate partition to hold writeback values
- To support incremental processing, because when you do incremental processing on a partition in the background SSAS needs to create a new partition, process it and then merge it with your existing partition
Therefore if you create more than one partition per measure group in SE you may find that writeback and/or incremental processing will break.
New modules are being added to my MDX online training course all the time, and now there’s another free video available: a short introduction to the CurrentMember function. You can view it here (registration required):
The previous free video on MDX SELECT statements is also still available here, now with no registration required. If you’d like to subscribe to this course you can get a 10% discount by using the code TECHNITRAIN2014 when registering.
Don’t forget, if you are looking for classroom training in London for MDX, DAX, Analysis Services Multidimensional or Tabular, Power BI, Integration Services or the SQL Server engine check out http://www.technitrain.com/courses.php
One new feature of SQL Server PDW 2012 that hasn’t had the attention it deserves is the fact that it is now officially supported as a data source for Analysis Services, both Multidimensional (in ROLAP and MOLAP modes) and Tabular (in In-Memory and DirectQuery modes). If you are working with extremely large data volumes in SSAS then PDW might be something you want to do some research on. For SSAS Multidimensional in MOLAP mode or Tabular models in In-Memory mode, using PDW as a data source should make processing run extremely quickly. For SSAS Multidimensional in ROLAP mode or Tabular models in DirectQuery mode, it can give you interactive query access to data volumes that MOLAP/In-Memory simply couldn’t handle (remember though that DirectQuery only works with DAX queries, so Excel PivotTables don’t work with it, only Power View).
There are a few public sources of information on PDW/SSAS integration. One is the white paper on PDW that you can download from Henk van der Valk’s blog here:
Here’s the relevant section:
New in SQL Server 2012 PDW, you can use PDW as a high performance relational data source for building multidimensional OR tabular models with SQL Server Analysis Services (SSAS). For example, you can:
· Use DirectQuery to perform real-time queries against a SQL Server PDW data source from a tabular model.
· Reduce ROLAP query times by using clustered columnstore indexes on SQL Server PDW tables.
· Use new features of SSAS. For example, use EnableRolapDistinctCountOnDataSource to run distinct count operations on PDW tables.
· Use PDW as a data warehouse for aggregating Hadoop data for use in multidimensional or tabular models.
The DirectQuery page in Books Online says similar things:
In contrast, a tabular model in DirectQuery mode uses data that is stored in a SQL Server database, or in a SQL Server PDW data warehouse. At design time, you import all or a small sample of the data into the cache and build your model as usual. When you are ready to deploy the model, you change the operating mode to DirectQuery. After you change the operating mode, any queries against the model will use the specified relational data source (either SQL Server or SQL Server PDW), not the cached data.
DirectQuery can take advantage of provider-side query acceleration, such as that provided by xVelocity memory optimized column indexes. xVelocity columnstore indexes are provided in both SQL Server 2012 and SQL Server PDW, to support improved DirectQuery performance.
It’s a shame there isn’t more information out there though. This post from Michael Mukovskiy has some interesting findings on using PDW v1 as a data source for SSAS:
…presumably PDW 2012 would give even better results.
Last week I was doing some performance tuning on SSAS Multidimensional and saw something very strange: when the MDX Script of the cube was executed (as always happens after the cache has been cleared, or processing has taken place – you can monitor this via the Execute MDX Script Begin/End events in Profiler) the Calculate() statement was taking just over four seconds. I’d never seen this before so I asked the nice people on the SSAS dev team what was going on, and Akshai Mirchandani very kindly filled me in on the details.
There are two types of calculation on an SSAS cube: those explicitly defined in the MDX Script (ie those seen on the Calculations tab of the cube editor); and semi-additive measures, unary operators and custom rollups, which are defined in the model itself. This second type of calculation is added to the cube when the Calculate() statement fires, and the more of them there are the longer it takes SSAS to work out where they should be applied in the space of the cube. In my customer’s case there were several large (80000+ members) parent/child hierarchies with unary operators as well as a few semi-additive measures and so this was the reason why Calculate() was so slow. Up to now I had only known that Calculate() triggers the aggregation of data up through the cube, which is why if you delete it the cube seems to contain no data.
Recently I spent a few days with Rafal Lukawiecki recording videos of my MDX training course for Project Botticelli. I’m now pleased to announce that the first two videos (to add to the free video I recorded last year) from the course are now live with more to follow soon. You can find the main course page here; the two videos available as of today are:
This video covers writing simple MDX queries and looks at the SELECT clause, the FROM clause and the WHERE clause. The cool thing is that it’s free to view – although you will need to register on the site first.
This video covers the basics of creating calculated members in MDX in the WITH clause and using the CREATE MEMBER statement. It’s available to subscribers only.
Apart from my course there’s a lot of other great Microsoft BI video training available via Project Botticelli, including several DAX videos by my old friends Marco Russo and Alberto Ferrari. Subscriptions to the site are very reasonably priced, but if you register before the end of December 2013 you can get a 20% discount by using the following promotion code:
Of course, if you prefer your training in a classroom, you can always attend one of my Technitrain courses in London next year.