Fun and Games with Schema Rowsets
One of the (few) new features in AS2008 are Data Management Views, which give you a table-like way of looking at all kinds of useful metadata and management style information. However, as Darren points out here:
…this isn’t really anything you can’t get from schema rowsets at the moment, and the great set of functions that Darren wrote for the Analysis Services Stored Procedure Project (http://www.codeplex.com/ASStoredProcedures/Wiki/View.aspx?title=XmlaDiscover&referringTitle=Home) make it very easy to get at this information in MDX.
I’ve been playing around with schema rowsets a bit over the last week or so, mostly because I’m working on an updated version of my SSIS cache-warmer (more of which soon), but it’s given me a number of interesting ideas for other things that could be possible. For instance, one of the big missing features from AS at the moment in my opinion in a query governor: it’s all too easy for one user to bring your server to its knees by running the query from hell. In the article above, Darren talks about the information in the DISCOVER_SESSIONS and DISCOVER_CONNECTIONS rowsets, and how you can use the ids they return to kill sessions and connections; both of them contain other interesting information (which I think probably need some time spent researching to make sure I know exactly what they mean) on the time since the last command, the CPU time taken by the last command, what the last command was, how long the last command took to run etc. With this information I guess it would be possible to build a rudimentary query governor in SSIS: basically a package that was scheduled to run every ten seconds or something, checked the overall CPU usage (or other perfmon counters) on the machine and if it was greater than a certain threshold looked at the sessions that were open, tried to work out which were the guilty parties and killed them.
There’s other useful stuff available from schema rowsets too that doesn’t seem to be documented anywhere. If you look at the DISCOVER_SCHEMA_ROWSETS rowset (try running call assp.Discover("DISCOVER_SCHEMA_ROWSETS") if you have ASSP installed) you can see what schema rowsets are available. One interesting one I found was DISCOVER_PARTITION_DIMENSION_STAT; for example run
call assp.Discover("DISCOVER_PARTITION_DIMENSION_STAT", "<DATABASE_NAME>Adventure Works DW</DATABASE_NAME><CUBE_NAME>Adventure Works</CUBE_NAME><MEASURE_GROUP_NAME>Internet Sales</MEASURE_GROUP_NAME><PARTITION_NAME>Internet_Sales_2004</PARTITION_NAME>")
…on Adventure Works. What it’s showing is the partitioning and indexing information for an individual partition, information discussed in this SQLCat blog entry from earlier this year:
This article sends you off to the info*.xml files created when you process a partition to get this information, but here it is ready to use. The last three columns the rowset returns are the important ones: they tell you whether each attribute is indexed in the partition and if so what the maximum and minimum members in the partition are. These maximums and minimums are expressed as DataIDs, AS’s own internal surrogate keys, and you can find out what a member’s DataID is by using the (wait for it) DataID function as follows:
with member measures.showdataid as
select measures.showdataid on 0,
[Date].[Date].members on 1
from [adventure works]
When I first found out about this I thought it would be really cool to be able to visualise this information somehow so you could check to see whether your partition slicing was in a healthy state or not. For instance, if you look at the 2001 partition from the Internet Sales measure group:
call assp.Discover("DISCOVER_PARTITION_DIMENSION_STAT", "<DATABASE_NAME>Adventure Works DW</DATABASE_NAME><CUBE_NAME>Adventure Works</CUBE_NAME><MEASURE_GROUP_NAME>Internet Sales</MEASURE_GROUP_NAME><PARTITION_NAME>Internet_Sales_2001</PARTITION_NAME>")
…you can see there are no indexes at all. This is because there is so little data in it, it falls below the Index Build Threshold server property, and this in turn means that every time you run a query then that partition is going to be scanned. Another thing you might need to watch out for, if you’re not using the Slice property on each partition (and you really should be, even on MOLAP partitions, despite what some people say) is the scenario the SQLCat team describe where your partitions end up with overlapping DataID ranges, meaning that AS ends up scanning partitions unnecessarily when you run queries. I’m sure I could put together some kind of SSRS report showing all the partitions in a measure group side by side, and perhaps using a stacked bar chart or something try to represent the DataID ranges for a given attribute.