Archive for November 2011
While the introduction of native support for sparklines and other microcharts in Excel 2010 was welcome, Excel is still lacking more advanced visualisation features. I came across Sparklines for Excel – a free Excel addin that gives you a lot of extra charting options, not just sparklines – a while ago but I’ve only just got round to playing with it and I have to say it’s a lot of fun. I’m not much of a data visualisation expert (I’ll leave that to the likes of Jen) but it’s a subject that every BI professional needs a passing knowledge of and in any case it’s a shiny new toy to play with, so it’s worth a blog post.
What I like most of all about Sparklines for Excel is that everything is driven from Excel formulas, and no VBA is required. That means you can make every aspect of the charts you create data-driven, and this holds a fundamental appeal for the data geek in me. Let’s take creating a treemap as an example, and start with an Excel 2010 worksheet hooked up to the Adventure Works cube using some Excel cube functions plus some thresholds telling us whether the values for Gross Profit Margin are good or bad:
We can then simply click on an empty cell and then click on the Treemap button in the ribbon, fill in some ranges, and we get the following formula:
And this treemap in the worksheet (I won’t even try to apologies for the colour scheme):
Cool, eh? And of course, as soon as you change the dropdown filter to select another year, or change any of the threshold values, the treemap updates too. Even the position, length and width of the treemap itself can be parameterised.
You can see the full list of chart types – including heat maps, cascade charts and Pareto charts – in the manual here. It’s definitely worth checking out if you’re an SSAS or PowerPivot user who’s into data visualisation and on a tight budget.
Over the last few years I’ve been doing more and more training – my MDX, SSAS cube design and performance tuning courses continue to be extremely popular – and I’ve also seen how successful preconference seminars at conferences like SQLBits have become. It’s my opinion that there’s significant demand for SQL Server training that is either at a more advanced level than the big training companies can offer, or that covers niche topics that the big training companies would never bother with such as MDX. Equally, I believe that more experienced developers would rather be taught by people like them, who have used a technology in the field, written books and blog posts, and have real-world knowledge, rather than professional trainers who (by definition) spend most of their time training.
That’s why I have decided to launch a new training company to offer expert-level SQL Server training in the UK: Technitrain. Not only will I be running all my public training courses through it, but I’ll also be offering training courses by other respected SQL Server MVPs, authors, bloggers and speakers. Here’s my initial course schedule:
I’m really excited to be working with the likes of Christian, Jeremy and Andy for this first group of courses – they really are the acknowledged experts in their particular areas. All the courses will be run in central London, so they will not only be convenient for anyone in the UK but also easily accessible for attendees from Europe or further away.
Finally, I’d like your help in making my new company a success. But don’t worry, I’m going to pay you for it! If you run a user group, a small consultancy, a training company or are a contractor or a blogger, you may be interested in my affiliate programme. You can find more details on the site, but basically I will pay 20% of the price of the course for each registration that an affiliate sends my way. For example, for Andy Leonard’s SSIS course that means I’ll pay £399 per registration in commission – which hopefully is enough motivation for you to mention these courses to your friends, colleagues, customers, blog readers, Twitter followers and so on.
As I mentioned a few months back, some new functionality snuck into SSAS with SQL 2008 R2 SP1, the most interesting of which is a new Profiler event called Resource Usage (Thomas Ivarsson recently blogged about some other new events too). I’ve been doing some investigations on it recently, and asking the SSAS dev team what the information it returns actually means (Akshai Mirchandani is the source for much of the content of this post and I’m very grateful for his help), so I thought I’d blog my findings.
When you’re defining a new trace, you can find the Resource Usage event in the Query Processing section as shown below:
It is raised immediately after a query has finished executing (in which case it follows the Query End event):
It is also raised after any XMLA command has finished executing, and this means you’re also able to use it to monitor the resource usage of a processing operation:
Essentially, it gives you information that is very similar to what’s already available in Perfmon but specific to a particular query or command. The problem with Perfmon is that it’s easy to spot strange things happening in the data it gives you, but there’s no sure-fire way of linking what you see in Perfmon back to individual events such as queries executing; the Resource Usage event solves this problem.
Here’s a breakdown of the data returned by the event:
- READS: The number of disk read operations tracked for this query
- READ_KB: The size of disk reads in KB
- WRITES: The number of disk write operations tracked for this query
- WRITE_KB: The size of disk writes in KB
- CPU_TIME_MS: The CPU time as measured in milliseconds for this query (although this seems to bear very little relation to the CPU time shown elsewhere in Profiler – perhaps it is only the CPU time for the Storage Engine?)
- ROWS_SCANNED: The number of rows scanned (decoded/filtered/aggregated) by the Storage Engine for this query
- ROWS_RETURNED: The number of rows resulting from the scans after decoding/filtering/aggregation by the Storage Engine for this query
The data returned relates purely to Storage Engine operations as far as I can see and does not relate to the Formula Engine – I get no values back for queries that hit the Storage Engine cache but are nonetheless slow because they are Formula Engine bound.
To investigate things further, I took a look at three queries (slightly modified to run on my antique version of Adventure Works) from Jeffrey Wang’s recent post on prefetching, which illustrate scenarios where the Storage Engine does radically different amounts of work; they’re particularly interesting because Jeffrey describes in detail what goes on in the Storage Engine when each of them run. First of all, the first test query from Jeffrey’s post where prefetching does not take place gives me the following values for Resource Usage on a cold cache:
On a warm cache (ie in a situation where the Storage Engine does not need to go to disk because it can get the values it needs from cache) I get the following values:
Here’s Jeffrey’s second query, where an acceptable amount of prefetching is taking place:
select [Internet Sales Amount] on 0,
head(descendants([Date].[Calendar].[Calendar Year].&,[Date].[Calendar].[Date]), 32) on 1
from [Adventure Works]
On a cold cache this is what I get from Resource Usage, showing slightly more activity going on:
If we now look at Jeffrey’s third query, where he shows a scenario where excessive prefetching is taking place:
select [Internet Sales Amount] on 0,
head(descendants([Date].[Calendar].[Calendar Year].&,[Date].[Calendar].[Date]), 33) on 1
from [Adventure Works]
Here’s what I get on a cold cache from Resource Usage:
It’s clear from these numbers that a lot more work is going on in the Storage Engine compared to the previous two queries, although I’m not sure it’s worth trying to read too much into what the exact values themselves actually represent (unless of course you happen to be Jeffrey). I think it’s also going to be dangerous to make simplistic general recommendations about these values: while in some cases trying to keep the values returned as low as possible will be a good idea, I’m pretty sure there are going to be other situations where a more efficient query would involve more reads from disk, or scanning or returning more rows, than a less efficient version of the same query would. That said, this is useful and interesting information and another weapon in the arsenal of the SSAS consultant out in the field trying to diagnose why a query is slow and what can be done to tune it.
This is probably the 5th or 6th post I’ve written on this problem (most deal with MDX, but I did blog about solving it in DAX early last year) but what can I say – it’s an interesting problem! I came across it at work today while working with the 2012 CTP3 version of PowerPivot and found yet another solution to the problem that used some of the new DAX functionality, so I thought I’d crank out one more blog post.
The basic approach is similar to the one I describe here. Using the same Adventure Works data, I can load the DimDate and FactInternetSales tables into PowerPivot V2.0 and I’ll get the following model:
Note that we have three relationships between the two tables: one active one, which is the relationship from OrderDateKey to DateKey, and two inactive ones from DueDateKey and ShipDateKey. If we want to find the number of orders up to the current date using the Order Date we can simply use the following DAX in a measure definition:
, DATESBETWEEN(DimDate[FullDateAlternateKey], Blank(), LASTDATE(DimDate[FullDateAlternateKey])))
Now, if we want to find the number of orders that have shipped up until yesterday we don’t need any special modelling, we can use the new UseRelationship function to force a calculation to follow the relationship going from ShipDateKey to DateKey. Therefore, if we want to find the number of orders that have been placed but not shipped, we just need to take the measure above and subtract the vale returned by the same measure when use this different relationship and change the filter context to be the day before the current day:
IF(ISBLANK(DATEADD(LASTDATE(DimDate[FullDateAlternateKey]), -1, DAY))
, USERELATIONSHIP(FactInternetSales[ShipDateKey], DimDate[DateKey])
, DATEADD(LASTDATE(DimDate[FullDateAlternateKey]), -1, DAY)))
Quite an elegant solution, I think.
Although I’m pretty late to the news (almost a whole day!) I thought it would still be worth mentioning here that the details for SQL Server 2012 licensing have been announced:
Numerous people have already blogged about this in detail (Denny Cherry has a very good overview here); the big news from my point of view is the new BI Edition. Some people have been asking for a separate BI Edition for some time (although I was in two minds on the subject) and it will certainly have a lot of advantages: ok, it doesn’t support per-core licensing, only the server+CAL model, but in every other respect it’s the same as Enterprise Edition feature-wise so it will be a cheaper option for many BI projects. I’m a bit surprised to see Tabular didn’t make it into Standard Edition, though, which is unchanged in terms of features from SE in 2008R2 – I would have thought if Tabular was meant to bring SSAS to a wider audience then it should be positioned as the starting point for those who are new to BI; as it is only Multidimensional will be available in Standard Edition.