Archive for December 2009
So my blog birthday has come around again, and five years seems like some kind of milestone – I’m surprised anything in IT lasts five years. But here I am still going…
It’s been a pretty good year. Half of it was taken up with writing “Expert Cube Development with Analysis Services 2008” with Marco and Alberto, which was published in the summer; I was very happy with how it turned out, and we got some really good reviews on various blogs (see here for a list) as well as six five-star reviews on Amazon.com. Even the one bad review we got on Amazon, in my opinion, proves that we achieved our objective to write a book specifically for more advanced users rather than beginners. We heard the other day that we’ve already sold more than a thousand copies which I don’t think is bad at all for a book with a relatively small target readership.
The blog itself did quite well too: I was very proud when won an award for best BI blog post in the PASS Log Reader Awards for my post on SSRS Drilldown, and I was equally chuffed when Donald Farmer named me in his top ten SQL bloggers of 2009 the other day. It’s been harder to find SSAS and MDX-related issues to write about because, let’s face it, there hasn’t been much new SSAS and MDX functionality in the past few years, but I’ve got a lot of PowerPivot and DAX posts planned and there’s plenty of new stuff in the wider MS BI stack that will be worth a look. So far I’ve resisted the temptation to move with the times and start Twittering as well, not because I have anything against it but more because I’m sure I’d like it too much, and I spend far too much time tapping away at my laptop already.
Business-wise things have certainly been slower in 2009 than they were in 2008 or 2007, which was only to be expected. Luckily the quietest months for work were also the months when I was working hardest on the book, and it now looks like things have turned a corner (for the European market, if not the UK) and I’m extremely busy once again. I’m not sure whether PowerPivot and DAX will present many opportunities for me to make money from consultancy, but given that I’ve been doing a lot more training recently I’ll probably set aside some time to write a course on them too; I’ll be on the lookout for other opportunities to diversify away from my core expertise of Analysis Services and MDX, but I don’t think I’ve found a new technology that really grabs me yet.
Last of all, I couldn’t let Mosha’s ‘farewell to BI’ blog post from earlier today pass without comment. I started working with OLAP Services as it then was during the beta for SQL Server 7.0, more than ten years ago now, and right from the beginning Mosha made an incredible effort to engage with the user community, giving us advice, answering our questions and taking the time to understand what we were trying to do with the product. His posts on the microsoft.public.sqlserver.olap newsgroup, and latterly on his blog, were a goldmine of information and I firmly believe his efforts were one of the main reasons why Analysis Services has been as successful as it has been; his example is one other development teams at Microsoft would do well to follow. Of course it’s not like he’s died or anything and I don’t want to sound as though I’m writing his obituary, but he’ll be missed and Bing’s gain is our loss.
Here’s to 2010!
Vidas just pointed out the following blog entry to me, about the Microsoft BI Conference 2010 being ‘co-located’ with TechEd North America:
That means it’ll be taking place in June in New Orleans. I don’t think anyone was expecting it to reappear after its break last year so I’m pleasantly surprised, and the decision to run it parallel with TechEd is a good one in my opinion. Will I go though? I’d like to, but I’m already travelling way too much for work and I’m not sure I can justify the expense and yet another week away from my family…
I like using Excel tables, and one of the things that has mildly annoyed me in the past is that you can’t bind the results of an MDX query to an Excel table in the way you can do with a SQL query. I mean, pivot tables are all very well, but sometimes a plain old table is a better fit – for example, think of the cool stuff the data mining addin works can do with tables. Luckily, with a bit of hacking, you can do it… and here’s how.
The trick is to create a .odc file that contains the MDX query you want. What I did was create an odc file for a SQL query and then edit it in Notepad so the connection string pointed to SSAS and the SQL query was replaced with an MDX query. To do this, open Excel and go to the Data tab, click From Other Data Sources and then choose From Data Connection Wizard. Choose Other/Advanced on the first step, then create a connection to the OLEDB source of your choice so long as it isn’t an SSAS source. The resulting odc file will then be saved to the My Data Sources; go there and edit it in Notepad.
The hacking is fairly easy to do – a quick inspection of the odc file format reveals that there’s a lot of junk and the important stuff is contained in a bit of XML near the beginning. Here’s an example of what that XML needs to be for an MDX query:
Once you’ve done this, you can go back to Excel, go to the Data tab and click Existing Connections to open the file:
Choose to view this data in a table, and you get something that looks like this in your worksheet:
Incidentally, since you can query Analysis Services with a basic dialect of SQL, you can also bind an entire dimension or measure group (which are treated as ‘tables’ in SSAS SQL) or the results of a DMV to a table. Here’s an example of what the xml for the odc file looks like:
In this example I’m retrieving the entire contents of the Adventure Works Source Currency dimension. The output looks like this:
Since we’re all going to be doing a lot more reporting in Excel in the future, hopefully this tip will turn out useful to someone creating dashboards in Excel using SSAS data.
UPDATE: Greg Galloway just told me about another way of doing this: "If you have an ODC file you build a PivotTable, then if you drillthrough on a cell, it brings up a QueryTable with the drillthrough results. At that point, you can right click on the QueryTable and edit the query. It’s not elegant, but at least you don’t have to create an ODC file per query."
I recently had the pleasure of going to Belgium for the Belgian SQL Server User Group’s annual conference, and I’ve just heard that the videos for the sessions have now been posted online. You can see a complete list here:
I did two sessions: "Designing Effective Aggregations in Analysis Services 2008” and “Introduction to PowerPivot”. The aggregations session was the same one I did at PASS Europe and PASS USA and even though it was the third time I’d done it I still managed to mess up the timing and had to skip over the last few slides in double-quick time, hohum. Luckily they’re not all that important and I managed to cover the main points. Here’s that presentation:
The PowerPivot session was the closing keynote, and as well as do all the normal demos it gave me the opportunity to vent my positive and negative emotions about it (I’d like to stress that I was in general positive). I suspect I might have gone on a bit too long about that side of things initially so apologies for that… Here’s the link:
With a bit of luck the videos for SQLBits will be online soon too.
I recently had a customer show me an MDX calculation using a CASE statement, where they had achieved a big improvement in performance when they moved from using a searched CASE to using a simple CASE. For the record, here’s an example of a simple CASE statement:
CASE [Measures].[Internet Sales Amount]
WHEN 0 THEN 0
WHEN 1 THEN 1
WHEN 2 THEN 2
WHEN 3 THEN 3
And what the equivalent searched CASE looks like this:
WHEN [Measures].[Internet Sales Amount]=0 THEN 0
WHEN [Measures].[Internet Sales Amount]=1 THEN 1
WHEN [Measures].[Internet Sales Amount]=2 THEN 2
WHEN [Measures].[Internet Sales Amount]=3 THEN 3
In my customer’s code they were doing lots of complex conditions involving calculated measures, but nonetheless I couldn’t work out why their searched CASE performed so much worse because the logic in both the simple and the searched CASE was the same. Therefore, in turn I asked my friends in Redmond and Marius Dumitru gave me some very useful insights that I thought I’d pass on. Basically, a searched CASE requires all previous branch conditions to have been evaluated before a given branch is chosen, which in turn makes it harder for the query optimiser to evaluate a searched CASE in bulk mode. As a result it’s always better to use a simple CASE where possible; in the worst scenarios (and I guess this is what was happening to my customer) the performance of a searched CASE can be several times worse than a simple CASE even when they seem to be doing the same thing.
One of the mysteries of the MS BI third-party ecosystem is how slow it has been to make use of technologies like WPF and Silverlight. Marco Russo has a plausible explanation of why this is here; it’s really only in the last six months that things have begun to change. A few products I’ve seen or heard of include Clearway GeoAnalyzer, Radarsoft’s RIA Grid and Intelligencia for Silverlight; there’s also increased interest in building your own specialised BI apps in Silverlight – for example I’ve seen Sascha Lorenz do presentations on this subject at various conferences, and of course Bart Czernicki’s book “Next Generation Business Intelligence Software with Silverlight 3” was released a few months back. I still don’t think we’re anyway near reaching the potential of the technology though.
I think one way to increase uptake would be to provide some kind of toolkit or additional layer to help developers or even power users build BI applications. Maybe something like a Microsoft version of SAP’s XCelsius would be a good idea? I know there would be a lot of overlap with what PerformancePoint is meant to do, but I think there is sometimes a need for highly visual presentation of data rather than plain old dashboards, beyond what’s possible with PerformancePoint, Excel or Visio even in Office 2010. I’m not advocating the abandonment of Stephen-Few-ish design principles for serious business dashboards in favour of fancy gauges and animations – but sometimes, for example in presentations or newspaper articles, a bit of ‘wow’ in the way the data is presented can be as important for the overall purpose as the meaning of the data; the kind of visualisations you can find on http://www.informationisbeautiful.net/, for example, are what I’ve got in mind here.
Here’s two examples of what could be done. When I saw Microsoft Semblio I thought something like it for BI developers for creating dashboards or presentations would be cool: it’s an SDK for creating rich, multimedia content for educational purposes. In a similar vein, I recently met up with an ex-customer of mine, Steven Peters, who is now the owner of a startup called Munglefish that develops a platform for developing closed-loop sales and marketing presentation applications. Munglefish’s EpicX platform is something like an interactive PowerPoint, and among other things each ‘slide’ can display BI data as an aid to the sales process (eg if you’re selling Widgets to an IT consultant in his mid-30s in SE England, you’d be able to display just how much money other IT consultants in their mid-30s in SE England had saved buying your brand of widgets) as well as capturing information about the flow of the sales process and sending it back to a data warehouse to be analysed; I think it is one of the best examples of BI being integrated in what is not primarily a BI application that I’ve ever seen, and its success is completely due to the kind of high-quality graphics that are possible with WPF and Silverlight. These platforms don’t remove the need for a developer but they do reduce the overall amount of development work needed. They are also targeted at scenarios where slick visualisations are very important for engaging the audience – we know it’s just as important to hold the CEO’s attention in a dull meeting where you’re presenting your financial data as it is to hold a 12-year-old’s attention in a science lesson.
Finally, last week I also saw the announcement of Vedea, a new, experimental data visualisation language from Microsoft Research. You can find full details of it on Martin Calsyn’s blog here:
It’s basically a new .NET language for “creating interactive infographics, data visualizations and computational art” – pretty much what I’ve been asking for so far in this post, and although I still think it would be too technical for the average business user I can see it would have a lot of interesting uses for BI professionals. With a bit of luck, like F#, it will make the transition to being a full member of the .NET family one day and maybe then we’ll have a tool that will allow us to make the most of the power of Silverlight and WPF for BI with the minimum of effort.
Continuing my exploration of DAX, here’s a quick post on calculating running totals. I’ll be honest and say that I didn’t work out these expressions myself but got it from a newsgroup exchange between Colin Banfield and Karen Aleksanyan. In this case I want to calculate a running total from the beginning of time up to the current date (as opposed to from the beginning of the current year, as with a year-to-date). There are two possible approaches:
=’FactInternetSales’[Sum of SalesAmount](VALUES(‘DimDate’[FullDateAlternateKey])>=’DimDate’[FullDateAlternateKey], ALL(‘DimDate’[FullDateAlternateKey]))
=’FactInternetSales’[Sum of SalesAmount](DATESBETWEEN(‘DimDate’[FullDateAlternateKey], BLANK(), LASTDATE(‘DimDate’[FullDateAlternateKey])), ALL(‘DimDate’[FullDateAlternateKey]))
The first simply asks for a filter of all values from FullDateAlternateKey where the date is greater than or equal to the current row date. Don’t ask me why it needs a ‘greater than’ as opposed to ‘less than’, but that’s what it needs to work – possibly this is a bug. The second expression uses the DatesBetween, Blank and LastDate functions (interesting that the Blank function seems to work in the same way a null does with the colon operator in MDX in this case); it has one advantage over the first expression in that it doesn’t display errors when other columns from the DimDate table are used in the pivot table. Here’s what the output looks like with just dates on rows:
As with the original ytd calculation in my previous post on DAX, neither expression displays the expected values when looking at Years or Quarters, for example – you again just see the same value as Sum of SalesAmount. The really weird thing is that at the date level both calculations return different values when you have Years and Dates on rows to when you just have dates. The screenshot below shows the crossover between 2001 and 2002 with just dates on rows:
Now here’s what you see with Years present:
Notice how in the first screenshot the running total carried on across the year boundary, as you’d expect, but in the second case the running total is only calculated from the beginning of the current year. I’m not sure whether this is a bug (although I suspect it is) or expected behaviour; the same thing happens when you slice by Year too, which makes more sense:
As ever, I’ll find out what’s going on and post an update here…
UPDATE: Marius Dumitru has come to the rescue again. Here’s the working version that does everything I want:
=’FactInternetSales’[Sum of SalesAmount](DATESBETWEEN(‘DimDate’[FullDateAlternateKey], BLANK(),LASTDATE( ‘DimDate’[FullDateAlternateKey])), All(‘DimDate’))
Here it is working:
With the release of the latest CTP of PowerPivot, DAX has gained a lot of new functions (mostly to do with time calculations), so over the weekend I decided that it was time to sit down and start learning it properly. A lot of other people out there are doing just the same thing – Vidas, for example – and like them I thought I’d blog about my experiences while I was checking it out. So far I agree with Shaun Ryan that anyone with a background in MDX will have an advantage when it comes to understanding DAX, but there are as many differences as there are similarities and it’s still a bit tricky to understand what’s going on.
I would like to point out that the code here doesn’t necessarily represent best-practice for DAX (I’ve only been using it for a short time, remember!) and in some cases the code will have to change before RTM because DAX itself will change; hopefully, though, these examples will help anyone else out there who, like me, at the bottom of the learning curve. In all this blog post I’m using a very simple PowerPivot model based on the DimDate and FactInternetSales tables from the 2008 version of the Adventure Works DW database, with the two tables joining on the OrderDate key.
I thought I’d choose year-to-dates as the subject of this first blog post because it’s a common calculation and relatively easy to understand. There is, of course, a DAX function specifically for year-to-date calculations: TotalYTD. Here’s what my first attempt at a DAX year-to-date calculated measure looked like:
TotalYTD(Sum([SalesAmount]), ‘DimDate’[FullDateAlternateKey], All(‘DimDate’[FullDateAlternateKey]))
I would have put in some line breaks but of course DAX doesn’t allow them (yet? hopefully this will change before RTM). Step by step:
- TotalYTD( – the year-to-date function itself
- Sum([SalesAmount]) – the value we’re going to be summing up, ie the sum of the [SalesAmount] column
- , ‘DimDate’[FullDateAlternateKey] – the date we want to do the year-to-date sum up to, which can be found in the [FullDateAlternateKey] column of the DimDate table
- , All(‘DimDate’[FullDateAlternateKey]) – which is an extra filter condition that just needs to be there to make things work. If it ain’t there, the calculation won’t work, but it doesn’t really serve any purpose and it won’t need to be there at RTM. Howie Dickerman of Microsoft pointed this out in his session at PASS earlier this month and if I hadn’t picked up this tip I’m sure I would have got absolutely nowhere with DAX by now.
Here’s a screenshot of the calculated measure in action:
As you can see, it works well when we’re looking at dates. But when we start to introduce years and quarters you can see we don’t get the results we might expect:
Hmm, cue several hours of head-scratching on how to fix this. After a lot of trial and error, I found that by creating a second calculated measure like this:
‘FactInternetSales’[YTD Sales](LASTDATE(VALUES(‘DimDate’[FullDateAlternateKey])), all(‘DimDate’))
Worked. What I’m doing here is forcing the measure to display the value of the [YTD Sales] calculated measure for the last date in the current context. Here’s what it displays (note the value for Quarter 2):
Various other similar approaches didn’t work, though, and the obvious next step of using LASTDATE(VALUES(‘DimDate’[FullDateAlternateKey])) inside my original calculated measure didn’t work either.
Given that PowerPivot doesn’t allow you to hide calculated measures (another thing that, in my opinion, needs to be fixed before RTM – breaking up calculations into smaller steps is a useful thing to do for debugging and sharing code) I don’t particularly like this two step process. I’ll post an update here when/if I find a way to do this in a single calculation; if anyone else out there finds out how to do this, please leave a comment!
UPDATE: Marius Dumitru has given me the answer to my problem. Here’s a version of the YTD calc that works on all time selections:
=TotalYTD(Sum([SalesAmount]), ‘DimDate’[FullDateAlternateKey], All(‘DimDate’))
The difference here is the All function is taking the whole DimDate table rather than just the [FullDateAlternateKey] column. I’m currently trying to work out why this is important… In the meantime, here’s a screenshot of it working: