Archive for December 2013
This is the ninth anniversary of the first post on my blog, and every year at this time I take a moment to reflect on what’s happened in the last twelve months in my professional life and in the world of Microsoft BI.
Without a doubt 2013 has been the year of Power BI. It’s true we’ve had
PowerPivot Power Pivot for a while now, but in my opinion the combination of Excel 2013, Power Query, Power Map, Power BI Sites and Q&A is a much stronger proposition for customers interested in self-service BI; I’ve already blogged at great length about what I think are the strengths and weaknesses of Power BI (see here, here and here) so I won’t repeat myself here. As you would expect Microsoft marketing has gone into overdrive to promote it (have you entered the Power BI competition yet?) and it will be in 2014 that we see whether Power BI is a success or not. What will success look like though, if it comes? It won’t be lots of customers lining up to buy Power BI in they way they’ve bought traditional BI solutions, I think: instead it will be organisations that have already signed up for Office 365 being upsold to Power BI based on their existing commitment to the Office platform. This presents a number of challenges to someone like me who makes a living as an independent consultant and trainer.
At the moment more than 90% of my consultancy income comes from SSAS and MDX, but the overall percentage of my time that I spend doing consultancy has reduced over the last few years to about 60%. This is partly the result of SSAS and MDX skills becoming more widespread; partly due to the fact that I’ve been promoting my public and private training more aggressively; and possibly due to fewer new SSAS projects kicking off. In the future I expect this trend to continue. Just how much consultancy will be necessary in the world of self-service Power BI solutions remains to be seen, but it’s going to be less than is necessary for corporate SSAS solutions and the rates will probably be lower too.
For the same reason, though, the demand for all forms of training for Power BI will almost certainly be much greater. That’s why I’ve been scheduling more public training courses through Technitrain; why I’ve signed up to write a book on Power Query next year; and why I’ve started recording video training courses with Project Botticelli (there’s a new video available there, by the way, on set and member functions). If I’m honest I prefer doing consultancy to training and I don’t think you can be a really good trainer if you don’t have a substantial amount of practical experience gained from consultancy, so I’m going to have to make a special effort to maintain a balance between the two.
Speaking at conferences and user groups is an aspect of my work that I’ve always really enjoyed, and I’m pleased to say that I’ll be speaking at the PASS BA Conference next year for example. I’m also still involved with SQLBits but please, please don’t ask me when the next SQLBits will be – we don’t have anything to announce yet but I can assure you we are working on it and I promise there will be one in 2014. I won’t be speaking at quite so many events as I have done in the past however. I travel a lot for work and this makes it hard to justify spending even more time away from my family, especially at weekends, so I’ve made a conscious decision to cut down on my speaking engagements. The thing is that the number of SQL Server events has increased a lot in the last couple of years and this has led to an increased number of invitations to speak, and I’m one of those people who finds it hard to say no when someone asks me to do something. I’m just going to have to be a bit more choosy from now on, and concentrate on events close to home, events that coincide with business trips and online sessions.
All that remains is to wish you all a happy and prosperous 2014, and to thank you for reading my blog! This time next year I’ll have been blogging for ten years, and that’s a scary thought…
A few months ago I posted a review of Q&A, the natural language query functionality in Power BI, based on the sample data sets that were then available. Last week, finally, we got the news that we could enable Q&A on our own Power Pivot models, and having played with this new release I thought it was a good idea to post an update to my original thoughts.
The first thing to point out is that even if you’ve got a Power BI Preview tenant you will need the latest version of Power Pivot for Excel to be able to get the best out of Q&A. This latest release contains some new functionality to add ‘Synonyms’ to the model – what this means is that it allows you, as a model creator, to tell Power BI about other names that end users might use when querying your model. For example on a Geography dimension you might have a column called State but if you are a multinational company you may find that while your State column contains the names of states in the USA, it might contain the names of cantons in Switzerland, counties in the UK, departments in France and so on. As a result you will want Power BI to know that if a user asks for sales by county in the UK that it should actually look in the State column. Devin Knight has already written a good post showing how synonyms work with Q&A which you can see here.
Another complication is that, at the time of writing, the Synonym functionality is only available to users who have installed the streamed version of Office 2013 from Office 365. I have an Office 365 subscription but I had installed Office from an msi before that, so I had to uninstall Office and reinstall the streamed version to be able to see Synonyms – I assume that support for Synonyms in the non-streamed version of Excel will come at some point soon in the future, but in general I would expect that new Power BI functionality will appear first in the streamed version of Office first so if you’re serious about BI you should change over to it as soon as you can. Melissa Coates has a lot more detail on this issue here.
But enough about setup, what about Q&A? The data that I tested it on was a model I’ve been using for user group and conference demos for about six months now, which contains data from the UK’s Land Registry and details all residential property transactions in England and Wales in 2013. It’s fairly simple – two tables, a date table and a transactions table containing around 0.5 million rows – so probably a lot simpler than the average Power Pivot model, but nonetheless real data and one which had been polished for demo purposes. The Excel file holding it is around 25MB so I was well within the Power BI file size limits.
My initial impression after I had added my existing model (with no synonyms etc) to Q&A was that while it worked reasonably well, it worked nowhere near as well as the demo models I had seen. I then set about making changes to the model and re-uploading it, and these changes made all the difference. Some examples of the things I did are:
- Changed table and column names. In my model I had already taken the trouble to make them human readable, but this did not necessarily mean they were suitable for Q&A. For example, my main fact table was called ‘Land Registry’, so at first Q&A kept suggesting questions like “How many land registries were there in June…” which clearly makes no sense. Renaming the fact table to ‘Sales’ fixed this.
- Setting synonyms. Unsurprisingly, this had a big impact on usability in the same way that changing the table and column names did. I found that I had to go through several iterations of uploading the data, writing questions, seeing what worked and what didn’t, and adding more synonyms before I had a set that I was happy with; I can imagine that in the real world you’d need to round up several end users and lock them in a room to see how they phrased their questions so as to get a really good list of synonyms for them.
- Setting Power View-related properties. This included setting the Default Field Set on a table, so I only saw a few important fields in a meaningful order when Q&A returned a table result; and also Summarize By so that Q&A didn’t try to aggregate year values. All of this makes sense given how closely-related Q&A and Power View are, but even though I had a reasonably ‘finished’ model to start off with I still hadn’t set all of these properties because I knew I was never going to try to sum up a year column.
- Adding new columns. There were a number of cases where I realised that I, as a human user, was able to make assumptions about the data that Q&A could not. For example the source data records sales of four different types of residential property: terraced, detached, semi-detached and flat. The first three are types of house, but the source data doesn’t actually state that they are types houses anywhere so in order to see the total number of sales of houses I had to add another column to explicitly define which property types were houses.
- Disambiguation. Probably the most irritating thing about the Bing geocoding service that Power View and Q&A use is the way it always chooses a US location when you give it an ambiguous place name. Therefore when looking at sales by town I would see the town name “Bristol” show up on the map as Bristol, Tennessee (population 24,821) rather than Bristol in England (population 416,400). Creating a new column with town name and country concatenated stopped this happening.
The Microsoft blog post I referenced above announcing Q&A promises that a more detailed guide to configuring models for Q&A will be published soon, which is good news. The important point to take away from this, though, is that even the most polished Power Pivot models will need additional tweaks and improvements in order to get the best out of Q&A.
The big question remains, though, whether Q&A will be something that end users actually get some value from. As a not-very-scientific test of this I handed my laptop over to my wife (who has no experience of BI tools but who has a healthy interest in property prices) to see how easy it was for her to use, and straight away she was able to write queries and find the information she was looking for, more or less. There were a still few cases where Q&A and/or my model failed, such as when she searched for “average house price in Amersham” – the model has a measure for “average price”, it knows about the property type “house” and the town “Amersham”, but “average house price” confused it and the query had to be rewritten as “average price of a house in Amersham”. Overall, though, I was pleasantly surprised and as a result I’m rather less sceptical than I was about Q&A’s usefulness, even if I’m still not 100% convinced yet.
Last week I was doing some performance tuning on SSAS Multidimensional and saw something very strange: when the MDX Script of the cube was executed (as always happens after the cache has been cleared, or processing has taken place – you can monitor this via the Execute MDX Script Begin/End events in Profiler) the Calculate() statement was taking just over four seconds. I’d never seen this before so I asked the nice people on the SSAS dev team what was going on, and Akshai Mirchandani very kindly filled me in on the details.
There are two types of calculation on an SSAS cube: those explicitly defined in the MDX Script (ie those seen on the Calculations tab of the cube editor); and semi-additive measures, unary operators and custom rollups, which are defined in the model itself. This second type of calculation is added to the cube when the Calculate() statement fires, and the more of them there are the longer it takes SSAS to work out where they should be applied in the space of the cube. In my customer’s case there were several large (80000+ members) parent/child hierarchies with unary operators as well as a few semi-additive measures and so this was the reason why Calculate() was so slow. Up to now I had only known that Calculate() triggers the aggregation of data up through the cube, which is why if you delete it the cube seems to contain no data.
As of the November update of Power Query, it’s now possible to write your own SQL query when you’re moving data out of SQL Server into Excel using Power Query. So I got thinking… if you can write your own SQL query, can you execute any other SQL statement? Can you use Power Query to move data out of Excel and into SQL Server? Well, it turns out you can… with some limitations. This blog post details what I found out while researching this problem.
I started with a simple table in a SQL Server database with two columns, Fruit and Sales, and some data:
I then created a new function in Power Query with the following definition:
As you can see, it takes the name of a fruit and a sales value and updates the appropriate row in the SQL Server table. I then created a new table in Excel with some new fruit sales values:
Used this table as the source for another Power Query query, and for each row in this table called the function above:
I ran this query, and lo! My table in SQL Server was updated:
There are some interesting things to note here though. First, for each row in my Excel table, and each time an UPDATE statement was run, Power Query showed a prompt warning me that it was about to make a change to my database:
Probably the safe thing to do here, I think.
Furthermore, running a Profiler trace showed that each UPDATE statement was run at least twice. In fact, I originally started my tests with an INSERT INTO rather than an UPDATE, and found that since the INSERT INTO was run multiple times I ended up with duplicate rows in my table.
None of the code I’ve showed here should be used in a real application of course, but with some thought (and maybe a few changes to the way Power Query behaves), in the future it might be possible to use Power Query to move data out of Excel as well as in.
Recently I spent a few days with Rafal Lukawiecki recording videos of my MDX training course for Project Botticelli. I’m now pleased to announce that the first two videos (to add to the free video I recorded last year) from the course are now live with more to follow soon. You can find the main course page here; the two videos available as of today are:
This video covers writing simple MDX queries and looks at the SELECT clause, the FROM clause and the WHERE clause. The cool thing is that it’s free to view – although you will need to register on the site first.
This video covers the basics of creating calculated members in MDX in the WITH clause and using the CREATE MEMBER statement. It’s available to subscribers only.
Apart from my course there’s a lot of other great Microsoft BI video training available via Project Botticelli, including several DAX videos by my old friends Marco Russo and Alberto Ferrari. Subscriptions to the site are very reasonably priced, but if you register before the end of December 2013 you can get a 20% discount by using the following promotion code:
Of course, if you prefer your training in a classroom, you can always attend one of my Technitrain courses in London next year.
As in previous years, in return for promoting their survey the people at BARC have given me a free copy of the latest BI Survey – always an interesting read. I saw a story in the press last week about adoption of mobile BI slowing based on research from the BI Survey, but what does it have to say about Analysis Services? To be honest, much the same as in previous years: it’s a good, cheap, reliable, mature solution. A few points that caught my eye:
- 11% of SSAS users are still using Proclarity as their front-end. That’s down from 22% two years ago, but still… I guess that the likes of Pyramid Analytics, which specialises in Proclarity migration, will be pleased to hear that.
- 29% of SSAS users are using Power View. That’s a big surprise for me – I haven’t seen it at any of my customers yet. Presumably this must be the Sharepoint version of Power View going against SSAS Tabular.
- I found the ‘cost per seat’ (calculated as license fees plus external implementation cost divided by number of deployed seats) table particularly interesting: SSAS comes out with a score of $1111, about a quarter of the way from the bottom; Qliktech comes in at $1499, Tableau at $1039. In general self-service BI tools don’t appear to cost any less to implement overall than traditional corporate BI tools.
- SSAS has a very respectable win-rate in competitive evaluations of 64%, although this is declining over time (it has to be said that almost all other large vendors are showing a decline too). Again, compare this with a win-rate of 34% for Tableau and 37% for Qliktech.
- Sadly there’s no mention of PowerPivot anywhere. It would have been good to know how it’s doing…