Bringing It All Together In The Cloud and Excel
A few things got released in the past week or so that I would normally have devoted a short blog post to; now I’ve finally made it onto Twitter I tend to just tweet about them instead, but I still think a blogging is the best way for me to get my thoughts together about what they actually mean. Let’s consider this ‘stuff’ in isolation first:
- The first CTP of Hadoop on Azure went live. Denny Lee’s blog post is probably the best place to start to learn about it; the video in the post is a very useful quick overview of what it does too. I noticed that it supports importing data from the Windows Azure Marketplace (what used to be called the Azure Datamarket).
- The Azure Marketplace also got a speed upgrade, as Boyan Penev notes here. This is important because whenever I’d tried to use it in the past its appalling performance had stopped me doing anything interesting with it at all. This, plus the fact that you can now publish your own data there, turns what was something of a white elephant into what could be an integral part of the Microsoft cloud BI platform.
- Version 2 of Google BigQuery went live, although it’s still in beta.
- The CTP of Data Explorer got released, which of course I’ve blogged about already here and which Jamie blogged about here and here.
- Microsoft announced Office 365 compliance with leading EU and US standards for data protection and security, which means less of those legal worries about whether you’re allowed to put all that interesting data you want to use for BI into the cloud.
From this it’s clear that Microsoft’s cloud BI platform is beginning to take shape, as are competing cloud BI platforms (if we assume that Google actually has a cloud BI strategy, and I think it has), and I think it’s fair to say Microsoft is well placed. There’s also yet more evidence, as if it was not blindingly obvious already, that Excel is at the heart of Microsoft’s BI strategy. Look at how data from Hadoop on Azure can be imported directly into Excel, and how this is flagged up as an important selling point, in the video on Denny’s blog post. However I think Microsoft needs to make even more of an effort to make everything Excel-like: now that it’s building a new BI platform from scratch it has a very rare opportunity to do this and to produce a truly coherent set of tools rather than the traditional grab-bag of technologies that make up the current Microsoft BI stack and which the cloud platform could also end up as too. Actually I’d like to go further and say that rather than have a bunch of separate cloud BI tools MS should make everything BI a feature of Excel, whether it be Excel on the desktop or Excel in the cloud. This might seem a controversial thing to say, but if MS is committed to true mass-market, self-service BI then Excel has to be the platform and MS needs to base its cloud BI strategy on it 100%.
Here are a couple of the things I’d like to see happen to achieve this:
- Data Explorer’s expression language needs to be made consistent with Excel syntax, in the way that DAX is consistent with Excel syntax. While I like what Data Explorer can do at the moment I hate having to learn a new syntax and a new set of functions for each tool I use (think SSRS expressions, SSIS expressions, MDX and so on) and it makes me much less productive when I do. I want to use the same syntax to write formulas in my spreadsheet, calculations in DAX and calculations/transformations in Data Explorer too – after all, we want to do a lot of the same things in ETL and reporting like lookups, aggregation and pivoting.
- Hadoop on Azure is an important tool to have, not least because Hadoop is already so widely used in the wider world so it adds some credibility to the MS BI stack, but I’d like to be able to control massively parallel calculations from inside Excel and not just consume the results of them there. I’m thinking something like DataScope or the way you can scale out Excel calculations on Windows HPC, maybe driven though a PowerPivot-like interface with calculations expressed in DAX, or Data Explorer (which links back to my last point, because if Data Explorer expressions were DAX it needn’t be an either/or choice). It could of course still be Hadoop at the back-end with a layer on top to make it consistent with Excel.
- We desperately need a proper API for the Excel Web App (yes, here we go again…). An OData API for importing and exporting data from spreadsheets on the web is only the first step; in the long term I’d want the Excel Web App to have capabilities like Project Dirigible, so that Excel on the desktop could become a kind of client tool for a massively scalable Excel Server in the cloud (note that I didn’t say Excel Services, which is more like Excel-on-the-server than a real Excel Server in my opinion). I’d want to be able to store data in the cloud and automatically synchronise it with Excel on multiple desktops or elsewhere in the cloud. I’d also want to create Excel spreadsheets in the cloud that acted like functions (similar to Project Dirigible), where these functions could be called from other Excel spreadsheets which again could be on the desktop or be in the cloud.
- We need Analysis Services in the cloud. We’ve known it’s been coming for a long time, but not what form it will take. Again, I’d like to see tight integration with Excel similar to how PowerPivot works on the client or tighter, where Analysis Services would not be a separate service but just be the functionality within Excel for manipulating tables of data for reporting purposes. I’d want Data Explorer to be able to create and load these tables.
- We also need the full BI functionality of desktop Excel – pivot tables and cube formulas – in the Excel Web App. I would guess this is in the pipeline anyway since it’s a pretty obvious requirement if Microsoft’s cloud BI strategy is going to work. I also don’t see the point of Power View being a separate app – I’d like to see it become a data visualisation feature of Excel.
- Finally, I’d like to see some way of tying all the data held in this gigantic data store together. You’d want to be able to search it, certainly, but also understand where it’s come from and what it actually represents. It sounds like maybe this is what Project Barcelona is aiming for…
Coordinating the work of multiple teams at Microsoft in the way this would demand is a particularly thankless task, I know. But maybe, just maybe, someone senior (maybe a Technical Fellow like Amir…?) could pull off something this ambitious?