MDX on Cloudera Impala
You may have seen today’s announcement (and comment from Mary-Jo Foley) on the official SQL Server blog about HortonWorks Data Platform for Windows; I won’t repeat what it says, but it’s clear Microsoft is taking Hadoop and Big Data (yuk, I hate that term, but sometimes I can’t avoid using it…) very seriously. However, yesterday I saw another, very interesting announcement from the guys at Simba that presents an alternative vision for Microsoft BI and Big Data. You can see a demo of it here:
There are some more details here:
Basically, Simba have built an OLEDB for OLAP Provider that translates the MDX generated by Excel PivotTables into HiveQL queries against Cloudera Impala. You can read a good, short overview of what Impala is here:
and there’s more detailed information here:
In summary, what this gives you is fast, interactive analysis from within Excel going direct against large amounts of data stored via Impala, with no need for users to have to write Hive queries or stage the data in PowerPivot in the way the current Microsoft/Hortonworks solution does. Even more interesting is the fact that Simba support MDX calculated members and not just MDX queries, so given that Excel 2013 allows you to define your own calculated members you could do some very powerful analysis this way. Well, those of us who know some MDX, at least :-)
Impala is open source and isn’t shy about the debt it owes to Google’s Dremel, which of course is available publicly now as BigQuery. When I first saw BigQuery I thought putting an MDX interface over the top would make it appealing to a much wider audience – maybe not a good thing for Microsoft, but it would be a clever move on the part of Google certainly. Microsoft hasn’t announced that it’s working on anything comparable to BigQuery, alas; already, BI tools like Tableau and BIME can connect to and query BigQuery (Tableau also connects to Amazon’s new Redshift database too), and it’s these tools that are Excel’s big competitors in the BI client tool space. I guess SSAS 2012 Tabular in DirectQuery mode going against PDW would be the only vaguely similar Microsoft solution, but PDW is on-prem only and pretty expensive. Translating MDX to the SQL used by tools like Impala, in the way that the Simba MDX Provider does, puts Excel on a more equal footing with Tableau et al. I don’t know how Simba/Cloudera will be making this MDX Provider available but I would be surprised if they didn’t charge for it; Microsoft’s close relationship with Hortonworks, a competitor to Cloudera, makes me think that Microsoft might not want to promote this particular tool either, which is a shame. Maybe an acceptable solution for MS would be to build new cartridges for SSAS 2012 and enable DirectQuery for data sources other than SQL Server? It’s unlikely to happen, I think, but it would be an option.
UPDATE: While I was writing this post, Cathy Dumas of Simba (you were wondering what happened to her, weren’t you?) also blogged about this