Chris Webb's BI Blog

Analysis Services, MDX, PowerPivot, DAX and anything BI-related

Archive for the ‘Excel’ Category

Power BI Review, Part 2.5: Q&A (Now I’ve Used It With My Data)

with 6 comments

A few months ago I posted a review of Q&A, the natural language query functionality in Power BI, based on the sample data sets that were then available. Last week, finally, we got the news that we could enable Q&A on our own Power Pivot models, and having played with this new release I thought it was a good idea to post an update to my original thoughts.

The first thing to point out is that even if you’ve got a Power BI Preview tenant you will need the latest version of Power Pivot for Excel to be able to get the best out of Q&A. This latest release contains some new functionality to add ‘Synonyms’ to the model – what this means is that it allows you, as a model creator, to tell Power BI about other names that end users might use when querying your model. For example on a Geography dimension you might have a column called State but if you are a multinational company you may find that while your State column contains the names of states in the USA, it might contain the names of cantons in Switzerland, counties in the UK, departments in France and so on. As a result you will want Power BI to know that if a user asks for sales by county in the UK that it should actually look in the State column. Devin Knight has already written a good post showing how synonyms work with Q&A which you can see here.

Another complication is that, at the time of writing, the Synonym functionality is only available to users who have installed the streamed version of Office 2013 from Office 365. I have an Office 365 subscription but I had installed Office from an msi before that, so I had to uninstall Office and reinstall the streamed version to be able to see Synonyms – I assume that support for Synonyms in the non-streamed version of Excel will come at some point soon in the future, but in general I would expect that new Power BI functionality will appear first in the streamed version of Office first so if you’re serious about BI you should change over to it as soon as you can. Melissa Coates has a lot more detail on this issue here.

But enough about setup, what about Q&A? The data that I tested it on was a model I’ve been using for user group and conference demos for about six months now, which contains data from the UK’s Land Registry and details all residential property transactions in England and Wales in 2013. It’s fairly simple – two tables, a date table and a transactions table containing around 0.5 million rows – so probably a lot simpler than the average Power Pivot model, but nonetheless real data and one which had been polished for demo purposes. The Excel file holding it is around 25MB so I was well within the Power BI file size limits.

My initial impression after I had added my existing model (with no synonyms etc) to Q&A was that while it worked reasonably well, it worked nowhere near as well as the demo models I had seen. I then set about making changes to the model and re-uploading it, and these changes made all the difference. Some examples of the things I did are:

  • Changed table and column names. In my model I had already taken the trouble to make them human readable, but this did not necessarily mean they were suitable for Q&A. For example, my main fact table was called ‘Land Registry’, so at first Q&A kept suggesting questions like “How many land registries were there in June…” which clearly makes no sense. Renaming the fact table to ‘Sales’ fixed this.
  • Setting synonyms. Unsurprisingly, this had a big impact on usability in the same way that changing the table and column names did. I found that I had to go through several iterations of uploading the data, writing questions, seeing what worked and what didn’t, and adding more synonyms before I had a set that I was happy with; I can imagine that in the real world you’d need to round up several end users and lock them in a room to see how they phrased their questions so as to get a really good list of synonyms for them.
  • Setting Power View-related properties. This included setting the Default Field Set on a table, so I only saw a few important fields in a meaningful order when Q&A returned a table result; and also Summarize By so that Q&A didn’t try to aggregate year values. All of this makes sense given how closely-related Q&A and Power View are, but even though I had a reasonably ‘finished’ model to start off with I still hadn’t set all of these properties because I knew I was never going to try to sum up a year column.
  • Adding new columns. There were a number of cases where I realised that I, as a human user, was able to make assumptions about the data that Q&A could not. For example the source data records sales of four different types of residential property: terraced, detached, semi-detached and flat. The first three are types of house, but the source data doesn’t actually state that they are types houses anywhere so in order to see the total number of sales of houses I had to add another column to explicitly define which property types were houses.
  • Disambiguation. Probably the most irritating thing about the Bing geocoding service that Power View and Q&A use is the way it always chooses a US location when you give it an ambiguous place name. Therefore when looking at sales by town I would see the town name “Bristol” show up on the map as Bristol, Tennessee (population 24,821) rather than Bristol in England (population 416,400). Creating a new column with town name and country concatenated stopped this happening.

The Microsoft blog post I referenced above announcing Q&A promises that a more detailed guide to configuring models for Q&A will be published soon, which is good news. The important point to take away from this, though, is that even the most polished Power Pivot models will need additional tweaks and improvements in order to get the best out of Q&A.

The big question remains, though, whether Q&A will be something that end users actually get some value from. As a not-very-scientific test of this I handed my laptop over to my wife (who has no experience of BI tools but who has a healthy interest in property prices) to see how easy it was for her to use, and straight away she was able to write queries and find the information she was looking for, more or less. There were a still few cases where Q&A and/or my model failed, such as when she searched for “average house price in Amersham” – the model has a measure for “average price”, it knows about the property type “house” and the town “Amersham”, but “average house price” confused it and the query had to be rewritten as “average price of a house in Amersham”. Overall, though, I was pleasantly surprised and as a result I’m rather less sceptical than I was about Q&A’s usefulness, even if I’m still not 100% convinced yet.

Written by Chris Webb

December 23, 2013 at 12:45 am

Posted in Cloud, Excel, Power BI, Q&A

Ordering Of Named Sets In Excel

with 3 comments

A bit of an obscure one, this, but it’s come up twice this week so worth mentioning. When you define a named set on your SSAS Multidimensional cube, Excel doesn’t respect the order of items in that set by default when you use it in a PivotTable. Consider the following named set defined on the Adventure Works cube (on the Calculations tab of the cube, not in defined in Excel itself):

CREATE SET [MY COUNTRIES] AS
{[Customer].[Country].&[France], [Customer].[Country].&[Canada], [Customer].[Country].&[Australia]};

Note that the countries are in the order France, Canada, Australia. When you use this named set in Excel, this order is overridden and the countries come out in hierarchy order, that’s to say the order that they appear on the Country hierarchy: Australia, Canada, France.

image 

image

How can you stop this? After all, in a lot of cases the order of members in a named set is important. If you have Excel 2010 or Excel 2013 (I believe this option isn’t available in Excel 2007), you need to click on the name of the set in the PivotTable Field List pane and select Field Settings:

image

Then in the Field Settings dialog go to the Layout and Print tab and uncheck the “Automatically order and remove duplicates from the set” option:

image

When you do that, the order of your set is respected:

image

Written by Chris Webb

July 31, 2013 at 2:12 pm

Some Thoughts About Power BI

with 51 comments

By now you’ll probably have seen the Power BI announcement from Microsoft. It’s an important one, and if you haven’t seen it I suggest you take a look at the official announcements and website here:

http://blogs.office.com/b/office-news/archive/2013/07/08/announcing-power-bi-for-office-365.aspx
http://blogs.office.com/b/office365tech/archive/2013/07/07/what-powers-power-bi-in-office-365.aspx
http://blogs.technet.com/b/dataplatforminsider/archive/2013/07/08/introducing-power-bi-for-office-365.aspx?WT.mc_id=Social_TW_OutgoingEvents_20130708_25941_SQLServer
http://office.microsoft.com/en-us/excel/power-bi-FX104080667.aspx
http://blogs.msdn.com/b/powerbi/archive/2013/07/07/getting-started-with-pq-and-pm.aspx

Andrew Brust also has a good summary of the news here:
http://www.zdnet.com/microsoft-announces-power-bi-for-office-365-7000017746/

There’s no point me repeating what’s already been said, so I thought I’d post my initial reaction to it:

  • Proper Mobile BI! HTML 5 Power View! It works on iPads too! Hurray! At-bloody-last! The mobile BI solution will allow you to find, interact with and share “Excel and Power View content”, so I guess that includes Excel worksheets (with PivotTables, slicers etc) as well as Power View reports (a close look at the screenshots in the blog posts above back this up).
  • Making Power BI available only via Office 365 is going to be a very controversial strategy in the MS BI partner community. To be clear: as far as I understand it, Data Explorer (now Power Query), GeoFlow (now Power Map), mobile BI and all of the cool stuff that’s just been announced is only going to be available through Power BI, and therefore Office 365. Unfortunately the biggest companies, with the biggest BI budgets, are often the ones who are slowest to upgrade to the latest versions of Office and a lot of cases this won’t change just because someone wants to see their reports on an iPad. Where IT department inertia, worries about data privacy and company politics mean that Office 365 is not an option, Microsoft will lose out to the pure-play BI vendors who offer standalone solutions.
  • If you look at this from a different point of view, though, some of the things that I (as a BI Pro) feel least comfortable about in Microsoft’s BI strategy are also its greatest strengths. The way I see it, MS is not treating self-service BI as a solution in its own right, but selling self-service BI as a feature of Office. This makes a lot of sense from Microsoft’s point of view – it’s building on the fact that Excel is the tool of choice for data analysis for 99% of users. What I think is going to happen as a result of this is that, rather than partners selling BI as a standalone solution as in the past, we’re going to be talking to people who have already got Office 2013 or Office 365 and are looking to make the most of the BI features that come as part of that. The MS BI partner community is going to have to adjust to this because I doubt MS is going to change this strategy soon.
  • The same can be said of the Office 365-only strategy. If Microsoft is going to be successful with its cloud-first strategy then it’s going to have to prioritise cloud functionality over on-prem functionality. I think MS is doing the right thing with its cloud-first strategy, so therefore, even though I’m going to find it painful when I have to deal with customers that won’t or can’t move to Office 365, I can understand why MS is making Power BI Office 365 only. Beyond the hype (MS says that 1 in 4 of its customers is already on Office 365), it does seem like the uptake of Office 365 is quite strong, especially in SMBs, so hopefully there will be a large potential customer base.
  • I’ve been presenting sessions on the Excel 2013 BI stack at various user groups and conferences over the last few months and it’s gone down very well indeed. A lot of people have come up to me after seeing my session and said that they had been looking at QlikView and Tableau for BI, and would now consider Office 2013 as another alternative. The BI functionality on its own is pretty good, and good enough for a lot of customers even if it isn’t as mature as some of the competing offerings; the fact that the BI functionality comes baked into Office is the killer. While it may be expensive to upgrade to Office 2013/Office 365 this is a cost that many businesses will be considering anyway regardless of their BI requirements; you also have to compare this with the cost of QlikView and Tableau licenses and remember that not every user will need the most expensive SKUs of Office 2013.
  • The ability to refresh data in Excel workbooks deployed to Sharepoint Online, even when the data sources are on-prem, is a key feature and one that I’ve been waiting for. I wonder what the performance will be like?
  • For anyone of a certain age, the first reaction to the news of the natural language querying capability is two words: English Query. I haven’t played with it so I can’t pass judgement, but it’s going to have to be pretty impressive if anyone is going to use it for more than just sales demos. We shall see…
  • I am quite curious about the enterprise data search capabilities. Leaving aside the ability to query them in natural language, the ability to search for data across the enterprise will be useful. I think this is what happened to Project Barcelona.
  • Similarly, it seems as though this search capability is going to be significantly expanded on the public internet. At the moment, in Data Explorer Power Query we’ve seen the ability to query Wikipedia for data. Being able to query many other public data in the same way will be very powerful. There are a number of sites like Quandl that already make public datasets very easy to find and download, and the new search and query capabilities could leapfrog them.
  • No announcement on pricing has been made as yet. Please, please be ridiculously cheap!
  • We don’t have a date for the preview yet, but if you sign up here you’ll be notified when it’s available. It’s meant to be coming “later this summer”.

UPDATE: Some more things to add…

  • Something I didn’t pick up on at the time, but which emerged on Twitter later, is that PowerPivot has had a name change: it’s now Power Pivot with a space, to make it consistent with PowerV View and so on. This might seem minor, but for those of us who write books and have to sweat these details, it’s quite important!
  • I sense I’ve hurt a few feelings at MS with my comments on the natural language query. Let me be clear about my position here: I’ve not played with it, so I can’t pass judgement yet. I can imagine that natural language search for data will work well, but I will be very, very impressed if natural language query works well enough to be used on real data by real users. Real data is dirty and complex and user expectations will be very high and easily dashed. My guess is that the main issue will be that users can’t distinguish between what is a query and what is a calculation, and while the product can probably do queries well (eg “Show me the sales of widgets this year”) it may struggle with calculations (eg “Show me the customer churn by month this year”). But as I said, we shall see.

Finally, and as always, I reread this post this morning and worried that I sound too negative when in fact I’m very positive overall. To summarize:

  • The fact this is is all in Office and specifically Excel = the killer feature.
  • Mobile BI = good, even if it’s very late.
  • Power Query = very, very, very good indeed. I love it already and I think it’s going to be as big, if not bigger than Power Pivot. Power BI is worth buying for this alone.
  • Office 365 requirement = a problem for some customers, maybe, but understandable from Microsoft’s point of view.
  • Cloud requirement = again, a problem for some, but understandable and a big advantage for SMEs who can’t afford the cost of hardware and time to configure Sharepoint on-prem. The ability to refresh in the cloud from on-prem data sources is the key feature here.
  • Power Map = OK. Useful for customers who need geographic analysis, but it’s main use is that it’s great for demos (and this should not be underestimated – all products need some wow).
  • Power View goes HTML5 = relief. The Silverlight dependency undermined its credibility no end.
  • Natural language search for data sources = potentially very useful.
  • Natural language query of those data sources = see above. I remain to be convinced.
  • Data stewardship features = I haven’t seen enough of these to be able to comment.

More blog posts worth reading on this subject:
http://www.jenunderwood.com/blog.htm#O365PowerBI
http://www.jenstirrup.com/2013/07/power-business-intelligence-for.html

UPDATE #2: even more things to add…!

You can see the video of Amir’s demo from WPC here: http://www.youtube.com/watch?v=Jsa-5LGx_IY&feature=youtu.be

Some more details (which should be accurate) from a friend of mine at the conference:

  • The Power View standalone app is still Silverlight based, it’s only the mobile app that uses HTML5
  • Power Query queries can be shared between users via BI Sites, but the execution of these queries always takes place in desktop Excel. Queries will appear in searches when they’re published to a BI Site.
  • Excel workbooks connected to on-prem SSAS (Tabular and Multidimensional) will also be refreshable from a BI Site
  • Natural Language queries (what I think is being called “Q&A”) will also work against SSAS Tabular models. The Categorization property that’s in the Advanced tab in the PowerPivot window, and also in SSDT, is partly used to help Q&A do this.
  • External users can be given access to reports in BI Sites.
  • There will be a “Data Steward Portal” which will help you monitor who is doing what on your BI Site.
  • No comment on when this will arrive in Sharepoint on-prem. My feeling is that MS have no plans to do this at all, or maybe will only do it if a lot of people complain…?

Written by Chris Webb

July 8, 2013 at 5:10 pm

Sharing Data From Excel as OData with FlatMerge

with 9 comments

At last, the Excel 2013 app I’ve really been waiting for! Those data visualization apps from last week have generated a lot of interest, but this is even cooler for a data geek like me.

A few months ago I came across FlatMerge, a startup that allows you to upload data and then make it available as an OData feed; I was going to blog about it but my fellow OData fan Jamie Thomson beat me to it. However at that point it was only a website where you could upload data… today, FlatMerge released its own, free (for the time being) Excel 2013 app which allows you to upload data direct from Excel. So you can take data from an Excel table:

image

Save it to FlatMerge:

image

And then import it into Excel 2013, PowerPivot, Data Explorer or any tool that supports OData feeds. Here’s the URL for the table I just uploaded (which, if I’ve read the docs correctly, should be publicly available):

https://fmsecure.azurewebsites.net/Data/oData/a5462e03-e0bc-44b4-a654-5dbbbd59cb59

It’s still a version 1.0 and there are a few features it’s missing that I’d like to see (like the ability to update a data source, and to control who has access to that data), but I think it’s very cool. I’ve seen tools that allow you to share data from Excel before but this is the first that uses OData, and this means you have a much greater degree of flexibility about how you consume your data. Arguably you could do the same thing by saving your Excel file to Sharepoint 2013 Excel Services and using the OData feed from an Excel Services table, but that’s a much more expensive and less user-friendly option.

I can imagine a whole bunch of uses for this, for example in a budgeting application where multiple Excel users need to submit their figures, which then need to be consolidated in a single Excel spreadsheet, maybe using Data Explorer.

Written by Chris Webb

May 30, 2013 at 6:42 am

Posted in Cloud, Excel

Tagged with

New Treemap, Histogram and Streamgraph Apps for Excel 2013

with 5 comments

I blogged about the new app model for Office 2013 and what it means for BI last year, but since then there hasn’t exactly been a massive flood of new data visualisation apps. However… yesterday, I was interested to see that some new apps had been published by the Visualization and Interaction for Business and Entertainment team at Microsoft Research. You can read all the details in this blog post:
http://blogs.technet.com/b/inside_microsoft_research/archive/2013/05/23/new-ways-to-visualize-your-data.aspx

The new apps (which are all free) are:

To test the Treemap out, I used Data Explorer to get the overall size on disk of the contents of the folders I use to store my presentation materials; I won’t go into detail about how I did it, but Erik Svenson has a great post on how to do this here. I ended up with a the following treemap:

image

It’s worth pointing out one cool thing about these apps: they still work when your worksheet is deployed to Sharepoint and viewed in a browser with the Excel Web App, even in Office 365!

Written by Chris Webb

May 24, 2013 at 3:14 pm

Posted in Data Explorer, Excel

Accumulating Data In An Excel Table Using Data Explorer (Power Query) and PowerPivot

with 3 comments

NOTE: This post was written before Data Explorer was renamed as Power Query. All of the content is still relevant to Power Query.

One of the first questions I get asked after showing someone PowerPivot for the first time is “Can I add new data to a PowerPivot table that already has data in it?”. Out of the box, of course, the answer is no: when you process a table in PowerPivot you have to reload all the data from your data source, you can’t just append new data (unless you’re using copy/paste to load data, which isn’t a good idea). However, there are a lot of self-service BI scenarios where the ability to do this would be extremely useful: for example, you might want to scrape stock quotes from a web page every day and then, in an Excel workbook, accumulate that data in a table so you can analyse historical stock prices with PowerPivot. I ran into a scenario very much like this last week and I thought that Data Explorer should be able to help here. It can, but it’s not obvious how to do it – hence this blog post!

Here’s a super-simple example of how to accumulate data in a table then. Let’s start with a csv file that contains the following data:

Product,Sales
Apples,1
Oranges,2

It’s straightforward to import this data into Excel using Data Explorer and the ‘From csv’ data source:

image

 

Here’s the code that Data Explorer generates:

let

    Source = Csv.Document(File.Contents("C:\InputData.csv")),

    FirstRowAsHeader = Table.PromoteHeaders(Source),

    ChangedType = Table.TransformColumnTypes(FirstRowAsHeader,

                              {{"Product", type text}, {"Sales", type number}})

in

    ChangedType

 

Now, let’s imagine that you want to keep the data from this file in Excel and every time you click Refresh in Data Explorer you add the data from the file onto the end of the existing data you’ve already captured. The first thing you’ll probably want to do in this scenario is add a new column to the data that gives the date and time that the data was loaded, and you can do that quite easily in Data Explorer using the DateTimeZone.UtcNow() function as follows:

Table.AddColumn(ChangedType, “Load Date”, each DateTimeZone.UtcNow())

Data Explorer has functionality to append the data from one query onto the end of another query, but the problem you have to solve now is that when you click Refresh you want the new data to be appended onto the end of the data that has already been collected. It’s a recursive scenario not unlike the one I grappled with here. The solution to this problem is to first of all load the data into the PowerPivot (ie what we should be calling the Excel Data Model now) by clicking on the Load To Data Model link in the Data Explorer query pane:

image

Then, on a new sheet, create an Excel query table that returns all the data from the PowerPivot table that you’ve just loaded data into. Kasper shows how to do this here; there’s no need for any special DAX, you just need to connect to the PowerPivot table in the Existing Connections dialog:

image

image

At this point you should have two tables on two sheets that contain the same data. The next step is to modify the original Data Explorer query so that it contains a new step that appends data from the table you’ve just created (ie the table getting the data from PowerPivot) onto the data from the csv file. This can be done with three new steps, first to get the data from the new Excel table:

Excel.CurrentWorkbook(){[Name="ExistingData"]}[Content]

Then to make sure the Load Date is treated as a DateTimeZone type:

Table.TransformColumnTypes(GetExistingData,{{“Load Date”, type datetimezone}})

Then finally to combine the two tables:

Table.Combine({ChangedType1,InsertedCustom})

Now, whenever you Refresh your Data Explorer query, you will see the data from the csv file appended to the data that has already been loaded:

image

image

Here’s the complete code:

let

    Source = Csv.Document(File.Contents("C:\InputData.csv")),

    FirstRowAsHeader = Table.PromoteHeaders(Source),

    ChangedType = Table.TransformColumnTypes(FirstRowAsHeader,

                  {{"Product", type text}, {"Sales", type number}}),

    InsertedCustom = Table.AddColumn(ChangedType, "Load Date", each DateTimeZone.UtcNow()),

    Custom1 = Excel.CurrentWorkbook(){[Name="Table_Input_Data"]}[Content],

    ChangedType1 = Table.TransformColumnTypes(Custom1,{{"Load Date", type datetimezone}}),

    Custom2 = Table.Combine({ChangedType1,InsertedCustom})

in

    Custom2

Now as I said, this is just a super-simple example and in the real world you’d need extra functionality to do things like delete rows you’ve already loaded and so on; but that’s all doable I think. It’s also worth mentioning that I encountered some strange errors and behaviour when implementing this, partly due to Data Explorer still being in preview I guess, so if you want to recreate this query you’ll need to follow my instructions exactly.

You can download the sample workbook here, and the csv file here.

Written by Chris Webb

May 13, 2013 at 12:40 pm

PowerPivot Workbook Size Optimizer

with 3 comments

Browsing through my RSS feeds this morning, I saw a new download on the Microsoft site: an Excel addin (Excel 2013 only, I think) called the PowerPivot Workbook Size Optimizer. You can get it here:
http://www.microsoft.com/en-us/download/details.aspx?id=38793

Here’s the blurb from the site:

The Workbook Size optimizer for Excel can better compress data inside workbooks that use PowerPivot or PowerView if this data comes from external data sources. The best size compression can be achieved for workbooks based on SQL Server databases and there are a few tricks we can do for other SQL datasources as well. The optimizer will install as an add in to excel and will provide you with a nice wizard to better compress the size of your workbook. Using the optimizer you can often get more than 1,000,000 rows datasets in a workbook under 10 MB, share it in SharePointOnline and interact withit using the Excel Web App in any browser.

Here’s a screenshot:

image

Despite a testing a few models with data from Adventure Works I couldn’t get it to suggest any changes (it didn’t spot that I had imported a column containing binary data, hmmm) but I guess it needs more testing on larger/more diverse data sources. Maybe there’s a blog post coming from the PowerPivot team coming soon explaining how to use this?

UPDATE: after playing around with it a bit more, I was able to get it to suggest some changes to tables. Marco has some more details:
http://sqlblog.com/blogs/marco_russo/archive/2013/04/30/powerpivot-workbook-size-optimizer-powerpivot-tabular.aspx

And there’s a white paper on the rules that it uses:
http://office.microsoft.com/en-gb/excel-help/create-a-memory-efficient-data-model-using-excel-2013-and-the-powerpivot-add-in-HA103981538.aspx

Written by Chris Webb

April 30, 2013 at 9:52 am

Posted in Excel, PowerPivot

GeoFlow Public Preview Available

with 8 comments

First big news from the PASS BA Conference: the public preview for GeoFlow is now available. You can download it here:
http://www.microsoft.com/en-us/download/details.aspx?id=38395

Here are the official announcements with all the details:
http://blogs.technet.com/b/dataplatforminsider/archive/2013/04/11/day-2-pass-business-analytics-conference-new-3d-mapping-analytics-tool-for-excel.aspx
http://blogs.office.com/b/microsoft-excel/archive/2013/04/11/dallas-utilities-electricity-seasonal-use-simulation-with-geoflow-preview-and-powerview.aspx

GeoFlow is an addin for Excel 2013 that allows you to visualise your data on a 3D map, to zoom in and explore that data, and record ‘tours’ of this data. It’s a lot of fun! As a taster, here’s a screenshot of a visualisation showing English secondary schools exam results data (average A-Level point score per pupil) broken down by school gender of entry:

image

UPDATE: one other thing I have to mention is that when this was announced in the keynote at the PASS BA Conference this morning, Amir Netz did an absolutely astounding demo showing GeoFlow’s touch-based capabilities running on a massive Perceptive Pixel screen (I think it was this one: http://www.perceptivepixel.com/products/82-lcd-multi-touch-display). It was possibly the most impressive demo I’ve seen of any Microsoft BI product. Anyway, I got to play on it myself later and it was as cool as it looked. If you’ve got $80000 burning a hole in your pocket then you could do worse than invest in one of these babies.

Written by Chris Webb

April 11, 2013 at 2:07 pm

Posted in Excel, GeoFlow

Bringing Location Information Into Excel 2013 With The WebService() Function

with 2 comments

Last summer I wrote a post about the new WebService() function in Excel 2013 and showed how it could be used to display data from the BBC Weather web service in a worksheet. I hadn’t thought about it much after that, but the other day I was working with a customer who wanted to work out how far people had to travel to visit a location and I wondered whether this was something that could be achieved with the WebService() function. It turns out that it can be – once you’ve worked around some of the quirks of WebService() – so I thought I’d post a couple of worked examples of how to use Bing Maps and Google Maps REST APIs to solve this kind of problem in Excel.

Let’s start with Bing, because after all I’m still a Microsoft loyalist and it does get some good reviews! Before you can use the Bing Maps REST API you need to create an account key (which is free), which you can do here:
https://www.bingmapsportal.com
You’ll need to use this in all your API calls. Now, what can we do with the Bing Maps REST API? Well, lots of things as you can see here:
http://msdn.microsoft.com/en-us/library/ff701713.aspx
Geocoding (finding a latitude and longitude for an address so you can plot it on a map) is a popular thing to do, but actually in Excel it’s not so important because all of the options for mapping in Excel 2013 such as Power View, GeoFlow (Jason Thomas did a good post on it recently) and the Bing Maps Excel 2013 app, all do it for you automatically. So I decided to show how to do reverse geocoding instead – looking up an address from a given latitude and longitude – which is also possible with the Bing Locations API. Here’s how:

image

In cell C3 I have pasted my Bing Maps API key. C5 and C6 contain the latitude and longitude I want to search for. B8 contains the formula with the WebService() function that calls the API and returns the result as XML:
=WEBSERVICE("http://dev.virtualearth.net/REST/v1/Locations/" & C5 & "," & C6 & "?o=xml&key=" & C3)

C10 uses the FilterXML() function to retrieve the address from the XML:
=FILTERXML(B8, "/Response/ResourceSets/ResourceSet/Resources/Location/Name")

Easy-peasy. Another common problem is that one I mentioned earlier about calculating the distance between two points. There are a few examples out there of how to calculate direct distances (like this one I saw halfway through writing this post), but direct distances aren’t all that useful – what you really need is the time needed to drive or walk somewhere along the available roads. Bing has a web service that provides route information and it works perfectly, but I found that the WebService() function returned errors in some cases – I suspect because the XML returned by Bing was too long, but I’m not sure. Anyway, Google has a similar API (with no key needed, at least for light usage) that returns only distances and not detailed routes, and WebService() seems to like it better. Here’s an example:

image

Here, I’m entering a start address and an end address in cells C2 and C3, and then calling the Google Distance Matrix API as follows:

=WEBSERVICE("http://maps.googleapis.com/maps/api/distancematrix/xml?origins=" & ENCODEURL(C2) & "&destinations=" & ENCODEURL(C3) & "&mode=driving&sensor=false")

Then, I’m using FilterXML() to get the text of the driving distance and duration in cells C7 and C8 from the XML returned:

=FILTERXML(B5, "/DistanceMatrixResponse/row/element/distance/text")

=FILTERXML(B5, "/DistanceMatrixResponse/row/element/duration/text")

You can download a sample workbook that contains both these examples, minus my Bing Maps API key, here.

One final thing to mention is that the WebService() function seems to be horrendously slow, so you might not be able to use it to make a large number of calls very quickly, although many of these services allow you to make multiple lookups in a single API call. Also, remember that there’s usually a limit on the number of calls you can make to these APIs without paying.

Written by Chris Webb

February 26, 2013 at 9:07 pm

Posted in Excel

Tagged with

MDX on Cloudera Impala

with 5 comments

You may have seen today’s announcement (and comment from Mary-Jo Foley) on the official SQL Server blog about HortonWorks Data Platform for Windows; I won’t repeat what it says, but it’s clear Microsoft is taking Hadoop and Big Data (yuk, I hate that term, but sometimes I can’t avoid using it…) very seriously. However, yesterday I saw another, very interesting announcement from the guys at Simba that presents an alternative vision for Microsoft BI and Big Data. You can see a demo of it here:

Simba MDX Provider for Impala

There are some more details here:
https://groups.google.com/a/cloudera.org/forum/?fromgroups=#!topic/impala-user/BniN1ELYNDs

Basically, Simba have built an OLEDB for OLAP Provider that translates the MDX generated by Excel PivotTables into HiveQL queries against Cloudera Impala. You can read a good, short overview of what Impala is here:
http://blog.cloudera.com/blog/2012/10/cloudera-impala-real-time-queries-in-apache-hadoop-for-real/
and there’s more detailed information here:
https://ccp.cloudera.com/display/IMPALA10BETADOC/Impala+Frequently+Asked+Questions

In summary, what this gives you is fast, interactive analysis from within Excel going direct against large amounts of data stored via Impala, with no need for users to have to write Hive queries or stage the data in PowerPivot in the way the current Microsoft/Hortonworks solution does. Even more interesting is the fact that Simba support MDX calculated members and not just MDX queries, so given that Excel 2013 allows you to define your own calculated members you could do some very powerful analysis this way. Well, those of us who know some MDX, at least :-)

Impala is open source and isn’t shy about the debt it owes to Google’s Dremel, which of course is available publicly now as BigQuery. When I first saw BigQuery I thought putting an MDX interface over the top would make it appealing to a much wider audience – maybe not a good thing for Microsoft, but it would be a clever move on the part of Google certainly. Microsoft hasn’t announced that it’s working on anything comparable to BigQuery, alas; already, BI tools like Tableau and BIME can connect to and query BigQuery (Tableau also connects to Amazon’s new Redshift database too), and it’s these tools that are Excel’s big competitors in the BI client tool space. I guess SSAS 2012 Tabular in DirectQuery mode going against PDW would be the only vaguely similar Microsoft solution, but PDW is on-prem only and pretty expensive. Translating MDX to the SQL used by tools like Impala, in the way that the Simba MDX Provider does, puts Excel on a more equal footing with Tableau et al. I don’t know how Simba/Cloudera will be making this MDX Provider available but I would be surprised if they didn’t charge for it; Microsoft’s close relationship with Hortonworks, a competitor to Cloudera, makes me think that Microsoft might not want to promote this particular tool either, which is a shame. Maybe an acceptable solution for MS would be to build new cartridges for SSAS 2012 and enable DirectQuery for data sources other than SQL Server? It’s unlikely to happen, I think, but it would be an option.

UPDATE: While I was writing this post, Cathy Dumas of Simba (you were wondering what happened to her, weren’t you?) also blogged about this

http://blogs.simba.com/simba_technologies_ceo_co/2013/02/demo-microsoft-excel-pivottables-on-cloudera-impala-via-simba-mdx-provider.html

Written by Chris Webb

February 25, 2013 at 11:38 pm

Posted in Excel, MDX

Tagged with

Follow

Get every new post delivered to your Inbox.

Join 2,868 other followers