Chris Webb's BI Blog

Analysis Services, MDX, PowerPivot, DAX and anything BI-related

Archive for the ‘Excel’ Category

Sharing Data From Excel as OData with FlatMerge

with 9 comments

At last, the Excel 2013 app I’ve really been waiting for! Those data visualization apps from last week have generated a lot of interest, but this is even cooler for a data geek like me.

A few months ago I came across FlatMerge, a startup that allows you to upload data and then make it available as an OData feed; I was going to blog about it but my fellow OData fan Jamie Thomson beat me to it. However at that point it was only a website where you could upload data… today, FlatMerge released its own, free (for the time being) Excel 2013 app which allows you to upload data direct from Excel. So you can take data from an Excel table:

image

Save it to FlatMerge:

image

And then import it into Excel 2013, PowerPivot, Data Explorer or any tool that supports OData feeds. Here’s the URL for the table I just uploaded (which, if I’ve read the docs correctly, should be publicly available):

https://fmsecure.azurewebsites.net/Data/oData/a5462e03-e0bc-44b4-a654-5dbbbd59cb59

It’s still a version 1.0 and there are a few features it’s missing that I’d like to see (like the ability to update a data source, and to control who has access to that data), but I think it’s very cool. I’ve seen tools that allow you to share data from Excel before but this is the first that uses OData, and this means you have a much greater degree of flexibility about how you consume your data. Arguably you could do the same thing by saving your Excel file to Sharepoint 2013 Excel Services and using the OData feed from an Excel Services table, but that’s a much more expensive and less user-friendly option.

I can imagine a whole bunch of uses for this, for example in a budgeting application where multiple Excel users need to submit their figures, which then need to be consolidated in a single Excel spreadsheet, maybe using Data Explorer.

Written by Chris Webb

May 30, 2013 at 6:42 am

Posted in Cloud, Excel

Tagged with

New Treemap, Histogram and Streamgraph Apps for Excel 2013

with 5 comments

I blogged about the new app model for Office 2013 and what it means for BI last year, but since then there hasn’t exactly been a massive flood of new data visualisation apps. However… yesterday, I was interested to see that some new apps had been published by the Visualization and Interaction for Business and Entertainment team at Microsoft Research. You can read all the details in this blog post:
http://blogs.technet.com/b/inside_microsoft_research/archive/2013/05/23/new-ways-to-visualize-your-data.aspx

The new apps (which are all free) are:

To test the Treemap out, I used Data Explorer to get the overall size on disk of the contents of the folders I use to store my presentation materials; I won’t go into detail about how I did it, but Erik Svenson has a great post on how to do this here. I ended up with a the following treemap:

image

It’s worth pointing out one cool thing about these apps: they still work when your worksheet is deployed to Sharepoint and viewed in a browser with the Excel Web App, even in Office 365!

Written by Chris Webb

May 24, 2013 at 3:14 pm

Posted in Data Explorer, Excel

Accumulating Data In An Excel Table Using Data Explorer (Power Query) and PowerPivot

with 5 comments

NOTE: This post was written before Data Explorer was renamed as Power Query. All of the content is still relevant to Power Query.

One of the first questions I get asked after showing someone PowerPivot for the first time is “Can I add new data to a PowerPivot table that already has data in it?”. Out of the box, of course, the answer is no: when you process a table in PowerPivot you have to reload all the data from your data source, you can’t just append new data (unless you’re using copy/paste to load data, which isn’t a good idea). However, there are a lot of self-service BI scenarios where the ability to do this would be extremely useful: for example, you might want to scrape stock quotes from a web page every day and then, in an Excel workbook, accumulate that data in a table so you can analyse historical stock prices with PowerPivot. I ran into a scenario very much like this last week and I thought that Data Explorer should be able to help here. It can, but it’s not obvious how to do it – hence this blog post!

Here’s a super-simple example of how to accumulate data in a table then. Let’s start with a csv file that contains the following data:

Product,Sales
Apples,1
Oranges,2

It’s straightforward to import this data into Excel using Data Explorer and the ‘From csv’ data source:

image

 

Here’s the code that Data Explorer generates:

let

    Source = Csv.Document(File.Contents("C:\InputData.csv")),

    FirstRowAsHeader = Table.PromoteHeaders(Source),

    ChangedType = Table.TransformColumnTypes(FirstRowAsHeader,

                              {{"Product", type text}, {"Sales", type number}})

in

    ChangedType

 

Now, let’s imagine that you want to keep the data from this file in Excel and every time you click Refresh in Data Explorer you add the data from the file onto the end of the existing data you’ve already captured. The first thing you’ll probably want to do in this scenario is add a new column to the data that gives the date and time that the data was loaded, and you can do that quite easily in Data Explorer using the DateTimeZone.UtcNow() function as follows:

Table.AddColumn(ChangedType, “Load Date”, each DateTimeZone.UtcNow())

Data Explorer has functionality to append the data from one query onto the end of another query, but the problem you have to solve now is that when you click Refresh you want the new data to be appended onto the end of the data that has already been collected. It’s a recursive scenario not unlike the one I grappled with here. The solution to this problem is to first of all load the data into the PowerPivot (ie what we should be calling the Excel Data Model now) by clicking on the Load To Data Model link in the Data Explorer query pane:

image

Then, on a new sheet, create an Excel query table that returns all the data from the PowerPivot table that you’ve just loaded data into. Kasper shows how to do this here; there’s no need for any special DAX, you just need to connect to the PowerPivot table in the Existing Connections dialog:

image

image

At this point you should have two tables on two sheets that contain the same data. The next step is to modify the original Data Explorer query so that it contains a new step that appends data from the table you’ve just created (ie the table getting the data from PowerPivot) onto the data from the csv file. This can be done with three new steps, first to get the data from the new Excel table:

Excel.CurrentWorkbook(){[Name="ExistingData"]}[Content]

Then to make sure the Load Date is treated as a DateTimeZone type:

Table.TransformColumnTypes(GetExistingData,{{“Load Date”, type datetimezone}})

Then finally to combine the two tables:

Table.Combine({ChangedType1,InsertedCustom})

Now, whenever you Refresh your Data Explorer query, you will see the data from the csv file appended to the data that has already been loaded:

image

image

Here’s the complete code:

let

    Source = Csv.Document(File.Contents("C:\InputData.csv")),

    FirstRowAsHeader = Table.PromoteHeaders(Source),

    ChangedType = Table.TransformColumnTypes(FirstRowAsHeader,

                  {{"Product", type text}, {"Sales", type number}}),

    InsertedCustom = Table.AddColumn(ChangedType, "Load Date", each DateTimeZone.UtcNow()),

    Custom1 = Excel.CurrentWorkbook(){[Name="Table_Input_Data"]}[Content],

    ChangedType1 = Table.TransformColumnTypes(Custom1,{{"Load Date", type datetimezone}}),

    Custom2 = Table.Combine({ChangedType1,InsertedCustom})

in

    Custom2

Now as I said, this is just a super-simple example and in the real world you’d need extra functionality to do things like delete rows you’ve already loaded and so on; but that’s all doable I think. It’s also worth mentioning that I encountered some strange errors and behaviour when implementing this, partly due to Data Explorer still being in preview I guess, so if you want to recreate this query you’ll need to follow my instructions exactly.

You can download the sample workbook here, and the csv file here.

Written by Chris Webb

May 13, 2013 at 12:40 pm

PowerPivot Workbook Size Optimizer

with 3 comments

Browsing through my RSS feeds this morning, I saw a new download on the Microsoft site: an Excel addin (Excel 2013 only, I think) called the PowerPivot Workbook Size Optimizer. You can get it here:
http://www.microsoft.com/en-us/download/details.aspx?id=38793

Here’s the blurb from the site:

The Workbook Size optimizer for Excel can better compress data inside workbooks that use PowerPivot or PowerView if this data comes from external data sources. The best size compression can be achieved for workbooks based on SQL Server databases and there are a few tricks we can do for other SQL datasources as well. The optimizer will install as an add in to excel and will provide you with a nice wizard to better compress the size of your workbook. Using the optimizer you can often get more than 1,000,000 rows datasets in a workbook under 10 MB, share it in SharePointOnline and interact withit using the Excel Web App in any browser.

Here’s a screenshot:

image

Despite a testing a few models with data from Adventure Works I couldn’t get it to suggest any changes (it didn’t spot that I had imported a column containing binary data, hmmm) but I guess it needs more testing on larger/more diverse data sources. Maybe there’s a blog post coming from the PowerPivot team coming soon explaining how to use this?

UPDATE: after playing around with it a bit more, I was able to get it to suggest some changes to tables. Marco has some more details:
http://sqlblog.com/blogs/marco_russo/archive/2013/04/30/powerpivot-workbook-size-optimizer-powerpivot-tabular.aspx

And there’s a white paper on the rules that it uses:
http://office.microsoft.com/en-gb/excel-help/create-a-memory-efficient-data-model-using-excel-2013-and-the-powerpivot-add-in-HA103981538.aspx

Written by Chris Webb

April 30, 2013 at 9:52 am

Posted in Excel, PowerPivot

GeoFlow Public Preview Available

with 8 comments

First big news from the PASS BA Conference: the public preview for GeoFlow is now available. You can download it here:
http://www.microsoft.com/en-us/download/details.aspx?id=38395

Here are the official announcements with all the details:
http://blogs.technet.com/b/dataplatforminsider/archive/2013/04/11/day-2-pass-business-analytics-conference-new-3d-mapping-analytics-tool-for-excel.aspx
http://blogs.office.com/b/microsoft-excel/archive/2013/04/11/dallas-utilities-electricity-seasonal-use-simulation-with-geoflow-preview-and-powerview.aspx

GeoFlow is an addin for Excel 2013 that allows you to visualise your data on a 3D map, to zoom in and explore that data, and record ‘tours’ of this data. It’s a lot of fun! As a taster, here’s a screenshot of a visualisation showing English secondary schools exam results data (average A-Level point score per pupil) broken down by school gender of entry:

image

UPDATE: one other thing I have to mention is that when this was announced in the keynote at the PASS BA Conference this morning, Amir Netz did an absolutely astounding demo showing GeoFlow’s touch-based capabilities running on a massive Perceptive Pixel screen (I think it was this one: http://www.perceptivepixel.com/products/82-lcd-multi-touch-display). It was possibly the most impressive demo I’ve seen of any Microsoft BI product. Anyway, I got to play on it myself later and it was as cool as it looked. If you’ve got $80000 burning a hole in your pocket then you could do worse than invest in one of these babies.

Written by Chris Webb

April 11, 2013 at 2:07 pm

Posted in Excel, GeoFlow

Bringing Location Information Into Excel 2013 With The WebService() Function

with 5 comments

Last summer I wrote a post about the new WebService() function in Excel 2013 and showed how it could be used to display data from the BBC Weather web service in a worksheet. I hadn’t thought about it much after that, but the other day I was working with a customer who wanted to work out how far people had to travel to visit a location and I wondered whether this was something that could be achieved with the WebService() function. It turns out that it can be – once you’ve worked around some of the quirks of WebService() – so I thought I’d post a couple of worked examples of how to use Bing Maps and Google Maps REST APIs to solve this kind of problem in Excel.

Let’s start with Bing, because after all I’m still a Microsoft loyalist and it does get some good reviews! Before you can use the Bing Maps REST API you need to create an account key (which is free), which you can do here:
https://www.bingmapsportal.com
You’ll need to use this in all your API calls. Now, what can we do with the Bing Maps REST API? Well, lots of things as you can see here:
http://msdn.microsoft.com/en-us/library/ff701713.aspx
Geocoding (finding a latitude and longitude for an address so you can plot it on a map) is a popular thing to do, but actually in Excel it’s not so important because all of the options for mapping in Excel 2013 such as Power View, GeoFlow (Jason Thomas did a good post on it recently) and the Bing Maps Excel 2013 app, all do it for you automatically. So I decided to show how to do reverse geocoding instead – looking up an address from a given latitude and longitude – which is also possible with the Bing Locations API. Here’s how:

image

In cell C3 I have pasted my Bing Maps API key. C5 and C6 contain the latitude and longitude I want to search for. B8 contains the formula with the WebService() function that calls the API and returns the result as XML:
=WEBSERVICE("http://dev.virtualearth.net/REST/v1/Locations/" & C5 & "," & C6 & "?o=xml&key=" & C3)

C10 uses the FilterXML() function to retrieve the address from the XML:
=FILTERXML(B8, "/Response/ResourceSets/ResourceSet/Resources/Location/Name")

Easy-peasy. Another common problem is that one I mentioned earlier about calculating the distance between two points. There are a few examples out there of how to calculate direct distances (like this one I saw halfway through writing this post), but direct distances aren’t all that useful – what you really need is the time needed to drive or walk somewhere along the available roads. Bing has a web service that provides route information and it works perfectly, but I found that the WebService() function returned errors in some cases – I suspect because the XML returned by Bing was too long, but I’m not sure. Anyway, Google has a similar API (with no key needed, at least for light usage) that returns only distances and not detailed routes, and WebService() seems to like it better. Here’s an example:

image

Here, I’m entering a start address and an end address in cells C2 and C3, and then calling the Google Distance Matrix API as follows:

=WEBSERVICE("http://maps.googleapis.com/maps/api/distancematrix/xml?origins=" & ENCODEURL(C2) & "&destinations=" & ENCODEURL(C3) & "&mode=driving&sensor=false")

Then, I’m using FilterXML() to get the text of the driving distance and duration in cells C7 and C8 from the XML returned:

=FILTERXML(B5, "/DistanceMatrixResponse/row/element/distance/text")

=FILTERXML(B5, "/DistanceMatrixResponse/row/element/duration/text")

You can download a sample workbook that contains both these examples, minus my Bing Maps API key, here.

One final thing to mention is that the WebService() function seems to be horrendously slow, so you might not be able to use it to make a large number of calls very quickly, although many of these services allow you to make multiple lookups in a single API call. Also, remember that there’s usually a limit on the number of calls you can make to these APIs without paying.

Written by Chris Webb

February 26, 2013 at 9:07 pm

Posted in Excel

Tagged with

MDX on Cloudera Impala

with 5 comments

You may have seen today’s announcement (and comment from Mary-Jo Foley) on the official SQL Server blog about HortonWorks Data Platform for Windows; I won’t repeat what it says, but it’s clear Microsoft is taking Hadoop and Big Data (yuk, I hate that term, but sometimes I can’t avoid using it…) very seriously. However, yesterday I saw another, very interesting announcement from the guys at Simba that presents an alternative vision for Microsoft BI and Big Data. You can see a demo of it here:

Simba MDX Provider for Impala

There are some more details here:
https://groups.google.com/a/cloudera.org/forum/?fromgroups=#!topic/impala-user/BniN1ELYNDs

Basically, Simba have built an OLEDB for OLAP Provider that translates the MDX generated by Excel PivotTables into HiveQL queries against Cloudera Impala. You can read a good, short overview of what Impala is here:
http://blog.cloudera.com/blog/2012/10/cloudera-impala-real-time-queries-in-apache-hadoop-for-real/
and there’s more detailed information here:
https://ccp.cloudera.com/display/IMPALA10BETADOC/Impala+Frequently+Asked+Questions

In summary, what this gives you is fast, interactive analysis from within Excel going direct against large amounts of data stored via Impala, with no need for users to have to write Hive queries or stage the data in PowerPivot in the way the current Microsoft/Hortonworks solution does. Even more interesting is the fact that Simba support MDX calculated members and not just MDX queries, so given that Excel 2013 allows you to define your own calculated members you could do some very powerful analysis this way. Well, those of us who know some MDX, at least :-)

Impala is open source and isn’t shy about the debt it owes to Google’s Dremel, which of course is available publicly now as BigQuery. When I first saw BigQuery I thought putting an MDX interface over the top would make it appealing to a much wider audience – maybe not a good thing for Microsoft, but it would be a clever move on the part of Google certainly. Microsoft hasn’t announced that it’s working on anything comparable to BigQuery, alas; already, BI tools like Tableau and BIME can connect to and query BigQuery (Tableau also connects to Amazon’s new Redshift database too), and it’s these tools that are Excel’s big competitors in the BI client tool space. I guess SSAS 2012 Tabular in DirectQuery mode going against PDW would be the only vaguely similar Microsoft solution, but PDW is on-prem only and pretty expensive. Translating MDX to the SQL used by tools like Impala, in the way that the Simba MDX Provider does, puts Excel on a more equal footing with Tableau et al. I don’t know how Simba/Cloudera will be making this MDX Provider available but I would be surprised if they didn’t charge for it; Microsoft’s close relationship with Hortonworks, a competitor to Cloudera, makes me think that Microsoft might not want to promote this particular tool either, which is a shame. Maybe an acceptable solution for MS would be to build new cartridges for SSAS 2012 and enable DirectQuery for data sources other than SQL Server? It’s unlikely to happen, I think, but it would be an option.

UPDATE: While I was writing this post, Cathy Dumas of Simba (you were wondering what happened to her, weren’t you?) also blogged about this

http://blogs.simba.com/simba_technologies_ceo_co/2013/02/demo-microsoft-excel-pivottables-on-cloudera-impala-via-simba-mdx-provider.html

Written by Chris Webb

February 25, 2013 at 11:38 pm

Posted in Excel, MDX

Tagged with

Follow

Get every new post delivered to your Inbox.

Join 3,302 other followers