Archive for August 2013
Returning The Contents Of Any Table In Any Excel Workbook In Sharepoint With Power Pivot And Power Query
Yet another Power Query post this week (with more to come) – today I’m going to show you how you can use Power Pivot and Power Query to build a dynamic solution that allows a user to browse through all the Excel workbooks stored in a Sharepoint document library and see the contents of any table in any workbook they choose. Why would you want to do this? Well, I’m not really sure, but I visited a customer a few months ago where an analyst (a data steward?!) had created a repository of Excel workbooks containing various datasets used for BI in Sharepoint. Maybe it could be useful to use Power Query to browse these datasets… and even if it isn’t, it’s still a learning experience for me :-)
The first step is to build a Power Query query that returns a list of all of the Excel workbooks in a Sharepoint document library. There are a couple of ways of doing this; you could use the SharePoint.Contents() function, but I opted to use the OData REST api that Sharepoint exposes because it was faster and easier to work with. The following Power Query query is all that’s needed to get a list of files in a library called MetadataTest:
These all happen to be Excel workbooks, which makes things easier for me.
The next thing to do is to get a list of all the tables in any of these Excel workbooks. Again, you can use OData for this. Here’s an example query that requests the $metadata of the OData endpoint that an Excel workbook called FruitVegPeople.xlsx exposes, interprets the response as an XML document, and finds the list of tables in the workbook from that XML document (this last step accounts for 90% of the code below):
In this case, there are three tables called Fruit, Vegetable and Person returned:
At this point you can find a list of files in a library and find a list of tables in an Excel workbook; what you really need is a single list of all the tables in all of the files. This can be accomplished by turning the second query above into a function, and calling that function for each file returned by the first query. Here’s the function:
And here it is used:
This gives you the list of workbooks and tables you need, and this table itself can now be loaded into the Excel Data Model. You can now build a report using this data something like this:
The slicers allow the user to select a folder, a workbook and a table in a workbook. In the top right hand corner is a PivotTable displaying a measure with the following definition:
This returns the URL containing the OData feed for the contents of the selected table. The PivotTable is a bit of a red herring, in fact – I only created it so the slicer highlighting would work properly, a trick I use quite frequently when I’m working with Excel cube functions (the PivotTable itself can be hidden from the end user). In the bottom right hand corner, in the cell selected in the screenshot above, is a CubeValue() function that returns the same measure value and I’ve put that cell into an Excel table – which means that we can now use this measure value as an input to yet another Power Query query.
Two last bits of code: first, a Power Query function that will return the contents of any OData feed passed to it.
Finally, a query that takes this function and calls it for the URL selected by the user above:
And there you have it – a way of selecting any table in any Excel workbook in a Sharepoint library and seeing its contents. You can download the sample workbook here, although of course you’ll have to modify all the URLs to make it work on.
PS Is it just me, or does “Power Query query” seem like a bit of a mouthful? Would it be ok just to talk about queries in future? That seems a bit misleading… maybe I should talk about “a Power Query”? That doesn’t sound right either. Hmmm.
This is just a quick post to summarise the Power BI-related information that has come to light over the last few weeks; this is all stuff that has been tweeted extensively (follow me!), but I know not everyone spends their life on Twitter so here you go:
- The Power BI mobile app has appeared in the Windows 8 app store. Check out these posts from Jen Underwood, A.J. Mee and Parikshit Savjani for more details:
There’s also a detailed video on YouTube showing a demo of it:
- The maximum size of a workbook you can upload to Power BI will be 250MB, as stated here:
- There’s a new option to use the HTML 5 viewer for Power View reports in the browser:
- The full Power BI online help has been published. Links to all the help docs and more can be found on this wiki page here:
There’s a lot of interesting stuff in there. For example, here are the details on how you can publish an OData feed from SSIS and therefore expose on-prem data sources to Power BI in the cloud:
And here’s a discussion of the role of the data steward in self-service BI scenarios, something Jamie Thomson also blogged about recently:
And here’s how to share Power Query queries with other people:
Now, all we need is for the full preview to be released…
The more I use Power Query and the M language that underpins it, the more I see how useful it is – quite apart from its abilities to import data from other data sources – as a third option (after regular Excel formulas and Power Pivot/DAX) to solve problems in Excel itself. For example, last week I read this blog post by David Hager about finding the number of unique values in a delimited string:
It’s an interesting question and the Excel formulas in this post are way beyond what I’m capable of writing. The point is, though, that Power Query can do this very easily indeed just through the UI. Starting with the following table in a worksheet:
You just need to import it into a Power Query query:
Use the Split Column/By Delimiter/By Comma option on the Input column:
This creates as many columns as you’ve got values in the delimited list with the largest number of values:
You can then use the Unpivot option on these new columns:
Then remove the Attribute column:
Next, select Remove Duplicates on the entire table:
Finally, do a Group By on the RowID column and Count the number of rows:
And bingo, you have the number of distinct values in each delimited list:
Here’s the complete code:
Emboldened by this, I turned to another Excel challenge – this time from Chandoo’s blog:
This time the objective is to split a string containing blocks of text and numbers so that you get everything after the first block of numbers. The whole point of the challenge that Chandoo lays down is to do this in a single Excel formula with no VBA; I was just curious to see how easy it would be to solve the problem in Power Query however many steps it took. Unfortunately this is not a something that you can do just in the UI and you need to write some M, but the code isn’t too bad:
Starting with the following table:
You then need to create a new column containing a list of the positions of every occurrence of a numeric character in each string:
Then create another new column containing a list of continuous numbers starting from the first number in the previous list and incrementing by 1:
Then create another new column containing the minimum value from a list all of the values that are in the previous list and not in the first list you created :
This gives the position of the first character in the second block of text, which can then be used to find a substring of the original text:
Here’s the complete code:
My first reaction when I saw M was that I liked it, but I didn’t think Excel users were prepared to learn yet another language. I still think this (it would have been much better if Power Query, like Power Pivot/DAX, used a language that was consistent with Excel formula language) but as you can see from these examples, sometimes having a choice of tools means it’s more likely that you can find an easy solution to a problem.
One of the many interesting things that caught my eye in the new SSAS Tabular Performance Tuning white paper is actually about new functionality in the SSAS 2012 Multidimensional and nothing to do with Tabular! It turns out that in the DAXMD release of SSAS 2012, ie SQL Server 2012 SP1 CU4, some work was done to enable the Formula Engine to cache the results of MDX calculations for longer than the lifetime of a query when a query includes a subselect. Here’s an excerpt from the paper:
Certain types of queries cannot benefit from MDX caching. For example, multi-select filters (represented in the MDX query with a subselect) prevent use of the global MDX formula engine cache. However, this limitation was lifted for most subselect scenarios as part of SQL Server 2012 SP1 CU4 (http://support.microsoft.com/kb/2833645/en-us). However, subselect global scope caching is still prevented for arbitrary shapes (http://blog.kejser.org/2006/11/16/arbitrary-shapes-in-as-2005/), transient calculations like NOW(), and queries without a dimension on rows or columns. If the business requirements for the report do not require visual totals and caching is not occurring for your query, consider changing the query to use a set in the WHERE clause instead of a subselect as this enables better caching in some scenarios.
This is a subject I’ve blogged about in the past, both for SSRS reports (which almost always use subselects in the MDX created by the query editor) and Excel PivotTables (which sometimes, but not always, use subselects) and you may want to read this posts to get some background. In my experience, if you have a lot of complex MDX calculations on your cube (financial applications are a great example), this issue can have a major impact on your overall query performance, even if it isn’t immediately obvious that this is the case. On builds of SSAS before 2012 SP1 CU4, even if Storage Engine caching is working properly, if a query references a lot of MDX calculations and includes a subselect it will be consistently slow however many times you run it because the calculations will need to be re-evaluated every time the query is run.
I’ve heard of a few problems with CU4 regarding SSRS so I don’t recommend upgrading your production SSAS servers just yet, but when these problems have been ironed out in the next full service pack I think this could be a compelling reason for many people to move to SSAS 2012. There’s also still a limitation whereby queries that return a single cell value and use a subselect may still not be able to use the global Formula Engine cache, but hopefully this will be dealt with in a future release too. Overall, though, I’m extremely pleased to see yet another improvement to the Multidimensional engine.
Thanks to Jeffrey Wang for answering my questions about this functionality.
In case you haven’t already heard via Twitter, a new white paper on performance tuning Analysis Services 2012 Tabular models was released at the end of last week. You can download it here:
It’s good stuff, required reading for anyone working with SSAS Tabular. Kudos to the authors, John Sirmon, Greg Galloway, Cindy Gross and Karan Gulati!