Chris Webb's BI Blog

Analysis Services, MDX, PowerPivot, DAX and anything BI-related

Archive for August 2013

Returning The Contents Of Any Table In Any Excel Workbook In Sharepoint With Power Pivot And Power Query

with 6 comments

Yet another Power Query post this week (with more to come) – today I’m going to show you how you can use Power Pivot and Power Query to build a dynamic solution that allows a user to browse through all the Excel workbooks stored in a Sharepoint document library and see the contents of any table in any workbook they choose. Why would you want to do this? Well, I’m not really sure, but I visited a customer a few months ago where an analyst (a data steward?!) had created a repository of Excel workbooks containing various datasets used for BI in Sharepoint. Maybe it could be useful to use Power Query to browse these datasets… and even if it isn’t, it’s still a learning experience for me :-)

The first step is to build a Power Query query that returns a list of all of the Excel workbooks in a Sharepoint document library. There are a couple of ways of doing this; you could use the SharePoint.Contents() function, but I opted to use the OData REST api that Sharepoint exposes because it was faster and easier to work with. The following Power Query query is all that’s needed to get a list of files in a library called MetadataTest:

let

    Source = OData.Feed("https://zzzz.sharepoint.com/_vti_bin/listdata.svc/MetadataTest"),

    RemovedOtherColumns = Table.SelectColumns(Source,{"Id", "ContentType", "Path", "Name"})

in

    RemovedOtherColumns

 

image

These all happen to be Excel workbooks, which makes things easier for me.

The next thing to do is to get a list of all the tables in any of these Excel workbooks. Again, you can use OData for this. Here’s an example query that requests the $metadata of the OData endpoint that an Excel workbook called FruitVegPeople.xlsx exposes, interprets the response as an XML document, and finds the list of tables in the workbook from that XML document (this last step accounts for 90% of the code below):

let

    Source = Xml.Document(

    OData.Feed(

    "https://zzzz.sharepoint.com/

    _vti_bin/ExcelRest.aspx/MetadataTest/FruitVegPeople.xlsx/OData/$metadata"

    )),

    Value = Source{0}[Value],

    Value1 = Value{0}[Value],

    Value2 = Value1{0}[Value],

    Value3 = Value2{3}[Value],

    #"Expand Value" = Table.ExpandTableColumn(Value3, "Value"

        , {"Name", "Namespace", "Value", "Attributes"}

        , {"Value.Name", "Value.Namespace", "Value.Value", "Value.Attributes"}),

    #"Expand Attributes" = Table.ExpandTableColumn(#"Expand Value", "Attributes"

        , {"Name", "Namespace", "Value"}

        , {"Attributes.Name", "Attributes.Namespace", "Attributes.Value"}),

    FilteredRows = Table.SelectRows(#"Expand Attributes", each ([Attributes.Name] = "Name")),

    RemovedOtherColumns = Table.SelectColumns(FilteredRows,{"Attributes.Value"}),

    RenamedColumns = Table.RenameColumns(RemovedOtherColumns,{{"Attributes.Value", "TableName"}})

in

    RenamedColumns

 

In this case, there are three tables called Fruit, Vegetable and Person returned:

image

At this point you can find a list of files in a library and find a list of tables in an Excel workbook; what you really need is a single list of all the tables in all of the files. This can be accomplished by turning the second query above into a function, and calling that function for each file returned by the first query. Here’s the function:

let

    Source = (SharepointRootURL, ExcelSourceDoc) => 

 

let

    Source = 

    Xml.Document(

    OData.Feed(SharepointRootURL & "/_vti_bin/ExcelRest.aspx" & ExcelSourceDoc & "/OData/$metadata"

    )

    ),

    Value = Source{0}[Value],

    Value1 = Value{0}[Value],

    Value2 = Value1{0}[Value],

    Value3 = Value2{3}[Value],

    #"Expand Value" = Table.ExpandTableColumn(Value3, "Value", {"Name", "Namespace"

    , "Value", "Attributes"}, {"Value.Name", "Value.Namespace", "Value.Value", "Value.Attributes"}),

    #"Expand Attributes" = Table.ExpandTableColumn(#"Expand Value", "Attributes"

    , {"Name", "Namespace", "Value"}, {"Attributes.Name", "Attributes.Namespace", "Attributes.Value"}),

    FilteredRows = Table.SelectRows(#"Expand Attributes", each ([Attributes.Name] = "Name")),

    RemovedOtherColumns = Table.SelectColumns(FilteredRows,{"Attributes.Value"}),

    RenamedColumns = Table.RenameColumns(RemovedOtherColumns,{{"Attributes.Value", "TableName"}})

in

    RenamedColumns

 

in

    Source

 

And here it is used:

let

    SharepointRootURL = "https://zzzz.sharepoint.com",

    Source = OData.Feed(SharepointRootURL & "/_vti_bin/listdata.svc/MetadataTest"),

    RemovedOtherColumns = Table.SelectColumns(Source,{"Id", "ContentType", "Path", "Name"}),

    InsertedCustom = Table.AddColumn(RemovedOtherColumns, "Custom"

    , each GetTablesFunction(SharepointRootURL, [Path] & "/" & [Name] )),

    #"Expand Custom" = Table.ExpandTableColumn(InsertedCustom, "Custom", {"TableName"}, {"TableName"}),

    InsertedCustom1 = Table.AddColumn(#"Expand Custom", "TableODataURL", each SharepointRootURL 

    & "/_vti_bin/ExcelRest.aspx" & [Path] & "/"  & [Name] & "/OData/" & [TableName]),

    RemovedColumns = Table.RemoveColumns(InsertedCustom1,{"ContentType", "Id"})

in

    RemovedColumns

 

image

This gives you the list of workbooks and tables you need, and this table itself can now be loaded into the Excel Data Model. You can now build a report using this data something like this:

image

The slicers allow the user to select a folder, a workbook and a table in a workbook. In the top right hand corner is a PivotTable displaying a measure with the following definition:

TableODataURLMeasure:=

IF(HASONEVALUE(GetDocumentsTables[TableODataURL]), VALUES(GetDocumentsTables[TableODataURL]))

This returns the URL containing the OData feed for the contents of the selected table. The PivotTable is a bit of a red herring, in fact – I only created it so the slicer highlighting would work properly, a trick I use quite frequently when I’m working with Excel cube functions (the PivotTable itself can be hidden from the end user). In the bottom right hand corner, in the cell selected in the screenshot above, is a CubeValue() function that returns the same measure value and I’ve put that cell into an Excel table – which means that we can now use this measure value as an input to yet another Power Query query.

Two last bits of code: first, a Power Query function that will return the contents of any OData feed passed to it.

let

    Source = (ODataTableURL) => 

 

let

    Source = OData.Feed(ODataTableURL),

    RemovedColumns = Table.RemoveColumns(Source,{"excelUpdated", "excelRowID"})

in

    RemovedColumns

 

in

    Source

 

Finally, a query that takes this function and calls it for the URL selected by the user above:

let

    Source = GetTableContentsFunction(Excel.CurrentWorkbook(){[Name="TableODataURL"]}[Content]{0}[URL])

in

    Source

 

And there you have it – a way of selecting any table in any Excel workbook in a Sharepoint library and seeing its contents. You can download the sample workbook here, although of course you’ll have to modify all the URLs to make it work on. 

PS Is it just me, or does “Power Query query” seem like a bit of a mouthful? Would it be ok just to talk about queries in future? That seems a bit misleading… maybe I should talk about “a Power Query”? That doesn’t sound right either. Hmmm.

Written by Chris Webb

August 29, 2013 at 11:28 pm

Posted in Power Query

Power BI News Roundup

with 5 comments

This is just a quick post to summarise the Power BI-related information that has come to light over the last few weeks; this is all stuff that has been tweeted extensively (follow me!), but I know not everyone spends their life on Twitter so here you go:

Now, all we need is for the full preview to be released…

Written by Chris Webb

August 22, 2013 at 10:48 pm

Posted in Power BI

Some Power Query/M Examples

with 16 comments

The more I use Power Query and the M language that underpins it, the more I see how useful it is – quite apart from its abilities to import data from other data sources – as a third option (after regular Excel formulas and Power Pivot/DAX) to solve problems in Excel itself. For example, last week I read this blog post by David Hager about finding the number of unique values in a delimited string:

http://dailydoseofexcel.com/archives/2013/08/07/calculating-the-number-of-unique-items-in-a-delimited-string/

It’s an interesting question and the Excel formulas in this post are way beyond what I’m capable of writing. The point is, though, that Power Query can do this very easily indeed just through the UI. Starting with the following table in a worksheet:

image

You just need to import it into a Power Query query:

image

Use the Split Column/By Delimiter/By Comma option on the Input column:

image

This creates as many columns as you’ve got values in the delimited list with the largest number of values:

image

You can then use the Unpivot option on these new columns:

image

Then remove the Attribute column:

image

Next, select Remove Duplicates on the entire table:

image

Finally, do a Group By on the RowID column and Count the number of rows:

image

And bingo, you have the number of distinct values in each delimited list:

image

Here’s the complete code:

let

    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],

    SplitColumnDelimiter = Table.SplitColumn(Source,"Input",Splitter.SplitTextByDelimiter(","),13),

    Unpivot = Table.Unpivot(SplitColumnDelimiter,{"Input.1", "Input.2", "Input.3", "Input.4",

    "Input.5", "Input.6",    "Input.7", "Input.8", "Input.9", "Input.10", "Input.11", "Input.12"

    ,  "Input.13"},"Attribute","Value"),

    RemovedColumns = Table.RemoveColumns(Unpivot,{"Attribute"}),

    DuplicatesRemoved = Table.Distinct(RemovedColumns),

    GroupedRows = Table.Group(DuplicatesRemoved, {"RowID"}, {{"Count of Distinct Values"

    , each Table.RowCount(_), type number}})

in

    GroupedRows

 

Emboldened by this, I turned to another Excel challenge – this time from Chandoo’s blog:

http://chandoo.org/wp/2013/07/16/formula-challenge-001-1/

This time the objective is to split a string containing blocks of text and numbers so that you get everything after the first block of numbers. The whole point of the challenge that Chandoo lays down is to do this in a single Excel formula with no VBA; I was just curious to see how easy it would be to solve the problem in Power Query however many steps it took. Unfortunately this is not a something that you can do just in the UI and you need to write some M, but the code isn’t too bad:

Starting with the following table:

image

You then need to create a new column containing a list of the positions of every occurrence of a numeric character in each string:

image

Then create another new column containing a list of continuous numbers starting from the first number in the previous list and incrementing by 1:

image

Then create another new column containing the minimum value from a list all of the values that are in the previous list and not in the first list you created :

image

This gives the position of the first character in the second block of text, which can then be used to find a substring of the original text:

image

Here’s the complete code:

let

    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],

    InsertedCustom = Table.AddColumn(Source, "NumericPositions", 

    each Text.PositionOfAny([Input], {"0", "1", "2", "3", "4", "5", "6", "7", "8", "9"}, Occurrence.All)),

    InsertedCustom1 = Table.AddColumn(InsertedCustom, "NumberOfPositions", 

    each List.Numbers(List.Min([NumericPositions]), List.Count([NumericPositions]))),

    InsertedCustom2 = Table.AddColumn(InsertedCustom1, "StartChar", 

    each List.Min(List.Difference([NumberOfPositions], [NumericPositions]))),

    InsertedCustom3 = Table.AddColumn(InsertedCustom2, "Output", each Text.Range([Input], [StartChar]))

in

    InsertedCustom3

 

My first reaction when I saw M was that I liked it, but I didn’t think Excel users were prepared to learn yet another language. I still think this (it would have been much better if Power Query, like Power Pivot/DAX, used a language that was consistent with Excel formula language) but as you can see from these examples, sometimes having a choice of tools means it’s more likely that you can find an easy solution to a problem.

Written by Chris Webb

August 16, 2013 at 11:31 am

Posted in Power BI, Power Query

Subselects And Formula Engine Caching In SSAS 2012

with 13 comments

One of the many interesting things that caught my eye in the new SSAS Tabular Performance Tuning white paper is actually about new functionality in the SSAS 2012 Multidimensional and nothing to do with Tabular! It turns out that in the DAXMD release of SSAS 2012, ie SQL Server 2012 SP1 CU4, some work was done to enable the Formula Engine to cache the results of MDX calculations for longer than the lifetime of a query when a query includes a subselect. Here’s an excerpt from the paper:

Certain types of queries cannot benefit from MDX caching. For example, multi-select filters (represented in the MDX query with a subselect) prevent use of the global MDX formula engine cache. However, this limitation was lifted for most subselect scenarios as part of SQL Server 2012 SP1 CU4 (http://support.microsoft.com/kb/2833645/en-us). However, subselect global scope caching is still prevented for arbitrary shapes (http://blog.kejser.org/2006/11/16/arbitrary-shapes-in-as-2005/), transient calculations like NOW(), and queries without a dimension on rows or columns. If the business requirements for the report do not require visual totals and caching is not occurring for your query, consider changing the query to use a set in the WHERE clause instead of a subselect as this enables better caching in some scenarios.

This is a subject I’ve blogged about in the past, both for SSRS reports (which almost always use subselects in the MDX created by the query editor) and Excel PivotTables (which sometimes, but not always, use subselects) and you may want to read this posts to get some background. In my experience, if you have a lot of complex MDX calculations on your cube (financial applications are a great example), this issue can have a major impact on your overall query performance, even if it isn’t immediately obvious that this is the case. On builds of SSAS before 2012 SP1 CU4, even if Storage Engine caching is working properly, if a query references a lot of MDX calculations and includes a subselect it will be consistently slow however many times you run it because the calculations will need to be re-evaluated every time the query is run.

I’ve heard of a few problems with CU4 regarding SSRS so I don’t recommend upgrading your production SSAS servers just yet, but when these problems have been ironed out in the next full service pack I think this could be a compelling reason for many people to move to SSAS 2012. There’s also still a limitation whereby queries that return a single cell value and use a subselect may still not be able to use the global Formula Engine cache, but hopefully this will be dealt with in a future release too. Overall, though, I’m extremely pleased to see yet another improvement to the Multidimensional engine.

Thanks to Jeffrey Wang for answering my questions about this functionality.

Written by Chris Webb

August 7, 2013 at 12:10 am

SSAS Tabular Performance Tuning White Paper

with 3 comments

In case you haven’t already heard via Twitter, a new white paper on performance tuning Analysis Services 2012 Tabular models was released at the end of last week. You can download it here:

http://msdn.microsoft.com/en-us/library/dn393915.aspx

It’s good stuff, required reading for anyone working with SSAS Tabular. Kudos to the authors, John Sirmon, Greg Galloway, Cindy Gross and Karan Gulati!

Written by Chris Webb

August 4, 2013 at 11:31 pm

Follow

Get every new post delivered to your Inbox.

Join 3,072 other followers