Chris Webb's BI Blog

Analysis Services, MDX, PowerPivot, DAX and anything BI-related

Archive for the ‘Power Query’ Category

Working With Excel Named Ranges In Power Query

with 16 comments

One of the more recent additions to Power Query is the ability to access data from named ranges in the Excel worksheet rather than an Excel table. I’ve got used to formatting data as tables in Excel because that’s what Power Pivot needs to import data directly from the worksheet, but if you are working with Power Query and a pre-existing workbook then trying to reformat data as tables can be a pain. Also, if you just want to import a single value, for example as a parameter to a query, a table seems like overkill.

(Incidentally, if you’re wondering what a named range is in Excel, there are tons of good introductions to the subject on the internet like this one. You can do loads of cool stuff with them.)

Consider the following Excel worksheet:

image

There are three named ranges here: FirstRange, SecondRange, ThirdRange, and the values in the cells show which range the cells are in. FirstRange consists of two cells in two columns; SecondRange consists of three cells in a single row; and ThirdRange is consists of three, non-contiguous cells. (You can also use this trick to display the names of all contiguous ranges in an Excel workbook, but alas it does seem to work for non-contiguous ranges).

At the moment, the Power Query ribbon doesn’t make it obvious that you can use named ranges as data sources. However you can see all the tables and cells in a worksheet, and even return that list from a query, by creating a blank query and using the expression

= Excel.CurrentWorkbook()

image

Once you’ve done this you can see all the ranges (and also any tables) in the workbook, and click on the table link next to the name to see the data. For example, clicking on FirstRange shows the following table in a new step in the query editor:

image

The expression to get at this table in a single step is:

= Excel.CurrentWorkbook(){[Name="FirstRange"]}[Content]

The range SecondRange in my example is equally straightforward to reference, and you can see its contents by using the expression

= Excel.CurrentWorkbook(){[Name="SecondRange"]}[Content]

image

Unfortunately ThirdRange, which is not contiguous, is a problem: I can only get the first cell in the range. So the expression

= Excel.CurrentWorkbook(){[Name="ThirdRange"]}[Content]

Returns just this table:

image

It would be nice if we could get a list containing the cell values, rather than a table, for ranges like this…

Last thing to mention is that if you do want the value in a cell, rather than a table, you just need to right-click inside the cell in the Query Editor and select Drill Down:

image

This returns the value (in this case the text “Third Range Cell 1”) in the cell you clicked on:

image

This is a much more useful value to return than a table containing a single row/column, if you intend to use a value from a single cell in a named range as a parameter to another query.

You can download the sample workbook for this post here.

Written by Chris Webb

July 22, 2014 at 9:30 am

Posted in Excel, Power Query

Power Query Book Published!

with 5 comments

Looking for some summer holiday (or winter holiday, depending on which hemisphere you live in) reading? If so, may I suggest my new Power Query book? “Power Query for Power BI and Excel” is available now from the Apress site, Amazon.com, Amazon.co.uk and all good bookstores.

Power Query for Power BI and Excel Cover Image

It’s an introductory level book. It covers all of the stuff you can do in the UI, it has a chapter on M, and it goes into a reasonable amount of detail on more advanced topics; it is not a 500-page exhaustive guide to the product. I’ve focused on readability and teaching the fundamentals of Power Query rather than every looking at every obscure M function, but at the same time if you’ve already used Power Query I think there’ll be plenty of material in there you’ll find interesting.

Now for the bad news: the book is out-of-date already, although not by much. One of the best things about Power Query is the monthly release cycle; unfortunately that makes writing a book on it a bit of a nightmare. I started off writing in January and had to deal with lots of added functionality and changes to the UI over the next few months; I had to retake pretty much all of the screenshots as a result. The published version of the book is based on the version of Power Query that was released in early June rather than the current version. Hopefully you can forgive this – the differences are minor – but it’s a good reason to buy the book as soon as you can! I want to do a second edition in a year’s time once (if?) the release cycle slows down.

I’ve been teased a bit for blogging and teaching so much about Power Query recently, so the final thing I want to say here is why an old corporate BI/SSAS guy like me is getting so excited about a self-service ETL tool. Well, the main reason is that Power Query is a great piece of software. It does what it does very well; it does useful things rather than what the marketing guys/analysts/journalists think is hot in BI; it is easy to use but at the same time is flexible enough for the advanced user to do really complex stuff; it is updated regularly based on feedback from its users. I only wish all Microsoft software was this good… Honestly, I wouldn’t be able to motivate myself to blog and write about Power Query if I didn’t think it was cool, and even though it hasn’t been hyped in the same way as other parts of the Power BI stack it is nonetheless the part that people get excited about when I show them Power BI. It’s not just me either – every day I see positive comments like Greg Low’s here. I think it is as important, if not more important, than Power Pivot and I think it will be a massive success.

Oh, and did I mention that I’m also teaching a Power Query course in London later this year….?

Written by Chris Webb

July 12, 2014 at 3:09 pm

Posted in Books, Power BI, Power Query

Using List.Generate() To Make Multiple Replacements Of Words In Text In Power Query

with 4 comments

Recently I had a request for help from someone who wanted to do the following in Power Query: take a piece of text and then, using a table, search for all of the occurrences of the words in one column of the table in the text and replace those words with those in the other column. So, for example, given these two tables in Excel:

image

You want to take the table on the left and for each piece of text replace the words in the ‘Word To Replace’ column of the right-hand table with those in the ‘Replace With’ column of the right-hand table. The output would therefore be:

image

An interesting challenge in itself, and one I solved first of all using a recursive function. Here’s some code showing how I did it:

let

    //Get table of word replacements

    Replacements = Excel.CurrentWorkbook(){[Name="Replacements"]}[Content],

    //Get table containing text to change

    TextToChange = Excel.CurrentWorkbook(){[Name="Text"]}[Content],

    //Get a list of all words to replace

    WordsToReplace = Table.Column(Replacements, "Word To Replace"),

    //Get a list of all words to replace with

    WordsToReplaceWith = Table.Column(Replacements, "Replace With"),

    //Recursive function to do the replacement

    ReplacementFunction = (InputText, Position)=> 

    let 

     //Use Text.Replace to do each replace

     ReplaceText = Text.Replace(

            InputText, 

            WordsToReplace{Position}, 

            WordsToReplaceWith{Position})

    in

     //If we have reached the end of the list of replacements

     if Position=List.Count(WordsToReplace)-1 

      then 

      //return the output of the query

      ReplaceText 

      else 

      //call the function again

      @ReplacementFunction(ReplaceText, Position+1),

    //Add a calculated column to call the function on every row in the table

    //containing text to change

    Output = Table.AddColumn(TextToChange, "Changed Text", each ReplacementFunction([Text], 0))

    

in

    Output

 

It does the job, but… after thinking about this some more, I wondered if there was a better way. A lot of my recent Power Query blog posts have used recursive functions, but are they a Good Thing? So I asked on the forum, and as usual the nice people on the Power Query dev team answered very promptly (that’s one of the things I like about the Power Query dev team – they engage with their users). Recursive functions are indeed something that should be avoided if there is an alternative, and in this case List.Generate() can be used instead. Here’s how:

let

    //Get table of word replacements

    Replacements = Excel.CurrentWorkbook(){[Name="Replacements"]}[Content],

    //Get table containing text to change

    TextToChange = Excel.CurrentWorkbook(){[Name="Text"]}[Content],

    //Get list of words to replace

    WordsToReplace = Table.Column(Replacements, "Word To Replace"),

    //Get list of words to replace them with

    WordsToReplaceWith = Table.Column(Replacements, "Replace With"),

    //A non-recursive function to do the replacements

    ReplacementFunction = (InputText)=> 

     let

       //Use List.Generate() to do the replacements

       DoReplacement = List.Generate(

                          ()=> [Counter=0, MyText=InputText], 

                          each [Counter]<=List.Count(WordsToReplaceWith), 

                          each [Counter=[Counter]+1, 

                                MyText=Text.Replace(

                                         [MyText], 

                                         WordsToReplace{[Counter]}, 

                                         WordsToReplaceWith{[Counter]})], 

                          each [MyText]),

       //Return the last item in the list that

       //List.Generate() returns

       GetLastValue = List.Last(DoReplacement)

     in

      GetLastValue,

    //Add a calculated column to call the function on every row in the table

    //containing the text to change

    Output = Table.AddColumn(TextToChange, "Changed Text", each ReplacementFunction([Text]))

in

    Output

 

List.Generate() is a very powerful function indeed, albeit one that took me a while to understand properly. It’s a bit like a FOR loop even if it’s a function that returns a list. Here’s what each of the parameters I’m passing to the function in the example above do:

  •  ()=> [Counter=0, MyText=InputText] returns a function that itself returns a record (a record is a bit like a table with just one row in it). The record contains two fields: Counter, which has the value 0, and MyText which is given the value of the text where the values are to be replaced. This record is the initial value that List.Generate() will modify at each iteration.
  • each [Counter]<=List.Count(WordsToReplaceWith) returns a function too. An each expression is a quick way of declaring a function that takes one, unnamed parameter, and in this case the value that will be passed to this parameter is a record of the same structure as the one declared in the previous bullet. The expression [Counter] gets the value of the Counter field from that record. The function returns a boolean value, true when the value in the [Counter] field of the record is less than or equal to the number of items in the list of words to replace. List.Generate() returns a list, and while this function returns true it will keep on iterating and adding new items to the list it returns.
  • each [Counter=[Counter]+1, MyText=Text.Replace([MyText], WordsToReplace{[Counter]}, WordsToReplaceWith{[Counter]})] returns yet another function, once again declared using an each expression. The function here takes the record from the current iteration and returns the record to be used at the next iteration: a record where the value of the Counter field is increased by one, and where the value of the MyText field has one word replaced. The word that gets replaced in MyText is the word in the (zero-based) row number given by Counter in the ‘Word To Replace’ column; this word is replaced by the word in the row number given by Counter in the ‘Replace With’ column.
  • each [MyText] returns a very simple function, one that returns the value from the MyText field of the record from the current iteration. It’s the value that this function returns that is added to the list returned by List.Generate() at every iteration.

To illustrate this, here’s a simplified example showing how List.Generate() works in this case:

let

    WordsToReplace = {"cat", "dog", "mat"},

    WordsToReplaceWith = {"fish", "snake", "ground"},

    Demo = List.Generate(

                          ()=> [Counter=0, MyText="the cat and the dog sat on the mat"], 

                          each [Counter]<=List.Count(WordsToReplaceWith), 

                          each [Counter=[Counter]+1, 

                                MyText=Text.Replace(

                                         [MyText], 

                                         WordsToReplace{[Counter]}, 

                                         WordsToReplaceWith{[Counter]})], 

                          each [MyText])

 

in

    Demo

 

The output of this query is the list:

image

This list can be written as (with the words changed at each iteration highlighted):

{“the cat and the dog sat on the mat”, “the fish and the dog sat on the mat”,  “the fish and the snake sat on the mat”, “the fish and the snake sat on the ground”}

So, another useful function to know about. I’m slowly getting to grips with all this functional programming!

You can download the sample workbook here.

Written by Chris Webb

June 25, 2014 at 11:33 pm

Posted in Power Query

Join Conditions in Power Query, Part 2: Events-In-Progress, Performance and Query Folding

with 3 comments

In my last post you saw how to join two tables together using conditions other than the built-in inner, outer and anti join conditions. However, as I mentioned, you need to be aware of some of the performance implications of using this technique – and that is the subject I’ll be looking at in this post.

Let’s take the events-in-progress problem, one that I have blogged about many times in the past (see here, here, here and here for example) and see how we can solve it in Power Query. It’s a very common problem that you encounter when you have a fact table where each row represents an event, there are columns containing the start and end dates of each event, and you want to count the number of events that were in progress on any given date.

Using the Adventure Works DW database, my normal way of illustrating the problem is this: taking the DimDate and FactInternetSales tables, for each date show the number of purchases that had been ordered but not shipped on that date. One way of solving this problem (but not the only way, as my previous blog posts show) is to do a cross join between the DimDate and FactInternetSales tables, then filter the result so that you get the rows where the value in the DateKey column from the DimDate table is between the values in the OrderDateKey and the ShipDateKey columns from the FactInternetSales table.

As a first attempt you might come up with a query like this one, which uses a custom column containing a table value that is subsequently expanded to do a cross join (the technique shown in my previous post):

let

    //Connect to SQL Server

    Source = Sql.Database("localhost", "Adventure Works DW"),

    //Get data from the DimDate table

    dbo_DimDate = Source{[Schema="dbo",Item="DimDate"]}[Data],

    //Remove all columns except the DateKey column

    RemovedOtherColumns = Table.SelectColumns(dbo_DimDate,{"DateKey"}),

    //Insert a custom column that contains the whole of FactInternetSales as a table in each row

    InsertedCustom = Table.AddColumn(RemovedOtherColumns, "FactInternetSalesTable", 

                        each Source{[Schema="dbo",Item="FactInternetSales"]}[Data]),

    //Expand this new column to show the OrderDateKey and ShipDateKey columns

    #"Expand FactInternetSalesTable" = Table.ExpandTableColumn(InsertedCustom, 

                        "FactInternetSalesTable", 

                        {"OrderDateKey", "ShipDateKey"}, 

                        {"FactInternetSalesTable.OrderDateKey", 

                        "FactInternetSalesTable.ShipDateKey"}),

    //Filter where DateKey is greater than or equal to OrderDateKey and 

    //DateKey is less than or equal to ShipDateKey

    FilteredRows = Table.SelectRows(#"Expand FactInternetSalesTable", 

                        each [DateKey] >= [FactInternetSalesTable.OrderDateKey] and 

                        [DateKey] <= [FactInternetSalesTable.ShipDateKey]),

    //Find the count of the number of rows grouped by DateKey

    GroupedRows = Table.Group(FilteredRows, {"DateKey"}, 

                        {{"Count", each Table.RowCount(_), type number}})

in

    GroupedRows

There is, however, a big problem with this query: on my laptop it runs and runs forever – well, maybe not forever but I cancelled it after several minutes. Some tuning is necessary.

I don’t think anyone outside the Power Query dev team has much experience of performance tuning Power Query yet. However there is one golden rule that I do know: where possible, allow Power Query to push as much of the work back to the data source. This behaviour is known as “query folding” and it’s something that I’ve blogged about, as have Matt Masson and Darren Gosbell. Looking in SQL Server Profiler for the query above it is clear that no query folding is taking place: the only activity visible is Power Query reading the data from the DimDate and FactInternetSales tables separately.

After a bit of trial and error I came up with the following alternative:

let

    //Connect to SQL Server

    Source = Sql.Database("localhost", "adventure works dw"),

    //Get data from the DimDate table

    dbo_DimDate = Source{[Schema="dbo",Item="DimDate"]}[Data],

    //Remove all columns except DateKey

    RemovedOtherColumns = Table.SelectColumns(dbo_DimDate,{"DateKey"}),

    //Add a custom column to DimDate containing the value 1

    InsertedCustom = Table.AddColumn(RemovedOtherColumns, "Dummy", each 1),

    //Get data from the FactInternetSales table

    dbo_FactInternetSales = Source{[Schema="dbo",Item="FactInternetSales"]}[Data],

    //Remove all columns except OrderDateKey and ShipDateKey

    RemovedOtherColumns1 = Table.SelectColumns(dbo_FactInternetSales,

                        {"OrderDateKey", "ShipDateKey"}),

    //Add a custom column to FactInternetSales containing the value 1

    InsertedCustom1 = Table.AddColumn(RemovedOtherColumns1, "Dummy", each 1),

    //Join DimDate and FactInternetSales on the two columns that contain 1

    Custom1 = Table.Join(InsertedCustom1, "Dummy", InsertedCustom, "Dummy"),

    //Filter rows where DateKey is between OrderDateKey and ShipDateKey

    FilteredRows = Table.SelectRows(Custom1, 

                    each [DateKey] >= [OrderDateKey] and 

                    [DateKey] <= [ShipDateKey]),

    //Group by DateKey and find the number of rows for each date

    GroupedRows = Table.Group(FilteredRows, {"DateKey"}, 

                    {{"Count", each Table.RowCount(_), type number}}),

    //Sort dates in ascending order

    SortedRows = Table.Sort(GroupedRows,{{"DateKey", Order.Ascending}})

in

    SortedRows

This returns the correct result more or less instantly:

image

The main difference between this query and the previous one is how I’m getting a cross join between the two tables. This time I’m creating custom columns on DimDate and FactInternetSales that both contain the value 1, and then doing an inner join between this two tables on the new columns – which of course results in the equivalent of a cross join.

In Profiler I can see the following SQL query being generated by Power Query:

select [_].[DateKey],

    [_].[Count]

from 

(

    select [rows].[DateKey] as [DateKey],

        count(1) as [Count]

    from 

    (

        select [_].[OrderDateKey],

            [_].[ShipDateKey],

            [_].[Dummy],

            [_].[DateKey]

        from 

        (

            select [$Outer].[OrderDateKey],

                [$Outer].[ShipDateKey],

                [$Inner].[Dummy],

                [$Inner].[DateKey]

            from 

            (

                select [_].[OrderDateKey] as [OrderDateKey],

                    [_].[ShipDateKey] as [ShipDateKey],

                    1 as [Dummy]

                from 

                (

                    select [OrderDateKey],

                        [ShipDateKey]

                    from [dbo].[FactInternetSales] as [$Table]

                ) as [_]

            ) as [$Outer]

            inner join 

            (

                select [_].[DateKey] as [DateKey],

                    1 as [Dummy]

                from 

                (

                    select [DateKey]

                    from [dbo].[DimDate] as [$Table]

                ) as [_]

            ) as [$Inner] on ([$Outer].[Dummy] = [$Inner].[Dummy])

        ) as [_]

        where [_].[DateKey] >= [_].[OrderDateKey] and [_].[DateKey] <= [_].[ShipDateKey]

    ) as [rows]

    group by [DateKey]

) as [_]

order by [_].[DateKey]

 

Query folding is definitely taking place now!

The last question to ask here is whether the first query was slow because query folding was not taking place, or slow because of the way the query was written. You can test this quite easily by rewriting the second query to prevent query folding taking place using the Table.Buffer() function. For example, in the second query the step to get the data from the DimDate table is:

dbo_DimDate = Source{[Schema="dbo",Item="DimDate"]}[Data],

To prevent query folding it needs to be altered to:

dbo_DimDate = Table.Buffer(Source{[Schema="dbo",Item="DimDate"]}[Data]),

(The step to get data from the FactInternetSales table needs to be altered in the same way.)

With this change made the query now executes in around a minute. So clearly the new query is more efficient when it is executed inside the Power Query engine itself, without query folding, but the Power Query engine is still nowhere near as fast as SQL Server and query folding gives the best possible performance.

You can download the sample workbook here.

Written by Chris Webb

June 4, 2014 at 9:24 pm

Posted in Power Query

Join Conditions In Power Query, Part 1

with 7 comments

In last week’s post I showed how to create a simple LIKE function in Power Query which could be used in a calculated column. This week I’m going to show you how you can use this function in a condition when joining two tables together.

Consider the following two tables in an Excel worksheet:

image

Let’s say you want to join the Patterns table to the Phrases table, but only return the rows for each pattern where the Like() function returns true. Power Query has good support for different types of joins in the Table.Join() and Table.NestedJoin() functions but it isn’t immediately obvious how to handle join conditions such as this one.

Here’s the M code for a query that shows you how to do this:

let

    //Load Patterns table

    PatternsTable = Excel.CurrentWorkbook(){[Name="Patterns"]}[Content],

    //Load Phrases table

    PhrasesTable = Excel.CurrentWorkbook(){[Name="Phrases"]}[Content],

    //Add custom column to Patterns that returns the Phrases table for each row

    InsertedCustom = Table.AddColumn(PatternsTable, "AllPhrases", each PhrasesTable),

    //Expand the new column

    ExpandedTable = Table.ExpandTableColumn(InsertedCustom, "AllPhrases", {"Phrases"}, {"Phrases"}),

    //Filter the expanded table using the Like() function

    Custom1 = Table.SelectRows(ExpandedTable, each Like([Phrases],[Patterns]))

in

    Custom1

 

What this query does is the following:

  • Gets the data from both the Patterns table and the Phrases table. I’ve done this as two steps in the same query for simplicity; you might want to create two separate queries to do this.
  • On the Patterns table, add a new custom column that returns the entire Phrases table as a value. The Insert Custom Column dialog and its output will look like this:

image

  • Expands the new custom column so that every row of the Phrases table is displayed against every row of the Patterns table – in effect, it does a cross join between the two tables. Interestingly there is no option to do a cross join using the Table.Join() and Table.NestedJoin() functions.

image

  • Finally, it uses the Like() function in Table.SelectRows() as follows:

    Table.SelectRows(ExpandedTable, each Like([Phrases],[Patterns]))

    …to filter the table from the previous step so that you only get the rows back where the Like() function returns true and the pattern matches the phrase:

image

So… we have a solution to our problem (and you can download the example workbook here). However, be warned: for large tables this approach may not perform well! In part two I’ll show you another example of a complex join condition and show you how different approaches can yield very different performance.

Written by Chris Webb

June 2, 2014 at 9:17 pm

Posted in Power Query

Implementing A Basic LIKE/Wildcard Search Function In Power Query

with 3 comments

Last week someone asked me whether it was possible to do the equivalent of a SQL LIKE filter in Power Query. Unfortunately there isn’t a function to do this in the standard library but, as always, it is possible to write some M code to do this. Here’s what I came up while I was waiting around at the stables during my daughter’s horse-riding lesson. At the moment it only supports the % wildcard character; also I can’t guarantee that it’s the most efficient implementation or indeed 100% bug-free, but it seems to work fine as far as I can see…

let

    Like = (Phrase as text, Pattern as text) => 

let

    //Split pattern up into a list using % as a delimiter

    PatternList = Text.Split(Pattern, "%"),

    //if the first character in the pattern is %

    //then the first item in the list is an empty string

    StartsWithWc = (List.First(PatternList)=""),

    //if the last character in the pattern is %

    //then the last item in the list is an empty string

    EndsWithWc = (List.Last(PatternList)=""),

    //if the first character is not %

    //then we have to match the first string in the pattern

    //with the opening characters of the phrase

    StartsTest = if (StartsWithWc=false) 

       then Text.StartsWith(Phrase, List.First(PatternList)) 

       else true,

    //if the last item is not %

    //then we have to match the final string in the pattern

    //with the final characters of the phrase

    EndsText = if (EndsWithWc=false) 

       then Text.EndsWith(Phrase, List.Last(PatternList)) 

       else true,

    //now we also need to check that each string in the pattern appears 

    //in the correct order in the phrase

    //and to do this we need to declare a function PhraseFind

    PhraseFind = (Phrase as text, SearchString as list) =>

    let

     //does the first string in the pattern appear in the phrase?

     StringPos = Text.PositionOf(Phrase, SearchString{0}, Occurrence.First),

     PhraseFindOutput = 

                 if

                 //if string not find then return false 

                 (StringPos=-1) 

                 then false 

                 else if

                 //we have found the string in the pattern, and

                 //if this is the last string in the pattern, return true

                 List.Count(SearchString)=1

                 then true

                 else

                 //if it isn't the last string in the pattern

                 //test the next string in the pattern by removing

                 //the first string from the pattern list

                 //and all text up to and including the string we have found in the phrase

                 (true and

                 @PhraseFind(

                 Text.RemoveRange(Phrase, 0, StringPos + Text.Length(SearchString{0})),

                 List.RemoveRange(SearchString, 0, 1)))

     in

      PhraseFindOutput,

    //return true if we have passed all tests    

    Output = StartsTest and EndsText and PhraseFind(Phrase, PatternList) 

in

    Output

in

    Like

 

Using the following test data:

image

I can run the following query:

let

    Source = Excel.CurrentWorkbook(){[Name="Phrases"]}[Content],

    ChangedType = Table.TransformColumnTypes(Source,{{"Phrases", type text}}),

    InsertedCustom = Table.AddColumn(ChangedType, "Test", each Like([Phrases],"%cat%sat%mat%"))

in

    InsertedCustom

 

And get this output:

image

You can download the sample workbook here.

I know the Power Query team have been asked for this several times already, but it would be really useful if we could package up functions like this and make it easy to share them publicly with other Power Query users…

Written by Chris Webb

May 27, 2014 at 2:19 pm

Posted in Power BI, Power Query

Expanding All Columns In A Table In Power Query

with 4 comments

When I’m working with XML files, or web pages, or any data with columns containing nested tables in Power Query, I often end up having to expand every expandable column in the table and then expanding any new columns that are revealed after that to find the data that I’m looking for. This is a such a pain I thought I’d write a function to do it for me – which is the subject of this post.

For example, consider the following XML:

image

If you load this into Power Query you will see the following table created for the first step:

image

To get to a table where all of the data is visible requires clicking on the expand icons in the address and the employees columns (highlighted), and then three more clicks after that. Sigh.

Here’s my function, called ExpandAll, to expand all the columns in a table that can be expanded:

let

    //Define function taking two parameters - a table and an optional column number 

    Source = (TableToExpand as table, optional ColumnNumber as number) =>

    let

     //If the column number is missing, make it 0

     ActualColumnNumber = if (ColumnNumber=null) then 0 else ColumnNumber,

     //Find the column name relating to the column number

     ColumnName = Table.ColumnNames(TableToExpand){ActualColumnNumber},

     //Get a list containing all of the values in the column

     ColumnContents = Table.Column(TableToExpand, ColumnName),

     //Iterate over each value in the column and then

     //If the value is of type table get a list of all of the columns in the table

     //Then get a distinct list of all of these column names

     ColumnsToExpand = List.Distinct(List.Combine(List.Transform(ColumnContents, 

                        each if _ is table then Table.ColumnNames(_) else {}))),

     //Append the original column name to the front of each of these column names

     NewColumnNames = List.Transform(ColumnsToExpand, each ColumnName & "." & _),

     //Is there anything to expand in this column?

     CanExpandCurrentColumn = List.Count(ColumnsToExpand)>0,

     //If this column can be expanded, then expand it

     ExpandedTable = if CanExpandCurrentColumn 

                         then 

                         Table.ExpandTableColumn(TableToExpand, ColumnName, 

                                ColumnsToExpand, NewColumnNames) 

                         else 

                         TableToExpand,

     //If the column has been expanded then keep the column number the same, otherwise add one to it

     NextColumnNumber = if CanExpandCurrentColumn then ActualColumnNumber else ActualColumnNumber+1,

     //If the column number is now greater than the number of columns in the table

     //Then return the table as it is

     //Else call the ExpandAll function recursively with the expanded table

     OutputTable = if NextColumnNumber>(Table.ColumnCount(ExpandedTable)-1) 

                        then 

                        ExpandedTable 

                        else 

                        ExpandAll(ExpandedTable, NextColumnNumber)

    in

     OutputTable

in

    Source

 

You can then use this function on the XML file shown above as follows:

let

    //Load XML file

    Source = Xml.Tables(File.Contents("C:\Users\Chris\Documents\PQ XML Expand All Demo.xml")),

    ChangedType = Table.TransformColumnTypes(Source,{{"companyname", type text}}),

    //Call the ExpandAll function to expand all columns

    Output = ExpandAll(ChangedType)

in

    Output

 

And bingo, in one step, you get everything:

image

You can download the sample workbook here.

Written by Chris Webb

May 21, 2014 at 8:29 pm

Posted in Power Query

Displaying Help For A Power Query Function

with one comment

Matt Masson and Theresa Palmer gave an excellent presentation on Power Query and M for the PASS DW/BI Virtual Chapter a few days ago (hopefully it will be on their YouTube channel soon). One thing that they showed which I hadn’t seen before was that you can display help for a function in M’s built in library simply by typing its name in the formula bar.

Take, for example, the Text.Replace() function. With a new blank query, if you create a step with the following definition:

= Text.Replace

You will see help and examples as shown below:

image

By doing this you are creating a step that returns the function itself – note that this is not the same as invoking the function, although it does mean you can invoke the function in a subsequent step. If you do decide to use the function you just need to click Invoke and a dialog will appear to prompt you for the values to pass to the function:

image

Clicking OK will show the output of the function:

image

Here’s the full M code for the query for those of you who are curious:

let

    Source = Text.Replace,

    InvokedSource = Source("the cat sat on the mat", "cat", "dog")

in

    InvokedSource

A very useful tip! Unfortunately you can’t specify help text for your own functions yet, although Matt did say it was something they wanted to do.

Written by Chris Webb

May 1, 2014 at 9:30 am

Posted in Power Query

Web Services And POST Requests In Power Query

with 9 comments

In my recent post on web services in Power Query I mentioned that while the Power Query Web.Contents() function generates a GET request by default, you can make it generate a POST request by specifying the Content option. Since this is a useful thing to be able to do I thought I’d put together a detailed example of how this works.

For my example I’m going to use the Bing Maps Elevations API, which allows you to get the elevation in metres for a set of geographic locations. You can read the documentation here:

http://msdn.microsoft.com/en-us/library/jj158961.aspx

As it says at the bottom of the page, if you have a large number of locations to pass to the web service you can do so by passing them as a comma delimited list of latitudes and longitudes using a POST request. Here’s the code for a Power Query query that generates a list of latitudes and longitudes that stretches across the UK from North Wales in the west to the Wash in the east and finds the elevation for each point:

let

    //Insert your Bing Maps API key here

    BingMapsKey = "xxxx",

    //Latitude

    Latitude = 53,

    //Generate a list of longitudes

    Longitudes = List.Numbers(-4.5,66,0.1),

    //Generate a list of latitudes and longitudes

    PointList = List.Transform(Longitudes, 

        each  Number.ToText(Latitude) & "," & Number.ToText(_)),

    //Turn this list to comma delimited text

    PointListText = Text.Combine(PointList,","),

    //Add the text "points=" to the beginning

    PostContents = "points=" & PointListText,

    //Call the Elevations web service

    GetElevations = Web.Contents(

                    "http://dev.virtualearth.net/REST/v1/Elevation/List?key=" & BingMapsKey,

                    [Content=Text.ToBinary(PostContents)]),

    //Treat the result as a JSON document

    ImportedJSON = Json.Document(GetElevations),

    //Navigate to the elevations data

    resourceSets = ImportedJSON[resourceSets],

    resourceSets1 = resourceSets{0},

    resources = resourceSets1[resources],

    resources1 = resources{0},

    elevations = resources1[elevations],

    //Turn the elevations data into a table

    TableFromList = Table.FromList(elevations, Splitter.SplitByNothing(), 

                                null, null, ExtraValues.Error),

    //Rename the column containing the elevations

    RenamedColumns = Table.RenameColumns(TableFromList,{{"Column1", "Elevations"}}),

    //Add a column containing the latitude

    InsertedCustom = Table.AddColumn(RenamedColumns, "Latitude", each Latitude),

    //Add an index column

    InsertedIndex = Table.AddIndexColumn(InsertedCustom,"Index"),

    //Use the index column to find the longitude for the current row

    InsertedCustom1 = Table.AddColumn(InsertedIndex, "Longitude", each Longitudes{[Index]}),

    //Remove the index column

    RemovedColumns = Table.RemoveColumns(InsertedCustom1,{"Index"}),

    //Set all columns to data type number

    ChangedType = Table.TransformColumnTypes(RemovedColumns,

                    {{"Elevations", type number}, {"Latitude", type number}, 

                    {"Longitude", type number}})

in

    ChangedType

 

It’s quite easy to edit the code so that it generates a list of latitudes and longitudes across the country of your choice…

Two things to point out:

  • To get this to work you need to insert a Bing Maps API key in the first step where indicated. If you don’t have one, you can get your own at https://www.bingmapsportal.com
  • The important step is GetElevations. The code is:

    Web.Contents(

    "http://dev.virtualearth.net/REST/v1/Elevation/List?key=&quot; & BingMapsKey,

    [Content=Text.ToBinary(PostContents)]),

    You can see here how the list of locations is passed to the Web.Contents() function (documentation here) via the Content field; notice also that I’ve had to use Text.ToBinary() on the text that I’m passing in.

Here’s the output in Power Map:

image

You can download the sample workbook here.

Written by Chris Webb

April 19, 2014 at 9:00 am

Working with Web Services in Power Query

with 21 comments

One of many cool things about Power Query is the way that it allows you to retrieve data from web services and load it into Excel. While this is a subject that lots of people (including me) have already blogged about, the Web.Contents() function – which is the M function that you’ll need to use to call a web service in your query – has gained some new, useful functionality over the last few releases that isn’t fully documented anywhere and which is important to know about.

For this post I’ll use a real-life example of a web service that I’ve been working with recently. As you may know, I’m one of the organisers of SQLBits and a few weeks ago I was given the task of building a Power BI solution to monitor registrations. SQLBits uses RegOnline to handle registrations and they have an API that allows developers to access registration data for events. I’ve been using this API in Power Query.

To take a simple example that illustrates the new functionality in Web.Contents(), take a look at the documentation for RegOnline’s GetEvents method:
http://developer.regonline.com/getevents/

When calling this method in Power Query using Web.Contents() there are two things I need to do:

  1. Pass two query parameters to it, called filter and orderBy – although I only need to pass empty values to these parameters
  2. Handle authentication by passing an API token through the HTTP header

Here’s one way of using Web.Contents() to do this:

Web.Contents(

  "https://www.regonline.com/api/default.asmx/GetEvents", 

  [

   Query=[ #"filter"="", #"orderBy"=""], 

   Headers=[#"APIToken" = "insertAPITokenHere"]

  ])

 

In the example above, the second parameter to Web.Contents() is a record (so it has to be enclosed in square brackets) containing two options fields, Query and Headers. The Query field is itself a record, with one field for each of the two query parameters I’m passing. The Headers field is another record, with one field for each custom HTTP header I need (see this post on the Power Query forum for more details). The RegOnline API specifies that for the purposes of authentication a header is needed called APIToken which contains the API token, so the Headers option here contains one field for that header.

The problem with handling authentication like this is that I need to hard-code the RegOnline API token in the code for my Power Query query, which is not exactly secure. Therefore instead of doing this I use the ApiKeyName option to tell Power Query the name of the header that should contain the RegOnline API token:

Web.Contents(

 "https://www.regonline.com/api/default.asmx/GetEvents", 

 [

  Query=[ #"filter"="", #"orderBy"=""], 

  ApiKeyName="APIToken"

 ])

 

This way, when the Power Query query is executed for the first time, you get prompted to tell Power Query how to authenticate the GetEvents method and in the dialog you have to choose Web API and enter the API token in the Key box, as shown below:

image

After this the API token gets stored separately from the query in Power Query’s own secure credentials store and you don’t need to enter it again; this also means that the API token will not travel with the workbook if you email it to someone else, for example.

One last thing to mention (and this is something that deserves a separate blog post really) is that while Web.Contents() generates a GET request by default, you can make it generate a POST request by specifying the Content option. You can find details on how to do this on this thread started by Lee Hawthorn on the old Data Explorer forum.

Written by Chris Webb

March 26, 2014 at 2:35 pm

Posted in Power Query

Follow

Get every new post delivered to your Inbox.

Join 3,072 other followers