Chris Webb's BI Blog

Analysis Services, MDX, PowerPivot, DAX and anything BI-related

Expanding All Columns In A Table In Power Query

with 5 comments

When I’m working with XML files, or web pages, or any data with columns containing nested tables in Power Query, I often end up having to expand every expandable column in the table and then expanding any new columns that are revealed after that to find the data that I’m looking for. This is a such a pain I thought I’d write a function to do it for me – which is the subject of this post.

For example, consider the following XML:

image

If you load this into Power Query you will see the following table created for the first step:

image

To get to a table where all of the data is visible requires clicking on the expand icons in the address and the employees columns (highlighted), and then three more clicks after that. Sigh.

Here’s my function, called ExpandAll, to expand all the columns in a table that can be expanded:

let

    //Define function taking two parameters - a table and an optional column number 

    Source = (TableToExpand as table, optional ColumnNumber as number) =>

    let

     //If the column number is missing, make it 0

     ActualColumnNumber = if (ColumnNumber=null) then 0 else ColumnNumber,

     //Find the column name relating to the column number

     ColumnName = Table.ColumnNames(TableToExpand){ActualColumnNumber},

     //Get a list containing all of the values in the column

     ColumnContents = Table.Column(TableToExpand, ColumnName),

     //Iterate over each value in the column and then

     //If the value is of type table get a list of all of the columns in the table

     //Then get a distinct list of all of these column names

     ColumnsToExpand = List.Distinct(List.Combine(List.Transform(ColumnContents, 

                        each if _ is table then Table.ColumnNames(_) else {}))),

     //Append the original column name to the front of each of these column names

     NewColumnNames = List.Transform(ColumnsToExpand, each ColumnName & "." & _),

     //Is there anything to expand in this column?

     CanExpandCurrentColumn = List.Count(ColumnsToExpand)>0,

     //If this column can be expanded, then expand it

     ExpandedTable = if CanExpandCurrentColumn 

                         then 

                         Table.ExpandTableColumn(TableToExpand, ColumnName, 

                                ColumnsToExpand, NewColumnNames) 

                         else 

                         TableToExpand,

     //If the column has been expanded then keep the column number the same, otherwise add one to it

     NextColumnNumber = if CanExpandCurrentColumn then ActualColumnNumber else ActualColumnNumber+1,

     //If the column number is now greater than the number of columns in the table

     //Then return the table as it is

     //Else call the ExpandAll function recursively with the expanded table

     OutputTable = if NextColumnNumber>(Table.ColumnCount(ExpandedTable)-1) 

                        then 

                        ExpandedTable 

                        else 

                        ExpandAll(ExpandedTable, NextColumnNumber)

    in

     OutputTable

in

    Source

 

You can then use this function on the XML file shown above as follows:

let

    //Load XML file

    Source = Xml.Tables(File.Contents("C:\Users\Chris\Documents\PQ XML Expand All Demo.xml")),

    ChangedType = Table.TransformColumnTypes(Source,{{"companyname", type text}}),

    //Call the ExpandAll function to expand all columns

    Output = ExpandAll(ChangedType)

in

    Output

 

And bingo, in one step, you get everything:

image

You can download the sample workbook here.

Written by Chris Webb

May 21, 2014 at 8:29 pm

Posted in Power Query

5 Responses

Subscribe to comments with RSS.

  1. Very usefull script!! thnx for sharing! saves a lot of clicks

    Niels

    May 22, 2014 at 6:43 am

  2. […] Expanding All Columns In A Table In Power Query (Chris Webb) […]

  3. Having read Curt’s reply to your question regarding recursion vs. List.Generate() loops — even more relevant for my dynamically loaded functions due to loading overhead within every iteration — I decided to try and see if I could rewrite all functions I had in order to avoid recursion.
    That turned out more of a challenge for this function. I plugged most of the original code into List.Generate()’s ‘next’ block, at which point Power Query started complaining each of the var names used in the block would now need to be initialized in its ‘start’ block as well (not to mention adding square brackets everywhere).
    I eventually figured out a workaround by embedding a let-expression into the next block, allowing the flexibility of declaring regular variables without requiring the initialization and square brackets of the iteration Record.
    Resulting code here. I hadn’t left the original comments intact, nor tested for performance gains, but it feels like I learned a lot. I also added a parameter to optionally leave out the appended parent column names whenever possible.

    tycho01

    June 26, 2014 at 10:42 am

  4. […] 1000 data points, I ended up having a total of 1000 X 1000 = 1 million rows. I referred to this blog post to try to expand the tables within the columns, but to no avail in my […]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 3,202 other followers

%d bloggers like this: