Chris Webb's BI Blog

Analysis Services, MDX, PowerPivot, DAX and anything BI-related

Pivoting Data In Power Query

with 20 comments

One of the features I’ve loved right from the start in Power Query is the ability to UnPivot data right from the UI (I’m not alone – Ken Puls has written a good blog post summing up this functionality and is equally impressed). However earlier this year when I was trying to implement Life in Power Query, I found that I needed to do the opposite: I needed to pivot data. I was able to do this with some pretty complex M, but since then the nice people on the Power query team have added a new Table.Pivot() function to allow this to be done easily. Unfortunately pivoting is not something that can be done using the UI alone (at least not at the time of writing), but it’s pretty straightforward to write your own M expression to use this new function.

Let’s take the following table as a starting point:

image

You can see, in the Measure column, we have values Sales Units and Sales Value that tell us what the numbers in each row in the Value column represent. For most purposes, it’s better to pivot this data so that we have two columns, one for Sales Units and one for Sales Value. You can do this pivot operation in a single step using Table.Pivot(). Here’s an example script that loads the data from the table above and pivots it:

let

    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],

    Pivot = Table.Pivot(Source, {"Sales Units", "Sales Value"}, "Measure", "Value")

in

    Pivot

Here’s the output:

image

It’s worth pointing out that if you don’t include each distinct value from your chosen column in the second parameter, those rows will be lost after the pivot takes place. So

= Table.Pivot(Source, {"Sales Units"}, "Measure", "Value")

Returns just:

 
image

 

Listing out all the column names in that second parameter is a bit of a pain, so we can get a list with all the distinct values in that column and improve our expression as follows:

= Table.Pivot(Source, List.Distinct(Table.Column(Source, "Measure")), "Measure", "Value")

This returns the same result as our first query – it uses Table.Column() to get a list of all the values in the Measure column, then List.Distinct() to return only the distinct values from that list.

Finally, there is an optional fifth parameter that can be used to aggregate data. The following source table has two rows for Sales Value for March:

image

If I use my original expression the pivot will work but I’ll get an error value in the cell for Sales Value for March:

image

Specifying a function to aggregate the data in these scenarios, where two cells need to be pivoted to one, solves the problem:

= Table.Pivot(Source, {"Sales Units", "Sales Value"}, "Measure", "Value", List.Sum)

image

In this case I’m passing the function List.Sum() to get the value 10+15=25 in the cell for Sales Value for March.

Written by Chris Webb

November 25, 2013 at 9:30 am

Posted in Power Query

20 Responses

Subscribe to comments with RSS.

  1. Andrea Uggetti

    November 26, 2013 at 11:03 am

  2. Chris, this is fantastic. Just had to do this myself, and this was a HUGE help!

    Ken Puls

    January 17, 2014 at 8:10 pm

  3. Great blog! Do you know if it is possible to pivot on more than one column?

    Justin

    June 12, 2014 at 5:14 pm

    • No, it isn’t possible using just Table.Pivot() – it can only pivot on column at a time. However it might be possible with a more complex expression, depending on what you want to do exactly.

      Chris Webb

      June 13, 2014 at 1:34 pm

  4. The fact that you can choose what function to use for aggregation is super powerful. Is there a specific list somewhere of functions that can be used?

    zacksg1

    September 30, 2014 at 10:03 pm

  5. Thank you – you just saved my day

    • BTW in the latest builds of Power Query there’s now a pivot button in the Query Editor.

      Chris Webb

      October 9, 2014 at 12:42 pm

  6. I large data file of raw samsung data. There are many columns. The columns names are in a seperate file in a list (so the column names load as rows in a separate query)

    I want to data drive what goes in the {} is that possible? i.e. my column names for a raw dataset in a separate datafile. So the column names come in as rows, I want to pivot (transpose in old money) it round and union it with another data set so it forms the column headers. Maybe pivot isn’t the best way.

    Am experimenting by comparing power query with an R script I have to see how doing the same task compares. In R I assign the column names to a vector and pass it to the column names property of the dataset.

    shaun

    February 24, 2015 at 9:56 pm

    • Hi Shaun, yes this sounds very doable. Here’s what I think you should do:
      1) Create a query that loads the data from your name file
      2) Use the Transpose button in Power Query (it’s easier than Pivot, you’re right) so you get a table with one column for each column name
      3) Create a second query that loads your raw data
      4) Use the Append button to append the contents of the second query onto the end of the first
      5) Click the Use First Rows As Headers button on the new query created by step (4) and you’ll have your raw data with the right column headers.

      Chris Webb

      February 24, 2015 at 10:15 pm

      • Yeah I found it. My version was out of date. It was recognising the pivot function but didn’t give what I expected. I updated the version to the latest preview and it works fine. Cheers

        shaun

        February 24, 2015 at 10:24 pm

  7. Hi Chris, I’m stuck on a pivot problem and can’t find solutions anywhere, but you seem to be the closest to the issue. The groundwork is that I have an online form that collects dates and hours (just your basic timesheet) plus other details…when the data connection pulls the form results into PQ, I end up with columns like this: NAME, DATE, HOURS, DATE2, HOURS2, DATE3, HOURS3, etc where date corresponds with the hours that have the same number to them.

    Ideally, I’d like to have a consolidated list of NAME/DATE/HOURS, but that’s where I’m struggling. I’ve tried to unpivot all of the date/hours columns, then remove all the “numbers” from the resulting Attributes column, then re-pivot the Attribute/Value column, but I keep getting errors (enumeration errors, primarily) when I do that.

    Any ideas if I should use a transpose or some intermediary step in order to get this to flow smoother?

    Jimmy

    March 18, 2015 at 2:28 pm

    • Hi Jimmy,

      I think you are getting errors since once you unpivot and clean up all of the entries in the attributes column to remove the numbers, then there are potentially lots of actual values associated with each (cleaned) attribute. You’ll either end up aggregating these values (which I don’t think you want to do) or getting an error due to multiple values essentially being placed in a single cell after the pivot.

      One option might be to grab each set of same-numbered fields, change the names to eliminate the numbers, and append each set of these fields to the original Name/Date/Hours columns.

      Zack Goldman

      March 18, 2015 at 4:13 pm

      • Thanks, Zack. I’ve thought about that as an option (and it appears that may be my only solution), but my concern is the amount of “additional” data columns that i will have to append and what that will do to the speed of the query itself. If I have columns labeled through 10, that means I’ll have to download (via the data connection) 10 different queries before appending – just seems it might get a little slow as data continues to come in.

        Jimmy

        March 18, 2015 at 5:31 pm

      • Hi Jimmy,

        You can actually do that all within PowerQuery, with only a single query to your source. Pull in all of the data from your source as you normally would. Then, delete all but the first set of columns. You can then open the advanced editor and see the syntax it uses for this step. It will essentially create a variable to represent the table you have just created. You can then copy and paste this line and edit it to delete all but the next set of columns you want to grab and assign this table to a new variable. You can then change the column names to eliminate the numbers. You can keep repeating this for all of the sets of columns you want to grab. Once each of these is stored in its own table variable, you can append all of those tables together.

        Zack Goldman

        March 18, 2015 at 5:57 pm

      • Zack, I understand the theory behind what you’re saying, but I’m having issues with the nomenclature in M for this…Is it possible you could show me an example? I’m just not getting the translation from the Table.RemoveColumns line to storing the table as a variable and then referencing it in the append…any recommendations?

        Jimmy

        March 19, 2015 at 2:56 am

  8. Hi Jimmy,

    Here’s an example. Let me know if you have questions!

    let

    //Import the full table from wherever it is being imported from (I just pulled in a table from excel as an example)
    Source = Excel.CurrentWorkbook(){[Name=”InputTable”]}[Content],

    //Grab the first set of columns
    ThingToAppend1 = Table.SelectColumns(Source,{“Name”, “Date”, “Hours”}),

    //Grab the second set of columns (and rename the columns so they match the first set)
    ColumnsGroup2 = Table.SelectColumns(Source,{“Name2″, “Date2″, “Hours2″}),
    ThingToAppend2 = Table.RenameColumns(ColumnsGroup2,{{“Name2″, “Name”}, {“Date2″, “Date”}, {“Hours2″, “Hours”}}),

    //Grab the third set of columns (and rename the columns so they match the first set)
    ColumnsGroup3 = Table.SelectColumns(Source,{“Name3″, “Date3″, “Hours3″}),
    ThingToAppend3 = Table.RenameColumns(ColumnsGroup3,{{“Name3″, “Name”}, {“Date3″, “Date”}, {“Hours3″, “Hours”}}),

    //Append each set of columns to a single table
    FinalResult = Table.Combine({ThingToAppend1,ThingToAppend2, ThingToAppend3})

    in
    FinalResult

    Zack Goldman

    March 19, 2015 at 2:14 pm

    • Thanks, Zack! Worked like a charm and the data connection isn’t dealing with any performance issues from the original multiple query idea I had.

      Jimmy

      March 19, 2015 at 3:15 pm


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 3,865 other followers

%d bloggers like this: