## Implementing Common Calculations In Power Query

When I first saw Data Explorer (as Power Query used to be called), I immediately started thinking of it as a self-service ETL tool; indeed, it’s how Microsoft itself is positioning it. However I suspect that a lot of Power Query users might also want to use it for building reports too: it could be that they find DAX too difficult, and if all they need is a simple table-based report they might not want to bother with using Power Pivot at all.

If that happens then it’s going to be important to be able implement common business calculations in Power Query… hence this blog post. I’ll also admit that I wanted to know how easy it is to write these calculations – because if it isn’t easy, or if it isn’t easier than DAX or the equivalent Excel formulas, then no-one will bother doing it. So let’s look at some examples.

**Simple Percentage Share**

Take the following table in Excel as a starting point:

Now let’s imagine that we want to find the percentage that each fruit makes up of the grand total (ie 10+15+24=49). The way to calculate the percentage is quite easy: you need to find the grand total, then divide the sales for each fruit by the grand total. Finding the grand total requires some code though – you might think you could do it easily in the UI using the Group By functionality, but that does not allow you to sum up all the values in a column and not group by anything! (I hope this gets fixed soon). Luckily the code is quite easy: you use Table.Column() to convert the Sales column into a List, then use List.Sum() to sum up all the values in that List. Finally, you insert a new custom column to the original table and calculate the percentage. Here’s the code:

let

Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],

Sales = Table.Column(Source, "Sales"),

` GrandTotal = List.Sum(Sales),`

TableWithShare = Table.AddColumn(Source, "Share", each [Sales]/GrandTotal)

`in`

TableWithShare

It would also be nice if I could get the table that Power Query outputs to remember the number formatting I apply to it after it refreshes… again, I hope this gets fixed.

**Nested Share**

A more complex example of a Share calculation might be where we need a share of a subtotal. Consider the following table:

In this case, let’s say we need to show each Fruit’s sales as a percentage of the total sales of its Category. This time we can use the Group By functionality to get a table containing the subtotals very easily. Next, we can join our subtotal table with the original table using Table.Join() so we have the subtotals on each row. Finally, we can do the share calculation and remove the subtotal column again. Here’s the code:

let

Source = Excel.CurrentWorkbook(){[Name="Table4"]}[Content],

Subtotals = Table.Group(Source, {"Category"}, {{"Category Sales",

each List.Sum([Sales]), type number}}),

JoinTables = Table.Join(Source, "Category", Subtotals, "Category"),

InsertedCustom = Table.AddColumn(JoinTables, "Share", each [Sales]/[Category Sales]),

` RemovedColumns = Table.RemoveColumns(InsertedCustom,{"Category Sales"})`

`in`

RemovedColumns

**Previous Period Growth**

Now consider a table with a date column (Americans: the dates are in dd/mm/yyyy format) and a sales column:

Let’s say we want to find the absolute change in Sales from the previous date to the current date on each row. Here we need to copy the table, shift the dates forward one day (by adding a value of type Duration onto the date) and then join the original table onto this new table using Table.Join() so we can see the current day sales and the previous day sales in two columns next to each other; this will then mean we can subtract one from the other to find the growth. Here’s the code:

let

Source = Excel.CurrentWorkbook(){[Name="Table7"]}[Content],

ChangedType = Table.TransformColumnTypes(Source,{{"Date", type date}}),

InsertedCustom = Table.AddColumn(ChangedType, "Next Date", each [Date]+#duration(1,0,0,0)),

RemovedColumns = Table.RemoveColumns(InsertedCustom,{"Date"}),

` RenamedColumns = Table.RenameColumns(RemovedColumns,{{"Sales", "Previous Sales"}}),`

JoinTables = Table.Join(ChangedType, "Date", RenamedColumns, "Next Date",JoinKind.LeftOuter),

RemovedColumns1 = Table.RemoveColumns(JoinTables,{"Next Date"}),

SortedRows = Table.Sort(RemovedColumns1,{{"Date", Order.Ascending}}),

Growth = Table.AddColumn(SortedRows, "Growth", each [Sales]-[Previous Sales])

`in`

Growth

**Year-to-Date Sales**

For the final example, take the same starting table and before and now let’s calculate a year-to-date running sum. In this case, I decided to use a function (see here for some background on this, and here for a more complex example of a function) to do the calculation. Here’s the code of the function, called YTD(), that I created:

let

YTD = (SalesTable, EndDate) =>

let

` FilteredTable=Table.SelectRows(SalesTable, `

each [Date]<=EndDate and Date.Year([Date])=Date.Year(EndDate) ),

SalesColumn = Table.Column(FilteredTable, "Sales"),

` YTDSales = List.Sum(SalesColumn)`

` in`

YTDSales

`in`

YTD

It takes a table structured in the same way as the source table (ie it has to have a [Date] column and a [Sales] column), filters the data using Table.SelectRows() so only rows with a date less than or equal to the EndDate parameter, but in the same year as it, are returned, and then sums up the values in the [Sales] column of the filtered table.

It’s then very easy to use this function in a new calculated column in a query:

let

Source = Excel.CurrentWorkbook(){[Name="Table7"]}[Content],

ChangedType = Table.TransformColumnTypes(Source,{{"Date", type date}}),

InsertedCustom = Table.AddColumn(ChangedType, "YTD Sales", each YTD(ChangedType, [Date]))

`in`

InsertedCustom

And voila, we have a column containing the year-to-date sum:

**Conclusion**

All of these examples here required me to write M code, and to be honest this is not something an ordinary user will ever be able to do. Furthermore, I suspect that end users will always prefer to write their calculations as Excel formulas, although more complex calculations such as nested shares might prove beyond them. However, I believe that a good data steward will certainly be able to write M code, and the last example above makes me think that if a data steward can write a function that performs a calculation and shares it with the end users, the end users should be able to use it in their own queries quite easily. I don’t think the code I’ve written here is any worse or better than the DAX equivalent but the ability to share functions could tip the balance in favour of Power Query for some simple reporting scenarios.

You can download a workbook with all the examples above here.

Thanks for sharing, Chris. Your latest example made me start to wonder where the Query is stored, and if an Excel file can be identified programmatically as having a Query.

David Hager (@dhExcel)October 18, 2013 at 5:56 pm

The query is stored inside the Excel workbook; I’m not sure how exactly though.

Chris WebbOctober 18, 2013 at 6:05 pm

I see how now. The provider for PQ queries is:

Microsoft.Mashup.OleDb.1

So, I can open files programmatically and loop through the connections to see if that provider exists.

David Hager (@dhExcel)October 18, 2013 at 6:12 pm

Reblogged this on waltika.

waltikaOctober 20, 2013 at 11:17 am

Chris

I too have been comparing powerquery (M) to powerpivot (DAX) as a tool for analysis. There is one area already that I believe powerquery has the advantage and that is when you need to generate a list of members based on the results of an initial query – obvious example is the common task of creating a histogram. Dynamically converting measures into members via dax is not easy. I have also been comparing sql with m as a language for analysts – definitely some plus points for powerquery here.

Steven PetersOctober 24, 2013 at 12:50 pm

Thanks Chris – some great stuff there, so thanks for sharing. I’m enjoying Power Query also – I find the coding/debugging experience is already much better than Excel Formulas or DAX. It’s certainly 100 times easier than SSIS or similar ETL tools.

I got the idea from your Year-to-Date Sales function to write a function that returns a Running Count / Index within a group – I just pass an extra function parameter for the current group, then add that test to the Table.SelectRows criteria. Final change was to change List.Sum to List.Count.

Mike HoneyAugust 19, 2014 at 2:00 am

@Chris

Here is one more way of doing Running Totals.

Assuming you have a 2 Column Table of Month (Jan,Feb…) and Sales (10,30…) and the Table is called S

The below M Code generates a Cumulative Sales Column called cSales

let

Source = Excel.CurrentWorkbook(){[Name=”S”]}[Content],

mTbl= Table.AddIndexColumn(Source,”Index”),

mTotal = Table.AddColumn(mTbl, “Total”, each Table.Range(mTbl,0,[Index]+1)),

cTotal = Table.AggregateTableColumn(mTotal, “Total”, {{“Sales”, List.Sum, “cSales”}}),

rColumn = Table.RemoveColumns(cTotal,{“Index”})

in

rColumn

I have no Idea how the performance compares with your method…but I was able to build this via the UI – so for someone not very proficient with M yet this looks easier.

samOctober 3, 2014 at 6:39 am

Thanks!

Chris WebbOctober 3, 2014 at 10:39 am

[…] 157 6 157 77 157 45 157 29 95 35 95 40 95 2 95 18 95 35 104 34 I believe the answer is here but I just can't make it work. I'm not an IT guy and I'm new to M language. Thank […]

Sum in Power QueryOctober 28, 2014 at 7:22 pm

Hi Chris, I need to do a nested share (just like the one in your example above) but I must end up with the cumulative % for each category. How would you do that?

moncherwatsonMarch 18, 2015 at 11:51 am

Hi Chris and team,

I’ve got an fairly complex variation of your example above, but i’m running into some roadblocks and hoping you could provide some guidance.

Basically, I’m generating a dynamic list of employee hours – these are categorized by hours per day for multiple employees. My goal is to automatically “code” the hours as overtime or regular time, based on the US regulations for those hours (for instance, 40 hours per week would be considered regular time, anything over that are considered overtime).

The difficulty I’m running into is creating a “running” tally of the hours by employee by the Date.WeekOfYear.

It’s laid out sort of like these three columns:

John Doe – 3/13/15 – 9 hours

Bob Employee – 3/14/15 – 12 hours

John Doe – 3/14/15 – 12 hours

Bob Employee – 3/16/15 – 12 hours

I’m trying to dynamically calculate a fourth column which would show the running total for the week:

John Doe – 3/13/15 – 9 hours – 9 hours

Bob Employee – 3/14/15 – 12 hours – 12 hours

John Doe – 3/14/15 – 12 hours – 21 hours

Bob Employee – 3/16/15 – 12 hours – 24 hours

Then reset each week and start the calculation again. I’ve used a variation of your function above, and it outputs a running tally, but it’s not a) separated by week and b) separated by individual employee.

Any recommendations?

JDH

JimmyMarch 27, 2015 at 5:25 am

I can think of a way to do this, but as always the real challenge is to come up with an *easy* way to do this. Maybe you could try something like this:

1) Create a Power Query function that adds a running total column to a table that only contains data for a single employee for a single week

2) Use Table.Partition() to split your current table up into a list of tables that each contain data for just one employee and one week

3) Apply the function created in (1) to every table in the list in step (2)

4) Combine the list of tables back into a single table

Chris WebbMarch 30, 2015 at 5:01 pm