Chris Webb's BI Blog

Analysis Services, MDX, PowerPivot, DAX and anything BI-related

Implementing Common Calculations In Power Query

with 6 comments

When I first saw Data Explorer (as Power Query used to be called), I immediately started thinking of it as a self-service ETL tool; indeed, it’s how Microsoft itself is positioning it. However I suspect that a lot of Power Query users might also want to use it for building reports too: it could be that they find DAX too difficult, and if all they need is a simple table-based report they might not want to bother with using Power Pivot at all.

If that happens then it’s going to be important to be able implement common business calculations in Power Query… hence this blog post. I’ll also admit that I wanted to know how easy it is to write these calculations – because if it isn’t easy, or if it isn’t easier than DAX or the equivalent Excel formulas, then no-one will bother doing it. So let’s look at some examples.

Simple Percentage Share

Take the following table in Excel as a starting point:

image

Now let’s imagine that we want to find the percentage that each fruit makes up of the grand total (ie 10+15+24=49). The way to calculate the percentage is quite easy: you need to find the grand total, then divide the sales for each fruit by the grand total. Finding the grand total requires some code though – you might think you could do it easily in the UI using the Group By functionality, but that does not allow you to sum up all the values in a column and not group by anything! (I hope this gets fixed soon). Luckily the code is quite easy: you use Table.Column() to convert the Sales column into a List, then use List.Sum() to sum up all the values in that List. Finally, you insert a new custom column to the original table and calculate the percentage. Here’s the code:

let

    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],

    Sales = Table.Column(Source, "Sales"),

    GrandTotal = List.Sum(Sales),

    TableWithShare = Table.AddColumn(Source, "Share", each [Sales]/GrandTotal)

in

    TableWithShare

image

It would also be nice if I could get the table that Power Query outputs to remember the number formatting I apply to it after it refreshes… again, I hope this gets fixed.

Nested Share

A more complex example of a Share calculation might be where we need a share of a subtotal. Consider the following table:

image

In this case, let’s say we need to show each Fruit’s sales as a percentage of the total sales of its Category. This time we can use the Group By functionality to get a table containing the subtotals very easily. Next, we can join our subtotal table with the original table using Table.Join() so we have the subtotals on each row. Finally, we can do the share calculation and remove the subtotal column again. Here’s the code:

let

    Source = Excel.CurrentWorkbook(){[Name="Table4"]}[Content],

    Subtotals = Table.Group(Source, {"Category"}, {{"Category Sales", 

        each List.Sum([Sales]), type number}}),

    JoinTables = Table.Join(Source, "Category", Subtotals, "Category"),

    InsertedCustom = Table.AddColumn(JoinTables, "Share", each [Sales]/[Category Sales]),

    RemovedColumns = Table.RemoveColumns(InsertedCustom,{"Category Sales"})

in

    RemovedColumns

image

Previous Period Growth

Now consider a table with a date column (Americans: the dates are in dd/mm/yyyy format) and a sales column:

image

Let’s say we want to find the absolute change in Sales from the previous date to the current date on each row. Here we need to copy the table, shift the dates forward one day (by adding a value of type Duration onto the date) and then join the original table onto this new table using Table.Join() so we can see the current day sales and the previous day sales in two columns next to each other; this will then mean we can subtract one from the other to find the growth. Here’s the code:

let

    Source = Excel.CurrentWorkbook(){[Name="Table7"]}[Content],

    ChangedType = Table.TransformColumnTypes(Source,{{"Date", type date}}),

    InsertedCustom = Table.AddColumn(ChangedType, "Next Date", each [Date]+#duration(1,0,0,0)),

    RemovedColumns = Table.RemoveColumns(InsertedCustom,{"Date"}),

    RenamedColumns = Table.RenameColumns(RemovedColumns,{{"Sales", "Previous Sales"}}),

    JoinTables = Table.Join(ChangedType, "Date", RenamedColumns, "Next Date",JoinKind.LeftOuter),

    RemovedColumns1 = Table.RemoveColumns(JoinTables,{"Next Date"}),

    SortedRows = Table.Sort(RemovedColumns1,{{"Date", Order.Ascending}}),

    Growth = Table.AddColumn(SortedRows, "Growth", each [Sales]-[Previous Sales])

in

    Growth

image

Year-to-Date Sales

For the final example, take the same starting table and before and now let’s calculate a year-to-date running sum. In this case, I decided to use a function (see here for some background on this, and here for a more complex example of a function) to do the calculation. Here’s the code of the function, called YTD(), that I created:

let

    YTD = (SalesTable, EndDate) =>

    let

        FilteredTable=Table.SelectRows(SalesTable, 

            each [Date]<=EndDate and Date.Year([Date])=Date.Year(EndDate) ),

        SalesColumn = Table.Column(FilteredTable, "Sales"),

        YTDSales = List.Sum(SalesColumn)

    in

        YTDSales

in

    YTD

It takes a table structured in the same way as the source table (ie it has to have a [Date] column and a [Sales] column), filters the data using Table.SelectRows() so only rows with a date less than or equal to the EndDate parameter, but in the same year as it, are returned, and then sums up the values in the [Sales] column of the filtered table.

It’s then very easy to use this function in a new calculated column in a query:

let

    Source = Excel.CurrentWorkbook(){[Name="Table7"]}[Content],

    ChangedType = Table.TransformColumnTypes(Source,{{"Date", type date}}),

    InsertedCustom = Table.AddColumn(ChangedType, "YTD Sales", each YTD(ChangedType, [Date]))

in

    InsertedCustom

And voila, we have a column containing the year-to-date sum:

image

Conclusion

All of these examples here required me to write M code, and to be honest this is not something an ordinary user will ever be able to do. Furthermore, I suspect that end users will always prefer to write their calculations as Excel formulas, although more complex calculations such as nested shares might prove beyond them. However, I believe that a good data steward will certainly be able to write M code, and the last example above makes me think that if a data steward can write a function that performs a calculation and shares it with the end users, the end users should be able to use it in their own queries quite easily. I don’t think the code I’ve written here is any worse or better than the DAX equivalent but the ability to share functions could tip the balance in favour of Power Query for some simple reporting scenarios.

You can download a workbook with all the examples above here.

Written by Chris Webb

October 18, 2013 at 5:46 pm

Posted in Power Query

6 Responses

Subscribe to comments with RSS.

  1. Thanks for sharing, Chris. Your latest example made me start to wonder where the Query is stored, and if an Excel file can be identified programmatically as having a Query.

    David Hager (@dhExcel)

    October 18, 2013 at 5:56 pm

    • The query is stored inside the Excel workbook; I’m not sure how exactly though.

      Chris Webb

      October 18, 2013 at 6:05 pm

      • I see how now. The provider for PQ queries is:

        Microsoft.Mashup.OleDb.1

        So, I can open files programmatically and loop through the connections to see if that provider exists.

        David Hager (@dhExcel)

        October 18, 2013 at 6:12 pm

  2. Reblogged this on waltika.

    waltika

    October 20, 2013 at 11:17 am

  3. Chris

    I too have been comparing powerquery (M) to powerpivot (DAX) as a tool for analysis. There is one area already that I believe powerquery has the advantage and that is when you need to generate a list of members based on the results of an initial query – obvious example is the common task of creating a histogram. Dynamically converting measures into members via dax is not easy. I have also been comparing sql with m as a language for analysts – definitely some plus points for powerquery here.

    Steven Peters

    October 24, 2013 at 12:50 pm

  4. Thanks Chris – some great stuff there, so thanks for sharing. I’m enjoying Power Query also – I find the coding/debugging experience is already much better than Excel Formulas or DAX. It’s certainly 100 times easier than SSIS or similar ETL tools.

    I got the idea from your Year-to-Date Sales function to write a function that returns a Running Count / Index within a group – I just pass an extra function parameter for the current group, then add that test to the Table.SelectRows criteria. Final change was to change List.Sum to List.Count.

    Mike Honey

    August 19, 2014 at 2:00 am


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 3,146 other followers

%d bloggers like this: