Chris Webb's BI Blog

Analysis Services, MDX, PowerPivot, DAX and anything BI-related

Some Power Query/M Examples

with 23 comments

The more I use Power Query and the M language that underpins it, the more I see how useful it is – quite apart from its abilities to import data from other data sources – as a third option (after regular Excel formulas and Power Pivot/DAX) to solve problems in Excel itself. For example, last week I read this blog post by David Hager about finding the number of unique values in a delimited string:

http://dailydoseofexcel.com/archives/2013/08/07/calculating-the-number-of-unique-items-in-a-delimited-string/

It’s an interesting question and the Excel formulas in this post are way beyond what I’m capable of writing. The point is, though, that Power Query can do this very easily indeed just through the UI. Starting with the following table in a worksheet:

image

You just need to import it into a Power Query query:

image

Use the Split Column/By Delimiter/By Comma option on the Input column:

image

This creates as many columns as you’ve got values in the delimited list with the largest number of values:

image

You can then use the Unpivot option on these new columns:

image

Then remove the Attribute column:

image

Next, select Remove Duplicates on the entire table:

image

Finally, do a Group By on the RowID column and Count the number of rows:

image

And bingo, you have the number of distinct values in each delimited list:

image

Here’s the complete code:

let

    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],

    SplitColumnDelimiter = Table.SplitColumn(Source,"Input",Splitter.SplitTextByDelimiter(","),13),

    Unpivot = Table.Unpivot(SplitColumnDelimiter,{"Input.1", "Input.2", "Input.3", "Input.4",

    "Input.5", "Input.6",    "Input.7", "Input.8", "Input.9", "Input.10", "Input.11", "Input.12"

    ,  "Input.13"},"Attribute","Value"),

    RemovedColumns = Table.RemoveColumns(Unpivot,{"Attribute"}),

    DuplicatesRemoved = Table.Distinct(RemovedColumns),

    GroupedRows = Table.Group(DuplicatesRemoved, {"RowID"}, {{"Count of Distinct Values"

    , each Table.RowCount(_), type number}})

in

    GroupedRows

 

Emboldened by this, I turned to another Excel challenge – this time from Chandoo’s blog:

http://chandoo.org/wp/2013/07/16/formula-challenge-001-1/

This time the objective is to split a string containing blocks of text and numbers so that you get everything after the first block of numbers. The whole point of the challenge that Chandoo lays down is to do this in a single Excel formula with no VBA; I was just curious to see how easy it would be to solve the problem in Power Query however many steps it took. Unfortunately this is not a something that you can do just in the UI and you need to write some M, but the code isn’t too bad:

Starting with the following table:

image

You then need to create a new column containing a list of the positions of every occurrence of a numeric character in each string:

image

Then create another new column containing a list of continuous numbers starting from the first number in the previous list and incrementing by 1:

image

Then create another new column containing the minimum value from a list all of the values that are in the previous list and not in the first list you created :

image

This gives the position of the first character in the second block of text, which can then be used to find a substring of the original text:

image

Here’s the complete code:

let

    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],

    InsertedCustom = Table.AddColumn(Source, "NumericPositions", 

    each Text.PositionOfAny([Input], {"0", "1", "2", "3", "4", "5", "6", "7", "8", "9"}, Occurrence.All)),

    InsertedCustom1 = Table.AddColumn(InsertedCustom, "NumberOfPositions", 

    each List.Numbers(List.Min([NumericPositions]), List.Count([NumericPositions]))),

    InsertedCustom2 = Table.AddColumn(InsertedCustom1, "StartChar", 

    each List.Min(List.Difference([NumberOfPositions], [NumericPositions]))),

    InsertedCustom3 = Table.AddColumn(InsertedCustom2, "Output", each Text.Range([Input], [StartChar]))

in

    InsertedCustom3

 

My first reaction when I saw M was that I liked it, but I didn’t think Excel users were prepared to learn yet another language. I still think this (it would have been much better if Power Query, like Power Pivot/DAX, used a language that was consistent with Excel formula language) but as you can see from these examples, sometimes having a choice of tools means it’s more likely that you can find an easy solution to a problem.

Written by Chris Webb

August 16, 2013 at 11:31 am

Posted in Power BI, Power Query

23 Responses

Subscribe to comments with RSS.

  1. It’s really difficult to understand, the reason why any company choose power query to try doing some ETL stufff inside an EXCEL File.
    Applying lines of code for ETL, DAX formulas, VBA…. are not SELF SERVICE BI! SELF SERVICE is having everything in place to consume! When I go to McDonalds, I’ll not go to prepare my hamburger… I want it prepared to just eat it! I just need to choose what I want. This is Self Service for me.

    I understand that microsoft want to improve Office sales and customer fidelity, but all decision makers should carefully think on the potential of business data and rules anarchy.

    I love Excel just to explore data. Data should be good, trusted and centralized for everyone see the same version of the truth.

    Just my opinion! ;-)
    Thank you Chris for all you support and blogging tips!

    PedroCGD

    August 16, 2013 at 11:50 am

    • I agree with you. Why not a simpler SSIS instead of Power Query? Why just Excel?

      analyzerhakan

      August 18, 2013 at 10:26 am

    • Building mini ETL pipelines in excel is something some power data analysts already do. This just provides a tool that makes it easier.

      For the more typical excel user, you’re right, they’ll just consume queries _someone else_ has built, certified, and published to a Power BI site where they can govern access and monitor usage.

      Kris Horrocks

      January 16, 2014 at 5:51 pm

  2. We should able to wrap M with VBA.

    David Hager (@dhExcel)

    August 16, 2013 at 12:25 pm

  3. The history of M is interesting. My reaction is I wished it was Clojure, my current functional language of choice! I love having a ‘let’ statement for variables. It would be nice if Excel formula language or DAX had a ‘let’ to allow local variables.

    Donald Parish

    August 16, 2013 at 2:21 pm

    • I fully agree Donald Qliview scripting has this let ability and it works perfect for simple budgeting and what-if scenarios. Well why? Then they would not be able to sell Excel probably!

      analyzerhakan

      August 18, 2013 at 10:29 am

  4. Reblogged this on analyzerhakan and commented:
    yet another language (“M” for Power Query)! I wonder where this is going and why they not putting these capabilities on the server side (SSIS)?

    analyzerhakan

    August 18, 2013 at 10:23 am

  5. Chris, this was a good example you used to understand the power of Power Query. I just recently implemented many to many dimension ETL loading using SSIS. Your query to unpivot a comma separated list of distinct values is similar to what needs to be done when implementing many to many dimensions. This was a pain in SSIS and I also attempted a simpler solution using SQL and XML which was simple but suffered from bad performance. Power Query seems like a simple and perhaps performing way to do it. I do agree with everyone that if Power Query was well integrated into SSIS I would use it over data flow tasks in a heartbeat. Being a hopeless optimist, I’m hoping that will eventually happen and we should take this time to learn the language now. What do you think Chris?

    suhail ali

    August 19, 2013 at 5:37 pm

    • It’s certainly something a lot of people have been asking for, so I wouldn’t be surprised if we did get M support in SSIS soon.

      Chris Webb

      August 20, 2013 at 11:21 pm

      • Thanks Chrisd for this examples, they’re really good. When exactly will your Power Query book be published?

        Enemona

        January 12, 2014 at 5:26 pm

      • Hopefully in May, but I’ve only just started work on it!

        Chris Webb

        January 12, 2014 at 5:28 pm

  6. Chris, given your extensive command of M functions, I believe that you overlooked a simpler solution to the count duplicate problem e.g.
    let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    InsertedCustom = Table.AddColumn(Source, “Custom”, each List.Count(List.Distinct(Text.Split([Input],”,”))))
    in
    InsertedCustom

    And if you wanted to return the distinct values in each row, you can add the line
    InsertedCustom1 = Table.AddColumn(InsertedCustom,”Custom1″, each Text.Combine(List.Distinct(Text.Split([Input],”,”)),”,”))
    (replacing “in InsertedCustom” with “in InsertedCustom1″)

    Clever solution to the second problem! Note that the list of numbers can be written more compactly as {“0″..”9″}

    Colin Banfield

    August 20, 2013 at 4:11 pm

    • Thanks Colin – the point I was trying to make with the count distinct problem was that it could be solved in Power Query without writing any M code manually at all, and just using the UI;I don’t think many Power Query users will want to learn M. But nice solution!

      BTW I’m on holiday at the moment and will reply to your recent email in a few days when I get home…

      Chris Webb

      August 20, 2013 at 5:24 pm

      • Hi Chris,

        Similar to your second solution, this is how I did mine, just sharing so other learners like myself can go through it:

        let

        Source = Excel.CurrentWorkbook(){[Name="Table3"]}[Content],

        InsertedCustom = Table.AddColumn(Source, “FirstNumberPosition”, each List.PositionOfAny(Text.ToList([Input]),{“0″..”9″},Occurrence.First)),

        InsertedCustom1 = Table.AddColumn(InsertedCustom, “InitialNumberText”, each Text.Range([Input],[FirstNumberPosition])),

        InsertedCustom2 = Table.AddColumn(InsertedCustom1, “PositionFirstAlphabet”, each List.PositionOfAny(Text.ToList([InitialNumberText]),{“a”..”z”,”A”..”Z”})),

        InsertedCustom3 = Table.AddColumn(InsertedCustom2, “Output”, each Text.Range([InitialNumberText], [PositionFirstAlphabet]))

        in

        InsertedCustom3

        This is how I might ‘incrementally’ implement it in Excel, this should be familiar to Excel Power users.

        NB> Please note that we eargerly await your power query book, I hope commenters on this blog would get a discount :)

        Enemona

        January 23, 2014 at 12:24 am

  7. […] work. If you’re interested on some more things that can be done with M, I suggest you check out these examples on Chris Webb’s BI […]

  8. Wow – lots of angry comments here. As a vanilla excel / VBA expert (who is actually an accountant by trade), I would like to say that M and power query blow VBA and Vanilla Excel formulas out of the water. There is so much utility in having a functional programming language available (VBA is not a FUNCTIONAL programming language – no anonymous function capabilities)

    I used code very similar to this example to solve a very time consuming formerly manual task at work. We match debits and credits after summarizing transaction by some criteria – simple scenario. I have simplified things even more here to demonstrate. The following code gets right to the jist of my problem in a few lines of code that vanilla excel cannot handle (because i cannot feed a list of excel results back through excel and developing this in VBA would have taken 10 times as long)

    ImportedExcel = Excel.Workbook(File),
    Table1_Table = ImportedExcel{[Item="Table1",Kind="Table"]}[Data],
    AmountsColumn = Table1_Table[Amount],

    DistinctAmounts = List.Distinct(AmountsColumn),

    DoubledUpAmounts = List.Difference(AmountsColumn,DistinctAmounts),
    DuplicatesRemoved = List.Distinct(DoubledUpAmounts)

    There in 6 short easy to read steps I have shredded a formerly very difficult task that would have been a nightmare to code with VBA. Eat your hearts out PowerQuery haters.

    Also see further examples at https://www.youtube.com/channel/UCxoZHrue1ZOFdDr7CGX43kw/videos

    Derik

    May 14, 2014 at 8:47 pm

  9. Just chiming in from a DBA perspective:
    1) Who came up with the name M? There’s a database-style language called MUMPS, often referred to by M. This makes finding details on this freaking horrific. And that doesn’t even include that “I’m” comes up when I search google for “M”.
    2) Saw some demos on it this weekend (Brian Smith with Microsoft) – really awesome for end users… but I want a way to use that on the server side. Let the Analysts clean the data file, then give the script to me so that I can automatically do it going forward. But he was _really_ non-committal about that feature. Alas.

    mbourgon

    August 28, 2014 at 5:22 pm

    • Some answers…
      1) M is only the unofficial name. It was the code name, when M was a standalone language (well before Power Query was thought of – it was developed for a completely different project), and the official name now is “Power Query Formula Language” but there are still lots of references to M around in the docs.
      2) A lot of people have been asking for this. Right now, you can refresh Excel workbooks that contain Power Query queries when you upload them to a Power BI site in Office 365; I would love to see some kind of standalone server for Power Query.

      Chris Webb

      August 28, 2014 at 6:56 pm

    • M was originally publicly disclosed as a codename for the language part of the now defunct “Oslo” modeling platform. In that context “M” stood for “Model”. The potential conflict with MUMPS was evaluated at that time but because “M” was considered to be just a codename it was kept assuming that some other name would be chosen upon product release. I left MSFT in 2010 so can’t comment on the decision to keep it with the release of Power Query. It does appear that the documentation also refers to it as the “Power Query for Excel
      Formula Language” and the shorter “Power Query Formula Language”. “M” (and it’s supporting infrastructure) was developed to be a very broadly applicable language and the original designers very much envisioned a world with “M” being adopted into various layers of MSFT’s servers and tools. How much of that vision still exists I don’t know but it is notoriously difficult to drive new plumbing into other product groups that you don’t control. So while the Office group has found utility for “M” in Power Query a completely different organization controls SSIS. I imagine that why you didn’t get any commitment from Brian as he is in the Office group.

      And to make things more confusing, notice that MSFT research has a project called M# that apparently has nothing to do with M. There is also a MSFT partner with a product for more rapidly building ASP.NET applications through DSLs. This really smells like it could be upon M as this was one of the primary initial use cases scoped out for M early on. AND, MSFT has a new effort also called Oslo for surfacing data in the Office Graph…

      M# – DSLs over ASP.NET

      http://www.msharp.co.uk/Home.html

      M# – MSFT Research extensions to C# for Midori

      http://www.zdnet.com/microsofts-midori-the-m-connection-7000024664/

      Oslo – New tool for surfacing data in the Office Graph

      http://www.pcworld.com/article/2153784/microsoft-oslo-will-help-you-stay-on-top-of-what-s-relevant.html

      Kris Horrocks

      August 28, 2014 at 6:59 pm

  10. Hi Chris, do you know if there is a way to define a custom aggregate function to use with Table.Group? I am looking for something that would accumulate the values in a comma separated way (for example instead of averaging 1, 2 and 3 it would accumulate them and output “1,2,3”. thanks.

    ben raibaud

    October 20, 2014 at 10:50 pm

    • It will be possible, yes – I do something similar here: http://cwebbbi.wordpress.com/2014/01/27/comparing-columns-in-power-query/

      Chris Webb

      October 21, 2014 at 6:59 am

      • Thanks. I think I manager to do it, but there might be a quicker option.

        My source table would look like this:

        System Domain
        ————————-
        App2 HTR
        App1 SC
        App2 RTR

        and in return I want something like this:

        System Domains
        ————————-
        App1 SC
        App2 HTR, RTR

        My steps are:

        let
        Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
        CombineWithComma = Combiner.CombineTextByDelimiter(“, “),
        GroupedRows = Table.Group(Source, {“System”}, {{“rows”, each _, type table}}),
        Results = Table.AddColumn(GroupedRows, “Domains”, each CombineWithComma(Table.ToList(Table.SelectColumns([rows], “Domain”)))),
        #”Removed Columns” = Table.RemoveColumns(Results,{“rows”}),
        #”Sorted Rows” = Table.Sort(#”Removed Columns”,{{“System”, Order.Ascending}})
        in
        #”Sorted Rows”

        Do you think this is the right way to do it?
        Thanks for your help, Ben

        ben raibaud

        October 21, 2014 at 10:13 am

      • That looks good to me. There may be other ways to do it, but that’s the way I’d solve the problem.

        Chris Webb

        October 21, 2014 at 8:52 pm


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 3,302 other followers

%d bloggers like this: