Join Conditions In Power Query, Part 1
In last week’s post I showed how to create a simple LIKE function in Power Query which could be used in a calculated column. This week I’m going to show you how you can use this function in a condition when joining two tables together.
Consider the following two tables in an Excel worksheet:
Let’s say you want to join the Patterns table to the Phrases table, but only return the rows for each pattern where the Like() function returns true. Power Query has good support for different types of joins in the Table.Join() and Table.NestedJoin() functions but it isn’t immediately obvious how to handle join conditions such as this one.
Here’s the M code for a query that shows you how to do this:
What this query does is the following:
- Gets the data from both the Patterns table and the Phrases table. I’ve done this as two steps in the same query for simplicity; you might want to create two separate queries to do this.
- On the Patterns table, add a new custom column that returns the entire Phrases table as a value. The Insert Custom Column dialog and its output will look like this:
- Expands the new custom column so that every row of the Phrases table is displayed against every row of the Patterns table – in effect, it does a cross join between the two tables. Interestingly there is no option to do a cross join using the Table.Join() and Table.NestedJoin() functions.
- Finally, it uses the Like() function in Table.SelectRows() as follows:
Table.SelectRows(ExpandedTable, each Like([Phrases],[Patterns]))
…to filter the table from the previous step so that you only get the rows back where the Like() function returns true and the pattern matches the phrase:
So… we have a solution to our problem (and you can download the example workbook here). However, be warned: for large tables this approach may not perform well! In part two I’ll show you another example of a complex join condition and show you how different approaches can yield very different performance.