Comparing Columns In Power Query
A few weeks ago I wrote a post about comparing the contents of entire tables in Power Query, and a question came up in the comments about how you might go about comparing values in columns rather than whole tables. Of course this prompted me to investigate how different types of comparison might be done – and here’s the blog post with the results of the investigation.
Consider the following two single-column tables in an Excel worksheet:
Which items are present in one column and not in the other? Which are present in both? The easiest way to answer these questions is to take each table and turn it into a List object (using Table.ToList() ); once you’ve done that you’ll find there are loads of really useful functions for this type of thing. Here’s a query that compares the values in each column:
Here’s the output:
Fairly self-explanatory, I think. List.Difference() finds the items that are in one list and not another: List.Intersect() finds items that are in both. In fact it’s probably more interesting to look at how I’ve generated the output. Table.FromRows() returns a manually constructed table. The CombineWithComma step uses Combine.CombineTextByDelimiter() to return a function that turns all of the items in a list into a single, comma-delimited piece of text, and I then use that function inside each row of the table I’m returning to get a readable version of what List.Difference() and List.Intersect() return.
Rather than looking at the distinct values in each column, though, you might want to do a row-by-row comparison. Here’s another query that does that:
Here’s the output:
Again I’m turning each table into a list, and then I’m using List.Positions() to generate a list of integer values from 0 to 9 representing the index of each item in the source list, then using List.Transform() to iterate over each item in this list and compare the values at the given index in the source and target list.
Frankly, an even easier way of doing this might have been to import both tables in separate queries, add an index column to both of them using the Insert Index Column button, then join the two tables together on the index column using the Merge button and then finally create some custom columns to do the comparison. This is certainly how any end-user would do it, but the resulting code is a bit less elegant I didn’t learn anything interesting about M from doing it that way. I’ve left the example in the demo workbook, which you can download here.
PS Even if you have voted for me already in the Power BI competition, please vote for me again (you can vote once every 24 hours)! Here’s the link: