Chris Webb's BI Blog

Analysis Services, MDX, PowerPivot, DAX and anything BI-related

Using List.Generate() To Make Multiple Replacements Of Words In Text In Power Query

with 6 comments

Recently I had a request for help from someone who wanted to do the following in Power Query: take a piece of text and then, using a table, search for all of the occurrences of the words in one column of the table in the text and replace those words with those in the other column. So, for example, given these two tables in Excel:

image

You want to take the table on the left and for each piece of text replace the words in the ‘Word To Replace’ column of the right-hand table with those in the ‘Replace With’ column of the right-hand table. The output would therefore be:

image

An interesting challenge in itself, and one I solved first of all using a recursive function. Here’s some code showing how I did it:

let

    //Get table of word replacements

    Replacements = Excel.CurrentWorkbook(){[Name="Replacements"]}[Content],

    //Get table containing text to change

    TextToChange = Excel.CurrentWorkbook(){[Name="Text"]}[Content],

    //Get a list of all words to replace

    WordsToReplace = Table.Column(Replacements, "Word To Replace"),

    //Get a list of all words to replace with

    WordsToReplaceWith = Table.Column(Replacements, "Replace With"),

    //Recursive function to do the replacement

    ReplacementFunction = (InputText, Position)=> 

    let 

     //Use Text.Replace to do each replace

     ReplaceText = Text.Replace(

            InputText, 

            WordsToReplace{Position}, 

            WordsToReplaceWith{Position})

    in

     //If we have reached the end of the list of replacements

     if Position=List.Count(WordsToReplace)-1 

      then 

      //return the output of the query

      ReplaceText 

      else 

      //call the function again

      @ReplacementFunction(ReplaceText, Position+1),

    //Add a calculated column to call the function on every row in the table

    //containing text to change

    Output = Table.AddColumn(TextToChange, "Changed Text", each ReplacementFunction([Text], 0))

    

in

    Output

 

It does the job, but… after thinking about this some more, I wondered if there was a better way. A lot of my recent Power Query blog posts have used recursive functions, but are they a Good Thing? So I asked on the forum, and as usual the nice people on the Power Query dev team answered very promptly (that’s one of the things I like about the Power Query dev team – they engage with their users). Recursive functions are indeed something that should be avoided if there is an alternative, and in this case List.Generate() can be used instead. Here’s how:

let

    //Get table of word replacements

    Replacements = Excel.CurrentWorkbook(){[Name="Replacements"]}[Content],

    //Get table containing text to change

    TextToChange = Excel.CurrentWorkbook(){[Name="Text"]}[Content],

    //Get list of words to replace

    WordsToReplace = Table.Column(Replacements, "Word To Replace"),

    //Get list of words to replace them with

    WordsToReplaceWith = Table.Column(Replacements, "Replace With"),

    //A non-recursive function to do the replacements

    ReplacementFunction = (InputText)=> 

     let

       //Use List.Generate() to do the replacements

       DoReplacement = List.Generate(

                          ()=> [Counter=0, MyText=InputText], 

                          each [Counter]<=List.Count(WordsToReplaceWith), 

                          each [Counter=[Counter]+1, 

                                MyText=Text.Replace(

                                         [MyText], 

                                         WordsToReplace{[Counter]}, 

                                         WordsToReplaceWith{[Counter]})], 

                          each [MyText]),

       //Return the last item in the list that

       //List.Generate() returns

       GetLastValue = List.Last(DoReplacement)

     in

      GetLastValue,

    //Add a calculated column to call the function on every row in the table

    //containing the text to change

    Output = Table.AddColumn(TextToChange, "Changed Text", each ReplacementFunction([Text]))

in

    Output

 

List.Generate() is a very powerful function indeed, albeit one that took me a while to understand properly. It’s a bit like a FOR loop even if it’s a function that returns a list. Here’s what each of the parameters I’m passing to the function in the example above do:

  •  ()=> [Counter=0, MyText=InputText] returns a function that itself returns a record (a record is a bit like a table with just one row in it). The record contains two fields: Counter, which has the value 0, and MyText which is given the value of the text where the values are to be replaced. This record is the initial value that List.Generate() will modify at each iteration.
  • each [Counter]<=List.Count(WordsToReplaceWith) returns a function too. An each expression is a quick way of declaring a function that takes one, unnamed parameter, and in this case the value that will be passed to this parameter is a record of the same structure as the one declared in the previous bullet. The expression [Counter] gets the value of the Counter field from that record. The function returns a boolean value, true when the value in the [Counter] field of the record is less than or equal to the number of items in the list of words to replace. List.Generate() returns a list, and while this function returns true it will keep on iterating and adding new items to the list it returns.
  • each [Counter=[Counter]+1, MyText=Text.Replace([MyText], WordsToReplace{[Counter]}, WordsToReplaceWith{[Counter]})] returns yet another function, once again declared using an each expression. The function here takes the record from the current iteration and returns the record to be used at the next iteration: a record where the value of the Counter field is increased by one, and where the value of the MyText field has one word replaced. The word that gets replaced in MyText is the word in the (zero-based) row number given by Counter in the ‘Word To Replace’ column; this word is replaced by the word in the row number given by Counter in the ‘Replace With’ column.
  • each [MyText] returns a very simple function, one that returns the value from the MyText field of the record from the current iteration. It’s the value that this function returns that is added to the list returned by List.Generate() at every iteration.

To illustrate this, here’s a simplified example showing how List.Generate() works in this case:

let

    WordsToReplace = {"cat", "dog", "mat"},

    WordsToReplaceWith = {"fish", "snake", "ground"},

    Demo = List.Generate(

                          ()=> [Counter=0, MyText="the cat and the dog sat on the mat"], 

                          each [Counter]<=List.Count(WordsToReplaceWith), 

                          each [Counter=[Counter]+1, 

                                MyText=Text.Replace(

                                         [MyText], 

                                         WordsToReplace{[Counter]}, 

                                         WordsToReplaceWith{[Counter]})], 

                          each [MyText])

 

in

    Demo

 

The output of this query is the list:

image

This list can be written as (with the words changed at each iteration highlighted):

{“the cat and the dog sat on the mat”, “the fish and the dog sat on the mat”,  “the fish and the snake sat on the mat”, “the fish and the snake sat on the ground”}

So, another useful function to know about. I’m slowly getting to grips with all this functional programming!

You can download the sample workbook here.

Written by Chris Webb

June 25, 2014 at 11:33 pm

Posted in Power Query

6 Responses

Subscribe to comments with RSS.

  1. Reminds me a bit of the lapply function in R. (which applies a function over a vector and returns a list)

  2. […] Using List.Generate() To Make Multiple Replacements Of Words In Text In Power Query (Chris Webb) […]

  3. another brilliant way to do these replacements.. however so far I’ve used the recursive method in many workbook and saw no issue…is there specific problem with it?… what about doing replace only when entire cell matches and not just sub-string without using Table.Transform() to replace existing column entirely.

    chandan chauhan

    June 26, 2014 at 4:24 pm

    • Recursion can be a beast – although this looks like recursion in a neatly controlled wrapper Iwas usually doing basically the same thing with recursion except usually escaping at the end state rather than returning the whole tree that got me there like this function does. Great option though it definately seems better controlled than some of the recursive implementations I’ve seen and created.

      Derik

      July 10, 2014 at 10:38 pm

  4. Hi Chris, Many thanks for another awesome blog (& your book too). I was just wondering if there is a limit to the number of records that can be loaded into Excel when using List.Generate? Following the download example I created a new workbook but with the source (text) table containing multiple columns and about 50,000 rows. I created a replacements table with about 10 replacement rows. Each of the 50,000 “text” rows has no more than 2 replacements. In the Power Query window it all works perfectly and makes the replacements for all 50,000 rows but when I try and load to worksheet (Excel 2010, 32 bit) it displays “100 rows loaded” in the “Workbook Queries” pane with the processing icon looping around but it never finishes. Only 100 rows appear in the excel worksheet, the 101st row displays “…” in each of the columns. Maximum cache is 4,096 MB & currently used cache is only 23.7 MB. Any ideas what might be preventing it from loading all rows?

    Dale Cameron

    August 15, 2014 at 2:49 am


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 3,240 other followers

%d bloggers like this: