Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: replace multiple occurrences of regex-matched strings in dataframe fields by looking them up in another dataframe

I have two dataframes:

df lookup:

oldId <- c(123, 456, 567, 789)
newId <- c(1, 2, 3, 4)
lookup <- data.frame(oldId, newId)

df data:

descr <- c("description with no match",
+ "description with one 123 match", 
+ "description with again no match",
+ "description 456 with two 789 matches")

Goal:

I want a new dataframe:

  • same structure as the data df
  • same field values, except that all instances of numbers (i.e. 123, 456, 789) are looked up in the other dataframe, and replaced by lookup$newId.

The resulting dataframe will thus look like this:

  1. "description with no match"
  2. "description with one 1 match"
  3. "description with again no match"
  4. "description 2 with two 4 matches"

So, each text in the descr column may have a large amount of numbers which need to be replaced. Of course, this is a stripped down example; my real life dataframes are much bigger.

I do have the regex-part fixed:

fx <- function(x) {gsub("([[:digit:]]{3})", "TESTTEST", x)}
data$descr <- lapply(data$descr, fx)

But I have no idea how to let the function loop over all matches in a row, and then let it look up the number and replace it.

like image 575
Rinke Avatar asked Oct 22 '25 15:10

Rinke


1 Answers

A base R approach can use Reduce:

Reduce(
  \(x, i) gsub(lookup$oldId[i], lookup$newId[i], x),
  seq_along(lookup$oldId),
  init = descr
)

Output:

[1] "description with no match"        "description with one 1 match"    
[3] "description with again no match"  "description 2 with two 4 matches"
like image 102
jpsmith Avatar answered Oct 25 '25 05:10

jpsmith



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!