The str_replace (and preg_replace) function in PHP replaces all occurrences of the search string with the replacement string. What interests me the most here, is that if search and replace args are arrays (in R we call that vectors), then str_replace takes a value from each array (vector) and uses them to search and replace on subject.
In other words, does R (or some R package) have a function to perform the following:
string <- "The quick brown fox jumped over the lazy dog."
patterns <- c("quick", "brown", "fox")
replacements <- c("slow", "black", "bear")
xxx_replace_xxx(string, patterns, replacements) ## ???
## [1] "The slow black bear jumped over the lazy dog."
So I am seeking for something like chartr, but for search patterns and replacement strings of arbitrary number of characters. This cannot be done via one call to gsub() as its replacement argument can be a single string only, see ?gsub. So my current implementation is like:
xxx_replace_xxx <- function(string, patterns, replacements) {
for (i in seq_along(patterns))
string <- gsub(patterns[i], replacements[i], string, fixed=TRUE)
string
}
However, I am looking for something much faster if length(patterns) is large - I have a lot of data to process and I'm dissatisfied with the current results.
Exemplary toy data for benchmarking:
string <- readLines("http://www.gutenberg.org/files/31536/31536-0.txt", encoding="UTF-8")
patterns <- c("jak", "to", "do", "z", "na", "i", "w", "za", "tu", "gdy",
"po", "jest", "Tadeusz", "lub", "razem", "nas", "przy", "oczy", "czy",
"sam", "u", "tylko", "bez", "ich", "Telimena", "Wojski", "jeszcze")
replacements <- paste0(patterns, rev(patterns))
Use str_replace_all() method of stringr package to replace multiple string values with another list of strings on a single column in R and update part of a string with another string.
The replace() method returns a new string with one, some, or all matches of a pattern replaced by a replacement . The pattern can be a string or a RegExp , and the replacement can be a string or a function called for each match. If pattern is a string, only the first occurrence will be replaced.
The gsub() function in R can be used to replace all occurrences of a certain pattern within a string in R.
Python String | replace() The replace() in Python returns a copy of the string where all occurrences of a substring are replaced with another substring.
Using PCRE instead of fixed matching takes ~1/3 the time on my machine for your example.
xxx_replace_xxx_pcre <- function(string, patterns, replacements) {
for (i in seq_along(patterns))
string <- gsub(patterns[i], replacements[i], string, perl=TRUE)
string
}
system.time(x <- xxx_replace_xxx(string, patterns, replacements))
# user system elapsed
# 0.491 0.000 0.491
system.time(p <- xxx_replace_xxx_pcre(string, patterns, replacements))
# user system elapsed
# 0.162 0.000 0.162
identical(x,p)
# [1] TRUE
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With