I am trying to translate R's dplyr code to R Polars, which replaces entire string with another if there is a partial match.
library(polars)
library(dplyr)
df <- data.frame(category = c('Cats A','Cats B','kittens','street cats','dogs A','dogs B'))
#replace string that contains 'cats' and 'kitten' with 'cats'
df %>%
mutate(replaced = replace(category,
grepl(paste0(c('cats','kittens'), collapse = '|'), category, ignore.case = TRUE),
'CATS')
)
#Output
category replaced
Cats A CATS
Cats B CATS
kittens CATS
street cats CATS
dogs A dogs A
dogs B dogs B
I want to replicate this in Polars and attempted something like:
p_df <- pl$DataFrame(df) #to polars dataframe
p_df$with_columns(replaced = pl$col('category')$str$replace_many(c("Cats","kittens"),"CATS"))
and
...$str$replace(r"{cats}",'Cats')) This replaces only the matched part, not the entire string. Not sure how to make it work. A Python implementation will also help.
#output
┌─────────────┬─────────────┐
│ category ┆ replaced │
│ --- ┆ --- │
│ str ┆ str │
╞═════════════╪═════════════╡
│ Cats A ┆ CATS A │
│ Cats B ┆ CATS B │
│ kittens ┆ CATS │
│ street cats ┆ street cats │
│ dogs A ┆ dogs A │
│ dogs B ┆ dogs B │
└─────────────┴─────────────┘
I don't know R but it looks like the first example builds a regex delimited by |?
You can do it at the "regex level" in a similar way:
| delimited partials in (?:)(?i) to ignore case.* "wildcard" on either side of the partials to match "anything"df.with_columns(replaced =
pl.col("category").str.replace(r"(?i).*(?:cats|kittens).*", "CATS")
)
shape: (6, 2)
┌─────────────┬──────────┐
│ category ┆ replaced │
│ --- ┆ --- │
│ str ┆ str │
╞═════════════╪══════════╡
│ Cats A ┆ CATS │
│ Cats B ┆ CATS │
│ kittens ┆ CATS │
│ street cats ┆ CATS │
│ dogs A ┆ dogs A │
│ dogs B ┆ dogs B │
└─────────────┴──────────┘
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With