Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

String manipulation in mutate with stringr

Tags:

r

dplyr

stringr

So lets say that I want to locate a pattern in a string and if the pattern exists then I only keep the part of the string before the pattern. My problem is that if the pattern does not exist then it returns NA and the final result will be NA. I want it to return the original string when the pattern does not exist.

library(stringr)
library(dplyr)
unique(iris$Species)
#> [1] setosa     versicolor virginica 
#> Levels: setosa versicolor virginica

test <- iris %>%
  mutate(Species = str_sub(Species, 1, str_locate(Species, "t")[,1] ))

head(test)
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1          5.1         3.5          1.4         0.2     set
#> 2          4.9         3.0          1.4         0.2     set
#> 3          4.7         3.2          1.3         0.2     set
#> 4          4.6         3.1          1.5         0.2     set
#> 5          5.0         3.6          1.4         0.2     set
#> 6          5.4         3.9          1.7         0.4     set
tail(test)
#>     Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 145          6.7         3.3          5.7         2.5    <NA>
#> 146          6.7         3.0          5.2         2.3    <NA>
#> 147          6.3         2.5          5.0         1.9    <NA>
#> 148          6.5         3.0          5.2         2.0    <NA>
#> 149          6.2         3.4          5.4         2.3    <NA>
#> 150          5.9         3.0          5.1         1.8    <NA>

Created on 2019-07-14 by the reprex package (v0.3.0)

like image 705
xhr489 Avatar asked Sep 05 '25 16:09

xhr489


1 Answers

We can use a regex lookaround with str_remove. If the pattern is not found, it will return the original string. Here, we are matching characters (.*) after the 't' character and if found, those characters are removed

library(dplyr)
library(stringr)
test <- iris %>% 
          mutate(Species = str_remove(Species, "(?<=t).*")) 
head(test)
#  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#1          5.1         3.5          1.4         0.2     set
#2          4.9         3.0          1.4         0.2     set
#3          4.7         3.2          1.3         0.2     set
#4          4.6         3.1          1.5         0.2     set
#5          5.0         3.6          1.4         0.2     set
#6          5.4         3.9          1.7         0.4     set
tail(test)
#    Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
#145          6.7         3.3          5.7         2.5 virginica
#146          6.7         3.0          5.2         2.3 virginica
#147          6.3         2.5          5.0         1.9 virginica
#148          6.5         3.0          5.2         2.0 virginica
#149          6.2         3.4          5.4         2.3 virginica
#150          5.9         3.0          5.1         1.8 virginica
like image 102
akrun Avatar answered Sep 08 '25 05:09

akrun