So lets say that I want to locate a pattern in a string and if the pattern exists then I only keep the part of the string before the pattern. My problem is that if the pattern does not exist then it returns NA
and the final result will be NA
. I want it to return the original string when the pattern does not exist.
library(stringr)
library(dplyr)
unique(iris$Species)
#> [1] setosa versicolor virginica
#> Levels: setosa versicolor virginica
test <- iris %>%
mutate(Species = str_sub(Species, 1, str_locate(Species, "t")[,1] ))
head(test)
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 5.1 3.5 1.4 0.2 set
#> 2 4.9 3.0 1.4 0.2 set
#> 3 4.7 3.2 1.3 0.2 set
#> 4 4.6 3.1 1.5 0.2 set
#> 5 5.0 3.6 1.4 0.2 set
#> 6 5.4 3.9 1.7 0.4 set
tail(test)
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 145 6.7 3.3 5.7 2.5 <NA>
#> 146 6.7 3.0 5.2 2.3 <NA>
#> 147 6.3 2.5 5.0 1.9 <NA>
#> 148 6.5 3.0 5.2 2.0 <NA>
#> 149 6.2 3.4 5.4 2.3 <NA>
#> 150 5.9 3.0 5.1 1.8 <NA>
Created on 2019-07-14 by the reprex package (v0.3.0)
We can use a regex lookaround with str_remove
. If the pattern is not found, it will return the original string. Here, we are matching characters (.*
) after the 't' character and if found, those characters are removed
library(dplyr)
library(stringr)
test <- iris %>%
mutate(Species = str_remove(Species, "(?<=t).*"))
head(test)
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#1 5.1 3.5 1.4 0.2 set
#2 4.9 3.0 1.4 0.2 set
#3 4.7 3.2 1.3 0.2 set
#4 4.6 3.1 1.5 0.2 set
#5 5.0 3.6 1.4 0.2 set
#6 5.4 3.9 1.7 0.4 set
tail(test)
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#145 6.7 3.3 5.7 2.5 virginica
#146 6.7 3.0 5.2 2.3 virginica
#147 6.3 2.5 5.0 1.9 virginica
#148 6.5 3.0 5.2 2.0 virginica
#149 6.2 3.4 5.4 2.3 virginica
#150 5.9 3.0 5.1 1.8 virginica
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With