Hello stackoverflowers,
I wonder if I could use the %like% operator row-wise in the datatable between two columns of the same datatable.
The following reproducible example will make it more clear.
First prepare the data
library(data.table)
iris <- as.data.table(iris)
iris <- iris[seq.int(from = 1, to = 150,length.out = 5)]
iris[, Species2 := c('set', "set|vers", "setosa", "nothing" , "virginica")]
Hence the dataset looks as follows.
Sepal.Length Sepal.Width Petal.Length Petal.Width Species Species2
1: 5.1 3.5 1.4 0.2 setosa set
2: 4.9 3.6 1.4 0.1 setosa set|vers
3: 6.4 2.9 4.3 1.3 versicolor setosa
4: 6.4 2.7 5.3 1.9 virginica nothing
5: 5.9 3.0 5.1 1.8 virginica virginica
I would like to use something like the following command row-wise.
iris[Species%like%Species2]
but it does not understand that I want it row-wise. Is that possible? The result should be the 1,2,5 rows.
One way would be to group by row:
iris[, .SD[Species %like% Species2], by = 1:5]
# : Sepal.Length Sepal.Width Petal.Length Petal.Width Species Species2
#1: 1 5.1 3.5 1.4 0.2 setosa set
#2: 2 4.9 3.6 1.4 0.1 setosa set|vers
#3: 5 5.9 3.0 5.1 1.8 virginica virginica
Or as per @docendodiscimus 's comment, in case there are duplicate entries, you can do:
iris[, .SD[Species[1L] %like% Species2[1L]], by = .(Species, Species2)]
%like% is just a wrapper around grepl, so the pattern (right-hand side) can only be length 1. You should be seeing a warning about this.
The stringi package lets you vectorize the pattern argument.
library(stringi)
iris[stri_detect_regex(Species, Species2)]
If you like the operator style instead of the function, you can make your own:
`%vlike%` <- function(x, y) {
stri_detect_regex(x, y)
}
iris[Species %vlike% Species2]
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species Species2
# 1: 5.1 3.5 1.4 0.2 setosa set
# 2: 4.9 3.6 1.4 0.1 setosa set|vers
# 3: 5.9 3.0 5.1 1.8 virginica virginica
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With