I would like to filter a SQL database based whether a regular expression appears within any column. I would like to specify the regex as a variable; however it is read as a literal string. I am having trouble getting the regex in as a variable. Thank you for your help!
Resources I've consulted:
Note: I had trouble making a reprex using the mtcars dataset, following https://www.tidyverse.org/blog/2018/01/dbplyr-1-2/. I get the error: "Error: str_detect() is not available in this SQL variant". I cannot share a reprex using my actual SQL database. As such, below is a pseudo-reprex.
library(dplyr)
library(stringr)
# Variable with regex (either lower- or uppercase "m")
my_string <- "(?i)m"
# WITHOUT SQL DATABASE ----------------------------------------------------
# This runs
mtcars %>%
tibble::rownames_to_column() %>%
filter(str_length(rowname) > 5)
# This runs with STRING
mtcars %>%
tibble::rownames_to_column() %>%
filter(str_detect(rowname, "(?i)m"))
# This runs with VARIABLE
mtcars %>%
tibble::rownames_to_column() %>%
filter(str_detect(rowname, my_string))
# WITH SQL DATABASE -------------------------------------------------------
con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
mtcars_db <- copy_to(con, tibble::rownames_to_column(mtcars), "mtcars")
# This runs
tbl(con, "mtcars") %>%
filter(str_length(rowname) > 5)
# This *should* run with STRING -- pretend it does ;)
tbl(con, "mtcars") %>%
filter(str_detect(rowname, "M"))
# This does NOT run with VARIABLE
tbl(con, "mtcars") %>%
filter(str_detect(rowname, my_string))
This might depend a lot on the flavour of SQL you are using. This issue mentions a translation for str_detect and also provides an alternative.
Testing for SQL Server:
library(dbplyr)
library(dplyr)
library(stringr)
data(mtcars)
df_sim = tbl_lazy(mtcars, con = simulate_mssql())
my_string <- "(?i)m"
df_sim %>%
filter(str_detect(my_string, gear)) %>%
show_query()
# Error: str_detect() is not available in this SQL variant
df_sim %>%
filter(gear %like% my_string) %>%
show_query()
# <SQL>
# SELECT *
# FROM `df`
# WHERE (`gear` like '(?i)m')
So it appears str_detect can not translate to SQL Server. But you can use %like% as a work around.
Testing for MySQL:
library(dbplyr)
library(dplyr)
library(stringr)
data(mtcars)
df_sim = tbl_lazy(mtcars, con = simulate_mysql()) # changed to mysql
my_string <- "(?i)m"
df_sim %>%
filter(str_detect(my_string, gear)) %>%
show_query()
# <SQL>
# SELECT *
# FROM `df`
# WHERE ('(?i)m' REGEXP `gear`)
df_sim %>%
filter(gear %like% my_string) %>%
show_query()
# <SQL>
# SELECT *
# FROM `df`
# WHERE (`gear` like '(?i)m')
So it appears str_detect can be translated correctly for MySQL.
In every case my_string is translated into the query.
A couple of other things to check:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With