Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: find if number is within range in a character string

I have a string s where "substrings" are divided by a pipe. Substrings might or might not contain numbers. And I have a test character string n that contains a number and might or might not contain letters. See example below. Note that spacing can be any

I'm trying to drop all substrings where n is not in a range or is not an exact match. I understand that I need to split by -, convert to numbers, and compare low/high to n converted to numeric. Here's my starting point, but then I got stuck with getting the final good string out of unl_new.

s = "liquid & bar soap 1.0 - 2.0oz | bar 2- 5.0 oz | liquid soap 1-2oz | dish 1.5oz"
n = "1.5oz"

unl = unlist(strsplit(s,"\\|"))

unl_new = (strsplit(unl,"-"))
unl_new = unlist(gsub("[a-zA-Z]","",unl_new))

Desired output:

"liquid & bar soap 1.0 - 2.0oz | liquid soap 1-2oz | dish 1.5oz"

Am I completely on the wrong path? Thanks!

like image 281
Alexey Ferapontov Avatar asked Nov 19 '25 05:11

Alexey Ferapontov


1 Answers

Here an option using r-base ;

## extract the n numeric
nn <- as.numeric(gsub("[^0-9|. ]", "", n))
## keep only numeric and -( for interval)
## and split by |
## for each interval test the condition to create a boolean vector
contains_n <- sapply(strsplit(gsub("[^0-9|. |-]", "", s),'[|]')[[1]],
       function(x){
         yy <- strsplit(x, "-")[[1]]
         yy <- as.numeric(yy[nzchar(yy)])
         ## the condition
         (length(yy)==1 && yy==nn) || length(yy)==2 && nn >= yy[1] && nn <= yy[2]
       })

## split again and use the boolean factor to remove the parts 
## that don't respect the condition
## paste the result using collapse to get a single character again
paste(strsplit(s,'[|]')[[1]][contains_n],collapse='')

## [1] "liquid & bar soap 1.0 - 2.0oz  liquid soap 1-2oz  dish 1.5oz"
like image 135
agstudy Avatar answered Nov 21 '25 20:11

agstudy