I need a regular expression that returns a specific letter and the following (one or two) digits until the next letter. For example, I would like to extract how many carbons (C) are in a formula using regular expressions in R
strings <- c("C16H4ClNO2", "CH8O", "F2Ni")
I need an expression that returns the number of C which can be one or 2 digits and that does not return the number after chlorine (Cl).
substr(strings,regexpr("C[0-9]+",strings) + 1, regexpr("[ABDEFGHIJKLMNOPQRSTUVWXYZ]+",strings) -1)
[1] "16" "C"  ""  
but the answer I want to be returned is
"16","1","0"
Moreover, I would like the regular expression to automatically locate the next letter and stop before it, instead of having a final position which I specify as a letter not being a C.
makeup in the CHNOSZ package will parse a chemical formula. Here are some alternatives that use it:
1) Create a list L of such fully parsed formulas and then for each one check if it has a "C" component and return its value or 0 if none:
library(CHNOSZ)
L <- Map(makeup, strings)
sapply(L, function(x) if ("C" %in% names(x)) x[["C"]] else 0)
## C16H4ClNO2       CH8O       F2Ni 
##         16          1          0 
Note that L is a list of the fully parsed formulas in case you have other requirements:
> L
$C16H4ClNO2
 C  H Cl  N  O 
16  4  1  1  2 
$CH8O
C H O 
1 8 1 
$F2Ni
 F Ni 
 2  1 
1a)  By adding c(C = 0) to each list component we can avoid having to test for the existence of carbon yielding the following shorter version of the sapply line in (1):
sapply(lapply(L, c, c(C = 0)), "[[", "C")
2) This one-line variation of (1) gives the same answer as in (1) except for names.  It appends "C0" to each formula to avoid having to test for the existence of carbon:
sapply(lapply(paste0(strings, "C0"), makeup), "[[", "C")
## [1] 16  1  0
2a) Here is a variation of (2) that eliminates the lapply by using the fact that makeup will accept a matrix:
sapply(makeup(as.matrix(paste0(strings, "C0"))), "[[", "C")
## [1] 16  1  0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With