I have the following vector and I want to have the subscript numbers (e.g. ₆, ₂) to be replaced with 'normal' numbers.
vec = c("C₆H₄ClNO₂", "C₆H₆N₂O₂", "C₆H₅NO₃", "C₉H₁₀O₂", "C₈H₈O₃")
I could lookup all subscript values and replace them individually:
gsub('₆', '6', vec)
But isn't there a pattern in regex for it?
There's a similar question for javascript but I couldn't translate it into R.
Use chartr:
Translate characters in character vectors
Solution:
chartr("₀₁₂₃₄₅₆₇₈₉", "0123456789", vec)
See the online R demo
BONUS
To normalize superscript digits use
chartr("⁰¹²³⁴⁵⁶⁷⁸⁹", "0123456789", "⁰¹²³⁴⁵⁶⁷⁸⁹")
## => [1] "0123456789"
We can use str_replace_all from stringr to extract all the subscript numbers, convert it to equivalent integer subtract 8272 (because that is the difference between integer value of ₆ and 6 and all other equivalents) and convert it back.
stringr::str_replace_all(vec, "\\p{No}", function(m) intToUtf8(utf8ToInt(m) - 8272))
#[1] "C6H4ClNO2" "C6H6N2O2" "C6H5NO3" "C9H10O2" "C8H8O3"
As pointed out by @Wiktor Stribiżew "\\p{No}" matches more than subscript digits to only match subscripts from 0-9 we can use (thanks to @thothal )
str_replace_all(vec, "[\U2080-\U2089]", function(m) intToUtf8(utf8ToInt(m) - 8272))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With