Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace subscript number in string

I have the following vector and I want to have the subscript numbers (e.g. ₆, ₂) to be replaced with 'normal' numbers.

vec = c("C₆H₄ClNO₂", "C₆H₆N₂O₂", "C₆H₅NO₃", "C₉H₁₀O₂", "C₈H₈O₃")

I could lookup all subscript values and replace them individually:

gsub('₆', '6', vec)

But isn't there a pattern in regex for it?

There's a similar question for javascript but I couldn't translate it into R.

like image 386
andschar Avatar asked Nov 15 '25 06:11

andschar


2 Answers

Use chartr:

Translate characters in character vectors

Solution:

chartr("₀₁₂₃₄₅₆₇₈₉", "0123456789", vec)

See the online R demo

BONUS

To normalize superscript digits use

chartr("⁰¹²³⁴⁵⁶⁷⁸⁹", "0123456789", "⁰¹²³⁴⁵⁶⁷⁸⁹")
## => [1] "0123456789"
like image 104
Wiktor Stribiżew Avatar answered Nov 17 '25 21:11

Wiktor Stribiżew


We can use str_replace_all from stringr to extract all the subscript numbers, convert it to equivalent integer subtract 8272 (because that is the difference between integer value of and 6 and all other equivalents) and convert it back.

stringr::str_replace_all(vec, "\\p{No}", function(m) intToUtf8(utf8ToInt(m) - 8272))
#[1] "C6H4ClNO2" "C6H6N2O2"  "C6H5NO3"   "C9H10O2"   "C8H8O3" 

As pointed out by @Wiktor Stribiżew "\\p{No}" matches more than subscript digits to only match subscripts from 0-9 we can use (thanks to @thothal )

str_replace_all(vec, "[\U2080-\U2089]", function(m) intToUtf8(utf8ToInt(m) - 8272))
like image 32
Ronak Shah Avatar answered Nov 17 '25 21:11

Ronak Shah