Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find occurrences with regex and then only remove first character in matched expression

Tags:

regex

r

Surprisingly I haven't found a satisfactory answer to this regex problem. I have the following vector:

row1
[1] "AA.8.BB.CCCC" "2017"            "3.166.5"         "3.080.2"         "68"              "162.6"          
[7] "185.223.632.4"           "500.332.1" 

My end result should look like this:

row1
[1] "AA.8.BB.CCCC" "2017"     "3,166.5"         "3,080.2"         "68"              "162.6"          
[7] "185,223,632.4"      "500,332.1" 

The last period in each of the numeric values is the decimal point and the other periods should be converted to commas. I want this done without affecting the value with letters ([1]). I tried the following:

gsub("[.]\\d{3}[.]", ",", row1)

This regex sort of works but doesn't quite do what I want. Additionally it removes the numbers, which is problematic. Is there a way to find the regex and then only remove the first character and not the entire matched values? If there is a better way of approaching this I welcome those responses as well.

like image 365
otteheng Avatar asked Dec 14 '25 04:12

otteheng


2 Answers

You can use the following:

See code in use here

gsub("\\G\\d+\\K\\.(?=\\d+(?!$))",",",x,perl=T)

See regex in use here

Note: The regex at the URL above is changed to (?:\G|^) for display purposes (\G matches the start of the string \A, but not the start of the line).

\G\d+\K\.(?=\d+(?!$))

How it works:

  • \G asserts position either at the end of the previous match or at the start of the string
  • \d+\K\. matches a digit one or more times, then resets the match (previously consumed characters are no longer included in the final match), then match a dot . literally
  • (?=\d+(?!$)) positive lookahead ensuring what follows is one or more digits, but not followed by the end of the line
like image 100
ctwheels Avatar answered Dec 15 '25 17:12

ctwheels


One option is to use a combination of a lookbehind and a lookahead to match only a dot when what is on the left is a digit and on the right are 3 digits followed by a dot.

You could add perl = TRUE using gsub.

In the replacement use a comma.

(?<=\d)[.](?=\d{3}[.])

Regex demo | R demo

Double escaped as noted by @r2evans

(?<=\\d)[.](?=\\d{3}[.])
like image 38
The fourth bird Avatar answered Dec 15 '25 19:12

The fourth bird



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!