How can I use R's regex to eliminate space(s) before period(s) unless period is followed by a digit?
Here's what I have and what I've tried:
x <- c("I have .32 dollars AKA 32 cents . ", 
    "I have .32 dollars AKA 32 cents .  Hello World .")
gsub("(\\s+)(?=\\.+)", "", x, perl=TRUE)
gsub("(\\s+)(?=\\.+)(?<=[^\\d])", "", x, perl=TRUE)
This gives (no space before .32):
## [1] "I have.32 dollars AKA 32 cents. "             
## [2] "I have.32 dollars AKA 32 cents.  Hello World."
I'd like to get:
## [1] "I have .32 dollars AKA 32 cents. "             
## [2] "I have .32 dollars AKA 32 cents.  Hello World."
I'm saddled with gsub here but other solutions welcomed to make the question more usable to future searchers.
You don't need a complex expression, you can use a Positive Lookahead here.
> gsub(' +(?=\\.(?:\\D|$))', '', x, perl=T)
## [1] "I have .32 dollars AKA 32 cents. "             
## [2] "I have .32 dollars AKA 32 cents.  Hello World."
Explanation:
 +        # ' ' (1 or more times)
(?=       # look ahead to see if there is:
  \.      #   '.'
  (?:     #   group, but do not capture:
    \D    #      non-digits (all but 0-9)
   |      #     OR
    $     #      before an optional \n, and the end of the string
  )       #   end of grouping
)         # end of look-ahead
Note: If these space characters could be any type of whitespace just replace ' '+ with \s+
If you are content with using the (*SKIP)(*F) backtracking verbs, here is the correct representation:
> gsub(' \\.\\d(*SKIP)(*F)| +(?=\\.)', '', x, perl=T)
## [1] "I have .32 dollars AKA 32 cents. "             
## [2] "I have .32 dollars AKA 32 cents.  Hello World."
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With