I'm trying to find a way to split a character column with an ellipsis in the middle into two columns, everything before the ellipsis and everything after.
For example, if I have:
a <- "60.4 (b)(33) and (e)(1) revised....................................46111"
How do I split that into "60.4 (b)(33) and (e)(1) revised" and "46111"?
I have tried:
str_extract(a, ".*\\.{2,}")
for the first part, and for the second part:
str_extract(a, "\\.{2,}.*")
but that keeps the ellipsis in both, which I'd like to drop.
It seems you want to split, not to extract, with a pattern that matches two or more consecutive dots:
a <- "60.4 (b)(33) and (e)(1) revised....................................46111"
unlist(stringr::str_split(a, "\\.{2,}"))
## => [1] "60.4 (b)(33) and (e)(1) revised" "46111"
## Base R strsplit:
unlist(strsplit(a, "\\.{2,}"))
## => [1] "60.4 (b)(33) and (e)(1) revised" "46111"
There is another possible splitting regex here: you can match any one or more dots that are followed with a some one or more digits at the end of string:
unlist(stringr::str_split(a, "\\.+(?=\\d+$)"))
unlist(strsplit(a, "\\.+(?=\\d+$)", perl=TRUE))
Both yield the same [1] "60.4 (b)(33) and (e)(1) revised" "46111" output. Here, \.+ matches one or more dots and (?=\d+$) is a positive lookahead that matches a location that is immediately followed with one or more digits (\d+) and then end of string ($).
Another approach is a matching one with str_match (to capture the bits you need):
res <- stringr::str_match(a, "^(.*?)\\.+(\\d+)$")
res[,-1]
# => [1] "60.4 (b)(33) and (e)(1) revised" "46111"
Here,
^ - matches the start of string(.*?) - Group 1: any zero or more chars other than line break chars, as few as possible\.+ - one or more dots(\d+) - Group 2: one or more digits$ - end of string.The res[,-1] is necessary to remove the first column with the full matches.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With