R str_extract everything before and after ellipsis

Question

I'm trying to find a way to split a character column with an ellipsis in the middle into two columns, everything before the ellipsis and everything after.

For example, if I have:

a <- "60.4 (b)(33) and (e)(1) revised....................................46111"

How do I split that into "60.4 (b)(33) and (e)(1) revised" and "46111"?

I have tried:

str_extract(a, ".*\.{2,}")

for the first part, and for the second part:

str_extract(a, "\.{2,}.*")

but that keeps the ellipsis in both, which I'd like to drop.

Wiktor Stribiżew · Accepted Answer

It seems you want to split, not to extract, with a pattern that matches two or more consecutive dots:

a <- "60.4 (b)(33) and (e)(1) revised....................................46111"
unlist(stringr::str_split(a, "\.{2,}"))
## => [1] "60.4 (b)(33) and (e)(1) revised" "46111"                          

## Base R strsplit:
unlist(strsplit(a, "\.{2,}"))
## => [1] "60.4 (b)(33) and (e)(1) revised" "46111"

There is another possible splitting regex here: you can match any one or more dots that are followed with a some one or more digits at the end of string:

unlist(stringr::str_split(a, "\.+(?=\d+$)"))
unlist(strsplit(a, "\.+(?=\d+$)", perl=TRUE))

Both yield the same [1] "60.4 (b)(33) and (e)(1) revised" "46111" output. Here, \.+ matches one or more dots and (?=\d+$) is a positive lookahead that matches a location that is immediately followed with one or more digits (\d+) and then end of string ($).

Another approach is a matching one with str_match (to capture the bits you need):

res <- stringr::str_match(a, "^(.*?)\.+(\d+)$")
res[,-1]
# => [1] "60.4 (b)(33) and (e)(1) revised" "46111"

Here,

^ - matches the start of string
(.*?) - Group 1: any zero or more chars other than line break chars, as few as possible
\.+ - one or more dots
(\d+) - Group 2: one or more digits
$ - end of string.

The res[,-1] is necessary to remove the first column with the full matches.

R str_extract everything before and after ellipsis

Tags:

regex

r

stringr

byronious

1 Answers

Wiktor Stribiżew

Recent Activity

Donate For Us

R str_extract everything before and after ellipsis

Tags:

regex

r

stringr

byronious

1 Answers

Wiktor Stribiżew

Related questions

Recent Activity

Donate For Us