I would like to split strings like the following:
x <- "abc-1230-xyz-[def-ghu-jkl---]-[adsasa7asda12]-s-[klas-bst-asdas foo]"
by dash (-) on the condition that those dashes must not be contained inside a pair of []. The expected result would be
c("abc", "1230", "xyz", "[def-ghu-jkl---]", "[adsasa7asda12]", "s",
"[klas-bst-asdas foo]")
Notes:
- whenever it's not inside [].There's a similar question for python (How to split a string by commas positioned outside of parenthesis?) but I haven't yet been able to accurately adjust that to my scenario.
You could use look ahead to verify that there is no ] following sooner than a [:
-(?![^[]*\])
So in R:
strsplit(x, "-(?![^[]*\\])", perl=TRUE)
-: match the hyphen(?! ): negative look ahead: if that part is found after the previously matched hyphen, it invalidates the match of the hyphen.
[^[]: match any character that is not a [
*: match any number of the previous\]: match a literal ]. If this matches, it means we found a ] before finding a [. As all this happens in a negative look ahead, a match here means the hyphen is not a match. Note that a ] is a special character in regular expressions, so it must be escaped with a backslash (although it does work without escape, as the engine knows there is no matching [ preceding it -- but I prefer to be clear about it being a literal). And as backslashes have a special meaning in string literals (they also denote an escape), that backslash itself must be escaped again in this string, so it appears as \\].Instead of splitting, extract the parts:
library(stringr)
str_extract_all(x, "(\\[[^\\[]*\\]|[^-])+")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With