I need to split the following string
the quick brown fox jumps over the lazy dog
into the following tokens:
So to explain, I want to split on the but include the the delimiter in the preceding array element (not as its own, separate element).
Can anyone shed any light on this or perhaps give me the correct regex?
I am using C#.
You need to use look-behind (?<=). The name says it all, look at the previous characters to see if they match some given pattern.
This should work:
"(?<=\\bthe) "
So, at any space, check if the previous characters were "the", if so, it matches.
Note - We also need to include the word boundary \\b (escaped \b) other-wise something like "bathe" will also match.
Without the look-behind, we'll check all the spaces:
v v v v v v v v
the quick brown fox jumps over the lazy dog
With the look-behind, we'll only match those the have "the" before it: (ignoring the \\b for now)
"the " - just found a space, and last characters are "the", so match.
"quick " - just found another space, but last characters are "...k", so no match.
etc.
Test.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With