Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex to split a string but keep delimiters, but not as separate elements

Tags:

c#

regex

I need to split the following string

the quick brown fox jumps over the lazy dog

into the following tokens:

  1. the
  2. quick brown fox jumps over the
  3. lazy dog

So to explain, I want to split on the but include the the delimiter in the preceding array element (not as its own, separate element).

Can anyone shed any light on this or perhaps give me the correct regex?

I am using C#.

like image 785
Dan Cook Avatar asked Dec 05 '25 15:12

Dan Cook


1 Answers

You need to use look-behind (?<=). The name says it all, look at the previous characters to see if they match some given pattern.

This should work:

"(?<=\\bthe) "

So, at any space, check if the previous characters were "the", if so, it matches.

Note - We also need to include the word boundary \\b (escaped \b) other-wise something like "bathe" will also match.

Without the look-behind, we'll check all the spaces:

   v     v     v   v     v    v   v    v
the quick brown fox jumps over the lazy dog

With the look-behind, we'll only match those the have "the" before it: (ignoring the \\b for now)

"the " - just found a space, and last characters are "the", so match.
"quick " - just found another space, but last characters are "...k", so no match.
etc.

Test.

like image 69
Bernhard Barker Avatar answered Dec 07 '25 04:12

Bernhard Barker