Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing consecutive duplicate words in a string

I am trying to write a function that removes consecutive duplicate words within a string. It's vital that one any matches found by the regular expression remains. In other words...

A very very very dirty dog

should become...

A very dirty dog

I have a regular expression that seems to work well (based on this post)

(\b\S+\b)(($|\s+)\1)+

However I'm not sure how to use preg_replace (or if there's a better function) to implement this. Right now I have it deleting all matching repeated words without leaving one copy of the word intact. Can I parse a variable or special instruction to it to keep a match ?

I have this currently...

$string=preg_replace('/(\b\S+\b)(($|\s+)\1)+/', '', $string);
like image 392
AdamJones Avatar asked Dec 20 '25 22:12

AdamJones


1 Answers

You may use a regex like \b(\S+)(?:\s+\1\b)+ and replace with $1:

$string=preg_replace('/\b(\S+)(?:\s+\1\b)+/i', '$1', $string);

See the regex demo

Details:

  • \b(\S+) - Group 1 capturing one or more non-whitespace symbols that are preceded with a word boundary (maybe \b(\w+) would suit better here)
  • (?:\s+\1\b)+ - 1 or more sequences of:
    • \s+ - 1 or more whitespaces
    • \1\b - a backreference to the value stored in Group 1 buffer (the value must be a whole word)

The replacement pattern is $1, the replacement backreference that refers to the value stored in Group 1 buffer.

Note that /i case insensitive modifier will make \1 case insensitive, and I have a dog Dog DOG will result in I have a dog.

like image 117
Wiktor Stribiżew Avatar answered Dec 22 '25 10:12

Wiktor Stribiżew



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!