Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex to replace a repeating string pattern

I need to replace a repeated pattern within a word with each basic construct unit. For example I have the string "TATATATA" and I want to replace it with "TA". Also I would probably replace more than 2 repetitions to avoid replacing normal words.

I am trying to do it in Java with replaceAll method.

like image 227
Michael Avatar asked Sep 02 '25 04:09

Michael


2 Answers

I think you want this (works for any length of the repeated string):

String result = source.replaceAll("(.+)\\1+", "$1")

Or alternatively, to prioritize shorter matches:

String result = source.replaceAll("(.+?)\\1+", "$1")

It matches first a group of letters, and then it again (using back-reference within the match pattern itself). I tried it and it seems to do the trick.


Example

String source = "HEY HEY duuuuuuude what'''s up? Trololololo yeye .0.0.0";

System.out.println(source.replaceAll("(.+?)\\1+", "$1"));

// HEY dude what's up? Trolo ye .0
like image 165
MightyPork Avatar answered Sep 04 '25 22:09

MightyPork


You had better use a Pattern here than .replaceAll(). For instance:

private static final Pattern PATTERN 
    = Pattern.compile("\\b([A-Z]{2,}?)\\1+\\b");

//...

final Matcher m = PATTERN.matcher(input);
ret = m.replaceAll("$1");

edit: example:

public static void main(final String... args)
{
    System.out.println("TATATA GHRGHRGHRGHR"
        .replaceAll("\\b([A-Za-z]{2,}?)\\1+\\b", "$1"));
}

This prints:

TA GHR
like image 20
fge Avatar answered Sep 04 '25 21:09

fge