Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

extract values from string with Regular Expression

Tags:

java

regex

I have this java code

String msg = "*1*20*11*30*IGNORE*53*40##";
String regex = "\\*1\\*(.*?)\\*11\\*(.*?)\\*(.*?)\\*53\\*(.*?)##";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(msg);
if (matcher.find()) {
    for (int i = 0; i < matcher.groupCount(); i++) {
        System.out.println(matcher.group((i+1)));
    }
}

the output is

20
30
IGNORE
40

How do I have to change the regex, that the String which is IGNORE is ignored? I want to,that anything which is written there not to be found by the matcher. the positions where 20,30,40 is are values for me which I need to extract, IGNORE in my case is any protocol specific counter which has no need for me

like image 465
user2071938 Avatar asked May 10 '26 20:05

user2071938


2 Answers

Always ignore the 3rd parameter:

Simply don't create a capture (don't use parentheses).

\\*1\\*(.*?)\\*11\\*(.*?)\\*.*?\\*53\\*(.*?)##

Ignore independently of position:

You need to capture the IGNORE part just like you're doing, and check in your loop if it needs to be ignored:

String msg = "*1*20*11*30*IGNORE*53*40##";
String regex = "\\*1\\*(.*?)\\*11\\*(.*?)\\*(.*?)\\*53\\*(.*?)##";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(msg);
if (matcher.find()) {
    for (int i = 0; i < matcher.groupCount(); i++) {
        if (!matcher.group(i+1).equals("IGNORE")) {
            System.out.println(matcher.group(i+1));
        }
    }
}

DEMO

like image 86
Mariano Avatar answered May 13 '26 11:05

Mariano


You can use a tempered greedy token to make sure you do not get a match when IGNORE is in-between the 2nd and 3rd capture groups:

\\*1\\*(.*?)\\*11\\*(.*?)\\*(?:(?!IGNORE).)*\\*53\\*(.*?)##

See demo. In this case, the 3rd group cannot contain IGNORE.

The token is useful when you need to match the closest window between two subpatterns that does not contain some substring.

In case you just do not want the 3rd group to be equal to IGNORE, use a negative look-ahead:

\\*1\\*(.*?)\\*11\\*(.*?)\\*(?!IGNORE\\*)(.*?)\\*53\\*(.*?)##
                             ^^^^^^^^^^^^

See demo

like image 35
Wiktor Stribiżew Avatar answered May 13 '26 09:05

Wiktor Stribiżew



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!