Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java RegEx and linebreaks - bug or expected behavior?

Tags:

java

regex

I'm trying to interpret a multiline string with RegEx and just found that matching will fail if the string contains newline characters. I'm NOT using using MULTILINE mode, because I'm not using anchors. According to API docs:

In multiline mode the expressions ^ and $ match just after or just before, respectively, a line terminator or the end of the input sequence. By default these expressions only match at the beginning and the end of the entire input sequence.

In short: it clearly says that this flag only changes how anchors work and says nothing like "when your string contains a newline you should definitely use this".

public static void main(String[] args) {
    Pattern p = Pattern.compile(".*");

    Matcher m1 = p.matcher("Hello");
    System.out.println("m1: " + m1.matches());    // true

    Matcher m2 = p.matcher("Hello\r\n");
    System.out.println("m2: " + m2.matches());    // false
}

So is this really a bug, or I just missed some docs? Or JAVA uses a dialect of RegEx where my pattern fails? I'm using jdk1.6.0_21.

like image 806
vbence Avatar asked Dec 07 '25 02:12

vbence


1 Answers

From the Pattern docs:

The regular expression . matches any character except a line terminator unless the DOTALL flag is specified.

So you need to specify the DOTALL flag if you want m2.matches() to be true.

like image 51
Philip Avatar answered Dec 08 '25 14:12

Philip



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!