Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java split string with '\r\n', '\r' or '\n' and keep it with preceding substring

Tags:

java

regex

My input string contains mixed type of line separators including '\r\n', '\r' or '\n'. I want to split the string and keep the line separator with the substring that precedes it. I followed two postings below

How to split a string, but also keep the delimiters?

Split Java String by New Line

and come up with something like:

String input = "1 dog \r\n 2 cat";
String[] output = input.split( "(?<=((\\r\\n)|\\r|\\n))")));

the output is ["1 dog\r", "\n", " 2 cat"], however the desired output is ["1 dog\r\n", " 2 cat"].

If I change the input to either String input = "1 dog \r 2 cat"; or String input = "1 dog \n 2 cat";, my code can produce desired output. Please advise. Thanks in advance.

like image 230
ascetic652 Avatar asked Sep 15 '25 00:09

ascetic652


1 Answers

You get your result ["1 dog\r", "\n", " 2 cat"] because your pattern uses an alternation which will match either (\r\n) or \r or \n.

When \r\n is encountered in the example string, the lookbehind assertion will be true after \r and will split for the first time.

Then the lookbehind assertion will be true after \n and will split for the second time.

What you might do is use \R in the positive lookbehind to assert what is on the left is a unicode newline sequence:

String input = "1 dog \r\n 2 cat";
String[] output = input.split("(?<=\\R)");

Java demo

Another option to fix your regex is to make it an atomic group:

(?<=(?>\\r\\n|\\r|\\n))

Java demo

Reading this post, when the \r is matched in the lookbehind using an atomic group, the following \n is also matched.

like image 174
The fourth bird Avatar answered Sep 17 '25 15:09

The fourth bird