I have a text and using this simple regex to split it in words: [ \n]. It splits the text into words using spaces and line-breaks.
I want to know if there is a way to keep the whitespace or the line-break in the splited word, because I will use this to a simple sentence detection after some processing.
I'm using the String#split method.
You can use lookbehind as @Piotr Findeisen suggested (+1):
public class RegexExample{
public static void main(String[] args) {
String s = "firstWordWithSpaceAfter secondWordWithSpaceAfter wordWithLineBreakAfter\nlastWord";
String sa[] = s.split("(?<=[ \\n])");
for (String saa : sa )
System.out.println("[" + saa + "]");
}
}
Output:
[firstWordWithSpaceAfter ]
[secondWordWithSpaceAfter ]
[wordWithLineBreakAfter
]
[lastWord]
Short explanation:
?<= is look behind, meaning you got a match if the data before the expression you are looking for is equal to the regex coming after ?<= (in this case [ \\n])
[ \\n] is regex that means one of the characters in the []
so the whole regex says split every time that the character before the expression / word is either space or \n.
Since we didn't try to match space or \n, it will not remove them.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With