According to the documentation the default definition of the ws method in a grammar is to match zero or more whitespace characters, as long as that point is not within a word:
regex ws { <!ww> \s* }
What is the difference between this definition and the following:
regex ws { \s+ }
I wonder why the zero width assertion <!ww> is used instead of the simpler \s+? I also note that the default definition allows to match zero white spaces, but when would that actually happen? Wouldn't it be more clear if it used \s+ instead of \s*?
In computer programming, whitespace is any character or series of characters that represent horizontal or vertical space in typography. When rendered, a whitespace character does not correspond to a visible mark, but typically does occupy an area on a page.
Previously a non-space character was defined as anything but. a space (U+0020). Now it is anything that is not a whitespace.
The ww assertion means that there are chars matching \w either side of the current point. The ! inverts it, meaning <!ww> matches:
\w character before the current position (such as between "+" and "a")\w character after the current position (such as between "a" and "+")Effectively, then, it means that whitespace can never be considered to occur between two word characters. However, between non-word characters, or between a word character and a non-word character, then there can be considered whitespace.
This follows what many languages we might wish to parse need. For example, consider ab+cd. The default ws will match either side of the +, but would not, for example, match within an identifier.
For languages where that isn't suitable, it's simply a matter of overriding the default ws for whatever that language needs.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With