I want to extract value amounts from documents which are present in textual form using regular expressions. Very often the text starts with some blanks, some comma, semikolon and than a value, e.g.
-. .- 12.345
or
123.45
or (and here comes the problem - A figure starting with just a . meaning 0.45
.45
I use the following regular expression to match the whitespaces
(?<seperator>(?:[^\S]|[.:,-\/—;_])*)
and the following to match the value amounts:
(?<value>((([+|-|.]*(\$|\%|(C\$)|(\€))*(?:\d+[.,']*\d*[.,']{0,1})+[?:mkC\%\€\$\£]*([?:mkC\%\€\$\£]|C\$)*))?))
By combining both, I can extract the seperator as well as the value seperately. How can I create a regular expression, so that the seperator regex does not accept a . or , in the end and that the . or , can be added to the value matching group by accepting an optional ([.,]?) in the beginning.
I posted the regular expression for evaluation here. I am using the regular expressions from Java and this works just fine. https://regex101.com/r/eF5bW3/3
I had a look at lookbehind but it didn't seem to be working for me. The value should be .45 and not 45:

Try the following regex. It does what you want to do:
[\s,.-]*(?<!\.)((?:\d+(?:\.\d*)?)|(?:\.\d+))
DEMO
by the way you mentioned comma in your question but it isn't there in the examples shown but I included it in answer, anyway the idea here is use negative lookbehind to make sure there that the * doesn't consume the . before the number you want to match. If you goal is to match the only number in every line, use a simpler regex as follows, It also matches what you want:
(\d*?.?\d+)
SIMPLER REGEX DEMO
EDIT 1
To handle cases like ..45 as you mentioned , you can use the following regex:
[\s,.-]*(\d*(?=\.)(?:(?:\d+(?:\.\d*)?)|(?:\.\d+)))
EDIT 1 DEMO
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With