Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Part of regular expression should not be matching . or , in the end

Tags:

regex

text

php

I want to extract value amounts from documents which are present in textual form using regular expressions. Very often the text starts with some blanks, some comma, semikolon and than a value, e.g.

   -.          .-       12.345

or

                        123.45

or (and here comes the problem - A figure starting with just a . meaning 0.45

                        .45

I use the following regular expression to match the whitespaces

(?<seperator>(?:[^\S]|[.:,-\/—;_])*)

and the following to match the value amounts:

(?<value>((([+|-|.]*(\$|\%|(C\$)|(\€))*(?:\d+[.,']*\d*[.,']{0,1})+[?:mkC\%\€\$\£]*([?:mkC\%\€\$\£]|C\$)*))?))

By combining both, I can extract the seperator as well as the value seperately. How can I create a regular expression, so that the seperator regex does not accept a . or , in the end and that the . or , can be added to the value matching group by accepting an optional ([.,]?) in the beginning.

I posted the regular expression for evaluation here. I am using the regular expressions from Java and this works just fine. https://regex101.com/r/eF5bW3/3

I had a look at lookbehind but it didn't seem to be working for me. The value should be .45 and not 45:

Value should be .45 and not 45

like image 425
Marc Giombetti Avatar asked Jan 19 '26 02:01

Marc Giombetti


1 Answers

Try the following regex. It does what you want to do:

[\s,.-]*(?<!\.)((?:\d+(?:\.\d*)?)|(?:\.\d+))

DEMO

by the way you mentioned comma in your question but it isn't there in the examples shown but I included it in answer, anyway the idea here is use negative lookbehind to make sure there that the * doesn't consume the . before the number you want to match. If you goal is to match the only number in every line, use a simpler regex as follows, It also matches what you want:

(\d*?.?\d+)

SIMPLER REGEX DEMO

EDIT 1

To handle cases like ..45 as you mentioned , you can use the following regex:

[\s,.-]*(\d*(?=\.)(?:(?:\d+(?:\.\d*)?)|(?:\.\d+)))

EDIT 1 DEMO

like image 89
Pruthvi Raj Avatar answered Jan 21 '26 17:01

Pruthvi Raj



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!