For analyzing a log file, I need to extract exception types with python and regex.
The exception types always contain the substring "Exception".
The problem is that the substring "Exception" is not always at the end of their names.
Moreover, the exception types consist of an unknown number of dots.
Expected behaviour:
Input
"08-01-2021: There is a System.InvalidCalculationException - System reboots"
"09-01-2021: SuperSystem recognised a System.IO.WritingException ask user what to do next"
"10-01-2021: Oh no, not again an InternalException.NullReference.NonCritical.User we should fix it!"
Output
"System.InvalidCalculationException"
"System.IO.WritingException"
"InternalException.NullReference.NonCritical.User"
How does the regex need to look like?
I have tried it with "\w+[.]\w+[.]*Exception" for the exception types who are ending with "Exception".
But what if exception types contain even more dots and "Exception" is not at the end?
You can use
\b(?:[A-Za-z]+\.)*[A-Za-z]*Exception(?:\.[A-Za-z]+)*\b
\b(?:\w+\.)*\w*Exception(?:\.\w+)*\b
See the regex demo / regex demo #2. Details:
\b - a word boundary(?:[A-Za-z]+\.)* - zero or more occurrences of one or more letters followed with a dot[A-Za-z]* - zero or more lettersException - a string Exception(?:\.[A-Za-z]+)* - zero or more reptitions of a dot and then one or more letters.\b - a word boundary.The \w matches any letters, digits or underscore.
Python usage:
re.findall(r'\b(?:\w+\.)*\w*Exception(?:\.\w+)*\b', text)
How about:
[^\s]*Exception[^\s]*
(Demo)
The above ensures that your string contains the word "Exception" and includes anything before or after that is not a white space character.
[^\s]* Matches anything that is not (^) a white space (\s) 0 to unlimited times (*).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With