text = "This is a TEXT CONTAINING UPPER CASE WORDS and lower case words. This is a SECOND SENTENCE."
pattern = '[A-Z]+[A-Z]+[A-Z]*[\s]+'
re.findall(pattern, text) gives an output -->
['TEXT ', 'CONTAINING ', 'UPPER ', 'CASE ', 'WORDS ', 'SECOND ', 'SENTENCE ']
However, I want an output something like this -->
['TEXT CONTAINING UPPER CASE WORDS', 'SECOND SENTENCE']
You may use this regex:
\b[A-Z]+(?:\s+[A-Z]+)*\b
RegEx Demo
RegEx Details:
\b: Word boundary[A-Z]+: Match a word comprising only uppercase letters(?:\s+[A-Z]+)*: Match 1+ whitespace followed by another word with uppercase letters. Match this group 0 or more times\b: Word boundaryCode:
>>> s = 'This is a TEXT CONTAINING UPPER CASE WORDS and lower case words. This is a SECOND SENTENCE';
>>> print (re.findall(r'\b[A-Z]+(?:\s+[A-Z]+)*\b', s))
['TEXT CONTAINING UPPER CASE WORDS', 'SECOND SENTENCE']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With