I am learning regex but have not been able to find the right regex in python for selecting characters that start with a particular alphabet.
Example below
text='this is a test'
match=re.findall('(?!t)\w*',text)
# match returns
['his', '', 'is', '', 'a', '', 'est', '']
match=re.findall('[^t]\w+',text)
# match
['his', ' is', ' a', ' test']
Expected : ['is','a']
First, to negate a character class, you put the ^ inside the brackets, not before them. ^[0-9] means "any digit, at the start of the string"; [^0-9] means "anything except a digit". Second, [^0-9] will match anything that isn't a digit, not just letters and underscores.
The caret ^ and dollar $ characters have special meaning in a regexp. They are called “anchors”. The caret ^ matches at the beginning of the text, and the dollar $ – at the end.
To match any character except a list of excluded characters, put the excluded charaters between [^ and ] . The caret ^ must immediately follow the [ or else it stands for just itself. The character '.
How do you match letters in regex? To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" .
Use the negative set [^\Wt] to match any alphanumeric character that is not t. To avoid matching subsets of words, add the word boundary metacharacter, \b, at the beginning of your pattern.
Also, do not forget that you should use raw strings for regex patterns.
import re
text = 'this is a test'
match = re.findall(r'\b[^\Wt]\w*', text)
print(match) # prints: ['is', 'a']
See the demo here.
Note that this is also achievable without regex.
text = 'this is a test'
match = [word for word in text.split() if not word.startswith('t')]
print(match) # prints: ['is', 'a']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With