p = re.compile("[AG].{2}[ATG|ATA|AAG].{1}G")
regex_result = p.search('ZZZAXXATGXGZZZ')
regex_result.group()
'AXXATG'
I was expecting AXXATGXG instead.
Use a grouping construct (...) rather than a character class [...] around the alternatives:
p = re.compile("[AG].{2}(?:ATG|ATA|AAG).G")
^^^^^^^^^^^^^^^
The (?:ATG|ATA|AAG) matches 3 sequences: either a ATG, or ATA or AAG. The [ATG|ATA|AAG] character class matches 1 char, either A, T, G or |.
Note the {1} is redundant and can be removed.
Python:
import re
p = re.compile("[AG].{2}(?:ATG|ATA|AAG).G")
regex_result = p.search('ZZZAXXATGXGZZZ')
print(regex_result.group())
# => AXXATGXG
See IDEONE demo
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With