I am trying to find a regular expression that will allow me to know if there is a dinucleotide(Two letters) that appears 2 times in a row in my sequence. I give you an example:
Let's suppose I have this sequence (The character ; is to make clear that I am talking about dinucleotides):
"AT;GC;TA;CC;AG;AG;CC;CA;TA;TA"
The result I expect is that it matches the pattern AGAG and TATA.
I have tried this already but it fails because it gives me any pair of dinucleotides, not the same pair :
([ATGC]{2}){2}
You will need to use backreferences.
Start with matching one pair:
[ATGC]{2}
will match any pair of two of the four letters.
You need to put that in capturing parentheses and refer to the contents of the parentheses with \1, like so:
([ATGC]{2});\1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With