Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular Expression Nucleotide Search

Tags:

regex

I am trying to find a regular expression that will allow me to know if there is a dinucleotide(Two letters) that appears 2 times in a row in my sequence. I give you an example:

Let's suppose I have this sequence (The character ; is to make clear that I am talking about dinucleotides):

"AT;GC;TA;CC;AG;AG;CC;CA;TA;TA"

The result I expect is that it matches the pattern AGAG and TATA.

I have tried this already but it fails because it gives me any pair of dinucleotides, not the same pair :

([ATGC]{2}){2}
like image 288
Javier Biotech Avatar asked Sep 06 '25 03:09

Javier Biotech


1 Answers

You will need to use backreferences.

Start with matching one pair:

[ATGC]{2}

will match any pair of two of the four letters.

You need to put that in capturing parentheses and refer to the contents of the parentheses with \1, like so:

([ATGC]{2});\1
like image 108
Andy Lester Avatar answered Sep 07 '25 19:09

Andy Lester