Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

string "cross correlation" in matlab

Assume that I have 2 strings of characters:

AACCCGGAAATTTGGAATTTTCCCCAAATACG

CGATGATCGATGAATTTTAGCGGATACGATTC

I want to find by how much I should move the second string such that it matches the first one the most.

There are 2 cases. The first one is that we assume that the string are wrapped around, and the second one is that we don't.

Is there a matlab function that does returns either a N array or 2N+1 array of values for how much the shifted string 2 correlates with string 1?

If not, is there a faster/simpler method than something like

result = zeroes(length, 1)
for i = 0:length-1
    result(i+1) = sum (str1 == circshift(str2, i));
end
like image 963
user e to the power of 2pi Avatar asked Dec 08 '25 23:12

user e to the power of 2pi


1 Answers

You can convert each char into a binary column of size 4:

A -> [1;0;0;0]
C -> [0;1;0;0]
G -> [0;0;1;0]
T -> [0;0;0;1]

As a result a string of length n becomes a binary matrix of size 4-by-n.

You can now cross-correlate (along X axis only) the two n-by-4 and m-by-4 to get your result.

like image 188
Shai Avatar answered Dec 10 '25 17:12

Shai



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!