Assume that I have 2 strings of characters:
AACCCGGAAATTTGGAATTTTCCCCAAATACG
CGATGATCGATGAATTTTAGCGGATACGATTC
I want to find by how much I should move the second string such that it matches the first one the most.
There are 2 cases. The first one is that we assume that the string are wrapped around, and the second one is that we don't.
Is there a matlab function that does returns either a N array or 2N+1 array of values for how much the shifted string 2 correlates with string 1?
If not, is there a faster/simpler method than something like
result = zeroes(length, 1)
for i = 0:length-1
result(i+1) = sum (str1 == circshift(str2, i));
end
You can convert each char into a binary column of size 4:
A -> [1;0;0;0]
C -> [0;1;0;0]
G -> [0;0;1;0]
T -> [0;0;0;1]
As a result a string of length n becomes a binary matrix of size 4-by-n.
You can now cross-correlate (along X axis only) the two n-by-4 and m-by-4 to get your result.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With