Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

printing specific characters that differ at same spot in two strings

I am trying to write code that prints out pairs of letters, one from the same place in each string, when the character in either one of the strings is "-" or "?".

ref_seq = "AGGTCATCAGGGAAA??TCTAGAACCC"
map_seq = "AGGTCTTCAAAAAAAGG---G"

#defining which sequence is longer/shorter
longest_seq = map_seq 
shortest_seq = ref_seq
if len(ref_seq) > len(map_seq):
    longest_seq == ref_seq
    shortest_seq == map_seq
#adding on characters to shortest sequence to make sequences same length
x = len(longest_seq) - len(shortest_seq)
shortest_seq += ("$" * x)

#printing out sites with gaps or unknown bases
print "sites with gaps or unknown bases"
for i in range(len(longest_seq)):   
    if longest_seq[i] == "-" or "?":
        print (i+1), longest_seq[i], shortest_seq[i]
    elif shortest_seq[i] == "-" or "?":
        print (i+1), longest_seq[i], shortest_seq[i]

My code is printing out all the sites, not just where the sites are either "?" or "-". Can someone explain how I can edit my code so that only the sites with "?" or "-" are printed?

The first block of code works fine, but I am including to explain the variables I am using, the problems start after #printing out sites with gaps or unknown bases.

I'm a beginner so an explanation would really help me improve. I think it may be something to do with the "if" elif" in my for loop, but I am not sure.

like image 397
wilberox Avatar asked Nov 23 '25 03:11

wilberox


2 Answers

Your problem is simply lacking parantheses in your if command, together with misconception of the or operator, because or "?" is always True, no matter what's on the left handside.

To be clear, your command should look like

if (longest_seq[i] == "-") or (longest_seq[i] == "?"):

However, I think you can still make some improvements.

One of the most obvious things is: don't do for x in range(len(whatever)) in python. You have enumerate for this. Try it out - you'll love it.
Another useful thing is zip and the in operator, so your code would imo be better written like

ref_seq = "AGGTCATCAGGGAAA??TCTAGAACCC"
map_seq = "AGGTCTTCAAAAAAAGG---G"

for i, (r, m) in enumerate(zip(ref_seq, map_seq)):   
    if (r in "-?") or (m in "-?"):
        print (i+1), r, m
like image 148
SpghttCd Avatar answered Nov 24 '25 19:11

SpghttCd


Change the if statement:

for i in range(len(longest_seq)):
    print(longest_seq[i])
    if longest_seq[i] == "-" or longest_seq[i] == "?":
        print ((i+1), longest_seq[i], shortest_seq[i])
    elif shortest_seq[i] == "-" or shortest_seq[i] == "?":
        print ((i+1), longest_seq[i], shortest_seq[i])
like image 21
Aniket Bote Avatar answered Nov 24 '25 19:11

Aniket Bote



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!