I am trying to write code that prints out pairs of letters, one from the same place in each string, when the character in either one of the strings is "-" or "?".
ref_seq = "AGGTCATCAGGGAAA??TCTAGAACCC"
map_seq = "AGGTCTTCAAAAAAAGG---G"
#defining which sequence is longer/shorter
longest_seq = map_seq
shortest_seq = ref_seq
if len(ref_seq) > len(map_seq):
longest_seq == ref_seq
shortest_seq == map_seq
#adding on characters to shortest sequence to make sequences same length
x = len(longest_seq) - len(shortest_seq)
shortest_seq += ("$" * x)
#printing out sites with gaps or unknown bases
print "sites with gaps or unknown bases"
for i in range(len(longest_seq)):
if longest_seq[i] == "-" or "?":
print (i+1), longest_seq[i], shortest_seq[i]
elif shortest_seq[i] == "-" or "?":
print (i+1), longest_seq[i], shortest_seq[i]
My code is printing out all the sites, not just where the sites are either "?" or "-". Can someone explain how I can edit my code so that only the sites with "?" or "-" are printed?
The first block of code works fine, but I am including to explain the variables I am using, the problems start after #printing out sites with gaps or unknown bases.
I'm a beginner so an explanation would really help me improve. I think it may be something to do with the "if" elif" in my for loop, but I am not sure.
Your problem is simply lacking parantheses in your if command, together with misconception of the or operator, because or "?" is always True, no matter what's on the left handside.
To be clear, your command should look like
if (longest_seq[i] == "-") or (longest_seq[i] == "?"):
However, I think you can still make some improvements.
One of the most obvious things is: don't do for x in range(len(whatever)) in python. You have enumerate for this. Try it out - you'll love it.
Another useful thing is zip and the in operator, so your code would imo be better written like
ref_seq = "AGGTCATCAGGGAAA??TCTAGAACCC"
map_seq = "AGGTCTTCAAAAAAAGG---G"
for i, (r, m) in enumerate(zip(ref_seq, map_seq)):
if (r in "-?") or (m in "-?"):
print (i+1), r, m
Change the if statement:
for i in range(len(longest_seq)):
print(longest_seq[i])
if longest_seq[i] == "-" or longest_seq[i] == "?":
print ((i+1), longest_seq[i], shortest_seq[i])
elif shortest_seq[i] == "-" or shortest_seq[i] == "?":
print ((i+1), longest_seq[i], shortest_seq[i])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With