I am trying to create genetic signatures. I have a textfile full of DNA sequences. I want to read in each line from the text file. Then add 4mers which are 4 bases into a dictionary. For example: Sample sequence
ATGATATATCTATCAT
What I want to add is ATGA, TGAT, GATA, etc.. into a dictionary with ID's that just increment by 1 while adding the 4mers.
So the dictionary will hold...
Genetic signatures, ID
ATGA,1
TGAT, 2
GATA,3
Here is what I have so far...
import sys
def main ():
readingFile = open("signatures.txt", "r")
my_DNA=""
DNAseq = {} #creates dictionary
for char in readingFile:
my_DNA = my_DNA+char
for char in my_DNA:
index = 0
DnaID=1
seq = my_DNA[index:index+4]
if (DNAseq.has_key(seq)): #checks if the key is in the dictionary
index= index +1
else :
DNAseq[seq] = DnaID
index = index+1
DnaID= DnaID+1
readingFile.close()
if __name__ == '__main__':
main()
Here is my output:
ACTC
ACTC
ACTC
ACTC
ACTC
ACTC
This output suggests that it is not iterating through each character in string... please help!
You need to move your index
and DnaID
declarations before the loop, otherwise they will be reset every loop iteration:
index = 0
DnaID=1
for char in my_DNA:
#... rest of loop here
Once you make that change you will have this output:
ATGA 1
TGAT 2
GATA 3
ATAT 4
TATA 5
ATAT 6
TATC 6
ATCT 7
TCTA 8
CTAT 9
TATC 10
ATCA 10
TCAT 11
CAT 12
AT 13
T 14
In order to avoid the last 3 items which are not the correct length you can modify your loop:
for i in range(len(my_DNA)-3):
#... rest of loop here
This doesn't loop through the last 3 characters, making the output:
ATGA 1
TGAT 2
GATA 3
ATAT 4
TATA 5
ATAT 6
TATC 6
ATCT 7
TCTA 8
CTAT 9
TATC 10
ATCA 10
TCAT 11
This should give you the desired effect.
from collections import defaultdict
readingFile = open("signatures.txt", "r").read()
DNAseq = defaultdict(int)
window = 4
for i in xrange(len(readingFile)):
current_4mer = readingFile[i:i+window]
if len(current_4mer) == window:
DNAseq[current_4mer] += 1
print DNAseq
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With