Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Directly calling SeqIO.parse() in for loop works, but using it separately beforehand doesn't? Why?

In python this code, where I directly call the function SeqIO.parse() , runs fine:

from Bio import SeqIO
a = SeqIO.parse("a.fasta", "fasta")
records = list(a)

for asq in SeqIO.parse("a.fasta", "fasta"):
    print("Q")

But this, where I first store the output of SeqIO.parse() in a variable(?) called a, and then try to use it in my loop, it doesn't run:

from Bio import SeqIO
a = SeqIO.parse("a.fasta", "fasta")
records = list(a)

for asq in a:
    print("Q")

Is this because a the output from the function || SeqIO.parse("a.fasta", "fasta") || is being stored in 'a' differently from when I directly call it? What exactly is the identity of 'a' here. Is it a variable? Is it an object? What does the function actually return?

like image 204
Abraham Ahmad Avatar asked Dec 14 '25 04:12

Abraham Ahmad


1 Answers

SeqIO.parse() returns a normal python generator. This part of the Biopython module is written in pure python:

>>> from Bio import SeqIO
>>> a = SeqIO.parse("a.fasta", "fasta")
>>> type(a)
<class 'generator'>

Once a generator is iterated over it is exhausted as you discovered. You can't rewind a generator but you can store the contents in a list or dict if you don't mind putting it all in memory (useful if you need random access). You can use SeqIO.to_dict(a) to store in a dictionary with the record ids as the keys and sequences as the values. Simply re-building the generator calling SeqIO.parse() again will avoid dumping the file contents into memory of course.

like image 63
Chris_Rands Avatar answered Dec 15 '25 18:12

Chris_Rands



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!