Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using Biopython to Retrieve Isoform Sequences of a Swissprot Entry?

If I have a protein with isoforms, and I'd like to retrieve the sequence of each one, how might I go about doing this?

from Bio import ExPASy
from Bio import SwissProt

accessions = ["Q16620"]

handle = ExPASy.get_sprot_raw(accessions)
record = SwissProt.read(handle)

This example from the biopython tutorials will retrieve the sequence of the first isoform with record.sequence.

I've found simply making a list of accessions to iterate through in the form of the isoform entries listed on uniprot["Q16620-1", "Q16620-2", "Q16620-3", ...] does not work.

like image 563
Estif Avatar asked Sep 07 '25 15:09

Estif


1 Answers

You could use the Proteins API of EBML-EBI and a few lines of Python code.

This will give you only the sequence as a string, not as a fully fledged BioPython object.

import requests
import xml.etree.ElementTree as ET

accession = "Q16620"

# a dictionary storing the sequence of your isoforms, key: accesion number, value: sequence
isoforms = dict()

# make a call to EBI API
r = requests.get('https://www.ebi.ac.uk/proteins/api/proteins/{}/isoforms'.format(accession))

# parse the returned XML
uniprot = ET.fromstring(r.text)

for isoform in uniprot:
    # get the sequence
    seq = isoform.find('{http://uniprot.org/uniprot}sequence')

    # get the accession number
    iso_accession = isoform.find('{http://uniprot.org/uniprot}accession')

    # add the values to the dictionary
    if seq.text and iso_accession.text:
        isoforms[iso_accession.text] = seq.text
like image 85
Maximilian Peters Avatar answered Sep 09 '25 13:09

Maximilian Peters