Increase the counter as a dictionary value within a loop

Question

I have a list of a few hundred amino acid sequences called aa_seq, it looks like this: ['AFYIVHPMFSELINFQNEGHECQCQCG', 'KVHSLPGMSDNGSPAVLPKTEFNKYKI', 'RAQVEDLMSLSPHVENASIPKGSTPIP', 'TSTNNYPMVQEQAILSCIEQTMVADAK',...]. Each sequence is 27 letters long. I have to determine the most used amino acid for every position (1-27) and at what frequency that is.

So far I have:

   count_dict = {} 
   counter = count_dict.values()
   aa_list = ['A', 'C', 'D', 'E' ,'F' ,'G' ,'H' ,'I' ,'K' ,'L' ,    #one-letter code for amino acids
       'M' ,'N' ,'P' ,'Q' ,'R' ,'S' ,'T' ,'V' ,'W' ,'Y']
   for p in range(0,26):                       #first round:looks at the first position in each sequence
        for s in range(0,len(aa_seq)):          #goes through all sequences of the list 
             for item in aa_list:                #and checks for the occurrence of each amino acid letter (=item)
                  if item in aa_seq[s][p]:
                      count_dict[item]            #if that letter occurs at the respective position, make it a key in the dictionary
                      counter += 1                #and increase its counter (the value, as definded above) by one 
    print count_dict

It says KeyError: 'A', and it's pointing to the line count_dict[item]. So the item of the aa_list apparently can't be added as a key this way..? How do I do that? And it also gave an error "'int' object is not iterable" concerning the counter. How else can the counter be increased?

greole · Accepted Answer

You could use the Counter class

>>> from collections import Counter

>>> l = ['AFYIVHPMFSELINFQNEGHECQCQCG', 'KVHSLPGMSDNGSPAVLPKTEFNKYKI', 'RAQVEDLMSLSPHVENASIPKGSTPIP', 'TSTNNYPMVQEQAILSCIEQTMVADAK']
>>> s = [Counter([l[j][i] for j in range(len(l))]).most_common()[0] for i in range(27)]
>>> s
[('A', 1),
 ('A', 1),
 ('Y', 1),
 ('I', 1),
 ('N', 1),
 ('Y', 1),
 ('P', 2),
 ('M', 4),
 ('S', 2),
 ('Q', 1),
 ('E', 2),
 ('Q', 1),
 ('I', 1),
 ('I', 1),
 ('A', 1),
 ('Q', 1),
 ('A', 1),
 ('I', 1),
 ('I', 1),
 ('Q', 1),
 ('E', 2),
 ('C', 1),
 ('Q', 1),
 ('A', 1),
 ('Q', 1),
 ('I', 1),
 ('I', 1)]

However i might be way to inefficient if you have large data sets.

Eric Duminil · Answer

Modified code

Here's a modified, working version of your code. It's not efficient but it should output the correct result.

A few notes :

You need one counter for each index. So you should initialize your dict inside the first loop.
range(0,26) only has 26 elements : from 0 to 25 (inclusive).
defaultdict helps you define 0 for each start value.
you need to increment the counter with count_dict[item] += 1
At the end of each loop, you need to find the key (amino acid) with the highest value (occurences).

from collections import defaultdict

aa_seq = ['AFYIVHPMFSELINFQNEGHECQCQCG', 'KVHSLPGMSDNGSPAVLPKTEFNKYKI',
          'RAQVEDLMSLSPHVENASIPKGSTPIP', 'TSTNNYPMVQEQAILSCIEQTMVADAK']
aa_list = ['A', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'K', 'L',  # one-letter code for amino acids
           'M', 'N', 'P', 'Q', 'R', 'S', 'T', 'V', 'W', 'Y']

for p in range(27):                  # first round:looks at the first position in each sequence
    count_dict = defaultdict(int)    # initialize counter with 0 as default value
    for s in range(0, len(aa_seq)):  # goes through all sequences of the list
        # and checks for the occurrence of each amino acid letter (=item)
        for item in aa_list:
            if item in aa_seq[s][p]:
                # if that letter occurs at the respective position, make it a
                # key in the dictionary
                count_dict[item] += 1
    print(max(count_dict.items(), key=lambda x: x[1]))

It outputs :

('R', 1)
('S', 1)
('Y', 1)
('S', 1)
('E', 1)
('P', 1)
('P', 2)
('M', 4)
...

Alternative with Counter

You don't need that many loops, you just need to iterate once over every character of every sequence.

Also, there's no need to reinvent the wheel: Counter and most_common are better alternatives than defaultdict and max.

from collections import Counter

aa_seqs = ['AFYIVHPMFSELINFQNEGHECQCQCG', 'KVHSLPGMSDNGSPAVLPKTEFNKYKI', 'RAQVEDLMSLSPHVENASIPKGSTPIP', 'TSTNNYPMVQEQAILSCIEQTMVADAK']

counters = [Counter() for i in range(27)]

for aa_seq in aa_seqs:
    for (i, aa) in enumerate(aa_seq):
        counters[i][aa] += 1

most_commons = [counter.most_common()[0] for counter in counters]
print(most_commons)

It outputs :

[('K', 1), ('A', 1), ('Y', 1), ('N', 1), ('N', 1), ('Y', 1), ('P', 2), ('M', 4), ('S', 2), ('Q', 1), ('E', 2), ('G', 1), ('H', 1), ('N', 1), ('L', 1), ('N', 1), ('N', 1), ('I', 1), ('G', 1), ('H', 1), ('E', 2), ('G', 1), ('N', 1), ('K', 1), ('Y', 1), ('K', 1), ('G', 1)]

Increase the counter as a dictionary value within a loop

Tags:

python

loops

ccaarroo

2 Answers

greole

Modified code

Alternative with Counter

Eric Duminil

Recent Activity

Donate For Us

Increase the counter as a dictionary value within a loop

Tags:

python

loops

ccaarroo

2 Answers

greole

Modified code

Alternative with Counter

Eric Duminil

Related questions

Recent Activity

Donate For Us