Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find the nth most common word and count in python

I am a undergraduate student who is new here and loves programming. I meet a problem in practice and I want to ask for help here.

Given a string an integer n, return the nth most common word and it's count, ignore capitalization.

For the word, make sure all the letters are lowercase when you return it!

Hint: The split() function and dictionaries may be useful.

Example:

Input: "apple apple apple blue BlUe call", 2

Output: The list ["blue", 2]

My code is in the following:

from collections import Counter
def nth_most(str_in, n):
    split_it = str_in.split(" ")
    array = []
    for word, count in Counter(split_it).most_common(n):
        list = [word, count]
        array.append(count)
        array.sort()
        if len(array) - n <= len(array) - 1:
            c = array[len(array) - n]
            return [word, c]

The test result is like in the following:

Traceback (most recent call last):
  File "/grade/run/test.py", line 10, in test_one
    self.assertEqual(nth_most('apple apple apple blue blue call', 3), ['call', 1])
  File "/grade/run/bin/nth_most.py", line 10, in nth_most
    c = array[len(array) - n]
IndexError: list index out of range

As well as

Traceback (most recent call last):
  File "/grade/run/test.py", line 20, in test_negative
    self.assertEqual(nth_most('awe Awe AWE BLUE BLUE call', 1), ['awe', 3])
AssertionError: Lists differ: ['BLUE', 2] != ['awe', 3]

First differing element 0:
'BLUE'
'awe'

I don't know what's wrong with my code.

Thank you very much for your help!

like image 591
Larry Chen Avatar asked Dec 30 '25 04:12

Larry Chen


2 Answers

Since you're using Counter, just use it wisely:

import collections

def nth_most(str_in, n):
    c = sorted(collections.Counter(w.lower() for w in str_in.split()).items(),key = lambda x:x[1])
    return(list(c[-n])) # convert to list as it seems to be the expected output

print(nth_most("apple apple apple blue BlUe call",2)) 

Build the word frequency dictionary, sort items according to values (2nd element of the tuple) and pick the nth last element.

This prints ['blue', 2].

What if there are 2 words with same frequency (tie) in first or second position ? This solution doesn't work. Instead, sort the number of occurrences, extract the nth most common occurrence, and run through the counter dict again to extract matches.

def nth_most(str_in, n):
    c = collections.Counter(w.lower() for w in str_in.split())
    nth_occs = sorted(c.values())[-n]
    return [[k,v] for k,v in c.items() if v==nth_occs]

print(nth_most("apple apple apple call blue BlUe call woot",2))

this time it prints:

[['call', 2], ['blue', 2]]
like image 112
Jean-François Fabre Avatar answered Dec 31 '25 19:12

Jean-François Fabre


Counter return most commune elements in order so you can do like:

list(Counter(str_in.lower().split()).most_common(n)[-1]) # n is nth most common word
like image 23
kederrac Avatar answered Dec 31 '25 17:12

kederrac