Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Integer comparison in python slows everything down to a crawl

The following code is causing me enormous headaches:

def extract_by_letters(letters, dictionary):

    for word in dictionary:
       for letter in letters:
           if word.count(letter) != letters.count(letter):
               if word in dictionary: #I cant leave this line out
                   dictionary.remove(word)

    return dictionary

First of all: the 'if word in dictionary' line. Why can't I leave this out? I get an error saying ValueError: list.remove(x): x not in list

But it is, obviously.

Second: Dictionary is a file of about 50,000 words separated by linebreaks. The above code takes about 2 minutes to run... Wayyy too long. I played with the code for a bit, and I found that the line:

if word.count(letter) != letters.count(letter):

seems to be causing all my problems. If I take that line out (which totally screws up the output), the function takes about 2 seconds to go through the dictionary.

I thought it was the count functions, but it's not.

if I change the if statement to something like:

print word.count(letter) 
print letters.count(letter)

the function takes about 3 seconds to run.

I'm convinced it's the comparison. Any other suggestions? Is there a better way to do this?

Thanks in advance!

like image 894
Dane Larsen Avatar asked Dec 01 '25 01:12

Dane Larsen


2 Answers

The reason you get the exception is that if the letter count matches for more than one letter, you are trying to remove the word more than once

The reason it is so slow is that you have loops inside loops inside loops.

If you would write a sentence or two about what the function is supposed to do, it would be much easier to refactor it. In the mean time, this would stop you checking whether a word needs to be removed once you have already removed it.

def extract_by_letters(letters, dictionary):
    for word in dictionary[:]:  # bad idea to change this while you iterate over it
        for letter in letters:
            if word.count(letter) != letters.count(letter):
                dictionary.remove(word)
                break
    return dictionary

If dictionary is a set you should get some speed up. If dictionary is a list this should give a huge speedup

like image 138
John La Rooy Avatar answered Dec 03 '25 13:12

John La Rooy


Try building the output instead of deleting from it:

def extract_by_letters(letters, dictionary):
    d = []
    for word in dictionary:
       for letter in letters:
           if word.count(letter)>0:
               d.append(word)
               break
    return d

Or, you could use regular expressions:

import re
def extract_by_letters(letters, dictionary):
    regex = re.compile('['+letters+']')
    d=[]
    for word in dictionary:
       if regex.search(word):
           d.append(word)
    return d

Or, perhaps the simplest way:

import re
def extract_by_letters(letters, dictionary):
    regex = re.compile('['+letters+']')
    return [word for word in dictionary if regex.search(word)]

This last one takes no noticeable time to scan /usr/share/dict/words on my Mac, which is a list of 234936 words.

like image 41
Andrew McGregor Avatar answered Dec 03 '25 15:12

Andrew McGregor



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!