Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

I have a df of words and I want to know if they are in English dictionary

I have a function like this. I have a data frame of words, and I want to test if those words are in the English dictionary or not. If yes, return True, if not return False.

The dataframe looks like below words cat dog mark lillly

If I run through the function, I want to return a new col

words  is_english
cat    true
dog    true
mark   true
lillly false 

my function looks like below:

from nltk.corpus import words as nltk_words
def is_english_word(word):
    # creation of this dictionary would be done outside of 
    #     the function because you only need to do it once.
    dictionary = dict.fromkeys(nltk_words.words(), None)
    try:
        x = dictionary[word]
        return True
    except KeyError:
        return False

call the function

df.apply(lambda x: is_english_word(df['words']), axis=1)

can anyone tell me where did I get it wrong? it couldn't return the result I wanted.

like image 915
LearningCode Avatar asked Jan 26 '26 20:01

LearningCode


1 Answers

No need for a custom function nor apply (this is slow), you can use pandas.isin and a set of the words to speed up the process:

import pandas as pd
from nltk.corpus import words as nltk_words
words = set(nltk_words.words())

df = pd.DataFrame({'words': ['cat', 'dog', 'mark', 'lillly']})

df['is_english'] = df['words'].isin(words)

Output:

    words  is_english
0     cat        True
1     dog        True
2    mark        True
3  lillly       False
like image 79
mozway Avatar answered Jan 29 '26 09:01

mozway



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!