I have a function like this. I have a data frame of words, and I want to test if those words are in the English dictionary or not. If yes, return True, if not return False.
The dataframe looks like below words cat dog mark lillly
If I run through the function, I want to return a new col
words is_english
cat true
dog true
mark true
lillly false
my function looks like below:
from nltk.corpus import words as nltk_words
def is_english_word(word):
# creation of this dictionary would be done outside of
# the function because you only need to do it once.
dictionary = dict.fromkeys(nltk_words.words(), None)
try:
x = dictionary[word]
return True
except KeyError:
return False
call the function
df.apply(lambda x: is_english_word(df['words']), axis=1)
can anyone tell me where did I get it wrong? it couldn't return the result I wanted.
No need for a custom function nor apply (this is slow), you can use pandas.isin and a set of the words to speed up the process:
import pandas as pd
from nltk.corpus import words as nltk_words
words = set(nltk_words.words())
df = pd.DataFrame({'words': ['cat', 'dog', 'mark', 'lillly']})
df['is_english'] = df['words'].isin(words)
Output:
words is_english
0 cat True
1 dog True
2 mark True
3 lillly False
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With