I have got a dataframe that contains a text and result
Text Result
0 some text... True
1 another one... False
And I have got a function that does a feature extraction from text - returns dict with about 1000 keys that are words and T/F values depending if the word was in a text.
words = ["some", "text", "another", "one", "other", "words"]
def extract(text):
result = dict()
for w in words:
result[w] = (w in text)
return result
Result I am expecting is
Text some text another one other words Result
0 some text... True True False False False False True
1 another one... False False True True False False False
But I don't know how to apply this on a dataframe? What I have done so far is to create columns with default False value, but I have no clue how to populate it with True values.
for feature in words:
df[feature] = False
I guess that there is better way to do it in pandas?
Use pd.Series.str.get_dummies with pd.DataFrame.reindex
exp = (
df.Text.str.get_dummies(' ')
.reindex(columns=words, fill_value=0)
.astype(bool)
)
df.drop('Result', 1).join(exp).join(df.Result)
Text some text another one other words Result
0 some text True True False False False False True
1 another one False False True True False False False
Explanation
get_dummies gives dummy columns for each word found, simple enough. However, I use reindex in order to represent all the words we care about. The fill_value and astype(bool) are there to match OPs output. I use drop and join(df.Result) as a pithy way to get Result to the end of the dataframe.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With