Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Executing a function that adds columns and populates them dependig on other columns in Pandas

Tags:

python

pandas

I have got a dataframe that contains a text and result

             Text    Result
0  some text...      True
1  another one...    False

And I have got a function that does a feature extraction from text - returns dict with about 1000 keys that are words and T/F values depending if the word was in a text.

words = ["some", "text", "another", "one", "other", "words"]
def extract(text):
      result = dict()
      for w in words:
             result[w] = (w in text)
      return result

Result I am expecting is

             Text    some   text  another one    other  words  Result
0  some text...      True   True  False   False  False  False  True
1  another one...    False  False True    True   False  False  False

But I don't know how to apply this on a dataframe? What I have done so far is to create columns with default False value, but I have no clue how to populate it with True values.

for feature in words:
    df[feature] = False

I guess that there is better way to do it in pandas?

like image 213
Ala Głowacka Avatar asked Nov 20 '25 20:11

Ala Głowacka


1 Answers

Use pd.Series.str.get_dummies with pd.DataFrame.reindex

exp = (
    df.Text.str.get_dummies(' ')
      .reindex(columns=words, fill_value=0)
      .astype(bool)
)

df.drop('Result', 1).join(exp).join(df.Result)

          Text   some   text  another    one  other  words  Result
0    some text   True   True    False  False  False  False    True
1  another one  False  False     True   True  False  False   False

Explanation

get_dummies gives dummy columns for each word found, simple enough. However, I use reindex in order to represent all the words we care about. The fill_value and astype(bool) are there to match OPs output. I use drop and join(df.Result) as a pithy way to get Result to the end of the dataframe.

like image 103
piRSquared Avatar answered Nov 23 '25 10:11

piRSquared