Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Retrieving matching word count on a datacolumn using pandas in python

I have a df,

Name      Description
Ram Ram   is one of the good cricketer
Sri Sri   is one of the member
Kumar     Kumar is a keeper

and a list, my_list=["one","good","ravi","ball"]

I am trying to get the rows which are having atleast one keyword from my_list.

I tried,

  mask=df["Description"].str.contains("|".join(my_list),na=False)

I am getting the output_df,

Name    Description
Ram     Ram is one of ONe crickete
Sri     Sri is one of the member
Ravi    Ravi is a player, ravi is playing
Kumar   there is a BALL

I also want to add the keywords present in the "Description" and its counts in a separate columns,

My desired output is,

Name    Description                      pre-keys          keys     count
Ram     Ram is one of ONe crickete         one,good,ONe   one,good    2
Sri     Sri is one of the member           one            one         1
Ravi    Ravi is a player, ravi is playing  Ravi,ravi      ravi        1
Kumar   there is a BALL                    ball           ball        1
like image 517
Vicky Avatar asked Jan 27 '26 05:01

Vicky


1 Answers

Use str.findall + str.join + str.len:

extracted = df['Description'].str.findall('(' + '|'.join(my_list) + ')') 
df['keys'] = extracted.str.join(',')
df['count'] = extracted.str.len()
print (df)
  Name                       Description      keys  count
0  Ram  Ram is one of the good cricketer  one,good      2
1  Sri          Sri is one of the member       one      1

EDIT:

import re
my_list=["ONE","good"]

extracted = df['Description'].str.findall('(' + '|'.join(my_list) + ')', flags=re.IGNORECASE)
df['keys'] = extracted.str.join(',')
df['count'] = extracted.str.len()
print (df)
  Name                       Description      keys  count
0  Ram  Ram is one of the good cricketer  one,good      2
1  Sri          Sri is one of the member       one      1
like image 77
jezrael Avatar answered Jan 29 '26 18:01

jezrael