String Containment in Pandas

Question

I am trying to produce all the rows where company1 in df is contained in company2. I am doing it as follows:

df1=df[['company1','company2']][(df.apply(lambda x: x['company1'] in x['company2'], axis=1) == True)]

When I run the above line of code, it also shows "South" matched with "Southern". Also, "South" matched with "Route South". I want to get rid of all such cases. Company1 should only be contained in beginning of Company2. And, company1 should not be a part of some word in company2 like "South" (company1) matched with "Southern" (company2). How should I modify my code to accomplish above two requirements?

jezrael · Accepted Answer

I think you need:

df = pd.DataFrame({'company1': {0: 'South', 1: 'South', 2:'South'}, 
                   'company2': {0: 'Southern', 1: 'Route South', 2: 'South Route'}})

print (df)
  company1     company2
0    South     Southern
1    South  Route South
2    South  South Route

df1=df[df['company2'].str.contains("|".join('^' + df['company1'] + ' '))]
print (df1)
  company1     company2
2    South  South Route

String Containment in Pandas

Tags:

python

string

pandas

ComplexData

1 Answers

jezrael

Recent Activity

Donate For Us

String Containment in Pandas

Tags:

python

string

pandas

ComplexData

1 Answers

jezrael

Related questions

Recent Activity

Donate For Us