I am trying to produce all the rows where company1 in df is contained in company2. I am doing it as follows:
df1=df[['company1','company2']][(df.apply(lambda x: x['company1'] in x['company2'], axis=1) == True)]
When I run the above line of code, it also shows "South" matched with "Southern". Also, "South" matched with "Route South". I want to get rid of all such cases. Company1 should only be contained in beginning of Company2. And, company1 should not be a part of some word in company2 like "South" (company1) matched with "Southern" (company2). How should I modify my code to accomplish above two requirements?
I think you need:
df = pd.DataFrame({'company1': {0: 'South', 1: 'South', 2:'South'},
'company2': {0: 'Southern', 1: 'Route South', 2: 'South Route'}})
print (df)
company1 company2
0 South Southern
1 South Route South
2 South South Route
df1=df[df['company2'].str.contains("|".join('^' + df['company1'] + ' '))]
print (df1)
company1 company2
2 South South Route
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With