If one row in two columns contain the same string python pandas

Question

I have a dataframe looking like this:

    id      k1        k2         same
    1    re_setup    oo_setup   true
    2    oo_setup    oo_setup   true
    3    alerting    bounce     false
    4    bounce      re_oversetup   false
    5    re_oversetup    alerting   false
    6    alerting_s  re_setup   false
    7    re_oversetup    oo_setup   true
    8    alerting    bounce     false

So, I need to classified rows where string 'setup' is contained or not.

And simple output would be:
    id      k1        k2         same
    1    re_setup    oo_setup   true
    2    oo_setup    oo_setup   true
    3    alerting    bounce     false
    4    bounce      re_setup   false
    5    re_setup    alerting   false
    6    alerting_s  re_setup   false
    7    re_setup    oo_setup   true
    8    alerting    bounce     false

I've tried something with this, but as I expact, I have error with selecting multiple columns.

data['same'] = data[data['k1', 'k2'].str.contains('setup')==True]

jezrael · Accepted Answer

I think you need apply with str.contains, because it working only with Series (one column):

print (data[['k1', 'k2']].apply(lambda x: x.str.contains('setup')))
      k1     k2
0   True   True
1   True   True
2  False  False
3  False   True
4   True  False
5  False   True
6   True   True
7  False  False

Then add DataFrame.all for check if all Trues per row

data['same'] = data[['k1', 'k2']].apply(lambda x: x.str.contains('setup')).all(1)
print (data)
   id          k1        k2   same
0   1    re_setup  oo_setup   True
1   2    oo_setup  oo_setup   True
2   3    alerting    bounce  False
3   4      bounce  re_setup  False
4   5    re_setup  alerting  False
5   6  alerting_s  re_setup  False
6   7    re_setup  oo_setup   True
7   8    alerting    bounce  False

or DataFrame.any for check at least one True per row:

data['same'] = data[['k1', 'k2']].applymap(lambda x: 'setup' in x).any(1)
print (data)
   id          k1        k2   same
0   1    re_setup  oo_setup   True
1   2    oo_setup  oo_setup   True
2   3    alerting    bounce  False
3   4      bounce  re_setup   True
4   5    re_setup  alerting   True
5   6  alerting_s  re_setup   True
6   7    re_setup  oo_setup   True
7   8    alerting    bounce  False

Another solutions with applymap for elemnt wise check:

data['same'] = data[['k1', 'k2']].applymap(lambda x: 'setup' in x).all(1)
print (data)
   id          k1        k2   same
0   1    re_setup  oo_setup   True
1   2    oo_setup  oo_setup   True
2   3    alerting    bounce  False
3   4      bounce  re_setup  False
4   5    re_setup  alerting  False
5   6  alerting_s  re_setup  False
6   7    re_setup  oo_setup   True
7   8    alerting    bounce  False

If only 2 columns simple chain conditions with & like all or | like any:

data['same'] = data['k1'].str.contains('setup') & data['k2'].str.contains('setup')
print (data)
   id          k1        k2   same
0   1    re_setup  oo_setup   True
1   2    oo_setup  oo_setup   True
2   3    alerting    bounce  False
3   4      bounce  re_setup  False
4   5    re_setup  alerting  False
5   6  alerting_s  re_setup  False
6   7    re_setup  oo_setup   True
7   8    alerting    bounce  False

Zero · Answer

Here's another generic reduce operation without needing apply

In [114]: np.logical_or.reduce([df[c].str.contains('setup') for c in ['k1', 'k2']])
Out[114]: array([ True,  True, False,  True,  True,  True,  True, False], dtype=bool)

Detail

In [115]: df['same'] = np.logical_or.reduce(
                         [df[c].str.contains('setup') for c in ['k1', 'k2']])

In [116]: df
Out[116]:
   id            k1            k2   same
0   1      re_setup      oo_setup   True
1   2      oo_setup      oo_setup   True
2   3      alerting        bounce  False
3   4        bounce  re_oversetup   True
4   5  re_oversetup      alerting   True
5   6    alerting_s      re_setup   True
6   7  re_oversetup      oo_setup   True
7   8      alerting        bounce  False

Timings

Small

In [111]: df.shape
Out[111]: (8, 4)

In [108]: %timeit np.logical_or.reduce([df[c].str.contains('setup') for c in ['k1', 'k2']])
1000 loops, best of 3: 421 µs per loop

In [109]: %timeit df[['k1', 'k2']].apply(lambda x: x.str.contains('setup')).any(1)
1000 loops, best of 3: 2.01 ms per loop

Large

In [110]: df.shape
Out[110]: (40000, 4)

In [112]: %timeit np.logical_or.reduce([df[c].str.contains('setup') for c in ['k1', 'k2']])
10 loops, best of 3: 59.5 ms per loop

In [113]: %timeit df[['k1', 'k2']].apply(lambda x: x.str.contains('setup')).any(1)
10 loops, best of 3: 88.4 ms per loop

If one row in two columns contain the same string python pandas

Tags:

python

string

pandas

dataframe

jovicbg

2 Answers

jezrael

Zero

Recent Activity

Donate For Us

If one row in two columns contain the same string python pandas

Tags:

python

string

pandas

dataframe

jovicbg

2 Answers

jezrael

Zero

Related questions

Recent Activity

Donate For Us