I have a dataframe like this:
df = pd.DataFrame({'c1': list('aba'), 'c2': list('aaa'), 'ignore_me': list('bbb'), 'c3': list('baa')})
  c1 c2 ignore_me c3
0  a  a         b  b
1  b  a         b  a
2  a  a         b  a
and a dictionary that looks like this
d = {'a': "foo", 'b': 'bar'}
I now want to map the values of d to columns that match the regex ^c\d+$.
I can do
df.filter(regex='^c\d+$').apply(lambda x: x.map(d))
    c1   c2   c3
0  foo  foo  bar
1  bar  foo  foo
2  foo  foo  foo
however, then the there are all the columns missing that don't match the regex.
So, I can therefore do:
tempdf = df.filter(regex='^c\d+$')
df.loc[:, tempdf.columns] = tempdf.apply(lambda x: x.map(d))
which gives the desired output
    c1   c2 ignore_me   c3
0  foo  foo         b  bar
1  bar  foo         b  foo
2  foo  foo         b  foo
Is there a smarter solution that avoids the tempory dataframe?
There absolutely is, use str.contains.
df.columns.str.contains(r'^c\d+$') # use raw strings, it's good hygene
# array([ True,  True, False,  True])
Pass the mask to loc:
df.loc[:, df.columns.str.contains(r'^c\d+$')] = df.apply(lambda x: x.map(d))
If you want to be as efficient as possible,
m = df.columns.str.contains(r'^c\d+$')
df.loc[:, m] = df.loc[:, m].apply(lambda x: x.map(d))
df
    c1   c2 ignore_me   c3
0  foo  foo  b         bar
1  bar  foo  b         foo
2  foo  foo  b         foo
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With