I have a dataframe like this:
df = pd.DataFrame({'c1': list('aba'), 'c2': list('aaa'), 'ignore_me': list('bbb'), 'c3': list('baa')})
c1 c2 ignore_me c3
0 a a b b
1 b a b a
2 a a b a
and a dictionary that looks like this
d = {'a': "foo", 'b': 'bar'}
I now want to map the values of d to columns that match the regex ^c\d+$.
I can do
df.filter(regex='^c\d+$').apply(lambda x: x.map(d))
c1 c2 c3
0 foo foo bar
1 bar foo foo
2 foo foo foo
however, then the there are all the columns missing that don't match the regex.
So, I can therefore do:
tempdf = df.filter(regex='^c\d+$')
df.loc[:, tempdf.columns] = tempdf.apply(lambda x: x.map(d))
which gives the desired output
c1 c2 ignore_me c3
0 foo foo b bar
1 bar foo b foo
2 foo foo b foo
Is there a smarter solution that avoids the tempory dataframe?
There absolutely is, use str.contains.
df.columns.str.contains(r'^c\d+$') # use raw strings, it's good hygene
# array([ True, True, False, True])
Pass the mask to loc:
df.loc[:, df.columns.str.contains(r'^c\d+$')] = df.apply(lambda x: x.map(d))
If you want to be as efficient as possible,
m = df.columns.str.contains(r'^c\d+$')
df.loc[:, m] = df.loc[:, m].apply(lambda x: x.map(d))
df
c1 c2 ignore_me c3
0 foo foo b bar
1 bar foo b foo
2 foo foo b foo
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With