I have the following column in a dataframe:
Q2
1 4
1 3
3 4 11
1 4 6 15 16
I want to replace mutiple values in a cell, if present: 1 gets replaced by Facebook, 2 with Instagram, and so on.
I splitted the values as follows:
columns_to_split = 'Q2'
for c in columns_to_split:
df[c] = df[c].str.split(' ')
which outputs
code
DSOKF31 [1, 4]
DSOVH39 [1, 3]
DSOVH05 [3, 4, 16]
DSOVH23 [1, 4, 6, 15, 16]
Name: Q2, dtype: object
but when trying to replace the multiple values with a dictionary as follows:
social_media_2 = {'1':'Facebook', '2':'Instagram', '3':'Twitter', '4':'Messenger (Google hangout, Tagg, WhatsAPP, MSG, Facetime, IMO)', '5':'SnapChat', '6':'Imo', '7':'Badoo', '8':'Viber', '9':'Twoo', '10':'Linkedin', '11':'Flickr', '12':'Meetup', '13':'Tumblr', '14':'Pinterest', '15':'Yahoo', '16':'Gmail', '17':'Hotmail', '18':'M-Pesa', '19':'M-Shwari', '20':'KCB-Mpesa', '21':'Equitel', '22':'MobiKash', '23':'Airtel money', '24':'Orange Money', '25':'Mobile Bankig Accounts', '26':'Other specify'}
df['Q2'] = df['Q2'].replace(social_media_2)
I get the same output:
code
DSOKF31 [1, 4]
DSOVH39 [1, 3]
DSOVH05 [3, 4, 16]
DSOVH23 [1, 4, 6, 15, 16]
Name: Q2, dtype: object
How do I replace multiple values in one cell in this case?
If dont need list as output add only regex=True to replace:
import pandas as pd
import numpy as np
df = pd.DataFrame({'Q2': ['1 4', '1 3', '3 4 11']})
print (df)
Q2
0 1 4
1 1 3
2 3 4 11
social_media_2 = {'1':'Facebook', '2':'Instagram', '3':'Twitter', '4':'Messenger (Google hangout, Tagg, WhatsAPP, MSG, Facetime, IMO)', '5':'SnapChat', '6':'Imo', '7':'Badoo', '8':'Viber', '9':'Twoo', '10':'Linkedin', '11':'Flickr', '12':'Meetup', '13':'Tumblr', '14':'Pinterest', '15':'Yahoo', '16':'Gmail', '17':'Hotmail', '18':'M-Pesa', '19':'M-Shwari', '20':'KCB-Mpesa', '21':'Equitel', '22':'MobiKash', '23':'Airtel money', '24':'Orange Money', '25':'Mobile Bankig Accounts', '26':'Other specify'}
df['Q2'] = df['Q2'].replace(social_media_2, regex=True)
print (df)
Q2
0 Facebook Messenger (Google hangout, Tagg, What...
1 Facebook Twitter
2 Twitter Messenger (Google hangout, Tagg, Whats...
If need lists, use another solutions.
EDIT by comment:
You can replace whitespace by ; and then it works nice:
df = pd.DataFrame({'Q2': ['1 4', '1 3', '3 4 11']})
print (df)
Q2
0 1 4
1 1 3
2 3 4 11
df['Q2'] = df['Q2'].str.replace(' ',';')
print (df)
Q2
0 1;4
1 1;3
2 3;4;11
social_media_2 = {'1':'Facebook', '2':'Instagram', '3':'Twitter', '4':'Messenger (Google hangout, Tagg, WhatsAPP, MSG, Facetime, IMO)', '5':'SnapChat', '6':'Imo', '7':'Badoo', '8':'Viber', '9':'Twoo', '10':'Linkedin', '11':'Flickr', '12':'Meetup', '13':'Tumblr', '14':'Pinterest', '15':'Yahoo', '16':'Gmail', '17':'Hotmail', '18':'M-Pesa', '19':'M-Shwari', '20':'KCB-Mpesa', '21':'Equitel', '22':'MobiKash', '23':'Airtel money', '24':'Orange Money', '25':'Mobile Bankig Accounts', '26':'Other specify'}
df['Q2'] = df['Q2'].replace(social_media_2, regex=True)
print (df)
Q2
0 Facebook;Messenger (Google hangout, Tagg, What...
1 Facebook;Twitter
2 Twitter;Messenger (Google hangout, Tagg, Whats...
EDIT1:
Tou can also a bit change dict by adding ; to keys and then replace by double ;:
df = pd.DataFrame({'Q2': ['1 2', '1 3', '3 2 11']})
print (df)
Q2
0 1 2
1 1 3
2 3 2 11
df['Q2'] = df['Q2'].str.replace(' ',';;') + ';'
print (df)
Q2
0 1;;2;
1 1;;3;
2 3;;2;;11;
social_media_2 = {'1':'Fa', '2':'I', '3':'T', '11':'KL'}
#add ; to keys in dict
social_media_2 = dict((key + ';', value) for (key, value) in social_media_2.items())
print (social_media_2)
{'1;': 'Fa', '2;': 'I', '3;': 'T', '11;': 'KL'}
df['Q2'] = df['Q2'].replace(social_media_2, regex=True)
print (df)
Q2
0 Fa;I
1 Fa;T
2 T;I;1Fa
Since the number of items is varying, there isn't a lot of structure. Still, after you split the string, you can apply a function that maps a list into dictionary values:
In [36]: df = pd.DataFrame({'Q2': ['1 4', '1 3', '1 2 3']})
In [37]: df.Q2.str.split(' ').apply(lambda l: [social_media_2[e] for e in l])
Out[37]:
0 [Facebook, Messenger (Google hangout, Tagg, Wh...
1 [Facebook, Twitter]
2 [Facebook, Instagram, Twitter]
Name: Q2, dtype: object
Edit Following Jezrael's excellent comment, here's a version that accounts for missing values as well:
In [58]: df = pd.DataFrame({'Q2': ['1 4', '1 3', '1 2 3', None]})
In [59]: df.Q2.str.split(' ').apply(lambda l: [] if type(l) != list else [social_media_2[e] for e in l])
Out[59]:
0 [Facebook, Messenger (Google hangout, Tagg, Wh...
1 [Facebook, Twitter]
2 [Facebook, Instagram, Twitter]
3 []
Name: Q2, dtype: object
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With