I have data like this
ID INFO
1 A=2;B=2;C=5
2 A=3;B=4;C=1
3 A=1;B=3;C=2
I want to split the Info columns into
ID A B C
1 2 2 5
2 3 4 1
3 1 3 2
I can split columns with one delimiter by using
df['A'], df['B'], df['C'] = df['INFO'].str.split(';').str
then split again by = but this seems to not so efficient in case I have many rows and especially when there are so many field that cannot be hard-coded beforehand.
Any suggestion would be greatly welcome.
You could use named groups together with Series.str.extract. In the end concat back the 'ID'. This assumes you always have A=;B=;and C= in a line.
pd.concat([df['ID'],
df['INFO'].str.extract('A=(?P<A>\d);B=(?P<B>\d);C=(?P<C>\d)')], axis=1)
# ID A B C
#0 1 2 2 5
#1 2 3 4 1
#2 3 1 3 2
If you want a more flexible solution that can deal with cases where a single line might be 'A=1;C=2' then we can split on ';' and partition on '='. pivot in the end to get to your desired output.
### Starting Data
#ID INFO
#1 A=2;B=2;C=5
#2 A=3;B=4;C=1
#3 A=1;B=3;C=2
#4 A=1;C=2
(df.set_index('ID')['INFO']
.str.split(';', expand=True)
.stack()
.str.partition('=')
.reset_index(-1, drop=True)
.pivot(columns=0, values=2)
)
# A B C
#ID
#1 2 2 5
#2 3 4 1
#3 1 3 2
#4 1 NaN 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With