Pandas row value string parsing (mixed string and float) [duplicate]

Question

I have data like this

ID   INFO
1    A=2;B=2;C=5
2    A=3;B=4;C=1
3    A=1;B=3;C=2

I want to split the Info columns into

ID   A    B    C
1    2    2    5
2    3    4    1
3    1    3    2

I can split columns with one delimiter by using

df['A'], df['B'], df['C'] = df['INFO'].str.split(';').str

then split again by = but this seems to not so efficient in case I have many rows and especially when there are so many field that cannot be hard-coded beforehand.

Any suggestion would be greatly welcome.

ALollz · Accepted Answer

You could use named groups together with Series.str.extract. In the end concat back the 'ID'. This assumes you always have A=;B=;and C= in a line.

pd.concat([df['ID'], 
           df['INFO'].str.extract('A=(?P<A>\d);B=(?P<B>\d);C=(?P<C>\d)')], axis=1)

#   ID  A  B  C
#0   1  2  2  5
#1   2  3  4  1
#2   3  1  3  2

If you want a more flexible solution that can deal with cases where a single line might be 'A=1;C=2' then we can split on ';' and partition on '='. pivot in the end to get to your desired output.

### Starting Data
#ID   INFO
#1    A=2;B=2;C=5
#2    A=3;B=4;C=1
#3    A=1;B=3;C=2
#4    A=1;C=2

(df.set_index('ID')['INFO']
   .str.split(';', expand=True)
   .stack()
   .str.partition('=')
   .reset_index(-1, drop=True)
   .pivot(columns=0, values=2)
)

#    A    B  C
#ID           
#1   2    2  5
#2   3    4  1
#3   1    3  2
#4   1  NaN  2

Pandas row value string parsing (mixed string and float) [duplicate]

Tags:

python

pandas

Thanh Nguyen

1 Answers

ALollz

Recent Activity

Donate For Us

Pandas row value string parsing (mixed string and float) [duplicate]

Tags:

python

pandas

Thanh Nguyen

1 Answers

ALollz

Related questions

Recent Activity

Donate For Us