Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multiple dataframe transformations based on a single column

I was looking for a similar question but I did not find a solution for what I want to do. any help is welcome

so here is the code to get an example of my Dataframe :

import pandas as pd
L = [[0.1998,'IN TIME,IN TIME','19708,19708','MR SD#5 W/Z SD#6 X/Y',20.5],
     [0.3983,'LATE,IN TIME','11206,18054','MR SD#4 A/B SD#1 C/D',19.97]]

df = pd.DataFrame(L,columns=['Time','status','F_nom','info','Delta'])

output :

enter image description here

I would like to create two new rows for each row in my main dataframe based on 'Info' column

as we can see on the column 'Info' in my main dataframe each row contains two different SD# i would like to have only one SD# per row

Also i would like to keep the corresponding values of the columns : Time , Status , F_norm ,Delta

Finaly create a new column 'type info' that contains the specific string for each SD# (W/Z or A/B etc.) and all this by keeping the index of my main data_frame !

Here is the desired result :

enter image description here

I hope i was clear enough, waiting for your returns thank you.

like image 779
Youcef Benyettou Avatar asked Feb 25 '26 19:02

Youcef Benyettou


1 Answers

Use:

#split values by comma or whitespace
df['status'] = df['status'].str.split(',')
df['F_nom'] = df['F_nom'].str.split(',')
info = df.pop('info').str.split()
#select values by indexing
df['info'] = info.str[1::2]
df['type_info'] = info.str[2::2]

#reshape to Series
s = df.set_index(['Time','Delta']).stack()
#create new DataFrame and reshape to expected output
df1 = (pd.DataFrame(s.values.tolist(), index=s.index)
        .stack()
        .unstack(2)
        .reset_index(level=2, drop=True)
        .reset_index())
print (df1)
     Time  Delta   status  F_nom  info type_info
0  0.1998  20.50  IN TIME  19708  SD#5       W/Z
1  0.1998  20.50  IN TIME  19708  SD#6       X/Y
2  0.3983  19.97     LATE  11206  SD#4       A/B
3  0.3983  19.97  IN TIME  18054  SD#1       C/D

Another solution:

df['status'] = df['status'].str.split(',')
df['F_nom'] = df['F_nom'].str.split(',')
info = df.pop('info').str.split()
df['info'] = info.str[1::2]
df['type_info'] = info.str[2::2]

from itertools import chain

lens = df['status'].str.len()
df = pd.DataFrame({
    'Time' : df['Time'].values.repeat(lens), 
    'status' : list(chain.from_iterable(df['status'].tolist())), 
    'F_nom' : list(chain.from_iterable(df['F_nom'].tolist())), 
    'info' : list(chain.from_iterable(df['info'].tolist())), 
    'Delta' : df['Delta'].values.repeat(lens),
    'type_info' : list(chain.from_iterable(df['type_info'].tolist())), 
})
print (df)
     Time   status  F_nom  info  Delta type_info
0  0.1998  IN TIME  19708  SD#5  20.50       W/Z
1  0.1998  IN TIME  19708  SD#6  20.50       X/Y
2  0.3983     LATE  11206  SD#4  19.97       A/B
3  0.3983  IN TIME  18054  SD#1  19.97       C/D
like image 155
jezrael Avatar answered Feb 27 '26 07:02

jezrael