Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to explode the column value without duplicating the other columns values in panda dataframe?

Tags:

python

pandas

I have df like this:

id       ColumnA           ColumnB           ColumnC
1        Audi_BMW_VW       BMW_Audi             VW
2          VW                 Audi            Audi_BMW_VW

I want to explode the columns based on explode when _ appear. For example for "Column A" like this

df['Column A'].str.split('_')).explode('Column A')

but when i use similar query for column B then it repeats the values of column A, but i really want that only ID should duplicate. The desired output would be something like this:

id       ColumnA           ColumnB           ColumnC
1        Audi                BMW             VW
1         BMW                Audi
1         VW                  
2         VW                 Audi            Audi
2                                            BMW
2                                            VW
like image 664
s_khan92 Avatar asked Nov 22 '25 14:11

s_khan92


1 Answers

Lots of reshaping. The key point is to stack then call Series.str.split on a single Series with the id as the Index.

(df.set_index('id')             # keep 'id' bound to cells in the row 
   .stack()                     # to a single Series
   .str.split('_', expand=True) # split into separate cells on '_'
   .unstack(-1).stack(0)        # original column labels back to columns
   .reset_index(-1, drop=True)  # remove split number label
)

   ColumnA ColumnB ColumnC
id                        
1     Audi     BMW      VW
1      BMW    Audi    None
1       VW    None    None
2       VW    Audi    Audi
2     None    None     BMW
2     None    None      VW
like image 64
ALollz Avatar answered Nov 25 '25 02:11

ALollz