Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python pandas groupby conditional concatenate strings into multiple columns

I am trying to group by a dataframe on one column, keeping several columns from one row in each group and concatenating strings from the other rows into multiple columns based on the value of one column. Here is an example...

df = pd.DataFrame({'test' : ['a','a','a','a','a','a','b','b','b','b'],
     'name' : ['aa','ab','ac','ad','ae','ba','bb','bc','bd','be'],
     'amount' : [1, 2, 3, 4, 5, 6, 7, 8, 9, 9.5],
     'role' : ['x','y','y','x','x','z','y','y','z','y']})

df

      amount    name    role    test
0        1.0    aa      x       a
1        2.0    ab      y       a
2        3.0    ac      y       a
3        4.0    ad      x       a
4        5.0    ae      x       a
5        6.0    ba      z       a
6        7.0    bb      y       b
7        8.0    bc      y       b
8        9.0    bd      z       b
9        9.5    be      y       b

I would like to groupby on test, retain name and amount when role = 'z', create a column (let's call it X) that concatenates the values of name when role = 'x' and another column (let's call it Y) that concatenates the values of name when role = 'y'. [Concatenated values separated by '; '] There could be zero to many rows with role = 'x', zero to many rows with role = 'y' and one row with role = 'z' per value of test. For X and Y, these can be null if there are no rows for that role for that test. The amount value is dropped for all rows with role = 'x' or 'y'. The desired output would be something like:

     test   name     amount        X              Y
0    a      ba          6.0        aa; ad; ae     ab; ac
1    b      bd          9.0        None           bb; bc; be

For the concatenating part, I found x.ix[x.role == 'x', X] = "{%s}" % '; '.join(x['name']), which I might be able to repeat for y. I tried a few things along the lines of name = x[x.role == 'z'].name.first() for name and amount. I also tried going down both paths of a defined function and a lambda function without success. Appreciate any thoughts.

like image 639
stlouismv Avatar asked Nov 21 '25 09:11

stlouismv


1 Answers

You can create customized columns in the apply function after groupby as follows where g can be considered a sub data frame with a single value in the test column, and since you want multiple columns returned, you need to create a Series object for each group where the indices are the corresponding headers in the result:

df.groupby('test').apply(lambda g: pd.Series({'name': g['name'][g.role == 'z'].iloc[0],
                                              'amount': g['amount'][g.role == 'z'].iloc[0], 
                                              'X': '; '.join(g['name'][g.role == 'x']), 
                                              'Y': '; '.join(g['name'][g.role == 'y'])
                                             })).reset_index()

enter image description here

like image 126
Psidom Avatar answered Nov 24 '25 01:11

Psidom