Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: group columns of duplicate rows into column of lists

I have a Pandas dataframe that looks something like this:

>>> df
       m  event
0      3      1
1      1      1
2      1      2
3      1      2
4      2      1
5      2      0
6      3      1
7      2      2
8      3      2
9      3      1

I want to group the values of the event column into lists based on the m column so that I would get this:

>>> df
       m            events
0      3      [1, 1, 2, 1]
1      1      [1, 2, 2]
2      2      [1, 0, 2]

There should be one row per unique value of m with a corresponding list of all events that belongs to m.

I tried this:

>>> list(df.groupby('m').event)
[(3, m_id
0    1
6    1
8    2
9    1
Name: event, dtype: int64), (1, m_id
1    1
2    2
3    2
Name: event, dtype: int64), (2, m_id
4    1
5    0
7    2
Name: event, dtype: int64)]

It sort of does what I want in that it groups the events after m. I could massage this back into the dataframe that I wanted with some loops, but I feel that I have started on an ugly an unnecessarily complex path. And slow, if there are thousands of unique values for m.

Can I perform the conversion I wanted in an elegant manner using Pandas methods?

Bonus if the events column can contain (numpy) arrays so that I can do math directly on the events rows, like df[df.m==1].events + 100, but regular lists are also ok.

like image 678
PaulMag Avatar asked Oct 21 '25 14:10

PaulMag


1 Answers

In [320]: r = df.groupby('m')['event'].apply(np.array).reset_index(name='event')

In [321]: r
Out[321]:
   m         event
0  1     [1, 2, 2]
1  2     [1, 0, 2]
2  3  [1, 1, 2, 1]

Bonus:

In [322]: r.loc[r.m==1, 'event'] + 1
Out[322]:
0    [2, 3, 3]
Name: event, dtype: object
like image 179
MaxU - stop WAR against UA Avatar answered Oct 23 '25 05:10

MaxU - stop WAR against UA



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!