I have a Pandas dataframe that looks something like this:
>>> df
m event
0 3 1
1 1 1
2 1 2
3 1 2
4 2 1
5 2 0
6 3 1
7 2 2
8 3 2
9 3 1
I want to group the values of the event column into lists based on the m column so that I would get this:
>>> df
m events
0 3 [1, 1, 2, 1]
1 1 [1, 2, 2]
2 2 [1, 0, 2]
There should be one row per unique value of m with a corresponding list of all events that belongs to m.
I tried this:
>>> list(df.groupby('m').event)
[(3, m_id
0 1
6 1
8 2
9 1
Name: event, dtype: int64), (1, m_id
1 1
2 2
3 2
Name: event, dtype: int64), (2, m_id
4 1
5 0
7 2
Name: event, dtype: int64)]
It sort of does what I want in that it groups the events after m. I could massage this back into the dataframe that I wanted with some loops, but I feel that I have started on an ugly an unnecessarily complex path. And slow, if there are thousands of unique values for m.
Can I perform the conversion I wanted in an elegant manner using Pandas methods?
Bonus if the events column can contain (numpy) arrays so that I can do math directly on the events rows, like df[df.m==1].events + 100, but regular lists are also ok.
In [320]: r = df.groupby('m')['event'].apply(np.array).reset_index(name='event')
In [321]: r
Out[321]:
m event
0 1 [1, 2, 2]
1 2 [1, 0, 2]
2 3 [1, 1, 2, 1]
Bonus:
In [322]: r.loc[r.m==1, 'event'] + 1
Out[322]:
0 [2, 3, 3]
Name: event, dtype: object
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With