enumerate items in each group

Question

I have a DataFrame like

    id   chi  prop   ord 
0   100   L    67     0 
1   100   L    68     1 
2   100   L    68     2 
3   100   L    68     3 
4   100   L    70     0 
5   100   L    71     0 
6   100   R    67     0 
7   100   R    68     1 
8   100   R    68     2 
9   100   R    68     3 
10  110   R    70     0 
11  110   R    71     0 
12  101   L    67     0 
13  101   L    68     0 
14  101   L    69     0 
15  101   L    71     0 
16  101   L    72     0 
17  201   R    67     0 
18  201   R    68     0 
19  201   R    69     0

ord essentially gives the ordering of the entries when (prop, chi and id) all have the same value. This isn't quite what I'd like though. Instead, I'd like to be able to enumerate the entries of each group g in {(id, chi)} from 0 to n_g where n_g is the size of group g. So I'd like to obtain something that looks like

    id   chi  prop   count 
0   100   L    67     0 
1   100   L    68     1 
2   100   L    68     2 
3   100   L    68     3 
4   100   L    70     4 
5   100   L    71     5 
6   100   R    67     0 
7   100   R    68     1 
8   100   R    68     2 
9   100   R    68     3 
10  110   R    70     0 
11  110   R    71     1 
12  101   L    67     0 
13  101   L    68     1 
14  101   L    69     2 
15  101   L    71     3 
16  101   L    72     4 
17  201   R    67     0 
18  201   R    68     1 
19  201   R    69     2

I'd like to know if there's a simple way of doing this with pandas. The following comes very close, but it feels way too complicated, and it for some reason won't let me join the resulting dataframe with the original one.

(df.groupby(['id', 'chi'])
   .apply(lambda g: np.arange(g.shape[0]))
   .apply(pd.Series, 1)
   .stack()
   .rename('counter')
   .reset_index()         
   .drop(columns=['level_2']))

EDIT: A second way of course is the for loop way, but I'm looking for something more "Pythonic" than:

for gname, idx in df.groupby(['id','chi']).groups.items():
    tmp = df.loc[idx]
    df.loc[idx, 'counter'] = np.arange(tmp.shape[0])

R has a very simple way of achieving this behaviour using the tidyverse packages, but I haven't quite found the well-oiled way to achieve the same thing with pandas. Any help provided is greatly appreciated!

piRSquared · Accepted Answer

`.groupby().cumcount()`

df.assign(ord=df.groupby(['id', 'chi']).cumcount())

     id chi  prop  ord
0   100   L    67    0
1   100   L    68    1
2   100   L    68    2
3   100   L    68    3
4   100   L    70    4
5   100   L    71    5
6   100   R    67    0
7   100   R    68    1
8   100   R    68    2
9   100   R    68    3
10  110   R    70    0
11  110   R    71    1
12  101   L    67    0
13  101   L    68    1
14  101   L    69    2
15  101   L    71    3
16  101   L    72    4
17  201   R    67    0
18  201   R    68    1
19  201   R    69    2

`defaultdict` and `itertools.count`

from itertools import count
from collections import defaultdict

d = defaultdict(count)

df.assign(ord=[next(d[t]) for t in zip(df.id, df.chi)])

     id chi  prop  ord
0   100   L    67    0
1   100   L    68    1
2   100   L    68    2
3   100   L    68    3
4   100   L    70    4
5   100   L    71    5
6   100   R    67    0
7   100   R    68    1
8   100   R    68    2
9   100   R    68    3
10  110   R    70    0
11  110   R    71    1
12  101   L    67    0
13  101   L    68    1
14  101   L    69    2
15  101   L    71    3
16  101   L    72    4
17  201   R    67    0
18  201   R    68    1
19  201   R    69    2

enumerate items in each group

Tags:

python

pandas

group-by

bashfuloctopus

1 Answers

`.groupby().cumcount()`

`defaultdict` and `itertools.count`

piRSquared

Recent Activity

Donate For Us

enumerate items in each group

Tags:

python

pandas

group-by

bashfuloctopus

1 Answers

.groupby().cumcount()

defaultdict and itertools.count

piRSquared

Related questions

Recent Activity

Donate For Us

`.groupby().cumcount()`

`defaultdict` and `itertools.count`