Aggregation on sub DataFrames defined by sets of indices without loop

Question

Suppose I have a Pandas DataFrame, I take some easy example:

import pandas as pd
df = pd.DataFrame(columns=["A", "B"], data = [(1, 2), (4, 5), (7, 8), (10, 11)])

I have a set of indices, let's make it simple and random:

inds = [(0, 1, 3), (0, 1, 2), (1, 2, 3)]

I want to aggregate the data according to those indices, in the following way, for instance if the aggregation operation is the mean I would obtain:

A	B
`df.loc[inds[0], "A"].mean()`	`df.loc[inds[0], "B"].mean()`
`df.loc[inds[1], "A"].mean()`	`df.loc[inds[1], "B"].mean()`
`df.loc[inds[2], "A"].mean()`	`df.loc[inds[2], "B"].mean()`

Is there a way to perform this in pure pandas without writing a loop?

This is very similar to a df.groupby and then .agg type of operation, but I did not find a way to create a GroupBy object from a custom set of indices.

ouroboros1 · Accepted Answer

Edit: showing how to achieve this with groupby, but surely "significantly simpler to think of this as a selection by index problem"; see the answer by @HenryEcker.

Option 1 (reindex + groupby)

s = pd.Series(inds).explode()

out = df.reindex(s).groupby(s.index).mean()

out

     A    B
0  5.0  6.0 # i.e. A: (1+4+10)/3, B: (2+5+11)/3, etc.
1  4.0  5.0
2  7.0  8.0

Explanation

Use inds to create a pd.Series (here: s), and apply series.explode. The index values function as group identifiers:

# intermediate series ('group 0, 1, 2')

0    0
0    1
0    3
1    0
1    1
1    2
2    1
2    2
2    3
dtype: object

Apply df.reindex with values from s, use df.groupby with s.index, and get groupby.mean.

Option 2 (merge + groupby)

out = (
    df.merge(
        pd.Series(inds, name='g').explode(), 
        left_index=True, 
        right_on='g', 
        how='right'
        )
    .drop(columns=['g'])
    .groupby(level=0)
    .mean()
    )

# same result

Explanation

As with option 1, we create a pd.Series and explode it, but this time we add a name, which we need for the merge in the next step.
Now, use df.merge with how=right to add the values from df using g values from our series and index from df as the keys.
Finally, drop column 'g' (df.drop), apply df.groupby on the index (level=0), and get groupby.mean.

Aggregation on sub DataFrames defined by sets of indices without loop

Tags:

python

pandas

DimB

1 Answers

ouroboros1

Recent Activity

Donate For Us

Aggregation on sub DataFrames defined by sets of indices without loop

Tags:

python

pandas

DimB

1 Answers

ouroboros1

Related questions

Recent Activity

Donate For Us