select multiple nth values in grouping with conditional aggregate - pandas

Question

i've got a pd.DataFrame with four columns

    df = pd.DataFrame({'id':[1,1,1,1,1,2,2,2,2] 
                      , 'A':['H','H','E','E','H','E','E','H','H']
                      , 'B':[4,5,2,7,6,1,3,1,0]
                      , 'C':['M','D','M','D','M','M','M','D','D']})

   id  A  B  C
0   1  H  4  M
1   1  H  5  D
2   1  E  2  M
3   1  E  7  D
4   1  H  6  M
5   2  E  1  M
6   2  E  3  M
7   2  H  1  D
8   2  H  0  D

I'd like to group by id and get the value of B for the nth (let's say second) occurrence of A = 'H' for each id in agg_B1 and value of B for the nth (let's say first) occurrence of C='M':

desired output:
   id  agg_B1  agg_B2
0   1      5      4  
1   2      0      1

desired_output = df.groupby('id').agg(
      agg_B1= ('B',lambda x:x[df.loc[x.index].loc[df.A== 'H'][1]])
   ,  agg_B2= ('B',lambda x:x[df.loc[x.index].loc[df.C== 'M'][0]])

    ).reset_index()

TypeError: Indexing a Series with DataFrame is not supported, use the appropriate DataFrame column

Obviously, I'm doing something wrong with the indexing.

Edit: if possible, I'd like to use aggregate with lambda function, because there are multiple aggregate outputs of other sorts that I'd like to extract at the same time.

jezrael · Accepted Answer

Your solution is possible change if need GroupBy.agg:

desired_output = df.groupby('id').agg(
      agg_B1= ('B',lambda x:x[df.loc[x.index, 'A']== 'H'].iat[1]),
      agg_B2= ('B',lambda x:x[df.loc[x.index, 'C']== 'M'].iat[0])
    ).reset_index()


print (desired_output)
   id  agg_B1  agg_B2
0   1       5       4
1   2       0       1

But if performance is important and also not sure if always exist second value matched H for first condition I suggest processing each condition separately and last add to original aggregated values:

#some sample aggregations
df0 = df.groupby('id').agg({'B':'sum', 'C':'last'})

df1 = df[df['A'].eq('H')].groupby("id")['B'].nth(1).rename('agg_B1')
df2 = df[df['C'].eq('M')].groupby("id")['B'].first().rename('agg_B2')

desired_output = pd.concat([df0, df1, df2], axis=1)
print (desired_output)
     B  C  agg_B1  agg_B2
id                       
1   24  M       5       4
2    5  D       0       1

EDIT1: If need GroupBy.agg is possible test if failed indexing and then add missing value:

#for second value in sample working nice
def f1(x):
    try:
        return x[df.loc[x.index, 'A']== 'H'].iat[1]
    except:
        return np.nan

desired_output = df.groupby('id').agg(
      agg_B1= ('B',f1),
      agg_B2= ('B',lambda x:x[df.loc[x.index, 'C']== 'M'].iat[0])
    ).reset_index()

print (desired_output)
   id  agg_B1  agg_B2
0   1       5       4
1   2       0       1

#third value not exist so added missing value NaN
def f1(x):
    try:
        return x[df.loc[x.index, 'A']== 'H'].iat[2]
    except:
        return np.nan

desired_output = df.groupby('id').agg(
      agg_B1= ('B',f1),
      agg_B2= ('B',lambda x:x[df.loc[x.index, 'C']== 'M'].iat[0])
    ).reset_index()

print (desired_output)
   id  agg_B1  agg_B2
0   1     6.0       4
1   2     NaN       1

What working same like:

df1 = df[df['A'].eq('H')].groupby("id")['B'].nth(2).rename('agg_B1')
df2 = df[df['C'].eq('M')].groupby("id")['B'].first().rename('agg_B2')

desired_output = pd.concat([df1, df2], axis=1)
print (desired_output)
    agg_B1  agg_B2
id                
1      6.0       4
2      NaN       1

sammywemmy · Answer

Filter for rows where A equals H, then grab the second row with the nth function :

df.query("A=='H'").groupby("id").nth(1)

    A   B
id      
1   H   5
2   H   0

Python works on a zero based notation, so row 2 will be nth(1)

select multiple nth values in grouping with conditional aggregate - pandas

Tags:

python

indexing

pandas

group-by

aggregate

Ankhnesmerira

2 Answers

jezrael

sammywemmy

Recent Activity

Donate For Us

select multiple nth values in grouping with conditional aggregate - pandas

Tags:

python

indexing

pandas

group-by

aggregate

Ankhnesmerira

2 Answers

jezrael

sammywemmy

Related questions

Recent Activity

Donate For Us