Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to Group by and calculation of other column. pandas

Tags:

python

pandas

I have summarized the data frame of col1 col2 col3 count, Add different weightage on that count

Data set is like


# Current result 
    col1 col2  col3   Count   Weightage_count
--------------------------------------------- 
 1:  A    S1   X110     2          2
 2:  A    S1   X150     2          0.5
 3:  A    S2   X212     2          1
 4:  A    S2   X200     1          0.5
 5:  A    S2   X211     1          0.25
 6:  B    S3   X311     4          4
 7:  C    S4   X222     3          1.5


data = {'Col1':['A','A','A','A','A','B','C'],
        'Col2':['S1','S1','S2','S2','S2','S3','S4'],
         'Col3':['X110','X150','X212','X200','X211','X311','X222'],
          'Count': [2,2,2,1,1,4,3],  
           'Weightage_count':[2, 0.5, 1, 0.5, 0.25, 4, 1.5]}

df = pd.DataFrame(data)

Want to calculate results based on col1 and col2.

  • Result = (Total Weightage_count By Col1 & Col2) / (Total Count by Col1 & Col2)

Expected result.

    Col1  Col2  Result
-------------------
1   A     S1     0.625
2   A     S2     0.5
3   B     S3     1 
4   C     S4     0.5
like image 383
Nirav Prajapati Avatar asked Jan 24 '26 05:01

Nirav Prajapati


1 Answers

First aggregate sum and then multiple columns in DataFrame.eval:

df = (df.groupby(['Col1','Col2'])
        .sum()
        .eval('Weightage_count / Count')
        .reset_index(name='Result'))
print (df)
  Col1 Col2  Result
0    A   S1  0.6250
1    A   S2  0.4375
2    B   S3  1.0000
3    C   S4  0.5000

Or divide by Series.div with DataFrame.pop for remove columns after processing:

df = df.groupby(['Col1','Col2'], as_index=False)[['Count','Weightage_count']].sum()
df['new'] = df.pop('Weightage_count').div(df.pop('Count'))
print (df)
  Col1 Col2     new
0    A   S1  0.6250
1    A   S2  0.4375
2    B   S3  1.0000
3    C   S4  0.5000

If need also columns:

df = df.groupby(['Col1','Col2'])[['Count','Weightage_count']].sum()
df['new'] = df['Weightage_count'].div(df['Count'])
print (df)
           Count  Weightage_count     new
Col1 Col2                                
A    S1        4             2.50  0.6250
     S2        4             1.75  0.4375
B    S3        4             4.00  1.0000
C    S4        3             1.50  0.5000
like image 145
jezrael Avatar answered Jan 25 '26 19:01

jezrael



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!