This is my dataFrame
df = pd.DataFrame([['@1','A',40],['@2','A',60],['@3','A',47],['@4','B',33],['@5','B',69],['@6','B',22],['@7','B',90],['@8
','C',31],['@9','C',78],['@10','C',12],['@11','C',89],['@12','C',88],['@13','C',99]],columns=['id','channel','score'])
     id channel  score
0    @1       A     40
1    @2       A     60
2    @3       A     47
3    @4       B     33
4    @5       B     69
5    @6       B     22
6    @7       B     90
7    @8       C     31
8    @9       C     78
9   @10       C     12
10  @11       C     89
11  @12       C     88
12  @13       C     99
Each channel has its own total number , I set a percent number = 80%
and I want to take int(channel'num * 0.8) nlargest , so it's will be
A channel take int(3*0.8) = 2
B channel take int(4*0.8) = 3
C channel take int(6*0.8) = 4
     id channel  score
1    @2       A     60
2    @3       A     47
3    @4       B     33
4    @5       B     69
6    @7       B     90
8    @9       C     78
10  @11       C     89
11  @12       C     88
12  @13       C     99
How can I do that , thanks.
Use groupby with nlargest:
a = 0.8
df1 = (df.groupby('channel',group_keys=False)
        .apply(lambda x: x.nlargest(int(len(x) * a), 'score')))
print (df1)     
     id channel  score
1    @2       A     60
2    @3       A     47
6    @7       B     90
4    @5       B     69
3    @4       B     33
12  @13       C     99
10  @11       C     89
11  @12       C     88
8    @9       C     78
Another solution with sort_values + groupby + head:
df1 = (df.sort_values('score', ascending=False)
        .groupby('channel',group_keys=False)
        .apply(lambda x: x.head(int(len(x) * a)))
        .reset_index(drop=True))
print (df1)
    id channel  score
0   @2       A     60
1   @3       A     47
2   @7       B     90
3   @5       B     69
4   @4       B     33
5  @13       C     99
6  @11       C     89
7  @12       C     88
8   @9       C     78
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With