I have a large Dataframe that looks similar to this:
ID_Code Status1 Status2
0 A Done Not
1 A Done Done
2 B Not Not
3 B Not Done
4 C Not Not
5 C Not Not
6 C Done Done
What I want to do is calculate is for each of the set of duplicate ID codes, find out the percentage of Not-Not entries are present. (i.e. [# of Not-Not/# of total entries] * 100)
I'm struggling to do so using groupby and can't seem to get the right syntax to perform this.
I may have misunderstood the question, but you appear to be referring to when values of Status1 and Status2 are both Not, correct? If that's the case, you can do something like:
df.groupby('ID_Code').apply(lambda x: (x[['Status1','Status2']] == 'Not').all(1).sum()/len(x)*100)
ID_Code
A 0.000000
B 50.000000
C 66.666667
dtype: float64
IIUC using crosstab
pd.crosstab(df['ID_Code'],(df['Status1'].eq('Not'))&(df['Status2'].eq('Not')),normalize ='index')
Out[713]:
col_0 False True
ID_Code
A 1.000000 0.000000
B 0.500000 0.500000
C 0.333333 0.666667
#pd.crosstab(df['ID_Code'],(df['Status1'].eq('Not'))&(df['Status2'].eq('Not')),normalize ='index')[True]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With