Pandas comparing dataframe with student results against historic quantiles

Question

I have two dataframes. One shows student test results by class on two tests

import pandas as pd   
 results = pd.DataFrame({
    'id':[1,2,3],
    'class':[1,1,2],
    'test_1':[0.67,0.88,0.33],
    'test_2':[0.76,0.63,0.78]})
    results

   id  class  test_1  test_2
0   1      1    0.67    0.76
1   2      1    0.88    0.63
2   3      2    0.33    0.78

The other shows quantiles by class and test based on previous semesters

quantiles = pd.DataFrame({'class':[1,2],
'test_1_0.25':[0.23,0.31],
'test_1_0.5':[0.54,0.67],
'test_1_0.75':[0.8,0.9],
'test_2_0.25':[0.23,0.31],
'test_2_0.5':[0.54,0.67],
'test_2_0.75':[0.8,0.9]})

  class  test_1_0.25  test_1_0.5  test_1_0.75  test_2_0.25  test_2_0.5  \
0      1         0.23        0.54          0.8         0.23        0.54   
1      2         0.31        0.67          0.9         0.31        0.67

   test_2_0.75  
0          0.8  
1          0.9

I would like to return a datarfame that tells me what quantile they place in. 0 if they are below 25, 1 if below 50, 2 if below 75, and 3 if above 75. So the output would look like this

   id  test_1_quantile  test_2_quantile  
0   1                2                2   
1   2                3                1   
2   3                1                2

Any help is much appreciated. Thanks

jezrael · Accepted Answer

First DataFrame.merge both DataFrame, then loop be all test values and processing - first DataFrame.filter by same test, add column for test values bellow .25 quantile, set new columns names for output range and compare by DataFrame.lt. Last change order of columns by iloc and get column name of first True value for replace test column:

df = pd.merge(results, quantiles, on='class')

for t in results.columns.difference(['id','class']):
    #print (t)
    df1 = df.filter(like=t)
    df1.insert(1, t + '_0', 0)
    df1.columns = [t] + list(range(4))
    #print (df1)
    a = df1.iloc[:, 1:].lt(df1[t], axis=0).iloc[:, ::-1].idxmax(axis=1)
    df[t] = a

print (df[results.columns])
   id  class  test_1  test_2
0   1      1       2       2
1   2      1       3       2
2   3      2       1       2

Pandas comparing dataframe with student results against historic quantiles

Tags:

python

pandas

L Xandor

1 Answers

jezrael

Recent Activity

Donate For Us

Pandas comparing dataframe with student results against historic quantiles

Tags:

python

pandas

L Xandor

1 Answers

jezrael

Related questions

Recent Activity

Donate For Us