I have trained a binary classifier, but I think that my ROC curve is incorrect.
This is the vector that contains labels:
y_true= [0, 1, 1, 1, 0, 1, 0, 1, 0]
and the second vector is the score vector
y_score= [
0.43031937, 0.09115553, 0.00650781, 0.02242869, 0.38608587,
0.09407699, 0.40521139, 0.08062053, 0.37445426
]
When I plot my ROC curve, I get the following:
I think the code is correct, but I don't understand why I'm getting this curve and why the tpr
, fpr
, and threshold
lists are of length 4. Why is my AUC is equal to zero?
fpr [0. 0.25 1. 1. ]
tpr [0. 0. 0. 1.]
thershold [1.43031937 0.43031937 0.37445426 0.00650781]
My Code:
import sklearn.metrics as metrics
fpr, tpr, threshold = metrics.roc_curve(y_true, y_score)
roc_auc = metrics.auc(fpr, tpr)
# method I: plt
import matplotlib.pyplot as plt
plt.title('Receiver Operating Characteristic')
plt.plot(fpr, tpr, 'b', label = 'AUC = %0.2f' % roc_auc)
plt.legend(loc = 'lower right')
plt.plot([0, 1], [0, 1],'r--')
plt.xlim([0, 1])
plt.ylim([0, 1])
plt.ylabel('True Positive Rate')
plt.xlabel('False Positive Rate')
plt.show()
One thing to keep in mind about AUC is that what's really important is distance from 0.5. If you have a really low AUC, that just means that your "positive" and "negative" labels are switched.
Looking at your scores, it's clear that a low score (anything less than ~0.095) means a 1 and anything above that threshold is a 0. So you actually have a great binary classifier!
The problem is that by default, higher scores are associated with the label 1. So you're labeling points with high scores as 1's instead of 0's. Thus you're wrong 100% of the time. In that case, just switch your predictions and you'll be correct 100% of the time.
The simple fix is to use the pos_label
argument to sklearn.metrics.roc_curve
. In this case you want your positive label to be 0.
fpr, tpr, threshold = metrics.roc_curve(y_true, y_score, pos_label=0)
roc_auc = metrics.auc(fpr, tpr)
print(roc_auc)
#1.0
What @pault stated is misleading
If you have a really low AUC, that just means that your "positive" and "negative" labels are switched.
AUC=0 implies that
AUC=1 implies that there is a threshold, that can perfectly separate the data.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With