I have a binary classification task and use the xgboost package to solve it. Basically, I just use boosted trees to do so. But I am being evaluated on the Brier score, so I thought I would optimize the Brier loss function (defined as the brier score applied on top of logistic classification) which led me to define the gradient and the hessian of the Brier loss like so :
def brier(preds, dtrain):
labels = dtrain.get_label()
preds = 1.0 / (1.0 + np.exp(-preds))
grad = 2*(preds-labels)*preds*(1-preds)
hess = 2*(2*(labels+1)*preds-labels-3*preds*preds)*preds*(1-preds)
return grad, hess
def evalerror(preds, dtrain):
preds = 1.0 / (1.0 + np.exp(-preds))
labels = dtrain.get_label()
errors = (labels - preds)**2
return 'brier-error', float(np.sum(errors)) / len(labels)
param = {'eta':0.01,
'max_depth': 6, # the maximum depth of each tree
#'objective': 'binary:logistic',
'booster' : 'gbtree',
'eval_metric':['rmse', 'auc']}
bst = xgb.train(param,dtrain, num_boost_round=999,early_stopping_rounds=10,obj=brier, feval=evalerror,evals=[(dtrain,'train'),(dtest,'test')])
The only problem is that by doing so, I get negative values for my prediction on my test set, which suggests that the output of the xgboost model is not the logistic probability as expected. Does anyone know what I am missing here or if there is a better way to optimize the brier score?
Any help would be really appreciated!!
Thanks,
I came across the same issue and investigated it a little bit. I think OP's calculations are correct and the issue here is not about using diagonal approximation instead of exact hessian as suggested by @Damodar8, as it refers to multi-class classification problem.
As pointed out here:
NOTE: when you do customized loss function, the default prediction value is margin. this may make builtin evaluation metric not function properly for example, we are doing logistic loss, the prediction is score before logistic transformation the builtin evaluation error assumes input is after logistic transformation Take this in mind when you use the customization, and maybe you need write customized evaluation function
Although the comment itself is quite hard to unravel, the bolded sentence explain OP's issue. The solution is to just use logistic transformation to bst.predict results. Full example below:
import numpy as np
import xgboost as xgb
dtrain = xgb.DMatrix('/home/kuba/Desktop/agaricus.txt.train')
dtest = xgb.DMatrix('/home/kuba/Desktop/agaricus.txt.test')
def brier(preds, dtrain):
labels = dtrain.get_label()
preds = 1.0 / (1.0 + np.exp(-preds))
grad = 2*(preds-labels)*preds*(1-preds)
hess = 2*(2*(labels+1)*preds-labels-3*preds*preds)*preds*(1-preds)
return grad, hess
def evalerror(preds, dtrain):
preds = 1.0 / (1.0 + np.exp(-preds))
labels = dtrain.get_label()
errors = (labels - preds)**2
return 'brier-error', float(np.sum(errors)) / len(labels)
param = {'max_depth': 2, 'eta': 1, 'silent': 1}
watchlist = [(dtest, 'eval'), (dtrain, 'train')]
num_round = 2
bst = xgb.train(param, dtrain, num_round, watchlist, obj=brier, feval=evalerror)
pred = bst.predict(dtest)
pred.min(), pred.max()
# (-5.809054, 2.2280416)
prob = 1 / (1 + np.exp(-pred))
prob.min(), prob.max()
# (0.0029912924, 0.9027395)
I think you might want to look at the following: https://arxiv.org/pdf/1610.02757.pdf
By quoting the authors "Notice that XGBoost does not work with the exact hessian but with its diagonal approximation."
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With