I would like to understand how the gradient and hessian of the logloss function are computed in an xgboost sample script.
I've simplified the function to take numpy arrays, and generated y_hat and y_true which are a sample of the values used in the script.
Here is a simplified example:
import numpy as np
def loglikelihoodloss(y_hat, y_true):
prob = 1.0 / (1.0 + np.exp(-y_hat))
grad = prob - y_true
hess = prob * (1.0 - prob)
return grad, hess
y_hat = np.array([1.80087972, -1.82414818, -1.82414818, 1.80087972, -2.08465433,
-1.82414818, -1.82414818, 1.80087972, -1.82414818, -1.82414818])
y_true = np.array([1., 0., 0., 1., 0., 0., 0., 1., 0., 0.])
loglikelihoodloss(y_hat, y_true)
The log loss function is the sum of
where
.
The gradient (with respect to p) is then
however in the code its
.
Likewise the second derivative (with respect to p) is
however in the code it is
.
How are the equations equal?
The log loss function is given as:

where

Taking the partial derivative we get the gradient as

Thus we get the negative of gradient as p-y.
Similar calculations can be done to obtain the hessian.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With