I'm running XGBoost XGBRegressor with Python and dataset that looks like this:
click_id | manufacturer | category    | delivery_time | price | revenue 1        |10            | 100        | 24             | 100   | 02        |10            | 100        | 24             | 100   | 03        |10            | 100        | 24             | 100   | 04        |10            | 100        | 24             | 100   | 1205        |20            | 200        | 48             | 200   | 0
Revenue is dependent variable and the rest of variables are features.
When I run XGBRegressor and set eval_metric as "mae" (mean absolute error) the training and validation errors are constantly increasing. How can training error increase? Is there any case (any combination of model parameters or weird data points) that might cause xgboost training error to increase?
This is the code:
model = XGBRegressor(
    learning_rate=0.1,
    n_estimators=200,
    max_depth=5,
    min_child_weight=1,
    gamma=0,
    subsample=0.9,
    colsample_bytree=0.9,
    reg_alpha=10,
    nthread=4)
model.fit(X_train, y_train, 
          eval_set=[(X_train, y_train), (X_test, y_test)], eval_metric='mae')
When eval_metric is set as "rmse" training error is decreasing as expected.
You have to distinguish between minimizing the objective and the error on the evaluation sets (calculated by the eval_metric). These two can be different - and that is the reason for the increasing error in your evaluation set.
XGBoost in your setting is trying to minimize the root mean squared error (RMSE) because you use objective="reg:linear" as argument (default argument of XGBRegressor). In fact, XGBoost does not even support mean absolute error (MAE) as objective function. Have a look at the XGBoost objective parameter for details. A reason why MAE as objective is not implemented might be that XGBoost needs non-zero second order derivative in the algorithm (which is not the case for MAE). 
Using XGRegressors train-function (see here) you can define your own objective by defining the error-function and the function for calulating the gradient and hessian (first and second order derivative). Have a look at this example for details. 
I tried to implement MAE myself by setting the hessian to a constant but small value. Unfortunately, it converged very slowly. It still may work in with your data.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With