How can I reduce processing time with xgboost by utilizing my GPU?

Question

I'm following this tutorial on data camp and one thing they mentioned was utilizing your GPU for faster processing times. They even go as far to say it is "blazing fast".

I, however, am seeing the opposite results. For the code block below, with 10k boosts, I am see ~30 seconds with "hist" passed in my params vs just over a minute with "gpu_hist" passed with my params.

Usage on my GPU caps out at 40% when using "gpu_hist" and 100% on all 24 logical cores when using "hist"

params = {"objective": "reg:squarederror", "tree_method": "gpu_hist", "subsample": 0.8,
    "colsample_bytree": 0.8}

evals = [(dtrain_reg, "train"),(dtest_reg, "validation")]

n = 10000


model = xgb.train(
   params=params,
   dtrain=dtrain_reg,
   num_boost_round=n,
   evals=evals,
   verbose_eval=50,
)

I'm trying to run this in VSCode in a jupyter notebook.

I've installed the CUDA toolkit and cuDNN
I've checked that they are added to the path
I've ensured that I installed the correct version of xgboost to utilize the gpu.
the data set is 53k rows with 10 columns, so I do not believe the dataset is too small
I've confirmed compatibility (using an RTX 2060)

I've asked chatGPT, searched online, even asked a mentor in the course I'm taking but have not been able to diagnose why it's taking so much longer with "gpu_hist" vs just "hist".

There is another similar question here in Stack Overflow from 4 months ago that has zero responses.

talismanbrandi · Accepted Answer

It's been a while but maybe a reply is warranted since I spent some time figuring this out. Try this code:

from sklearn.datasets import fetch_california_housing
import xgboost as xgb

# Fetch dataset using sklearn
data = fetch_california_housing()
X = data.data
y = data.target

num_round = 1000

param = {
    "eta": 0.05,
    "max_depth": 10,
    "tree_method": "hist",
    "device": "cpu",  # switch between "cpu" and "GPU" 
    "nthread": 24,  # increase this to increase CPU threads
    "seed": 42
}

# GPU accelerated or CPU multicore training
dtrain = xgb.DMatrix(X, label=y, feature_names=data.feature_names)
model = xgb.train(param, dtrain, num_round)

On 24 threads I am getting:
CPU times: user 1min 9s, sys: 43.7 ms, total: 1min 9s. Wall time: 2.95 s

On 32 threads I am getting:
CPU times: user 1min 40s, sys: 33.8 ms, total: 1min 40s Wall time: 3.19 s

With a GPU I am getting:
CPU times: user 6.47 s, sys: 9.98 ms, total: 6.48 s Wall time: 5.96 s

XGBoost does not scale too well with increasing parallelism. Sending to a GPU can actually increase training time for XGBoost

On the other hand, if you need to compute feature importance with SHAP

import shap    
model.set_param({"device": "gpu"}) # "cpu" or "GPU"
shap_values = model.predict(dtrain, pred_contribs=True)

On 32 threads I am getting:
CPU times: user 43min 43s, sys: 54.2 ms, total: 43min 43s Wall time: 1min 23s

With a GPU I am getting:
CPU times: user 3.06 s, sys: 28 ms, total: 3.09 s Wall time: 3.09 s

So SHAP computation is highly accelerated on a GPU.

There are more examples of XGBoost on CPU vs. GPU here
The GPUTreeShap paper is here

This is on NVIDIA GeForce RTX 3090/AMD Ryzen 9 5950X 16-Core

How can I reduce processing time with xgboost by utilizing my GPU?

Tags:

python

machine-learning

xgboost

Adrian V

1 Answers

talismanbrandi

Recent Activity

Donate For Us

How can I reduce processing time with xgboost by utilizing my GPU?

Tags:

python

machine-learning

xgboost

Adrian V

1 Answers

talismanbrandi

Related questions

Recent Activity

Donate For Us