You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Data Department recently performed some model benchmarking (ccao-data/report-model-benchmark) comparing the run times of XGBoost and LightGBM. We found that the current iteration of XGBoost runs much faster than LightGBM on most machines, while achieving similar performance.
We should test replacing LightGBM as the primary modeling engine in both models.
LightGBM
Pros
Native categorical support (easier feature engineering + clean SHAP values)
Better maintain R package
Already have bindings for advanced features (via Lightsnip)
Slightly better performance for our data
Cons
Slightly slower for general training (as of XGBoost 2.0.0)
Massively slower for calculating SHAP values (full order of magnitude)
Backend code seems much buggier
GPU support is lacking (+ hard to build for the R package)
Approximately 50,000 hyperparameters
XGBoost
Pros
Well-maintained codebase, will definitely exist in perpetuity
Excellent GPU and multi-core training support. Calculates SHAPs very quickly
More widely used than LightGBM
Cons
No native categorical support in the R package, even though the underlying XGBoost C++ supports it. Unlikely to change by the time we need to ship the 2024 model
R package support seems lacking
The text was updated successfully, but these errors were encountered:
Definitely not going to happen this year. The XGBoost R package is in heavy development right now (and still doesn't have native categorical support like the Python package does). May be worth picking up in the spring. Performance between the two engines was extremely similar in untuned benchmarking.
@ssaurbier As much as we'd love a Python rewrite of this model (see #230), it would probably be more helpful to get your eyes on the condo model. Obviously your time is your own; just trying to point you toward the areas of highest need.
The Data Department recently performed some model benchmarking (ccao-data/report-model-benchmark) comparing the run times of XGBoost and LightGBM. We found that the current iteration of XGBoost runs much faster than LightGBM on most machines, while achieving similar performance.
We should test replacing LightGBM as the primary modeling engine in both models.
LightGBM
Pros
Cons
XGBoost
Pros
Cons
The text was updated successfully, but these errors were encountered: