Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test xgboost modeling engine #31

Open
dfsnow opened this issue Nov 2, 2023 · 4 comments
Open

Test xgboost modeling engine #31

dfsnow opened this issue Nov 2, 2023 · 4 comments
Assignees
Labels
method ML technique or method change

Comments

@dfsnow
Copy link
Member

dfsnow commented Nov 2, 2023

The Data Department recently performed some model benchmarking (ccao-data/report-model-benchmark) comparing the run times of XGBoost and LightGBM. We found that the current iteration of XGBoost runs much faster than LightGBM on most machines, while achieving similar performance.

We should test replacing LightGBM as the primary modeling engine in both models.

LightGBM

Pros

  • Native categorical support (easier feature engineering + clean SHAP values)
  • Better maintain R package
  • Already have bindings for advanced features (via Lightsnip)
  • Slightly better performance for our data

Cons

  • Slightly slower for general training (as of XGBoost 2.0.0)
  • Massively slower for calculating SHAP values (full order of magnitude)
  • Backend code seems much buggier
  • GPU support is lacking (+ hard to build for the R package)
  • Approximately 50,000 hyperparameters

XGBoost

Pros

  • Well-maintained codebase, will definitely exist in perpetuity
  • Excellent GPU and multi-core training support. Calculates SHAPs very quickly
  • More widely used than LightGBM

Cons

  • No native categorical support in the R package, even though the underlying XGBoost C++ supports it. Unlikely to change by the time we need to ship the 2024 model
  • R package support seems lacking
@dfsnow dfsnow added this to the 2024 model changes milestone Dec 5, 2023
@dfsnow dfsnow added the method ML technique or method change label Dec 5, 2023
@dfsnow
Copy link
Member Author

dfsnow commented Jan 24, 2024

Definitely not going to happen this year. The XGBoost R package is in heavy development right now (and still doesn't have native categorical support like the Python package does). May be worth picking up in the spring. Performance between the two engines was extremely similar in untuned benchmarking.

@dfsnow dfsnow removed this from the 2024 model changes milestone Jan 24, 2024
@dfsnow dfsnow self-assigned this Dec 6, 2024
@dfsnow
Copy link
Member Author

dfsnow commented Dec 6, 2024

There's been a lot of work on this in this issue in the xgb repo. Might be worth testing this new interface to see if it's worth switching.

@ssaurbier
Copy link

I am rebuilding the repo in python now - stay tuned!

@dfsnow
Copy link
Member Author

dfsnow commented Dec 22, 2024

@ssaurbier As much as we'd love a Python rewrite of this model (see #230), it would probably be more helpful to get your eyes on the condo model. Obviously your time is your own; just trying to point you toward the areas of highest need.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
method ML technique or method change
Projects
None yet
Development

No branches or pull requests

2 participants