You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Could you please provide some clarification on the differences and/or how to choose between using xgboost_ray.train + xgboost_ray.RayDMatrix or ray.train.xgboost.XGBoostTrainer + ray.data.Dataset?
My use case is running Ray Tune on Azure Databricks, which operates on Spark. According to the Databricks docs, one creates a Ray Cluster using the Ray on Spark API, and creates a Ray Dataset from Parquet files.
Below are the questions I would like clarification on. Any help you could provide would be greatly appreciated.
Data
According to the README.md one can create a RayDMatrix from either Parquet files or a Ray Dataset:
Create a Ray Dataset from Parquet files, then create a RayDMatrix from that Dataset
or
Create the RayDMatrix directly from Parquet files
Training
Should I use Ray Tune with XGBoostTrainer or with xgboost_ray.train, running on this Ray on Spark Cluster?
I also intend to implement CV with early stopping. Since tune-sklearn is now deprecated, I understand that I'll need to implement this myself. As explained in ray-project/ray#21848 (comment), this can be done with ray.tune.stopper.TrialPlateauStopper. But according to #301 we can also use XGBoost's native xgb.callback.EarlyStopping. Which approach would you recommend? Can TrialPlateauStopper be used with xgboost_ray?
Thank you very much for any help you can offer.
The text was updated successfully, but these errors were encountered:
Could you please provide some clarification on the differences and/or how to choose between using
xgboost_ray.train
+xgboost_ray.RayDMatrix
orray.train.xgboost.XGBoostTrainer
+ray.data.Dataset
?My use case is running Ray Tune on Azure Databricks, which operates on Spark. According to the Databricks docs, one creates a Ray Cluster using the Ray on Spark API, and creates a Ray Dataset from Parquet files.
Below are the questions I would like clarification on. Any help you could provide would be greatly appreciated.
Data
According to the
README.md
one can create aRayDMatrix
from either Parquet files or a RayDataset
:xgboost_ray/README.md
Lines 450 to 465 in e904925
So if using
xgboost_ray
, should IDataset
from Parquet files, then create aRayDMatrix
from thatDataset
or
RayDMatrix
directly from Parquet filesTraining
Should I use Ray Tune with
XGBoostTrainer
or withxgboost_ray.train
, running on this Ray on Spark Cluster?I also intend to implement CV with early stopping. Since
tune-sklearn
is now deprecated, I understand that I'll need to implement this myself. As explained in ray-project/ray#21848 (comment), this can be done withray.tune.stopper.TrialPlateauStopper
. But according to #301 we can also use XGBoost's nativexgb.callback.EarlyStopping
. Which approach would you recommend? CanTrialPlateauStopper
be used withxgboost_ray
?Thank you very much for any help you can offer.
The text was updated successfully, but these errors were encountered: