Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: add metric example to xgboost docs #4917

Merged
merged 3 commits into from
Oct 13, 2023

Conversation

djsauble
Copy link
Contributor

Fixes: iterative/dvclive#721

The XGBoost docs have a dangling param variable that isn't defined anywhere. Since this is where you define your metric, and this is an important part of experimentation, I added an example of how to use it to set a metric via the eval_metric param.

@github-actions
Copy link
Contributor

github-actions bot commented Oct 12, 2023

Link Check Report

There were no links to check!

@dberenbaum
Copy link
Collaborator

Thanks @djsauble! As long as we are making changes, can we fix it up a bit more?

For example, can it follow a workflow that looks like the xgboost get started docs? It seems like that's what they see as their primary interface, and subjectively it looks easier to understand to me.

It would also be nice to flesh out the top example even more to show where other variables like dtrain and dval come from.

@djsauble
Copy link
Contributor Author

djsauble commented Oct 13, 2023

For example, can it follow a workflow that looks like the xgboost get started docs? It seems like that's what they see as their primary interface, and subjectively it looks easier to understand to me.

@dberenbaum Just to clarify, you're saying we should use something like this:

model = xgb.XGBClassifier()
model.fit(X_train, y_train)

…instead of…

xgb.train()

?

It would also be nice to flesh out the top example even more to show where other variables like dtrain and dval come from.

This is the working XGBoost code I ended up with after going through our existing docs + the hint about metrics from @mattseddon:

from dvclive import Live
from dvclive.xgb import DVCLiveCallback
import xgboost as xgb
from sklearn import datasets
from sklearn.model_selection import train_test_split

iris = datasets.load_iris()

X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, train_size=0.8)

dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)

with Live("results") as live:
    xgb.train(
        {
            "eval_metric": "rmsle"
        },
        dtrain,
        100,
        early_stopping_rounds=5,
        callbacks=[DVCLiveCallback("eval_data")],
        evals=[(dtest, "eval_data")]
    )

    live.log_metric("summary_metric", 1.0, plot=False)

How much of this do you want to see in our docs? I don't really think we should go all the way back to a specific dataset. The focus here should be on the call to train() (or fit()) and the associated base parameters.

@dberenbaum
Copy link
Collaborator

Fair points, @djsauble. I don't want to block the improvements you made, so I just left one minor comment to clarify and otherwise it looks good, thanks!

@djsauble
Copy link
Contributor Author

@dberenbaum I actually agree with you that using .fit() makes sense. It's more familiar to me anyway. 🙂

I'll fix this up a bit more and ping you again for a quick review.

@djsauble djsauble marked this pull request as draft October 13, 2023 16:22
@dberenbaum
Copy link
Collaborator

@dberenbaum I actually agree with you that using .fit() makes sense. It's more familiar to me anyway. 🙂

If we only show that .fit() sklearn-like interface, do you think it leaves the user confused about how to do it with xgb.train()?

@djsauble
Copy link
Contributor Author

Let's choose one or the other. The sklearn interface is probably more familiar to people, but I don't feel strongly about this. It looks like both interfaces support callbacks, which is the only hard requirement for DVCLive.

Which interface do you want to go with?

@dberenbaum
Copy link
Collaborator

Sounds good. Let's go with the sklearn interface then.

@shcheklein shcheklein temporarily deployed to dvc-org-add-metric-exam-3wx0bv October 13, 2023 17:53 Inactive
@djsauble djsauble marked this pull request as ready for review October 13, 2023 17:55
@djsauble
Copy link
Contributor Author

@dberenbaum Switched to the sklearn interface and tested all code snippets locally. Think this is good now.

@shcheklein shcheklein temporarily deployed to dvc-org-add-metric-exam-3wx0bv October 13, 2023 19:03 Inactive
@dberenbaum dberenbaum merged commit 3d6b6c3 into main Oct 13, 2023
5 checks passed
@dberenbaum dberenbaum deleted the add_metric_example_to_xgboost_docs branch October 13, 2023 19:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

xgboost: clarify docs: make it easier to add metrics to an existing project
3 participants