Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update R parameter constructor. #11083

Closed
trivialfis opened this issue Dec 9, 2024 · 5 comments
Closed

Update R parameter constructor. #11083

trivialfis opened this issue Dec 9, 2024 · 5 comments

Comments

@trivialfis
Copy link
Member

          > *Since this PR concerns the user interface, would be great if we could get some help for review from an R user's perspective @mayer79 @jameslamb*

Thanks for the @, sorry I did not get to it before this was merged.

The idea behind this constructor function is to offer an IDE-friendly way of creating and discovering XGBoost parameters, with full in-package documentation as is typical in R packages

I think it's a great idea, and would definitely be helpful for interactive development.

Some suggestions (to consider in follow-ups, given that this was already merged):

  • it would be good to have explicit unit tests on this new function, not just rely on implicit coverage from other tests. For example:
    • confirming that only non-NULL items show up in the return value
    • confirming that calling this with 0 arguments returns an empty list
  • I understand that existing names with . in them need to be preserved for backwards compatibility, but I'd stop adding new ones (e.g. I'd prefer xgb_params to xgb.params)
  • I'd have asked to change the line wrapping of the roxygen comments instead of ignoring the project's preferred line length with # nolint comments
    • or at least, asked that the linter be allowed to cover the function signature, which it currently cannot because of this placement:

# nolint end

I also would have asked for a little research supporting it being safe to change xgb.parameters to xgb.model.parameters without a deprecation cycle. For example, it looks to me like nothing on CRAN is currently using xgb.parameters (https://github.com/search?q=org%3Acran%20%22xgb.parameters%22&type=code), so I guess it at least won't break anything on CRAN. (I don't recall where the discussion about xgboost2 vs. releasing this as a new version of xgboost ended, disregard this last comment if the plan is to release a new package).

Originally posted by @jameslamb in #11072 (comment)

@trivialfis
Copy link
Member Author

cc @david-cortes

Converted from #11072 (comment) .

@david-cortes
Copy link
Contributor

Added a PR with tests: #11084

To address the other points:

I understand that existing names with . in them need to be preserved for backwards compatibility, but I'd stop adding new ones (e.g. I'd prefer xgb_params to xgb.params)

We've been adding all new non-method functions with the xgb. prefix. These haven't been a problem as there's no "xgb" class (nearest are "xgb.Booster" and "xgb.DMatrix", which do not clash with "xgb." as prefix). I wouldn't want to change that as it makes the code calls similar to the python examples where xgboost is imported as xgb, and changing it for some functions but not others would be confusing.

I'd have asked to change the line wrapping of the roxygen comments instead of ignoring the project's preferred line length with # nolint comments

I think the closer the R docs stay to the original .rst, the easier it'd be to update them later on. The .rst file doesn't manually break lines, relying on sphinx doing the line wrapping instead. Current logic (nolint) makes it easy to copy-paste from them, which is how the file was generated.

or at least, asked that the linter be allowed to cover the function signature, which it currently cannot because of this placement:

That's already checked by R CMD check.

I also would have asked for a little research supporting it being safe to change xgb.parameters to xgb.model.parameters without a deprecation cycle.

We haven't decided on that aspect yet, but we've already established that we're going to do big breaking changes with the next major release. There will be no deprecation cycle, other than for a few renames.

@jameslamb
Copy link
Contributor

Thanks very much for the tests and explanation @david-cortes . If the decisions about continuing the use of names with . in them and making big breaking changes have already been decided, I won't re-litigate them.

The point about copying directly from other non-R docs is a good justification to ignore line-length preferences there, I hadn't considered that it was literally copying and pasting.

There's just one thing I want to clarify:

That's already checked by R CMD check.

R CMD check will check for things like "argument in signature but not in docs" (and vice-versa), but it won't check for the types of preferences that might be enforced with {lintr}, like indentation, comma-first vs. comma-last, or spacing around the =. Today this new function has all = NULL in the signature but if that ever changed and some arguments defaulted to other R objects, then R CMD check would also not enforce things like this project's preference for using the form "single-string" instead of c("single-string"):

unnecessary_concatenation = lintr::unnecessary_concatenation_linter(),

It's for those reasons that I think the # nolint should have been placed before and not after the signature.

@david-cortes
Copy link
Contributor

It's for those reasons that I think the # nolint should have been placed before and not after the signature.

That could potentially bring problems later on when #' and # are interleaved, with both roxygen and git.

Today this new function has all = NULL in the signature but if that ever changed and some arguments defaulted to other R objects, then R CMD check would also not enforce things like this project's preference for using the form "single-string" instead of c("single-string"):

I don't think that'd matter much since the signature that's written into the .Rd file gets post-processed by roxygen, so those details would be lost.

but it won't check for the types of preferences that might be enforced with {lintr}, like indentation, comma-first vs. comma-last, or spacing around the =.

Should be fixed here: #11090

@jameslamb
Copy link
Contributor

#11090 addresses the last concern I had, thank you for that! This can be closed when that's merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants