Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for custom resampling #24

Open
tecosaur opened this issue Nov 25, 2022 · 3 comments
Open

Support for custom resampling #24

tecosaur opened this issue Nov 25, 2022 · 3 comments

Comments

@tecosaur
Copy link

For a problem I'm currently working on, it would be tremendously helpful if I could use a custom resampling method (in my case, a modified stratified bootstrap) to form the training sets used for each "atom" model in the ensemble.

At the moment bagging_fraction is supported, which is essentially special cases the bootstrap sample approach. Perhaps it would be possible for this to be generalized to support any ResamplingStrategy?

@ablaom
Copy link
Member

ablaom commented Nov 27, 2022

@tecosaur Thanks for the suggestion. Let me see if I understand it.

Each ResamplingStrategy from MLJBase generates a vector of 2-tuples of the form (train, test). I guess your observation is that the current resampling used in EnsembleModel (bagging without replacement) amounts to generating each atomic sample by taking the first (and only) element (train, test) returned by MLJBase.train_test_pairs(Holdout(fraction_train=bagging_fraction, rng=rng) (a vector) and using the indices in train (ignoring test) right? And you are suggesting the ability to do the same with any ResamplingStrategy?

@tecosaur
Copy link
Author

That is indeed what I'm proposing. This does also tie in with #25 somewhat, in that the test indices could be optionally used for out-of-bag predictions, should the user want it.

@ablaom
Copy link
Member

ablaom commented Dec 4, 2022

Makes sense. It seems to me that we could implement this proposal, incorporating the out-of-bag predictions, and I'd support that.

On the other hand, if a more substantial improvement / re-design is being entertained, then we'd want to incorporate those changes concurrently. I'd support that too, but doubt the core MLJ team has the resources to divert to such a project just now.

@tecosaur Is that something you'd be interested in?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants