Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for ensemble out-of-bag prediction #25

Open
tecosaur opened this issue Nov 25, 2022 · 3 comments
Open

Support for ensemble out-of-bag prediction #25

tecosaur opened this issue Nov 25, 2022 · 3 comments

Comments

@tecosaur
Copy link

In addition to out_of_bag_measure, I think it could be rather helpful to be able to obtain the ensemble's overall out-of-bag prediction.

In my own code (that I'd like to convert to use MLJEnsembles when possible), I'm currently creating a Matrix of predictions and missing for each model then aggregating the result by taking the row-wise means.

@ablaom
Copy link
Member

ablaom commented Nov 27, 2022

Sounds like a good suggestion to me. The usual way to expose something like this would be to return the out-of-bag predictions as part of the model report (last item returned by MLJModelInterface.fit). For example, outlier detection models return training scores that way, and MLJFlux models return training losses that way. Happy to support a PR.

I should say, MLJEnsembles is some of the oldest MLJ code and it may be that a rethink is worthwhile, if someone had the resources. See JuliaAI/MLJ.jl#363 for some old related discussion.

@tecosaur
Copy link
Author

Thanks for that link. A generalized blend of MLJEnsembles and SampleFitCombine seems like it would be quite good, but I'd think some breaking changes would be required to do this nicely.

@ablaom
Copy link
Member

ablaom commented Nov 29, 2022

If we get a better design that would be fine by me, as long as we don't need breaking changes to the basic MLJ model interface. I see SampleFitCombine.jl looks abandoned and was never registered, so one may want to be cautious what we take from there.

I did meet with the author at that time and I think his main use case was mixture models - creating an ensemble of probability distributions, which in MLJ we treat as supervised learners with empty input X; see here . We haven't actually implemented any of those, although I don't see any immediate problem. So, for example, one could wrap distributions from Distributions.jl that way.

But this may not be too relevant or out-of-scope. I'm just trying to recall what I remember from our conversations and I haven't reviewed the discussion linked above myself yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants