-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature importance / model inspection #403
Comments
Thanks for this. For bagging ensembles it's reasonably straightforward. Some models we interface with also have it (e.g. XGBoost) and so it can just be part of the interface (via Support for permutation/drop FI seems reasonably easy, there's just a question as to where the implementation would go, maybe a comparable "model(s) inspection" module or package in MLJ or something of the sorts. The rest of your suggestions are a bit trickier. LIME is nice but basically is an entire package; like shap which is quoted in the article at your last point. |
Feature importance is an interesting one because most of the measures out there are rather ad-hoc and model dependent. That is, the very definition of feature importance depends on the model (eg, absolute value of a coefficient in a linear model makes no sense for a decision tree). And for certain models, eg trees and random forests, there are several inequivalent methods in common use. The paper cited above on shap describes an approach that is really model independent; unless someone is aware of another such approach, I suggest any generic MLJ tool follow that approach. There is already some implementation of SHAP in python, if I remember correctly. |
The recently created https://github.com/nredell/ShapML.jl may also be a very nice add (already compatible with MLJ as far as I can see) cc @nredell |
My plans for ShapML can be found on the Discourse site--https://discourse.julialang.org/t/ml-feature-importance-in-julia/17196/12--, but I'm posting here for posterity sake. Just sitting down for the first refactor/feature additions today. I'll code with these guidelines in mind (https://github.com/invenia/BlueStyle) as well as take a trip through the MLJ code base. And if a general feature importance package pops up in the future, I wouldn't be opposed to helping fold ShapML in if it's up to par and hasn't expanded too much by then. |
cc @sjvollmer (for summer FAIRness student, if not already aware) |
My current inclination is to see if this can be satisfactorily addressed with third party packages, such as the Shapley one. A POC would make a great MLJTutorial. If something more integrated makes sense, though, l'm interested to here about it. |
Any update on feature importance integration |
@vishalhedgevantage There are some GSoC students working on better integration of interpretable machine learning (LIME/Shapley). And there is this issue, which I opened to support recursive feature elimination. However, the volunteer who had expressed an interest in the latter must of got busy with other things... |
Any movement on this? |
@Moelf Feel free to open a request to expose feature importance at XGBoost.jl. To be honest, current priorities for MLJ favour pure julia solutions. I'm pretty sure EvoTrees.jl (which has an MLJ interface) exposes feature importances in the report. Perhaps you want to check out that well-maintained tree-boosting package. |
I think XGBoost is SOTA for many things (especially in my line of work, and turns out XGBoost was born in this field, amazing enough). Of course a Julia native XGBoost would be ideal and very cool, but I don't think it's on anyone's priority list |
EvoTrees.jl is a pure julia gradient tree boosting algorithm which already has a lot of functionality to be found in XGBoost and, as far as I can tell, is implementing basically the same algorithm. It does not have all the bells and whistles, but it is being actively developed. |
It would be nice to have some integrated tools for model inspection and feature importance (FI). Below are some links to resources and what's available in scikit learn.
Scikit learn exposes a number of tools for understanding the relative importance of features in a dataset. These tools are general in the sense that they can be made to work with many different kinds of models. They are organized in a module called "Inspection" which I find fitting, since they all allow the user to somehow understand or inspect the result of fitting a model in other ways than simply measuring the error/accuracy. Some of them are linked below
The text was updated successfully, but these errors were encountered: