Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature selection/preprocessing before EBM fitting to speed up fitting? #566

Open
thaotnguyen opened this issue Aug 4, 2024 · 1 comment

Comments

@thaotnguyen
Copy link

I’m using tsfresh to generate tabular data from my time series. I have 3 channels per time series, and it generates 775 features each, so I have 2325 features total.

Fitting an EBM on my dataset (300 samples and 2325 features) takes almost an hour, which makes hyperparameter optimization nearly infeasible. Default EBM performance on my data is poor (70% accuracy), so I feel like hyperparameter optimization is probably necessary.

I assume that feature reduction is probably not necessary and may be discouraged with EBMs, but I’m considering using it just to speed up the time. Explainability is important to me, so I don’t want to use PCA. I’m considering using SelectKBest as a pre-processing step, but I’m not sure if there are other better ways to speed things up.

What can I do?

@paulbkoch
Copy link
Collaborator

Hi @thaotnguyen -- Feature reduction isn't discouraged for EBMs. Usually we'd recommend it from an interpretability point of view since having too many features reduces interpretability. SelectKBest could be one way. There's also a paper that demonstrated an interesting solution using LASSO (https://arxiv.org/abs/2311.07452). EBMs have the ability to drop features after fitting using the remove_terms function, which you could use for deciding what features to drop. You could also use the term_importances function.

Have you tried other packages like XGBoost/LightGBM on this data, and if so, what was the accuracy there? Hyperparameter optimization for EBMs can improve the model, but generally the improvements are modest. We do have recommended hyperparameters to try in https://interpret.ml/docs/hyperparameters.html.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants