Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guide train_test_split - improvement B - indepth analysis #498

Open
MarieS-WiMLDS opened this issue Oct 15, 2024 · 0 comments
Open

Guide train_test_split - improvement B - indepth analysis #498

MarieS-WiMLDS opened this issue Oct 15, 2024 · 0 comments
Labels
enhancement New feature or request needs-triage This has been recently submitted and needs attention

Comments

@MarieS-WiMLDS
Copy link
Contributor

Is your feature request related to a problem? Please describe.

As a data scientist, I want to be guided in the choice of the arguments in the scikit-learn train_test_split function, without having too many warnings to avoid being over my cognitive budget charge (let's say: 2 warnings max).
It's a follow-up of issue #492.
About the warning 7 on drift, we want to be able to dig further.

Describe the solution you'd like

We have a p-value saying that there is drift and the feature importance as input. We know that high cardinality features have a biais to increase feature importance in RF. To be really clean, we should compute feature importance on a test set (it's not something that is done today in scikit-learn, but they know they should).
We would like to remove little by little all the features for which we know there is drift or for which it has been analyzed, and check if, with the remaining features, there is still drift.

The design of the interaction still has to be drawn.

Describe alternatives you've considered, if relevant

No response

Additional context

No response

@MarieS-WiMLDS MarieS-WiMLDS added enhancement New feature or request needs-triage This has been recently submitted and needs attention labels Oct 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request needs-triage This has been recently submitted and needs attention
Projects
None yet
Development

No branches or pull requests

1 participant