Guide train_test_split - improvement B - indepth analysis #498
Labels
enhancement
New feature or request
needs-triage
This has been recently submitted and needs attention
Is your feature request related to a problem? Please describe.
As a data scientist, I want to be guided in the choice of the arguments in the scikit-learn train_test_split function, without having too many warnings to avoid being over my cognitive budget charge (let's say: 2 warnings max).
It's a follow-up of issue #492.
About the warning 7 on drift, we want to be able to dig further.
Describe the solution you'd like
We have a p-value saying that there is drift and the feature importance as input. We know that high cardinality features have a biais to increase feature importance in RF. To be really clean, we should compute feature importance on a test set (it's not something that is done today in scikit-learn, but they know they should).
We would like to remove little by little all the features for which we know there is drift or for which it has been analyzed, and check if, with the remaining features, there is still drift.
The design of the interaction still has to be drawn.
Describe alternatives you've considered, if relevant
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: