-
-
Notifications
You must be signed in to change notification settings - Fork 312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add missing_only=True to all imputers to use in combination with variables=None #388
Comments
As part of this change, ensure missing_only only works when variables is None in the AddMissingIndicator and DropMissingData classes. At the moment, the transformers will also check the users variables. |
Will be having a look at it later. |
Has anyone worked on this issue? I don't see a PR. Should we introduce |
It's all yours if you want it @Morgan-Sell :) The BaseImputer() looks like the right place. I look forward to the PR! |
Sounds good, @solegalli. To confirm the intention, this functionality will review all the values for each variable. It will then return only the variables that have missing values. For example (imagine the dictionaries are dataframes):
If Option A:
Option B:
Or, is it a different option? I'm assuming Option A, but I want to confirm ;) |
Option A. But note that all What needs to happen in There are a few things to consider:
|
Yeah, I see what you're saying about bullet #2. Since However, at a minimum, can we develop the It feels good to be back ;) |
Maybe we create the function you mention in the variable handling module: https://github.com/feature-engine/feature_engine/tree/main/feature_engine/variable_handling Because I think the categorical imputer also uses the base imputer, and then it has a method that is irrelevant. |
Why doesn't this apply to categorical variables? |
the function you mention is to select numerical variables. You need a different function for categoricals |
Yeah, I don't understand why the function cannot look for missing values in all variables. What am I missing? |
Does this also mean that we cannot introduce |
you gave the function this name: I trust your judgement. Tag me when you want me to have a look :) |
Add missing_only functionality to all imputers to use in combination with variables=None
When variables is None, the imputers select all numerical, or categorical or all variables by default. With the missing_only, it would select only those from each subgroup that show missing data during fit.
The text was updated successfully, but these errors were encountered: