Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Applying sampling method to sensitive features for fairness models #1085

Open
haytham918 opened this issue Jun 11, 2024 · 3 comments
Open

Applying sampling method to sensitive features for fairness models #1085

haytham918 opened this issue Jun 11, 2024 · 3 comments

Comments

@haytham918
Copy link

I am currently trying to incorporate imblearn's sampling methods such as SMOTE() and NearMiss() with ThresholdOptimizer and AdversarialFairnessClassifier from fairlearn. When I try to put all of them to run in imblearn.pipeline(sampling then classifier), the sampling step fails, which I guess it does not know what to do with the sensitive features we passed as metadata. Right now, I am twisting the work-flow to work this around, but I would like to know if there is a configuration or a feature that can easily solve this.

@glemaitre
Copy link
Member

Could you provide a minimal example with toy data and the version of the different model.

@glemaitre
Copy link
Member

This is highly possible that we need to modify our Pipeline implementation to be compatible with the metadata routing from scikit-learn.

@haytham918
Copy link
Author

import pandas as pd
from imblearn.pipeline import Pipeline as ImbPipeline
from imblearn.over_sampling import SMOTE
from fairlearn.adversarial import AdversarialFairnessClassifier
from sklearn.preprocessing import MinMaxScaler, Normalizer
from sklearn.model_selection import GridSearchCV
import sklearn
sklearn.set_config(enable_metadata_routing=True)

data = {
    'race': [1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1],
    'indicator': [1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1]
}


X = pd.DataFrame(data)


Y = [0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0]

# Sensitive Featrues
Z = X['race']



mitigator = AdversarialFairnessClassifier(
    backend="torch",
    predictor_model=[50, "relu"],
    adversary_model=[3, "relu"],
    batch_size=2**8,
    progress_updates=0.5,
    random_state=123,
).set_fit_request(sensitive_features=True)

pipe = ImbPipeline([
  ("scaling", Normalizer()), ("sampling", SMOTE()), ("classifier", mitigator)])

param_grid = {
    
    "classifier__batch_size": [2**6]
}

grid_s = GridSearchCV(pipe, param_grid, cv=5, scoring="accuracy")
grid_s.fit(X, Y, sensitive_features=Z)

Here is a piece of code that demonstrates the issue. But I also think the fairlearn's stuff has some incompatibility issue at this moment too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants