Applying sampling method to sensitive features for fairness models #1085

haytham918 · 2024-06-11T16:36:48Z

I am currently trying to incorporate imblearn's sampling methods such as SMOTE() and NearMiss() with ThresholdOptimizer and AdversarialFairnessClassifier from fairlearn. When I try to put all of them to run in imblearn.pipeline(sampling then classifier), the sampling step fails, which I guess it does not know what to do with the sensitive features we passed as metadata. Right now, I am twisting the work-flow to work this around, but I would like to know if there is a configuration or a feature that can easily solve this.

The text was updated successfully, but these errors were encountered:

glemaitre · 2024-06-12T15:19:49Z

Could you provide a minimal example with toy data and the version of the different model.

glemaitre · 2024-06-12T15:21:29Z

This is highly possible that we need to modify our Pipeline implementation to be compatible with the metadata routing from scikit-learn.

haytham918 · 2024-06-12T17:44:21Z

import pandas as pd
from imblearn.pipeline import Pipeline as ImbPipeline
from imblearn.over_sampling import SMOTE
from fairlearn.adversarial import AdversarialFairnessClassifier
from sklearn.preprocessing import MinMaxScaler, Normalizer
from sklearn.model_selection import GridSearchCV
import sklearn
sklearn.set_config(enable_metadata_routing=True)

data = {
    'race': [1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1],
    'indicator': [1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1]
}


X = pd.DataFrame(data)


Y = [0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0]

# Sensitive Featrues
Z = X['race']



mitigator = AdversarialFairnessClassifier(
    backend="torch",
    predictor_model=[50, "relu"],
    adversary_model=[3, "relu"],
    batch_size=2**8,
    progress_updates=0.5,
    random_state=123,
).set_fit_request(sensitive_features=True)

pipe = ImbPipeline([
  ("scaling", Normalizer()), ("sampling", SMOTE()), ("classifier", mitigator)])

param_grid = {
    
    "classifier__batch_size": [2**6]
}

grid_s = GridSearchCV(pipe, param_grid, cv=5, scoring="accuracy")
grid_s.fit(X, Y, sensitive_features=Z)

Here is a piece of code that demonstrates the issue. But I also think the fairlearn's stuff has some incompatibility issue at this moment too

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Applying sampling method to sensitive features for fairness models #1085

Applying sampling method to sensitive features for fairness models #1085

haytham918 commented Jun 11, 2024

glemaitre commented Jun 12, 2024

glemaitre commented Jun 12, 2024

haytham918 commented Jun 12, 2024

Applying sampling method to sensitive features for fairness models #1085

Applying sampling method to sensitive features for fairness models #1085

Comments

haytham918 commented Jun 11, 2024

glemaitre commented Jun 12, 2024

glemaitre commented Jun 12, 2024

haytham918 commented Jun 12, 2024