-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: split of kwargs between __init__ and fit in the FedECA object #27
Comments
Thanks @jeandut for sharing your thoughts on this tricky topic. What you propose makes sense, but it adds boilerplate code (basically getters and setters everywhere). I probably lack context, but why couldn't we go for a simple separation with everything in the |
@mandreux-owkin it's because of hydra, the way it wants you to operate is on a single object on which you repeatdly call fit with different kwargs i.e. we need to be able to pass DP parameters for instance which are not technically data and any parameter we might want to study |
We need to clean this ASAP it's hard to overstate how much time we wasted because of such issues. |
@honghaoli42 do you have an opinion on this topic? |
For simple models this is easily doable, but not for complex models like fedeca, based on what we saw so far I agree with what @mandreux-owkin proposed: initialise all and fit only with data, basically making each model "one-time use only", which is the most straightforward design.
Actually it's not hydra but I who chose the implementation, hydra is only a config loader so it does not impose anything on what we could do later. |
The way the FedECA API was created was to follow the fammous sklearn
fit
API with the constraints given by Hydra ,the xp manager, namely that the same fedeca object needs to be sequentially fitted with different fit/config args which should lead to different behaviors according to the experimental plan described in the paper, which for now vary:n_samples
and other data-related hyper-parameters such as the number of ties or covariate shifts or number of federated clients or repartition of data between them (and substra handles if the dataset exists)So given such constraints coming from hydra,
fit_kwargs
has to contain at least roughly{"data_kwargs": {"n_samples": 1000 , "shifts": 0.4, "cate": 0.2, "propensity": etc. }, "n_clients": 10, "substra_kwargs": {"dataset_hash", }}
.The problem is about the init kwargs (
init_kwargs
) and its relation with the fit kwargs (fit_kwargs
):The init can have either its own kwargs (therefore there is a complete separation of
fit_kwargs
andinit_kwargs
) OR its ownkwargs
and some of the fit kwargs (all or a subset) OR no kwargs at all OR all the same kwargs.The complete separation is cleaner however it means nothing much happens in the init and indeed, what should be happening in the init ? Maybe the init should have no kwargs and do nothing or almost nothing => careful there is inheritance of BaseEstimator and BootstrapMixin TODO: add this constraint.
Having overlap means handling the case where you have an init with certain kwargs and then fit with other kwargs which might change or not the value of the previous kwargs. This is what's happening right now in the code. The current logic is awful because it tries to detect which params have changed, in a very brittle way if for instance the user instantiated FedECA without DP and now wants to fit it with DP it has to detect it and reinstantiate the strategies attribute (the reverse case could also happen).
Bugs have previously been found by @honghaoli42 and by me in this logic and I would not be surprised if there were some more.
Handling more graciously this type of logic would necessitate the use of dedicated
set_attributes_with_kwargs
methods with default kwargs at None which would detect non-None hyperparameters and reinitialize only such kwargs OR with exactly the same default as the init and setting all kwargs no matter their values. This method could or not use dedicated object such as Enums. This is not done today.Considering sklearn, surprisingly it seems there is no clear separation and some kind of the same overlapping logic is implemented see i.e. the
fit
method of the BaseSGD class: https://github.com/scikit-learn/scikit-learn/blob/f07e0138b/sklearn/linear_model/_stochastic_gradient.py#L930See i.e. alpha, loss, learning_rate, ...
Delving into the fit logic you can see the same, kind of non-intuitive (very subjectively) patterns: if you didn't pass a
coef_init
it is thusNone
and so the object should use its attribute:self.coef_
What should we do ? First for kwargs separation (or not) and then if there is overlap what would be a pseudocode for a cool logic ?
In my opinion, seeing that sklearn did go with some overlap I guess having all kwargs overlap for now seems like it could be a good first step. Regarding the associated logic I would go for a
set_attributes_with_kwargs
which would be called by both init AND fit methods with all its kwargs at None and would iterate on its kwargs and detect non-None kwargs and set them immediately:Once we reach a decision whatever it is it should be relatively quick to implement and would prevent much bugs in the future.
The text was updated successfully, but these errors were encountered: