Can't reproduce #1

THU-syh · 2022-07-13T08:09:32Z

Sorry, I'm having some problems reproducing your work. I can't get the same results as in your paper（https://doi.org/10.1016/j.patter.2022.100521） by following the readme and code here.

zwvews · 2022-07-13T20:17:41Z

hi, thanks for your interests. Could you please provide your experimental results? Just FYI, you need to tune the hyperparameters following the experimental setup described in our paper.

THU-syh · 2022-07-19T18:39:26Z

Thanks for your reply, I just run the sample code given in the readme file, like
python main.py -dataset esol -fedmid avg -part_alpha 0.1
but the result is as follows:

Surprised that this result is better than the one you gave in the article (even better than the FLIT(+) results reported in the article which marked as Best federated-learning results)

However, when we tried FLIT/FLIT+,
python main.py -dataset esol -fedmid oursvatFLITPLUS -tmpFed 0.5 -lambdavat 0.01 -part_alpha 0.1
we got worse results than FedAvg

zwvews · 2022-07-19T19:58:12Z

as I mentioned, you need to tune the hyperparameters for FLIT(+) follow our paper. We do not find a set of hyperparameters that fits all datasets. However, it is wired to see that fedavg has such good performance. I will check our experiments, and will get you back soon.

zwvews · 2022-07-21T14:04:27Z

Hi, I have checked our previous experimental results and also re-run the experiments.
First, I did obtain the reported results for FedAvg on ESOL dataset as shown below.

I also admit that I cannot reproduce the results with our current code for this dataset. However, I should note that ESOL is extremely small and the training/testing performance is pretty unstable. I may suggest you play with our code on larger datasets e.g. Lipo. Anyway, thanks very much for pointing out the problem, and let me know if you have any other questions.

THU-syh · 2022-07-22T07:21:27Z

Thanks for your prompt response, with reference to your suggestion, I have also executed the relevant FedAvg code on other datasets, but also got surprising results on some datasets, as follows.

Freesolv: As the degree of data heterogeneity increases, the test metrics of the Freesolv dataset also increase, however, the lower the metrics of this dataset, the better.

Note: Note that this problem also occurs on the Lipo dataset (the lower the better) and SIDER dataset (the higher the better).

ClinTox: As with the ESOL problem, the Avg results on this dataset significantly outperform the state-of-the-art results for all methods reported in the paper.

zwvews · 2022-07-22T07:59:08Z

As for problem 1, we make the claim in our paper that our current heterogenous simulation method is not perfect and may not result in heterogeneous datasets. We give discussion in the main results section and also in the conclusions section. More research should be done in this direction.

As for problem 2, I beleive there may be some small differences between our current code and the one when we run the experiments. I am really sorry for this. Our results on these two datasets are consistent for all methods and I thus believe the results should still be able to work as a reference for comparision. I also paste the experimental records on fedavg for clintox here.

THU-syh · 2022-07-22T08:16:52Z

Thanks for your reply. In view of the current problems, I suggest that you carefully check the current open source code for errors and update the correct code. If the current code has no errors in FedAvg, it is obvious that you did not find the optimal baseline of FedAvg. I also recommend that you re-run the relevant experiments of FLIT(+) (especially on the ESOL and ClinTox datasets) to ensure that the conclusions in the paper are correct.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't reproduce #1

Can't reproduce #1

THU-syh commented Jul 13, 2022

zwvews commented Jul 13, 2022

THU-syh commented Jul 19, 2022 •

edited

Loading

zwvews commented Jul 19, 2022

zwvews commented Jul 21, 2022

THU-syh commented Jul 22, 2022

zwvews commented Jul 22, 2022

THU-syh commented Jul 22, 2022

Can't reproduce #1

Can't reproduce #1

Comments

THU-syh commented Jul 13, 2022

zwvews commented Jul 13, 2022

THU-syh commented Jul 19, 2022 • edited Loading

zwvews commented Jul 19, 2022

zwvews commented Jul 21, 2022

THU-syh commented Jul 22, 2022

zwvews commented Jul 22, 2022

THU-syh commented Jul 22, 2022

THU-syh commented Jul 19, 2022 •

edited

Loading