You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I'm attempting to reproduce the results of Cream-S, but my achieved accuracy of 77.04% falls short of the reported accuracy in the paper (77.6%). I have used the configuration file provided at https://github.com/microsoft/Cream/blob/main/Cream/experiments/configs/retrain/287.yaml, adjusting Net.SELECTION to 287, and utilizing 16 GPUs with a batch size of 128 per GPU, as the paper's specifications. However, I noticed that this configuration file employs RandAugment instead of AutoAugment (as mentioned in the paper), and also incorporates random erase augmentation, which was not discussed in the paper. This discrepancy is causing confusion. Could you please clarify the precise training strategy for Cream-S? Additionally, was the same training strategy applied to all architectures presented in the paper?
The text was updated successfully, but these errors were encountered:
I also tried to retrain with only 8 GPUs with a batch size of 128 per GPU, which is exactly the same setting in your config file, the result is 77.07%, which is similar to the 16 GPUs setting.
Hello, I'm attempting to reproduce the results of Cream-S, but my achieved accuracy of 77.04% falls short of the reported accuracy in the paper (77.6%). I have used the configuration file provided at https://github.com/microsoft/Cream/blob/main/Cream/experiments/configs/retrain/287.yaml, adjusting Net.SELECTION to 287, and utilizing 16 GPUs with a batch size of 128 per GPU, as the paper's specifications. However, I noticed that this configuration file employs RandAugment instead of AutoAugment (as mentioned in the paper), and also incorporates random erase augmentation, which was not discussed in the paper. This discrepancy is causing confusion. Could you please clarify the precise training strategy for Cream-S? Additionally, was the same training strategy applied to all architectures presented in the paper?
The text was updated successfully, but these errors were encountered: