You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jan 20, 2024. It is now read-only.
Though I had trouble with nni at first, but I solved it by defining_config = { "lambda_student": 0.5, "T_student": 5, "seed": 20 }
rather than: config = nni.get_next_parameter()_ as you suggested in Issues #15 .
I have reproduced the result on CIFAR-10. By using TAKD, I acquired accuracy of 80.05% of resnet8 from resnet20. With baseline KD, I only acquired accuracy of 77.19% of resnet8 from resnet110. I use the following config: {seed = 20,T_student = 10, lambda_student = 1.0} to reach approximately 3% improvement.
However, when I wanted to reproduce the results of the paper, I couldn't acquire the significant improvements on CIFAR-100.
Is there any chance that we can have the parameters for TAKD on CIFAR-100 and ImageNet to reach the same improvements stated on the paper?
The text was updated successfully, but these errors were encountered:
Sorry for the late reply. I don't have the parameters for Imagenet. That experiment was implemented by my co-authors at DeepMind.
For CIFAR-100, can you please try [lambda_student = 0.04, T_student = 10] or [lambda_student = 0.04, T_student = 5] ?
imirzadeh
changed the title
TA training parameters
TA training parameters for CIFAR-100 experiment
Aug 14, 2020
Hello Imirzadeh,
Though I had trouble with nni at first, but I solved it by defining_config = { "lambda_student": 0.5, "T_student": 5, "seed": 20 }
rather than: config = nni.get_next_parameter()_ as you suggested in Issues #15 .
I have reproduced the result on CIFAR-10. By using TAKD, I acquired accuracy of 80.05% of resnet8 from resnet20. With baseline KD, I only acquired accuracy of 77.19% of resnet8 from resnet110. I use the following config: {seed = 20,T_student = 10, lambda_student = 1.0} to reach approximately 3% improvement.
However, when I wanted to reproduce the results of the paper, I couldn't acquire the significant improvements on CIFAR-100.
Is there any chance that we can have the parameters for TAKD on CIFAR-100 and ImageNet to reach the same improvements stated on the paper?
The text was updated successfully, but these errors were encountered: