-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Results for Freeze Modules are higher than the ones reported (CIFAR-100 20 tasks) #1
Comments
Hi, thank you for your interest in our work and for raising this issue. The command as written looks okay to me. I implemented the dropout change and ran the experiment with the command you provided, and found the average accuracy on a single random seed to be 49.78%. One note on interpreting results: the average accuracy is computed from the log file of the last task trained (task 19), by averaging the performance of all tasks (tasks 0 through 19) in the last epoch (epoch 100). The reason I bring this up is because when I ran the evaluation, the final accuracy on the last task was 71.8%, but the performance we care about is the average across tasks after all tasks have been trained. I hope this helps. |
Hello, thank you for your answer. So, the measure I took is the mean of the accuracies over all the individual tasks, after training on all the tasks (the ones present in the files of task19/ folder, in the last epoch). I believe my computation is correct on this part. Another change that I had to make and I only now remember is that I changed the load_data method for SplitCIFAR100 dataset so that it loads data using the torchvision CIFAR100 dataset instead of taking it from directories that I did not have. Could this have an effect on results ? |
Hi, The computation of accuracies sounds correct. It is possible that that tiny change makes a difference. Could you try downloading the raw data directly from the repo here and re-runing your experiments? This might matter if torchvision applies any sort of transformations to the images. |
After running this command:
I would say the only differences with the original setup are that I set dropout to 0.2 and batch size to 64, but this should not influence the outcome at that point. The reported results for that baseline on CIFAR is 48%. However, when I run the command above to reproduce these results, I get an average accuracy of 71%.
I also obtain similar results in my own code, while trying to reproduce the experiments.
Did I write the wrong command to reproduce these results ?
The text was updated successfully, but these errors were encountered: