Results for Freeze Modules are higher than the ones reported (CIFAR-100 20 tasks) #1

AlbinSou · 2021-10-01T14:15:09Z

After running this command:

python lifelong_experiment.py -d CIFAR -T 20 -e 100 -alg fm_compositional --initial_seed 0 -l 4 -arc cnn -b 64 -i random_onehot -n 1 --init_tasks 4 -l 4 -s 50

I would say the only differences with the original setup are that I set dropout to 0.2 and batch size to 64, but this should not influence the outcome at that point. The reported results for that baseline on CIFAR is 48%. However, when I run the command above to reproduce these results, I get an average accuracy of 71%.

I also obtain similar results in my own code, while trying to reproduce the experiments.

Did I write the wrong command to reproduce these results ?

The text was updated successfully, but these errors were encountered:

jorge-a-mendez · 2021-10-11T21:47:00Z

Hi, thank you for your interest in our work and for raising this issue.

The command as written looks okay to me. I implemented the dropout change and ran the experiment with the command you provided, and found the average accuracy on a single random seed to be 49.78%.

One note on interpreting results: the average accuracy is computed from the log file of the last task trained (task 19), by averaging the performance of all tasks (tasks 0 through 19) in the last epoch (epoch 100). The reason I bring this up is because when I ran the evaluation, the final accuracy on the last task was 71.8%, but the performance we care about is the average across tasks after all tasks have been trained.

I hope this helps.

AlbinSou · 2021-10-15T08:22:04Z

Hello, thank you for your answer.

So, the measure I took is the mean of the accuracies over all the individual tasks, after training on all the tasks (the ones present in the files of task19/ folder, in the last epoch). I believe my computation is correct on this part.

Another change that I had to make and I only now remember is that I changed the load_data method for SplitCIFAR100 dataset so that it loads data using the torchvision CIFAR100 dataset instead of taking it from directories that I did not have. Could this have an effect on results ?

jorge-a-mendez · 2021-10-22T11:24:40Z

Hi,

The computation of accuracies sounds correct.

It is possible that that tiny change makes a difference. Could you try downloading the raw data directly from the repo here and re-runing your experiments? This might matter if torchvision applies any sort of transformations to the images.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Results for Freeze Modules are higher than the ones reported (CIFAR-100 20 tasks) #1

Results for Freeze Modules are higher than the ones reported (CIFAR-100 20 tasks) #1

AlbinSou commented Oct 1, 2021

jorge-a-mendez commented Oct 11, 2021

AlbinSou commented Oct 15, 2021 •

edited

Loading

jorge-a-mendez commented Oct 22, 2021

Results for Freeze Modules are higher than the ones reported (CIFAR-100 20 tasks) #1

Results for Freeze Modules are higher than the ones reported (CIFAR-100 20 tasks) #1

Comments

AlbinSou commented Oct 1, 2021

jorge-a-mendez commented Oct 11, 2021

AlbinSou commented Oct 15, 2021 • edited Loading

jorge-a-mendez commented Oct 22, 2021

AlbinSou commented Oct 15, 2021 •

edited

Loading