You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jan 20, 2024. It is now read-only.
The problem is that in the forward pass only the first one (i.e. self.conv1) and the two BatchNormalization Layers are used to compute the output. Furthermore, when the model is load in the gpu both conv1 and conv2 are moved into it and the second one is unused in the forward. So i think that the code of the forward pass should be:
def forward(self, x):
out = self.conv1(x)
out = relu(out)
out = self.IC1(out)
out = self.conv2(out)
out = self.IC2(out)
out += self.shortcut(x)
out = relu(out)
return out
`
Fixing this problem, i am not able to reproduce the results of the paper. On cifar 100 using the hyperparameters provided in this closed issue :
I think that something is missed in the Resnet18 implementation:
Two attributes are defined in the class BasicBlock. They represent the classical convolution operations that are used for Resnet-X architectures:
self.conv1 = conv3x3(in_planes, planes, stride)
self.conv2 = conv3x3(planes, planes)
The problem is that in the forward pass only the first one (i.e. self.conv1) and the two BatchNormalization Layers are used to compute the output. Furthermore, when the model is load in the gpu both conv1 and conv2 are moved into it and the second one is unused in the forward. So i think that the code of the forward pass should be:
`
Fixing this problem, i am not able to reproduce the results of the paper. On cifar 100 using the hyperparameters provided in this closed issue :
--dataset cifar100 --tasks 20 --epochs-per-task 1 --lr 0.15 --gamma 0.85 --batch-size 10 --dropout 0.1 --seed 1234
average accuracy = 51.339999999999996, forget = 0.11200000000000002
If use instead the hyperparameters provided in replicate_experiment_2.sh
--dataset cifar100 --tasks 20 --epochs-per-task 1 --lr 0.1 --gamma 0.8 --hiddens 256 --batch-size 10 --dropout 0.5 --seed 1234
average accuracy = 44.550000000000004, forget = 0.05421052631578946
These results differ much from the results reported in the paper:
average_accuracy=59.9 and forgetting=0.08
Could you provide the correct hyperparameters?
The text was updated successfully, but these errors were encountered: