Skip to content
This repository has been archived by the owner on Jan 20, 2024. It is now read-only.

Issue with Resnet18 Implementation #8

Open
simomagi opened this issue Apr 22, 2021 · 0 comments
Open

Issue with Resnet18 Implementation #8

simomagi opened this issue Apr 22, 2021 · 0 comments

Comments

@simomagi
Copy link

simomagi commented Apr 22, 2021

I think that something is missed in the Resnet18 implementation:

  class BasicBlock (nn.Module):
	expansion = 1

	def __init__(self, in_planes, planes, stride=1, config={}):
		super(BasicBlock, self).__init__()
		self.conv1 = conv3x3(in_planes, planes, stride)
		self.conv2 = conv3x3(planes, planes)

		self.shortcut = nn.Sequential()
		if stride != 1 or in_planes != self.expansion * planes:
			self.shortcut = nn.Sequential(
				nn.Conv2d(in_planes, self.expansion * planes, kernel_size=1,
						  stride=stride, bias=False),
			)
		self.IC1 = nn.Sequential(
			nn.BatchNorm2d(planes),
			nn.Dropout(p=config['dropout'])
			)

		self.IC2 = nn.Sequential(
			nn.BatchNorm2d(planes),
			nn.Dropout(p=config['dropout'])
			)

	def forward(self, x):
		out = self.conv1(x)
		out = relu(out)
		out = self.IC1(out)

		out += self.shortcut(x)
		out = relu(out)
		out = self.IC2(out)
		return out

Two attributes are defined in the class BasicBlock. They represent the classical convolution operations that are used for Resnet-X architectures:

self.conv1 = conv3x3(in_planes, planes, stride)
self.conv2 = conv3x3(planes, planes)

The problem is that in the forward pass only the first one (i.e. self.conv1) and the two BatchNormalization Layers are used to compute the output. Furthermore, when the model is load in the gpu both conv1 and conv2 are moved into it and the second one is unused in the forward. So i think that the code of the forward pass should be:

def forward(self, x):
          out = self.conv1(x)
          out = relu(out)
          out = self.IC1(out)
          out = self.conv2(out)
          out = self.IC2(out)
          
          out += self.shortcut(x)
          out = relu(out)
          return out

`
Fixing this problem, i am not able to reproduce the results of the paper. On cifar 100 using the hyperparameters provided in this closed issue :

--dataset cifar100 --tasks 20 --epochs-per-task 1 --lr 0.15 --gamma 0.85 --batch-size 10 --dropout 0.1 --seed 1234

average accuracy = 51.339999999999996, forget = 0.11200000000000002

If use instead the hyperparameters provided in replicate_experiment_2.sh

--dataset cifar100 --tasks 20 --epochs-per-task 1 --lr 0.1 --gamma 0.8 --hiddens 256 --batch-size 10 --dropout 0.5 --seed 1234

average accuracy = 44.550000000000004, forget = 0.05421052631578946

These results differ much from the results reported in the paper:

average_accuracy=59.9 and forgetting=0.08

Could you provide the correct hyperparameters?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant