Skip to content

Commit cdb58f7

Browse files
committed
First Release
0 parents  commit cdb58f7

22 files changed

+3304
-0
lines changed

README.md

+236
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,236 @@
1+
# Learning Architectures for Binary Networks
2+
3+
An Pytorch Implementation of the paper [Learning Architectures for Binary Networks](https://arxiv.org/abs/2002.06963) (*BNAS*) (ECCV 2020 - To appear)<br>
4+
If you find any part of our code useful for your research, consider citing our paper.
5+
6+
```bibtex
7+
@inproceedings{kimSC2020BNAS,
8+
title={Learning Architectures for Binary Networks},
9+
author={Dahyun Kim and Kunal Pratap Singh and Jonghyun Choi},
10+
booktitle={ECCV},
11+
year={2020}
12+
}
13+
```
14+
15+
### Maintainer
16+
* [Dahyun Kim](mailto:[email protected])
17+
* [Kunal P. Singh](mailto:[email protected])
18+
19+
20+
## Introduction
21+
22+
We present a method for searching architectures of a network with both binary weights and activations.
23+
When using the same binarization scheme, our searched architectures outperform binary network whose architectures are well known floating point networks.
24+
25+
**Note**: our searched architectures still achieve competitive results when compared to the state of the art without additional pretraining, new binarization schemes, or novel training methods.
26+
27+
## Prerequisite - Docker Containers
28+
29+
We recommend using the below Docker container as it provides comprehensive running environment.
30+
You will need to install `nvidia-docker` and its related packages following instructions [here](https://github.com/NVIDIA/nvidia-docker).
31+
32+
Pull our image uploaded [here](https://hub.docker.com/r/killawhale/apex) using the following command.
33+
```console
34+
$ docker pull killawhale/apex:latest
35+
```
36+
You can then create the container to run our code via the following command.
37+
```console
38+
$ docker run --name [CONTAINER_NAME] --runtime=nvidia -it -v [HOST_FILE_DIR]:[CONTAINER_FILE_DIR] --shm-size 16g killawhale/apex:latest bash
39+
```
40+
- [CONTAINER_NAME]: the name of the created container
41+
- [HOST_FILE_DIR]: the path of the directory on the host machine which you want to sync your container with
42+
- [CONTAINER_FILE_DIR]: the name in which the synced directory will appear inside the container
43+
44+
**Note**: For those who do not want to use the docker container, we use PyTorch 1.2, torchvision 0.5, Python 3.6, CUDA 10.0, and Apex 0.1. You can also refer to the provided requirements.txt.
45+
46+
## Dataset Preparation
47+
48+
### CIFAR10
49+
For CIFAR10, we're using CIFAR10 provided by torchvision.
50+
Run the following command to download it.
51+
```console
52+
$ python src/download_cifar10.py
53+
```
54+
This will create a directory named `data` and download the dataset in it.
55+
56+
### ImageNet
57+
For ImageNet, please follow the instructions below.
58+
59+
1. download the training set for the ImageNet dataset.
60+
```console
61+
$ wget http://www.image-net.org/challenges/LSVRC/2012/nnoupb/ILSVRC2012_img_train.tar
62+
```
63+
This may take a long time depending on your internet connection.
64+
65+
2. download the validation set for the ImageNet dataset.
66+
```console
67+
$ wget http://www.image-net.org/challenges/LSVRC/2012/nnoupb/ILSVRC2012_img_val.tar
68+
```
69+
70+
3. make a directory which will contain the ImageNet dataset and move your downloaded .tar files inside the directory.
71+
72+
4. extract and organize the training set into different categories.
73+
```console
74+
$ mkdir train && mv ILSVRC2012_img_train.tar train/ && cd train
75+
$ tar -xvf ILSVRC2012_img_train.tar && rm -f ILSVRC2012_img_train.tar
76+
$ find . -name "*.tar" | while read NAME ; do mkdir -p "${NAME%.tar}"; tar -xvf "${NAME}" -C "${NAME%.tar}"; rm -f "${NAME}"; done
77+
$ cd ..
78+
```
79+
5. do the same for the validation set as well.
80+
```console
81+
$ mkdir val && mv ILSVRC2012_img_val.tar val/ && cd val && tar -xvf ILSVRC2012_img_val.tar
82+
$ wget -qO- https://raw.githubusercontent.com/soumith/imagenetloader.torch/master/valprep.sh | bash
83+
$ rm -rf ILSVR2012_img_val.tar
84+
```
85+
86+
6. change the synset ids to integer labels.
87+
```console
88+
$ python src/prepare_imgnet.py [PATH_TO_IMAGENET]
89+
```
90+
- [PATH_TO_IMAGENET]: the path to the directory which has the ImageNet dataset
91+
92+
You can optionally run the following if you only prepared the validation set.
93+
```console
94+
$ python src/prepare_imgnet.py [PATH_TO_IMAGENET] --val_only
95+
```
96+
97+
## Inference with Pre-Trained Models
98+
99+
To reproduce the results reported in the paper, you can use the pretrained models provided [here](https://drive.google.com/drive/folders/14pmPSCb2u_4gHgeriRrBBkgIJ3J5p1O1?usp=sharing).
100+
101+
**Note**: For CIFAR10 we only share BNAS-A, as training other configurations for CIFAR10 does not take much time.
102+
For ImageNet, we currently provide all the models (BNAS-D,E,F,G,H).
103+
104+
For running validation on CIFAR10 using our pretrained models, use the following command.
105+
```console
106+
$ CUDA_VISIBLE_DEVICES=0,1; python -W ignore src/test.py --path_to_weights [PATH_TO_WEIGHTS] --arch latest_cell_zeroise --parallel
107+
```
108+
- [PATH_TO_WEIGHTS]: the path to the downloaded pretrained weights (for CIFAR10, it's BNAS-A)
109+
110+
For running validation on ImageNet using our pretrained models, use the following command.
111+
```console
112+
$ CUDA_VISIBLE_DEIVCES=0,1; python -m torch.distributed.launch --nproc_per_node=2 src/test_imagenet.py --data [PATH_TO_IMAGENET] --path_to_weights [PATH_TO_WEIGHTS] --model_config [MODEL_CONFIG]
113+
```
114+
- [PATH_TO_IMAGENET]: the path to the directory which has the ImageNet dataset
115+
- [PATH_TO_WEIGHTS]: the path to the downloaded pretrained weights
116+
- [MODEL_CONFIG]: the model to run the validation with. Can be one of 'bnas-d', 'bnas-e', 'bnas-f', 'bnas-g', or 'bnas-h'
117+
118+
Expected result:
119+
120+
| Model | Reported Top-1(%) | Reported Top-5(%) | Reproduced Top-1(%) | Reproduced Top-1(%) |
121+
|:------:|:-----------------:|:-----------------:|:-------------------:|:-------------------:|
122+
| [BNAS-A](https://drive.google.com/file/d/1aSikzW8_sYJ5FuJsl54rnRg7kWsJKbf3/view?usp=sharing) | 92.70 | - | ~92.39 | - |
123+
| [BNAS-D](https://drive.google.com/file/d/165KA6Q2kn0Qrr8MKMruc10FtHM5JQgfP/view?usp=sharing) | 57.69 | 79.89 | ~57.60 | ~80.00 |
124+
| [BNAS-E](https://drive.google.com/file/d/10rH9o8N9UdV-J8rCic2y6iXYn6z9WOKz/view?usp=sharing) | 58.76 | 80.61 | ~58.15 | ~80.16 |
125+
| [BNAS-F](https://drive.google.com/file/d/1naNNrsBB78GvehmrsoRs4Qr3RlU40NCH/view?usp=sharing) | 58.99 | 80.85 | ~58.89 | ~80.87 |
126+
| [BNAS-G](https://drive.google.com/file/d/1XOI6krbbBx_7u3A8Ikshvoh5LkihHNB2/view?usp=sharing) | 59.81 | 81.61 | ~59.39 | ~81.03 |
127+
| [BNAS-H](https://drive.google.com/file/d/18wwejF135kX4dxfMtnFdEqHNEDRqNsIq/view?usp=sharing) | 63.51 | 83.91 | ~63.70 | ~84.49 |
128+
129+
*You can click on the links at the model name to download the corresponding model weights.*
130+
131+
**Note**: the provided pretrained weights were trained with [Apex](https://github.com/NVIDIA/apex) along with different batch size than the ones reported in the paper (due to computation resource constraints) and hence the inference result may vary slightly from the reported results in the paper.
132+
133+
More comparison with state of the art binary networks are in [the paper](https://arxiv.org/abs/2002.06963).
134+
135+
## Searching Architectures
136+
137+
To search architectures, use the following command.
138+
```console
139+
$ CUDA_VISIBLE_DEVICES=0; python -W ignore src/search.py --save [ARCH_NAME]
140+
```
141+
- [ARCH_NAME]: the name of the searched architecture
142+
143+
This will automatically append the searched architecture in the `genotypes.py` file.
144+
Note that two genotypes will be appended, one for CIFAR10 and one for ImageNet.
145+
The validation accuracy at the end of the search is not indicative of the final performance of the searched architecture.
146+
To obtain the final performance, one must train the final architecture from scratch as described next.
147+
148+
149+
<p align="center">
150+
<img src="img/ours_normal-1.png" alt="BNAS-Normal Cell" width="40%">
151+
<img src="img/ours_reduction-1.png" alt="BNAS-Reduction Cell" width="45%">
152+
</p>
153+
<p align='center'>
154+
Figure: One Example of Normal(left) and Reduction(right) cells searched by BNAS
155+
</p>
156+
157+
## Training Searched Architectures from scratch
158+
159+
To train our best searched cell on CIFAR10, use the following command.
160+
```console
161+
$ CUDA_VISIBLE_DEVICES=0,1 python -W ignore src/train.py --learning_rate 0.05 --save [SAVE_NAME] --arch latest_cell_zeroise --parallel
162+
```
163+
- [SAVE_NAME]: experiment name
164+
165+
You will be able to see the validation accuracy at every epoch as shown below and there is no need to run a separate inference code.
166+
```console
167+
2019-12-29 11:47:42,166 args = Namespace(arch='latest_cell_zeroise', batch_size=256, data='../data', drop_path_prob=0.2, epochs=600, init_channels=36, layers=20, learning_rate=0.05, model_path='saved_models', momentum=0.9, num_skip=1, parallel=True, report_freq=50, save='eval-latest_cell_zeroise_train_repro_0.05', seed=0, weight_decay=3e-06)
168+
2019-12-29 11:47:46,893 param size = 5.578252MB
169+
2019-12-29 11:47:48,654 epoch 0 lr 2.500000e-03
170+
2019-12-29 11:47:55,462 train 000 2.623852e+00 9.375000 57.812500
171+
2019-12-29 11:48:34,952 train 050 2.103856e+00 22.533701 74.180453
172+
2019-12-29 11:49:14,118 train 100 1.943232e+00 27.440439 80.186417
173+
2019-12-29 11:49:53,748 train 150 1.867823e+00 30.114342 82.512417
174+
2019-12-29 11:50:29,680 train_acc 32.170000
175+
2019-12-29 11:50:30,057 valid 000 1.792161e+00 30.859375 88.671875
176+
2019-12-29 11:50:34,032 valid_acc 38.350000
177+
2019-12-29 11:50:34,101 epoch 1 lr 2.675926e-03
178+
2019-12-29 11:50:35,476 train 000 1.551331e+00 40.234375 92.187500
179+
2019-12-29 11:51:15,773 train 050 1.572010e+00 42.256434 90.502451
180+
2019-12-29 11:51:55,991 train 100 1.539024e+00 43.181467 90.976949
181+
2019-12-29 11:52:36,345 train 150 1.515295e+00 44.264797 91.395902
182+
2019-12-29 11:53:12,128 train_acc 45.016000
183+
2019-12-29 11:53:12,487 valid 000 1.419507e+00 46.484375 93.359375
184+
2019-12-29 11:53:16,366 valid_acc 48.640000
185+
```
186+
You should expect around 92% validation accuracy with our best searched cell once the training is finished at 600 epochs.
187+
To train custom architectures, give custom architectures to the `--arch` flag after adding it in the `genotypes.py` file.
188+
Note that you can also change the number of cells stacked and number of initial channels of the model by giving arguments to the `--layers` option and `--init_channels` option respectively.
189+
190+
With different architectures, the final network will have different computational costs and the default hyperparameters may not be optimal (such as the learning rate scheduling).
191+
Thus, you should expect the final accuracy to vary by around 1~1.5% on the validation accuracy on CIFAR10.
192+
193+
To train our best searched cell on ImageNet, use the following command.
194+
```console
195+
$ CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 src/train_imagenet.py --data [PATH_TO_IMAGENET] --arch latest_cell --model_config [MODEL_CONFIG] --save [SAVE_NAME]
196+
```
197+
- [PATH_TO_IMAGENET]: the path to the directory which has the ImageNet dataset
198+
- [MODEL_CONFIG]: the model to train. Can be one of 'bnas-d', 'bnas-e', 'bnas-f', 'bnas-g', or 'bnas-h'
199+
- [SAVE_NAME]: experiment name
200+
201+
You will be able to see the validation accuracy at every epoch as shown below and there is no need to run a separate inference code.
202+
```console
203+
2020-03-25 09:53:44,578 args = Namespace(arch='latest_cell', batch_size=256, class_size=1000, data='../../darts-norm/cnn/Imagenet/', distributed=True, drop_path_prob=0, epochs=250, gpu=0, grad_clip=5.0, init_channels=68, keep_batchnorm_fp32=None, label_smooth=0.1, layers=15, learning_rate=0.05, local_rank=0, loss_scale=None, momentum=0.9, num_skip=3, opt_level='O0', report_freq=100, resume=None, save='eval-bnas_f_retrain', seed=0, start_epoch=0, weight_decay=3e-05, world_size=4)
204+
2020-03-25 09:53:50,926 no of parameters 39781442.000000
205+
2020-03-25 09:53:56,444 epoch 0
206+
2020-03-25 09:54:06,220 train 000 7.844889e+00 0.000000 0.097656
207+
2020-03-25 10:00:01,717 train 100 7.059666e+00 0.315207 1.382658
208+
2020-03-25 10:06:09,138 train 200 6.909059e+00 0.498484 2.070215
209+
2020-03-25 10:12:21,804 train 300 6.750810e+00 0.838027 3.157119
210+
2020-03-25 10:18:30,815 train 400 6.627304e+00 1.203534 4.369691
211+
2020-03-25 10:24:37,901 train 500 6.526508e+00 1.601679 5.519625
212+
2020-03-25 10:30:44,522 train 600 6.439230e+00 2.016983 6.666298
213+
2020-03-25 10:36:50,776 train 700 6.360960e+00 2.424132 7.778648
214+
2020-03-25 10:42:58,087 train 800 6.291507e+00 2.830446 8.824784
215+
2020-03-25 10:49:04,204 train 900 6.228209e+00 3.251162 9.829681
216+
2020-03-25 10:55:12,315 train 1000 6.167705e+00 3.673670 10.844819
217+
2020-03-25 11:01:18,095 train 1100 6.112888e+00 4.080009 11.778710
218+
2020-03-25 11:07:23,347 train 1200 6.060676e+00 4.500969 12.712388
219+
2020-03-25 11:10:30,048 train_acc 4.697276
220+
2020-03-25 11:10:33,504 valid 000 4.754593e+00 10.839844 27.832031
221+
2020-03-25 11:11:03,920 valid_acc_top1 11.714000
222+
2020-03-25 11:11:03,920 valid_acc_top5 28.784000
223+
```
224+
You should expect similar accuracy to our pretrained models once the training is finished at 250 epochs.
225+
226+
To train custom architectures, give custom architectures to the `--arch` flag after adding it in the `genotypes.py` file as.
227+
Note that you can also change the number of cells stacked and number of initial channels of the model by giving arguments to the `--layers` option and `--init_channels` option respectively.
228+
229+
With different architectures, the final network will have different computational costs and the default hyperparameters may not be optimal (such as the learning rate scheduling).
230+
Thus, you should expect the final accuracy to vary by around 0.2% on the validation accuracy on ImageNet.
231+
232+
233+
**Note**: we ran our experiments with at least two NVIDIA V100s. For running on a single GPU, omit the `--parallel` flag and specify the GPU id using the `CUDA_VISIBLE_DEVICES` environment variable in the command line.
234+
235+
236+

comb_grads.gif

2.45 MB
Loading

img/ours_normal-1.png

120 KB
Loading

img/ours_reduction-1.png

125 KB
Loading

src/CustomDataParallel.py

+8
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
import torch.nn as nn
2+
3+
class MyDataParallel(nn.DataParallel):
4+
def __getattr__(self, name):
5+
try:
6+
return super().__getattr__(name)
7+
except AttributeError:
8+
return getattr(self.module, name)

src/architect.py

+46
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
import torch
2+
import numpy as np
3+
import torch.nn as nn
4+
from torch.autograd import Variable
5+
import math
6+
import torch.nn.functional as F
7+
8+
9+
def _concat(xs):
10+
return torch.cat([x.view(-1) for x in xs])
11+
12+
13+
class Architect(object):
14+
15+
def __init__(self, model, args):
16+
self.network_momentum = args.momentum
17+
self.network_weight_decay = args.weight_decay
18+
self.model = model
19+
self.optimizer = torch.optim.Adam(self.model.arch_parameters(),
20+
lr=args.arch_learning_rate, betas=(0.5, 0.999), weight_decay=args.arch_weight_decay)
21+
self.lamda = args.lamda
22+
self.tau = args.tau
23+
24+
25+
def step(self, input_train, target_train, input_valid, target_valid, eta, network_optimizer,step):
26+
self.optimizer.zero_grad()
27+
self._backward_step(input_valid, target_valid,step)
28+
self.optimizer.step()
29+
30+
def _backward_step(self, input_valid, target_valid,step):
31+
normal_alphas = F.softmax(self.model.arch_parameters()[0], dim = -1)
32+
reduce_alphas = F.softmax(self.model.arch_parameters()[1], dim = -1)
33+
total_entropy = 0.0
34+
for i in range(normal_alphas.shape[0]):
35+
temp = -sum(normal_alphas[i,:]*torch.log(normal_alphas[i,:])).cuda()
36+
total_entropy += temp
37+
for j in range(reduce_alphas.shape[0]):
38+
temp2 = -sum(reduce_alphas[i,:]*torch.log(reduce_alphas[i,:])).cuda()
39+
total_entropy += temp2
40+
41+
total_entropy = total_entropy/ (normal_alphas.shape[0] + reduce_alphas.shape[0])
42+
loss = self.model._loss(input_valid, target_valid) - self.lamda*total_entropy* np.exp(-step/self.tau)
43+
loss.backward()
44+
45+
46+

src/bin_utils.py

+82
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
import torch.nn as nn
2+
import numpy
3+
4+
5+
class BinOp():
6+
def __init__(self, model, args):
7+
# count the number of Conv2d
8+
count_Conv2d = 0
9+
for n,m in model.named_modules():
10+
if isinstance(m, nn.Conv2d) and 'res' not in n:
11+
count_Conv2d = count_Conv2d + 1
12+
13+
start_range = 1
14+
end_range = count_Conv2d
15+
self.bin_range = numpy.linspace(start_range,
16+
end_range, end_range-start_range+1)\
17+
.astype('int').tolist()
18+
19+
self.saved_params = []
20+
self.target_params = []
21+
self.target_modules = []
22+
self.num_skip = args.num_skip
23+
index = 0
24+
for m in model.modules():
25+
if isinstance(m, nn.Conv2d):
26+
index = index + 1
27+
if index in self.bin_range and 'res' not in n and index >= (self.num_skip + 1):
28+
tmp = m.weight.data.clone()
29+
self.saved_params.append(tmp)
30+
self.target_modules.append(m.weight)
31+
self.num_of_params = len(self.target_modules)
32+
def binarization(self):
33+
self.meancenterConvParams()
34+
self.clampConvParams()
35+
self.save_params()
36+
self.binarizeConvParams()
37+
38+
def meancenterConvParams(self):
39+
for index in range(self.num_of_params):
40+
s = self.target_modules[index].data.size()
41+
negMean = self.target_modules[index].data.mean(1, keepdim=True).\
42+
mul(-1).expand_as(self.target_modules[index].data)
43+
self.target_modules[index].data = self.target_modules[index].data.add(negMean)
44+
45+
def clampConvParams(self):
46+
for index in range(self.num_of_params):
47+
self.target_modules[index].data = \
48+
self.target_modules[index].data.clamp(-1.0, 1.0)
49+
50+
def save_params(self):
51+
for index in range(self.num_of_params):
52+
self.saved_params[index].copy_(self.target_modules[index].data)
53+
54+
def binarizeConvParams(self):
55+
for index in range(self.num_of_params):
56+
n = self.target_modules[index].data[0].nelement()
57+
s = self.target_modules[index].data.size()
58+
m = self.target_modules[index].data.norm(1, 3, keepdim=True)\
59+
.sum(2, keepdim=True).sum(1, keepdim=True).div(n)
60+
self.target_modules[index].data = \
61+
self.target_modules[index].data.sign().mul(m.expand(s))
62+
63+
def restore(self):
64+
for index in range(self.num_of_params):
65+
self.target_modules[index].data.copy_(self.saved_params[index])
66+
67+
def updateBinaryGradWeight(self):
68+
for index in range(self.num_of_params):
69+
weight = self.target_modules[index].data
70+
n = weight[0].nelement()
71+
s = weight.size()
72+
m = weight.norm(1, 3, keepdim=True)\
73+
.sum(2, keepdim=True).sum(1, keepdim=True).div(n).expand(s)
74+
m[weight.lt(-1.0)] = 0
75+
m[weight.gt(1.0)] = 0
76+
77+
m = m.mul(self.target_modules[index].grad.data)
78+
m_add = weight.sign().mul(self.target_modules[index].grad.data)
79+
m_add = m_add.sum(3, keepdim=True)\
80+
.sum(2, keepdim=True).sum(1, keepdim=True).div(n).expand(s)
81+
m_add = m_add.mul(weight.sign())
82+
self.target_modules[index].grad.data = m.add(m_add).mul(1.0-1.0/s[1]).mul(n)

0 commit comments

Comments
 (0)