This repo is an attempt to implement the paper
in tensorflow. The initial data.py
, utils.py
, logs.py
is taken from AlexNet.
Olga Russakovsky*, Jia Deng*, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg and Li Fei-Fei. (* = equal contribution) ImageNet Large Scale Visual Recognition Challenge. arXiv:1409.0575, 2014. paper | bibtex
Dataset info:
- Link: ILSVRC2010
- Training size: 1261406 images
- Validation size: 50000 images
- Test size: 150000 images
- Dataset size: 124 GB
To save up time:
I got one corrupted image (n02487347_1956.JPEG
). The error read: Can not identify image file '/path/to/image/n02487347_1956.JPEG n02487347_1956.JPEG
. This happened when I read the image using PIL
. Before using this code, please make sure you can open n02487347_1956.JPEG
using PIL
. If not delete the image, you won't loose anything if you delete 1 image out of 1 million.
So I trained on 1261405
images using 8 GB GPU.
- To train:
python model.py <path-to-training-data> --train true --test false
- To test:
python model.py <path-to-training-data> --train false --test true
- screenlog-train.0: The log file after running
python model.py <path-to-training-data> --train true
in screen - model and logs: google drive
The following preprocessing steps are performed
- Rescaling: Isotropically rescale the image such that the smallest size is randomly drawn from
[256, 512]
. In short isotropically means the ratio of width to height of the original image should match with that of the new image. - Cropping: Randomly crop the image from the rescaled image to get a size of
(224, 224)
. - Augmentation: Augment the data in two ways
- Horizontally flip the image with 50 % probability
- Add PCA as calculated by AlexNet to the processed image to give color shifting.
- Subtract mean: Finally subtract the mean activity from the processed image.
Note: To calculate eigenvalues and eigenvectors for the imagenet dataset will require significant amount of RAM. So the values are taken from stackoverflow and hardcoded while adding PCA.
top1 accuracy:
top5 accuracy:
loss:
- Top1 accuracy: 67.1013%
- Top5 accuracy: 85.1460%