Given a picture with a bird, we are supposed to box the bird.
In src/data directory, images.txt
is the index of all images, bouding_boxex.txt
is the label box of all images and images
contains all images. Box data make up of 4 data: the top left corner coordinate of box, width of box and height of box.
For traditional CNN and FC, it will meet degeneration problems when layers go deep.
In paper Deep Residual Learning for Image Recognition
, they try to solve this problem by using a Residual Block:
These blocks compose ResNet:
I use ResNet-18 in this project by adding a 4-dimension layer after ResNet-18 to predict box's x, y ,w and h.
Loss: smooth l1 loss
Metric: IoU of groound truth and prediction, threshold=0.75
Resize all images to square dimensions (224*224*3
recommended)
Then normalize and standardize all pixel channel.
Split all data into 0.75 training data and 0.25 tesing data. Train network on training data using batch size=128
, epoch=100
and validation split ratio=0.1
Training result:
Testing result:
Red box represents ground truth and green box is the prediction of network.
Failed example:
You should keep the directory structure.
python 3.6
Run pip install -r requirements.txt
In git root directory:
python object_localization.py
to run master script.
Follow the instructed options.
Deep Residual Learning for Image Recognition: https://arxiv.org/pdf/1512.03385.pdf
CKCZZJ
MIT