Image-Captioning-PyTorch

This repo contains codes to preprocess, train and evaluate sequence models on Flickr8k Image dataset in pytorch. This repo was a part of a Deep Learning Project for the Machine Learning Sessional course of Department of CSE, BUET for the session January-2020.

Models Experimented with:

Pretrained CNN encoder & LSTM based Decoder
- VGG-16, Inception-v3, Resnet-50, Resnet-101, Resnext-101, Densenet-201
Pretrained Resnet-101 & LSTM with Attention Mechanism

Open Pretrained Attention Model's Notebook or Pretrained MonoLSTM Model's Notebook in colab and execute from top to bottom.

Pre-requisites:

Datasets:
- Flickr8k Dataset: images and annotations
Pre-trained word embeddings:
- Glove Embeddings of 6B words

Data Folder Structure for training using train_torch.py or train_attntn.py:

data/
    flickr8k/
        Flicker8k_Dataset/
            *.jpg
        Flickr8k_text/
            Flickr8k.token.txt
            Flickr_8k.devImages.txt
            Flickr_8k.testImages.txt
            Flickr_8k.trainImages.txt
    glove.6B/
        glove.6B.50d.txt
        glove.6B.100d.txt
        glove.6B.200d.txt
        glove.6B.300d.txt

Pretrained Models:
Some pre-trained weights are provided here

Bleu score comparision of trained models:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!