Skip to content

It uses a encoder-decoder architecture to caption Images. I used a pretrained ResNet model as the encoder to extract image features, combined with an LSTM to generate captions using the extracted features.

License

Notifications You must be signed in to change notification settings

bcsamrudh/Image-Captioning-using-Attention-Mechanism

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Image-Captioning-using-Attention-Mechanism

CourseWork Project, Mathematics for Machine Learning

Teamates

  1. Adya Bhat
  2. Aditya G H

It uses a encoder-decoder architecture to caption Images. I used a pretrained ResNet model as the encoder to extract image features, combined with an LSTM to generate captions using the extracted features.

Model Architecture

image

Dataset

Flickr 8K - It consists of a folder Images with images (.jpg format), and a a text file captions.txt. Each image has five captions each. containing the captions corresponding to the image file names.

The format of the text file is:

<image-filename>,<caption>\n

Results

Result

About

It uses a encoder-decoder architecture to caption Images. I used a pretrained ResNet model as the encoder to extract image features, combined with an LSTM to generate captions using the extracted features.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published