Automatic Image Captioning, using Deep Learning and Flickr-8k Dataset. Also made a comparison between Xception Model and Inception Model.
This is the easiest way to generate captions and alt text for all kind of images using Convolutional Neural Networks and a type of Recurrent Neural Network (LSTM).
The image features will be extracted from CNN models trained on the imagenet dataset (see below) and then the features are fed into the LSTM model which will be responsible for generating the image captions.
This Repo revolves around 2 Models provided by Keras.
- Features Extracted can be found here
- Dataset used can be found here
- Jupyter Notebooks can be found here
- Models Trained can be found here
- Requirements and dependencies can be found here
Want to contribute? Suggestions, Error reporting, Bug Solving are highly appreciated, please open an Issue and/or PR here
- Setup a Virtual Environment (HIGHLY RECOMMENDED)
- Activate the Environment.
- Install Requirements, use
pip3 install -r requirements.txt
- NOTE: A GPU accelerated hardware is recommended, after
TF v2.1
, there is no need to install GPU separately. So no need to usepip3 install tensorflow-gpu
For GPU, separate Guidelines are provided here.
- NOTE: A GPU accelerated hardware is recommended, after
- See the Google Drive links for features, dataset and models.
- Download the required files.
It is recommended to train these Neural Networks using GPU accelerated hardware. User first need to have a CUDA enabled Graphics Card, if this condition is met, Download CUDA toolkit and cuDNN library.
For installation and help, these links are helpful:
Official CUDA Installation Guide
Official CUDNN Installation Guide
- Start with changing the hardcoded paths in the python scripts.
- If yu want to train the Network yourself (GPU recommended), run the whole script.
- You can skip, Feature Extraction and Selection and Model training by downloading the required files.
- Comment out the code you wnt to skip.
- Classification Task is separately marked.
- For only classification, execute only those functions which are invoked during the task.
- Check Notebooks for an interactive example.
- Inception Model uses GloVe word vectors, see this, download the file beforehand.
Made by Vybhav Chaturvedi. I have plans to update this project even further but that is beyond he scopes of this Repo, Rotten-Scripts is not meant for Deep Learning but is a accumulation of scripts in multiple languages.
Check this Repo, if you have further interests in Preprocessing using word ranking, and BLEU.
Using GPU for training these networks can lead to Memory overflow errors, long sessions can lead to overheating issues and can cause similar problems related to GPU computing.
Carefully read the CUDA guidelines to avoid any problems.