ECE 285 VQA Project (Group DDST)

Team Members

Dan He
Duke Lin
Sneha Kondur
Tyler Farnan

Requirements

Need python-3.6 -Run pip install pytorch, tqdm, pickle, torchvision, h5py, scipy
Optional: OpenCV for viewing glimpse maps

Description

This project uses the network from https://arxiv.org/abs/1704.03162 to solve the visual question answering problem. We extend the paper by testing implementations of the network with the addition or ablation of batch normalization and dropout. The dataset can be downloaded from here: https://visualqa.org/download.html. This needs to be preprocessed (requires 99GB of space) so this is not a part of the repo. Also the pretrained network weights for all four experiments are not a part of the repo as they are above the 100MB limit, they can be found here: https://drive.google.com/drive/folders/1iSwnzx4nl-MU8RuQ6T--VIl32cOYuLId?usp=sharing

To demo:

Download demo_weights.zip from: https://drive.google.com/drive/folders/1iSwnzx4nl-MU8RuQ6T--VIl32cOYuLId?usp=sharing
Unzip demo_weight.zip
Put the weight file in the same directory
Follow steps in demo.ipynb

To train:

Download dataset from: https://visualqa.org/download.html
-VQA Annotations, Input Questions, Input Images (training, validation, and testing)
Unzip into ./cyanogenoid_code/data/ (make the directory)
In ./cyanogenoid_code/
-Run preprocess-images.py and preprocess-vocab.py
-Run train.py

Code

root/cyanogenoid_code/
|
+----demo
|       |   vqaTools/vqa.py -- class to view images and questions 
|       |   demo jpgs -- images for demo.ipynb
+----resnet
|       |   -- directory for preprocessing data
|
|    config.py -- set up the parameters and paths for preprocessing and training
|    data.py -- class for loading the vocab and iamge features from processed dataset
|    model.py -- original model layout
|    model_batchnormDP0.py -- model layout with batch normalization and no dropout
|    model_batchnormDP05.py -- model layout with batch normalization and 50% dropout
|    model_dropout0.py -- model layout with no batch normalization and no dropout
|    model_dropout05.py -- model layout with no batch normalization and 50% dropout
|    preprocess-images.py -- preprocess the images through resnet, creates image filters (resnet-14x14.h5)
|    preprocess-vocab.py -- preprocess the tokens for vocab with an lstm, creates vocabulary for attention (vocab.json)
|    train.py -- trains the model, can retrain from another model by setting parameters in file
|    utils.py -- utility functions (getting transform, paths, accuracy)
|    demo.ipynb -- demo model with batch normalization and 50% dropout, randomly view 1 of 3 preset images and questions through network
|    graph_attention_QA.ipynb -- notebook used to view images, questions, network outputs, and glimpse maps

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

ECE 285 VQA Project (Group DDST)

Team Members

Requirements

Description

To demo:

To train:

Code

Files

README.md

Latest commit

History

README.md

File metadata and controls

ECE 285 VQA Project (Group DDST)

Team Members

Requirements

Description

To demo:

To train:

Code