Skip to content

skasai5296/VSE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

b1315be · May 14, 2020

History

52 Commits
Aug 13, 2019
Sep 12, 2019
Sep 12, 2019
Jul 27, 2019
May 14, 2020
Aug 5, 2019
Sep 3, 2019
May 14, 2020
Aug 13, 2019
May 14, 2020
May 13, 2020
May 14, 2020
May 10, 2020
May 10, 2020

Repository files navigation

VSE: Visual Semantic Embedding in PyTorch

Description

This repository contains the implementation of visual-semantic embedding.
Training and evaluation is done on the MSCOCO dataset.

Requirements (libraries)

python>=3.7
numpy
matplotlib
pytorch>=1.1.0
torchvision
Pillow
faiss-cpu (for nearest neighbor search)
accimage (optional, for fast loading of images)
torchtext (for vocabulary)
spacy (for spacy tokenizer)

Run the below command before training.

$ python -m spacy download en

For Anaconda Users

  • environment.yml file contains environment details for Anaconda users.
  • run conda env create -f environment.yml && conda activate mse for simple use.

Preparation of Dataset

Go to the directory where the data should be and run download_coco.sh.
This directory would be denoted $ROOTPATH.

Training

$ python train.py --root_path $ROOTPATH

Reported Scores

Image to Caption

R@1 R@5 R@10 Med r
VSE++ 41.3 71.1 81.2 2.0
Our Implementation 31.7 61.5 72.6 3.0

Caption to Image

R@1 R@5 R@10 Med r
VSE++ 30.3 59.4 72.4 4.0
Our Implementation 22.4 48.8 61.9 6.0

Evaluation, Visualization

$ python eval.py --root_path $ROOTPATH --checkpoint hogehoge.ckpt --image_path $IMAGE --caption $CAPTION

$IMAGE denotes the path to reference image. Defaults to samples/sample1.jpg.
$CAPTION denotes the reference caption. Defaults to "the cat is walking on the street"
Retrieval is done on MSCOCO validation set.

TODO

  • add Flickr8k
  • add Flickr30k
  • clean up validation
  • find optimal hyperparams

About

Visual-semantic embedding with PyTorch

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published