This library provides code for training and evaluating neural module networks (NMNs). An NMN is a neural network that is assembled dynamically by composing shallow network fragments called modules into a deeper structure. These modules are jointly trained to be freely composable. For a general overview to the framework, refer to:
Neural module networks. Jacob Andreas, Marcus Rohrbach, Trevor Darrell and Dan Klein. CVPR 2016.
Learning to compose neural networks for question answering. Jacob Andreas, Marcus Rohrbach, Trevor Darrell and Dan Klein. NAACL 2016.
At present the code supports predicting network layouts from natural-language strings, with end-to-end training of modules. Various extensions should be straightforward to implement—alternative layout predictors, supervised training of specific modules, etc.
Please cite the CVPR paper for the general NMN framework, and the NAACL paper
for dynamic structure selection. Feel free to email me at
[email protected] if you have questions. This
code is released under the Apache 2 license, provided in LICENSE.txt
.
You will need to build my fork of the excellent
ApolloCaffe library. This fork may be found at
jacobandreas/apollocaffe, and
provides support for a few Caffe layers that haven't made it into the main
Apollo repository. Ordinary Caffe users: note that you will have to install the
runcython
Python module in addition to the usual Caffe dependencies.
One this is done, update APOLLO_ROOT
at the top of run.sh
to point to your
ApolloCaffe installation.
You will also need to install the following packages:
colorlogs, sexpdata
All experiment data should be placed in the data
directory.
In data
, create a subdirectory named vqa
. Follow the VQA setup
instructions to
install the data into this directory. (It should have children Annotations
,
Images
, etc.)
We have modified the structure of the VQA Images
directory slightly. Images
should have two subdirectories, raw
and conv
. raw
contains the original
VQA images, while conv
contains the result of preprocessing these images with
a 16-layer VGGNet as
described in the paper. Every file in the conv
directory should be of the form
COCO_{SETNAME}_{IMAGEID}.jpg.npz
, and contain a 512x14x14 image map in zipped
numpy format. Here's a gist
with the code I use for doing the extraction.
Download the GeoQA dataset from the LSP
website, and unpack it into data/geo
.
Every dataset fold should contain a file of parsed questions, one per line, formatted as S-expressions. If multiple parses are provided, they should be semicolon-delimited. As an example, for the question "is the train modern" we might have:
(is modern);(is train);(is (and modern train))
For VQA, these files should be named Questions/{train2014,val2014,...}.sps2
.
For GeoQA, they should be named environments/{fl,ga,...}/training.sps
. Parses
used in our papers are provided in extra
and should be installed in the
appropriate location. The VQA parser script is also located under extra/vqa
;
instructions for running are provided in the body of the script.
You will first need to create directories vis
and logs
(which respectively
store run logs and visualization code)
Different experiments can be run by providing an appropriate configuration file
on the command line (see the last line of run.sh
). Examples for VQA and GeoQA
are provided in the config
directory.
Looking for SHAPES? I haven't finished integrating it with the rest of the
codebase, but check out the shapes
branch of this repository for data and
code.
- Configurable data location
- Model checkpointing