Seong Jae Hwang, Sathya N. Ravi, Zirui Tao, Hyunwoo J. Kim, Maxwell D. Collins, Vikas Singh, "Tensorize, Factorize and Regularize: Robust Visual Relationship Learning", Computer Vision and Pattern Recognition (CVPR), 2018.
http://pages.cs.wisc.edu/~sjh/
In total, there are files:
- Images part1 part2
- Image metadata
- VG scene graph
The image database file imdb_1024.h5
is generated through files (1-3).
- Scene graph database: VG-SGG.h5
- Scene graph database metadata: VG-SGG-dicts.json
- RoI proposals: proposals.h5
- RoI distribution: bbox_distribution.npy
- Faster-RCNN model
-
Model link, extract all files. In
./checkpoints
, there are two files for baseline models./checkpoints/Xu_2
(by Xu et al.),./checkpoints/CKP_Vrd
(by Lu et al.). -
Save Faster-RCNN model to
data/pretrained
. -
Full imdb dataset, image metadata, VG Scene graph, ROI database and its metadata. Check that all files are under
data/vg
directory and contain following 5 files:imdb_1024.h5 bbox_distribution.npy dproposals.h5 VG-SGG-dicts.json VG-SGG.h5
-
Download the VisualGenome image_metadata and its scece_graph, extract the files and place all the jason files under
./data_tools/VG
: Check the following 3 files are underdata_tools/VG
directory:images_data.jason objects.jason relationships.jason
You need the following 5 files:
- Image database:
imdb_1024.h5
- Scene graph database: VG-SGG.h5
- Scene graph database metadata: VG-SGG-dicts.json
- RoI proposals: proposals.h5
- RoI distribution: bbox_distribution.npy
(i). Download dataset images. part1 part2
(ii). Save Faster-RCNN model to data/pretrained
.
(iii). Place all the json files under data_tools/VG/
. Place the images under data_tools/VG/images
(iii). Create image database file imdb_1024.h5
by executing ./create_imdb.sh
in this directory. This script creates a hdf5 databse of images imdb_1024.h5
. The longer dimension of an image is resized to 1024 pixels and the shorter side is scaled accordingly. You may also create a image database of smaller dimension by editing the size argument of the script. You may skip to (vii) if you chose to downloaded (2-4).
(iv). Create an ROI database and its metadata by executing ./create_roidb.sh
in this directory. The scripts creates a scene graph database file VG-SGG.h5
and its metadata VG-SGG-dicts.json
. By default, the script reads the dimensions of the images from the imdb file created in (iii). If your imdb file is of different size than 512 and 1024, you must add the size to the img_long_sizes list variable in the vg_to_roidb.py script.
(v). Use the script provided by py-faster-rcnn to generate (4)proposal.h5
.
(vi). Change line 93 of tools/train_net.py to True to generate (5) bbox_distribution.npy
.
(vii). Finally, place (1-5) in data/vg
.
(viii). Check that all files are under data/vg
directory and contain following 5 files:
imdb_1024.h5
bbox_distribution.npy
dproposals.h5
VG-SGG-dicts.json
VG-SGG.h5
required dependencies:
- Python 2.7
- TensorFlow r0.12
- h5py
- numpy 1.11.0
- matplotlib
- scipy 0.12.0
- pyyaml
- easydict
- cython
- Pillow 2.3.0
- graphviz (optional, if you wish to visualize the graph structure)
- CUDA 8.0
- Create python 2.7 environment:
conda create -n tfr python=2.7
source activate tfr
- Installing dependenciy packages:
pip install -r requirement.txt
(helpful instruction here for installing tensorflow r0.12 on ubuntu 14.04/16.04 and associated software supports).
- After you have installed all the dependencies, run the following command to compile nms and bbox libraries:
cd lib
make
- Follow this this instruction to see if you can use the pre-compiled roi-pooling custom op or have to compile the op by yourself.
1.Run
./experiments/scripts/train.sh dual_graph_vrd_final 2 CHECKPOINT_DIRECTORY GPU_ID SIGMA
The program saves a checkpoint to ./checkpoints/<_CHECKPOINT_DIRECTORY_>/
every 50000 iterations. Training a full model on a desktop with Intel i7 CPU, 64GB memory, and a TitanX graphics card takes around 20 hours. You may use tensorboard to visualize the training process. By default, the tf log directory is set to checkpoints/<_CHECKPOINT_DIRECTORY_>/tf_logs/
.
- Run
./experiments/scripts/test.sh <gpu_id> <checkpoint_dir> <checkpoint_file prefix> <model_options> <number_of_inference_for_dual_graph_vrd_fianl> <number_images> <mode>
Where <model_options> are:
dual_graph_vrd_final by Xu et al (where our implementation is based on).
vrd by Lu et al.
Three evaluation are:
sg_cls: predict the predicated object and relationship (predicate) given the ground truth bounding boxes
sg_det (all): predicting object classification, relationship (predicate) prediction, using the proposed bounding box from the regional proposal network as object proposals
e.g.
/experiments/scripts/test.sh 0 CHECKPOINT_DIRECTORY FILE_PREFIX dual_graph_vrd_final 2 100 all
- Run the same scripts in Evaluation: with one of the following three modes:
viz_cls: visualize the sg_cls results
viz_det: visualize the sg_det results
viz_gt: visualizing the ground truth
Note: If the code is fetched from Xu et al.'s scene graph repository, then
- Change the line on 26 at lib/roi_data_layer/minibatch.py with the following code (Learn more about why doing this here):
fg_rois_per_image = np.round(cfg.TRAIN.FG_FRACTION * rois_per_image).astype(np.int)
- Comment out code block from line 76 to 79 at tools/test_net.py, as checkpoints does not contain .ckpt files explicitly and tf.saver only needs correct file prefix to succesfully restore model. Learn more about here.