Created by Tsung-Yu Lin, Subhransu Maji and Piotr Koniusz.
This repository contains the code for reproducing the results in our ECCV 2018 paper:
@inproceedings{lin2018o2dp,
Author = {Tsung-Yu Lin and Subhransu Maji and Piotr Koniusz},
Title = {Second-order Democratic Aggregation},
Booktitle = {European Conference on Computer Vision (ECCV)},
Year = {2018}
}
The paper analyzes various feature aggregators in the context of second-order features and proposes γ-democratic pooling which generalizes sum pooling and democratic aggregation. See the project page and the paper for the detail. The code is tested on Ubuntu 14.04 using NVIDIA Titan X GPU and MATLAB R2016a.
- MatConvNet: Our code was developed on the MatConvNet version
1.0-beta24
. - VLFEAT
- bcnn-package: The package includes our implementation of customized layers.
The packages are set up as the git submodules. Check them out by the following commands and follow the instructions on MatConvNet and VLFEAT project pages to install them.
>> git submodule init
>> git submodule update
To run the experiments, download the following datasets and edit the model_setup.m
file to point them to the dataset locations. For instance, you can point to the birds dataset directory by setting opts.cubDir = 'data/cub'
.
- ImageNet LSVRC 2012 pre-trained models: The
vgg-verydeep-16
andreset-101
ImageNet pre-trained models are used as our basic models. Download them from MatConvNet pre-trained models page. - B-CNN fine-tuned models: We also provide the B-CNN fine-tuned models with
vgg-verydeep-16
from which we can extract the CNN features and aggregate them to construct the image descriptor. Download the models for CUB Birds, FGVC Aircrafts, or Stanford Cars to reproduce the accuracy provided in the paper.
Solving the coefficients for γ-democratic aggregation involves sinkhorn iteration. The hyperparameters for the sinkhorn iteration are configurable in the entry codes run_experiments_o2dp.m
and run_experiments_sketcho2dp_resnet.m
. See the comment in the code for the detail.
-
Second-order γ-democratic aggregation: Point the variable
model_path
to the location of the model inrun_experiments_o2dp.m
and run the commandrun_experiments_o2dp(dataset, gamma, gpuidx)
in matlab terminal.- For example:
% gamma is the hyper-parameter gamma for gamma-democratic aggregation % gpuidx is the index of gpu on which you run the experiment run_experiments_o2dp('mit_indoor', 0.3, 1)
-
Classification results: Sum and democratic aggregation can be achieved by setting the proper values of γ. The optimal γ values are indicated in the parenthesis. In general γ=0.5 performs reasonably well. For
DTD
andFMD
these numbers are reported on the first split. For the fine-grained recognition datasets (†) the results are obtained by using the fine-tuned B-CNN models while for the texture and indoor scene datasets the ImageNet pre-trainedvgg-verydeep-16
model is used.Dataset Sum(γ=1) Democratic(γ=0) γ-democratic Caltech UCSD Birds † 84.0 84.7 84.9 (0.5) Stanford Cars † 90.6 89.7 90.8 (0.5) FGVC Aircrafts † 85.7 86.7 86.7 (0.0) DTD 71.2 72.2 72.3 (0.3) FMD 84.6 82.8 84.8 (0.8) MIT Indoor 79.5 79.6 80.4 (0.3)
-
Second-order γ-democratic aggregation in sketch space: Point the variable
model_path
to the location of the model inrun_experiments_sketcho2dp_resnet.m
and run the commandrun_experiments_sketcho2dp_resnet(dataset, gamma, d, gpuidx)
in matlab terminal.- For example:
% gamma is the hyper-parameter gamma for gamma-democratic aggregation % d is the dimension for the sketch space % gpuidx is the index of gpu on which you run the experiment run_experiments_sketcho2dp_resnet('mit_indoor', 0.5, 8192, 1)
-
The script aggregates the second-order ResNet features pre-trained on ImageNet in a 8192-dimensional sketch space with γ-democratic aggregator. With ResNet features the model achieves the following results. For
DTD
andFMD
the accuracy is averaged over 10 splits.DTD FMD MIT Indoor Accuracy 76.2 ∓ 0.7 84.3 ∓ 1.5 84.3