If you want to evaluate your own descriptors on our image-based reconstruction benchmark, please follow steps 1-4 below to download the datasets and to run the benchmark. In addition, more detailed instructions for each individual step can be found at the end of this document. It is recommended to first run the pipeline on a smaller dataset (e.g., Fountain or Herzjesu) to test whether the computed results make sense.
-
Requirements:
-
Computer with CUDA-enabled GPU
-
Matlab R2016b or newer (for GPU feature matching)
-
VLFeat toolbox for Matlab
-
git clone https://github.com/ahojnnes/local-feature-evaluation.git git clone https://github.com/colmap/colmap cd colmap git checkout 58d966c cp ../local-feature-evaluation/colmap-tools/* src/tools mkdir build cd build cmake .. -DTESTS_ENABLED=OFF make
-
-
Download the datasets:
mkdir datasets cd datasets wget https://cvg-data.inf.ethz.ch/local-feature-evaluation-schoenberger2017/Databases.tar.gz wget https://cvg-data.inf.ethz.ch/local-feature-evaluation-schoenberger2017/Strecha-Fountain.zip wget https://cvg-data.inf.ethz.ch/local-feature-evaluation-schoenberger2017/Strecha-Herzjesu.zip wget https://cvg-data.inf.ethz.ch/local-feature-evaluation-schoenberger2017/South-Building.zip wget http://landmark.cs.cornell.edu/projects/1dsfm/images.Madrid_Metropolis.tar wget http://landmark.cs.cornell.edu/projects/1dsfm/images.Gendarmenmarkt.tar wget http://landmark.cs.cornell.edu/projects/1dsfm/images.Tower_of_London.tar wget http://landmark.cs.cornell.edu/projects/1dsfm/images.Alamo.tar wget http://landmark.cs.cornell.edu/projects/1dsfm/images.Roman_Forum.tar wget http://vision.soic.indiana.edu/disco_files/ArtsQuad_dataset.tar wget http://www.robots.ox.ac.uk/~vgg/data/oxbuildings/oxbuild_images.tgz
-
Extract the datasets:
tar xvfz Databases.tar.gz unzip Strecha-Fountain.zip unzip Strecha-Herzjesu.zip unzip South-Building.zip tar xvf images.Madrid_Metropolis.tar tar xvf images.Gendarmenmarkt.tar mv home/wilsonkl/projects/SfM_Init/dataset_images/Gendarmenmarkt/images Gendarmenmarkt/images rm -r home tar xvf images.Madrid_Metropolis.tar tar xvf images.Tower_of_London.tar tar xvf images.Alamo.tar tar xvf images.Roman_Forum.tar tar xvf ArtsQuad_dataset.tar mkdir -p Oxford5k/images cd Oxford5k/images tar xfvz ../../oxbuild_images.tgz cd ../..
-
Download and extract keypoints:
If you evaluate just a feature descriptor without a feature detection component, you should use the provided SIFT keypoints:
wget https://cvg-data.inf.ethz.ch/local-feature-evaluation-schoenberger2017/Keypoints.tar.gz tar xvfz Keypoints.tar.gz
-
Run the evaluation:
You can now run the evaluation scripts for every dataset by first running the matching pipeline using the Matlab script
scripts/matching_pipeline.m
. All locations that require changes by you (the user) are marked withTODO
in the Matlab script.After finishing the matching pipeline, run the reconstruction using:
python scripts/reconstruction_pipeline.py \ --dataset_path datasets/Fountain \ --colmap_path colmap/build/src/exe
At the end of the reconstruction pipeline output, you should see all relevant statistics of the benchmark. For example:
============================================================================== Raw statistics ============================================================================== {'num_images': 11, 'num_inlier_pairs': 55, 'num_inlier_matches': 120944} {'num_reg_images': 11, 'num_sparse_points': 14472, 'num_observations': 68838, 'mean_track_length': 4.756633, 'num_observations_per_image': 6258.0, 'mean_reproj_error': 0.384562, 'num_dense_points': 298634} ============================================================================== Formatted statistics ============================================================================== | Fountain | METHOD | 11 | 11 | 14472 | 68838 | 4.756633 | 6258.0 | 0.384562 | 298634 | | | | | 55 | 120944 |
Alternatively, you can find more details about each individual step in the pipeline scripts above in the detailed instructions below.
-
Compute the keypoints:
The keypoints for each image
${IMAGE_NAME}
in theimages
folder are stored in a binary file${IMAGE_NAME}.bin
in thekeypoints
folder.-
Using the provided SIFT keypoints:
wget https://cvg-data.inf.ethz.ch/local-feature-evaluation-schoenberger2017/Keypoints.tar.gz tar xvfz Keypoints.tar.gz
-
Using your own keypoints:
The keypoints for each image are stored in a binary file of the format:
<N><D><KEY_1><KEY_2><...><KEY_N>
where
N
is the number of keypoints as a signed 4-byte integer,D = 4
is a signed 4-byte integer denoting the number of keypoint properties, andKEY_I
is one single-precision floating point vector withD = 4
elements. In total, this binary file should consist of two signed 4-byte integers followed byN x D
single-precision floating point values storing theN x 4
keypoint matrix in row-major format. In this matrix, each row contains thex
,y
,scale
,orientation
properties of the keypoint.
Note that we provide the Matlab function
scripts/read_keypoints.m
andscripts/write_keypoints.m
to read and write keypoints:keypoints = read_keypoints('Fountain/keypoints/0000.png.bin'); write_keypoints('Fountain/keypoints/0000.png.bin', keypoints);
The corresponding patches of each keypoint can be easily extracted using the provided
scripts/extract_patches.m
Matlab function:image = single(rgb2gray(image)); patches = extract_patches(image, keypoints, 32);
where
32
is the radius of the extracted patch centered at the keypoint. -
-
Compute the descriptors:
For each image
images/${IMAGE_NAME}
and each keypoint inkeypoints/${IMAGE_NAME}.bin
, you must save a corresponding descriptor filedescriptors/${IMAGE_NAME}.bin
in the following format:<N><D><DESC_1><DESC_2><...><DESC_N>
where
N
is the number of descriptors as a signed 4-byte integer,D
is the dimensionality as a signed 4-byte integer, andDESC_I
is one single-precision floating point vector withD
elements. In total, this binary file should consist of two signed 4-byte integers followed byN x D
single-precision floating point values storing theN x D
descriptor matrix in row-major format. Note that we provide the Matlab functionsscripts/read_descriptors.m
andscripts/write_descriptors.m
to read and write your descriptors:keypoints = read_keypoints('Fountain/keypoints/0000.png.bin'); patches = extract_patches('Fountain/images/0000.png', keypoints, 32); descriptors = your_descriptor_function(keypoints, patches); assert(size(keypoints, 1) == size(descriptors, 1)); write_descriptors('Fountain/descriptors/0000.png.bin', descriptors);
-
Build the visual vocabulary:
For matching the descriptors of the larger datasets, you need to build a visual vocabulary for your descriptor. This is done using the Oxford5k dataset. First, you must compute features for all images in the dataset as described in the previous steps. Note that if you use your own keypoint detector, every image should have around 1000 features on average in order to obtain a good quantization of the descriptor space. You can then build a visual vocabulary tree using the
vocab_tree_builder_float
binary:./colmap/build/src/tools/vocab_tree_builder_float \ --descriptor_path Oxford5k/descriptors \ --database_path Oxford5k/database.db \ --vocab_tree_path Oxford5k/vocab-tree.bin
Note that if your descriptors have a dimensionality different from 128, you have to change the
kDescDim
values in thevocab_tree_builder_float.cc
andvocab_tree_retriever_float.cc
source files accordingly. -
Match the descriptors:
As an input to image-based reconstruction, you need to compute 2D-to-2D feature correspondences between pairs of images. For the smaller datasets (Fountain, Herzjesu, South Building, Madrid Metropolis, Gendarmenmarkt, Tower of London), this must be done exhaustively for all image pairs. For the larger datasets (Alamo, Roman Forum, Cornell), this must be done by matching each image against its nearest neighbor image using the Bag-of-Words image retrieval system of COLMAP. First, make sure that all keypoints and descriptors exist. Then, run the corresponding section in the
scripts/matching_pipeline.m
script. It is strongly recommended that you run this step on a machine with a CUDA-enabled GPU to speedup the matching process. The benchmark pipeline script should run the matching fully automatically end-to-end, but you can find additional details below.-
Exhaustive matching: Note that this step can take a significant amount of time. For example, the largest dataset for exhaustive matching (Madrid Metropolis) takes around 16 hours to match on a single NVIDIA Titan X GPU. It is therefore recommended to check whether everything works on one of the smaller datasets before running the bigger datasets. The code for this matching module is in
scripts/exhaustive_matching.m
. -
Nearest neighbor matching: First, you need to execute the image retrieval pipeline to find the most similar image in the dataset for every image in the dataset:
./colmap/build/src/tools/vocab_tree_retriever_float \ --descriptor_path Roman_Forum/descriptors \ --database_path Roman_Forum/database.db \ --vocab_tree_path Roman_Forum/vocab-tree.bin \ > Roman_Forum/retrieval.txt
Then, run the
scripts/approximate_matching.m
Matlab script.
In both cases, the output will be written to the
matches
folder inside the dataset folder. The matches for image pair${IMAGE_NAME1}
and${IMAGE_NAME2}
must be written to thematches/${IMAGE_NAME1}---${IMAGE_NAME2}.bin
binary file in the format:<N><D><MATCH_1><MATCH_2><...><MATCH_N>
where
N
is the number of matches as a signed 4-byte integer,D = 2
is a signed 4-byte integer denoting the number of columns of the match matrix, andMATCH_I
is auint32
vector of two elements specifying the zero-based indices of the corresponding keypoints in${IMAGE_NAME1}
and${IMAGE_NAME2}
. In total, this binary file should consist of two signed 4-byte integers followed byN x 2
uint32
values storing theN x 2
match matrix in row-major format. In this matrix, each row contains the indices of one match.Note that we provide the Matlab function
scripts/write_matches.m
to write the matches in this format:write_matches('Fountain/matches/0000.png---0001.png.bin', matches);
Here, the matches matrix contains the matching keypoints using one-based indexing as used by Matlab:
keypoints1 = read_keypoints('Fountain/keypoints/0000.png.bin'); keypoints2 = read_keypoints('Fountain/keypoints/0001.png.bin'); matching_keypoints1 = keypoints1(matches(:,1)); matching_keypoints2 = keypoints2(matches(:,2));
-
-
Import the features and matches into COLMAP:
Run the Python script
scripts/colmap_import.py
:python scripts/colmap_import.py --dataset_path path/to/Fountain
You can now verify the features and matches using COLMAP by opening the COLMAP GUI and clicking
Processing > Database management
. By selecting an image and clickingShow image
you can see the detected keypoints. By clickingShow matches
you can see the feature matches for each image.At this point, the feature matches are not geometrically verified, which is why there are no inlier matches in the database. To verify the matches, you must run the COLMAP
matches_importer
executable:./colmap/build/src/exe/matches_importer \ --database_path path/to/Fountain/database.db \ --match_list_path path/to/Fountain/image-pairs.txt \ --match_type pairs
Alternatively, you can run the same operation from the COLMAP GUI by clicking
Processing > Feature matching
and then selecting theCustom
tab. Here, selectImage pairs
as match type and select thepath/to/Fountain/image-pairs.txt
as the match list path. Then, run the geometric verification by clickingRun
.To visualize the geometrically verified inlier matches, you can again use the database management tool. Alternatively, you can visualize the successfully verified image pairs by clicking
Extras > Show match matrix
. -
Run the sparse reconstruction:
-
From the command-line:
./colmap/build/src/exe/colmap mapper \ --database_path path/to/Fountain/database.db \ --image_path path/to/Fountain/images \ --export_path path/to/Fountain/sparse
-
From the GUI: Open the COLMAP GUI, click
File > New project
,Open
thepath/to/Fountain/database.db
database file and image path. Next, clickReconstruction > Start reconstruction
.
-
-
Run the dense reconstruction:
Now, we run the dense reconstruction on the reconstructed sparse model with the most registered images. To find the largest sparse model, you can use the command:
./colmap/build/src/exe/model_analyzer \ --path path/to/Fountain/sparse/0
Here,
0
is the folder containing the 0-th reconstructed sparse model. Then, execute the following commands on the largest sparse model, as determined previously (in this case it is the 0-th model):mkdir -p path/to/Fountain/dense/0 ./colmap/build/src/exe/colmap image_undistorter \ --image_path path/to/Fountain/images \ --input_path path/to/Fountain/sparse/0 \ --export_path path/to/Fountain/dense/0 \ --max_image_size 1200 ./colmap/build/src/exe/colmap patch_match_stereo \ --workspace_path path/to/Fountain/dense/0 \ --PatchMatchStereo.geom_consistency false ./colmap/build/src/exe/colmap stereo_fusion \ --workspace_path path/to/Fountain/dense/0 \ --StereoFusion.min_num_pixels 5 \ --input_type photometric \ --output_path path/to/Fountain/dense/0/fused.ply
-
Extract the statistics:
If you did not execute the evaluation script, you must now extract the relevant statistics from the output manually.