IGP: Efficient Multi-Vector Retrieval via Proximity Graph Index

Introduction

This is a fast algorithm for the multi-vector retrieval problem based on ColBERT.

Requirement (includes but not limited)

Linux OS (Ubuntu-20.04)
pybind, eigen, spdlog, CUDA (option)
some important libraries

How to build and run

Note that the python version must be 3.8

download the ColBERT pretrain model in https://downloads.cs.stanford.edu/nlp/data/colbert/colbertv2/colbertv2.0.tar.gz
decompress the ColBERT model and move it into the file {project_file}/multi-vector-retrieval-data/RawData/colbert-pretrain
mv {project_file}/multi-vector-retrieval-data /home/{username}/Dataset/multi-vector-retrieval
set username in the following scripts described

Note that you should move the project path as /home/{username}/

Install the necessary library when the system reports an error

Important scripts

script/data/build_index.py, build the index of plaid, it also generates the embedding
script/data/build_index_by_sample.py, if you want to test the program in a small dataset, use this one

Note that you can run the other competitors, to install it, you can refer to the installation guide from their repository

script/evaluation/eval_dessert.py,
script/evaluation/eval_emvb.py,
script/evaluation/eval_igp.py,
script/evaluation/eval_muvera.py,
script/evaluation/build_index.py

Some tips for library installation:

pybind: build file, then make install
eigen: sudo apt install libeigen3-dev
to install spdlog, you should install the spdlog-1.11.0 when build and install, run cmake -DCMAKE_POSITION_INDEPENDENT_CODE=ON .. && make -j to build the spdlog
when encountering the error: no CUDA runtime found, using CUDA_HOME = 'xxxx'

It may because of the incompatible version between CUDA and pytorch, the correct answer is to reinstall the pytorch and not do anything for the CUDA A command that has worked before: pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

When using the colbert to connect to huggingface. If it report the error show that huggingface cannot connect, then please add the proxy for downloading. The detail is to add the proxy configuration at the line 'loadedConfig = AutoConfig.from_pretrained(name_or_path)' in ColBERT/colbert/modeling/hf_colbert.py You can learn how to use the proxychains for climbing the "wall"
to let PLAID run in the CPU, you can set the following

os.environ["CUDA_VISIBLE_DEVICES"] = ""

export CUDA_VISIBLE_DEVICES=""

in bash, setting CUDA_VISIBLE_DEVICES=0 to enable

export CUDA_VISIBLE_DEVICES="0"

For Colbert, when want to use GPU for build index / retrieval, set total_visible_gpus in ColBERT/colbert/infra/config/settings, the total_visible_gpus variable

cmake -DCMAKE_BUILD_TYPE=Debug ..

if you find the error AttributeError: module 'faiss' has no attribute 'StandardGpuResources', or find the error about the matrix multiplication for calling the openblas, then you need to uninstall faiss-gpu in pip version and install the faiss-gpu using conda command, pip install faiss-gpu-cu12

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.idea		.idea
baseline		baseline
cmake		cmake
deps		deps
multi-vector-retrieval-data		multi-vector-retrieval-data
script		script
src		src
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
README.md		README.md
build-project.txt		build-project.txt
dataset.md		dataset.md
dataset_info.txt		dataset_info.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

IGP: Efficient Multi-Vector Retrieval via Proximity Graph Index

Introduction

Requirement (includes but not limited)

How to build and run

Important scripts

Some tips for library installation:

ENJOY!

About

Uh oh!

Releases

Packages

Languages

DBGroup-SUSTech/multi-vector-retrieval

Folders and files

Latest commit

History

Repository files navigation

IGP: Efficient Multi-Vector Retrieval via Proximity Graph Index

Introduction

Requirement (includes but not limited)

How to build and run

Important scripts

Some tips for library installation:

ENJOY!

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages