Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Created Python Code to set up the entire pipeline for benchmarking faiss-index with DuckDB and PGVector #1984

Open
wants to merge 29 commits into
base: master
Choose a base branch
from

Conversation

SeanSong25
Copy link
Contributor

This PR created multiple python classes and facilities aimed at making the benchmarking process of DuckDB and PGVector on HNSW indexes easier. The faiss_index_adaptor.py is the base class for duckdb_faiss_index_adaptor and pgvector_faiss_index_adaptor. It uses the faiss_index_extractor class to download and extract a 768 dimension Faiss index, then creates a database table in DuckDB/PGVector. Currently run_benchmark.py invokes the benchmarking process, see experiment_vectordbs.md for detailed instructions.

Copy link
Member

@lintool lintool left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please better organize files, e.g., scripts should go in scripts/, docs should go in docs/, some files should be checked it, etc.

@@ -1 +0,0 @@
# This is the default directory for document collections. Placeholder so that directory is kept in git.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file shouldn't be removed.

The entire process may take over a day to complete, depending on your hardware set up. This code will download the index, extract the embedded vectors of the index, build the table in duckdb and run the benchmark.

## PGVector
PGVector is an extension of PostgreSQL, so you will need to install PostgreSQL and PGVector. Here, it is assumed that you have a PostgreSQL server running on your local machine, and you have the PGVector extension installed and enabled in PostgreSQL. Make sure you supply the correct database configuration in the `db_config.txt` file. For example:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we install Postgres using conda? e.g., https://anaconda.org/anaconda/postgresql

If so, please provide instructions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants