Skip to content

Latest commit

 

History

History
122 lines (79 loc) · 3.92 KB

ARCHITECTURE.md

File metadata and controls

122 lines (79 loc) · 3.92 KB

sqlite-vec Architecture

Internal documentation for how sqlite-vec works under-the-hood. Not meant for users of the sqlite-vec project, consult the official sqlite-vec documentation for how-to-guides. Rather, this is for people interested in how sqlite-vec works and some guidelines to any future contributors.

Very much a WIP.

vec0

Shadow Tables

xyz_chunks

  • chunk_id INTEGER
  • size INTEGER
  • validity BLOB
  • rowids BLOB

xyz_rowids

  • rowid INTEGER
  • id
  • chunk_id INTEGER
  • chunk_offset INTEGER

xyz_vector_chunksNN

  • rowid INTEGER
  • vector BLOB

xyz_auxiliary

  • rowid INTEGER
  • valueNN [type]

xyz_metadatachunksNN

  • rowid INTEGER
  • data BLOB

xyz_metadatatextNN

  • rowid INTEGER
  • data TEXT

idxStr

The vec0 idxStr is a string composed of single "header" character and 0 or more "blocks" of 4 characters each.

The "header" charcter denotes the type of query plan, as determined by the enum vec0_query_plan values. The current possible values are:

Name Value Description
VEC0_QUERY_PLAN_FULLSCAN '1' Perform a full-scan on all rows
VEC0_QUERY_PLAN_POINT '2' Perform a single-lookup point query for the provided rowid
VEC0_QUERY_PLAN_KNN '3' Perform a KNN-style query on the provided query vector and parameters.

Each 4-character "block" is associated with a corresponding value in argv[]. For example, the 1st block at byte offset 1-4 (inclusive) is the 1st block and is associated with argv[1]. The 2nd block at byte offset 5-8 (inclusive) is associated with argv[2] and so on. Each block describes what kind of value or filter the given argv[i] value is.

VEC0_IDXSTR_KIND_KNN_MATCH ('{')

argv[i] is the query vector of the KNN query.

The remaining 3 characters of the block are _ fillers.

VEC0_IDXSTR_KIND_KNN_K ('}')

argv[i] is the limit/k value of the KNN query.

The remaining 3 characters of the block are _ fillers.

VEC0_IDXSTR_KIND_KNN_ROWID_IN ('[')

argv[i] is the optional rowid in (...) value, and must be handled with sqlite3_vtab_in_first() / sqlite3_vtab_in_next().

The remaining 3 characters of the block are _ fillers.

VEC0_IDXSTR_KIND_KNN_PARTITON_CONSTRAINT (']')

argv[i] is a "constraint" on a specific partition key.

The second character of the block denotes which partition key to filter on, using A to denote the first partition key column, B for the second, etc. It is encoded with 'A' + partition_idx and can be decoded with c - 'A'.

The third character of the block denotes which operator is used in the constraint. It will be one of the values of enum vec0_partition_operator, as only a subset of operations are supported on partition keys.

The fourth character of the block is a _ filler.

VEC0_IDXSTR_KIND_POINT_ID ('!')

argv[i] is the value of the rowid or id to match against for the point query.

The remaining 3 characters of the block are _ fillers.

VEC0_IDXSTR_KIND_METADATA_CONSTRAINT ('&')

argv[i] is the value of the WHERE constraint for a metdata column in a KNN query.

The second character of the block denotes which metadata column the constraint belongs to, using A to denote the first metadata column column, B for the second, etc. It is encoded with 'A' + metadata_idx and can be decoded with c - 'A'.

The third character of the block is the constraint operator. It will be one of enum vec0_metadata_operator, as only a subset of operators are supported on metadata column KNN filters.

The foruth character of the block is a _ filler.