Internal documentation for how sqlite-vec
works under-the-hood. Not meant for
users of the sqlite-vec
project, consult
the official sqlite-vec
documentation for
how-to-guides. Rather, this is for people interested in how sqlite-vec
works
and some guidelines to any future contributors.
Very much a WIP.
chunk_id INTEGER
size INTEGER
validity BLOB
rowids BLOB
rowid INTEGER
id
chunk_id INTEGER
chunk_offset INTEGER
rowid INTEGER
vector BLOB
rowid INTEGER
valueNN [type]
rowid INTEGER
data BLOB
rowid INTEGER
data TEXT
The vec0
idxStr is a string composed of single "header" character and 0 or
more "blocks" of 4 characters each.
The "header" charcter denotes the type of query plan, as determined by the
enum vec0_query_plan
values. The current possible values are:
Name | Value | Description |
---|---|---|
VEC0_QUERY_PLAN_FULLSCAN |
'1' |
Perform a full-scan on all rows |
VEC0_QUERY_PLAN_POINT |
'2' |
Perform a single-lookup point query for the provided rowid |
VEC0_QUERY_PLAN_KNN |
'3' |
Perform a KNN-style query on the provided query vector and parameters. |
Each 4-character "block" is associated with a corresponding value in argv[]
.
For example, the 1st block at byte offset 1-4
(inclusive) is the 1st block and
is associated with argv[1]
. The 2nd block at byte offset 5-8
(inclusive) is
associated with argv[2]
and so on. Each block describes what kind of value or
filter the given argv[i]
value is.
argv[i]
is the query vector of the KNN query.
The remaining 3 characters of the block are _
fillers.
argv[i]
is the limit/k value of the KNN query.
The remaining 3 characters of the block are _
fillers.
argv[i]
is the optional rowid in (...)
value, and must be handled with
sqlite3_vtab_in_first()
/ sqlite3_vtab_in_next()
.
The remaining 3 characters of the block are _
fillers.
argv[i]
is a "constraint" on a specific partition key.
The second character of the block denotes which partition key to filter on,
using A
to denote the first partition key column, B
for the second, etc. It
is encoded with 'A' + partition_idx
and can be decoded with c - 'A'
.
The third character of the block denotes which operator is used in the
constraint. It will be one of the values of enum vec0_partition_operator
, as
only a subset of operations are supported on partition keys.
The fourth character of the block is a _
filler.
argv[i]
is the value of the rowid or id to match against for the point query.
The remaining 3 characters of the block are _
fillers.
argv[i]
is the value of the WHERE
constraint for a metdata column in a KNN
query.
The second character of the block denotes which metadata column the constraint
belongs to, using A
to denote the first metadata column column, B
for the
second, etc. It is encoded with 'A' + metadata_idx
and can be decoded with
c - 'A'
.
The third character of the block is the constraint operator. It will be one of
enum vec0_metadata_operator
, as only a subset of operators are supported on
metadata column KNN filters.
The foruth character of the block is a _
filler.