Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added hnsw.ef_search variable to change expansion factor during search at runtime #199

Merged
merged 10 commits into from
Oct 25, 2023
Merged
17 changes: 16 additions & 1 deletion src/hnsw/options.c
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@
static relopt_kind ldb_hnsw_index_withopts;

int ldb_hnsw_init_k;
int ldb_hnsw_ef_search;

// this variable is only set during testing and controls whether
// certain elog() calls are made
Expand Down Expand Up @@ -188,7 +189,7 @@ void _PG_init(void)

add_int_reloption(ldb_hnsw_index_withopts,
"ef",
"HNSW ef-construction hyperparameter",
"HNSW ef-search hyperparameter",
HNSW_DEFAULT_EF,
1,
HNSW_MAX_EF
Expand Down Expand Up @@ -220,6 +221,20 @@ void _PG_init(void)
NULL,
NULL);

DefineCustomIntVariable("hnsw.search_ef",
"Expansion factor to use during vector search in a scan",
"Valid values are in range [1, 400]",
&ldb_hnsw_ef_search,
// HNSW_DEFAULT_EF,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this supposed to be commented out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will remove, doesn't impact functionality at all

0, // sentinel value
1,
HNSW_MAX_EF,
PGC_USERSET,
0,
NULL,
NULL,
NULL);

DefineCustomBoolVariable("_lantern_internal.is_test",
"Whether or not the DB is in a regression test",
"set this to 1 to enable extra logging for use in lanterndb regression tests",
Expand Down
1 change: 1 addition & 0 deletions src/hnsw/options.h
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ usearch_metric_kind_t ldb_HnswGetMetricKind(Relation index);
bytea* ldb_amoptions(Datum reloptions, bool validate);

extern int ldb_hnsw_init_k;
extern int ldb_hnsw_ef_search;
extern bool ldb_is_test;

#endif // LDB_HNSW_OPTIONS_H
22 changes: 18 additions & 4 deletions src/hnsw/scan.c
Original file line number Diff line number Diff line change
Expand Up @@ -159,6 +159,8 @@ bool ldb_amgettuple(IndexScanDesc scan, ScanDirection dir)
// about the furtheest neighbors
Assert(ScanDirectionIsForward(dir));

int ef = ldb_hnsw_ef_search; // 0 if not set, but we pass it into usearch_custom_ef anyway since 0 is also a
// sentinel value there
if(scanstate->first) {
int num_returned;
Datum value;
Expand Down Expand Up @@ -193,8 +195,14 @@ bool ldb_amgettuple(IndexScanDesc scan, ScanDirection dir)
}

ldb_dlog("LANTERN querying index for %d elements", k);
num_returned = usearch_search(
scanstate->usearch_index, vec, usearch_scalar_f32_k, k, scanstate->labels, scanstate->distances, &error);
num_returned = usearch_search_custom_ef(scanstate->usearch_index,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be better to just add ef as optional parameter to usearch_search and if ef != 0, use the passed ef, otherwise use the ef from usearch header

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would do this but C doesn't support optional parameters in functions. We can make usearch_search a variadic function but that seems like overkill.

The "ef != 0 then use passed ef, otherwise use ef from usearch header" logic you mention is already added in the usearch library. So passing the ef from here, regardless of what it is, will lead to this desired functionality

Maybe just rename usearch_search_custom_ef to usearch_search? Or just to make it more readable, check in this code if ef != 0 then use usearch_search_custom_ef and if not, use the original usearch_search?

vec,
usearch_scalar_f32_k,
k,
ef,
scanstate->labels,
scanstate->distances,
&error);
ldb_wal_retriever_area_reset(scanstate->retriever_ctx, NULL);

scanstate->count = num_returned;
Expand Down Expand Up @@ -228,8 +236,14 @@ bool ldb_amgettuple(IndexScanDesc scan, ScanDirection dir)
scanstate->labels = repalloc(scanstate->labels, k * sizeof(usearch_label_t));

ldb_dlog("LANTERN - querying index for %d elements", k);
num_returned = usearch_search(
scanstate->usearch_index, vec, usearch_scalar_f32_k, k, scanstate->labels, scanstate->distances, &error);
num_returned = usearch_search_custom_ef(scanstate->usearch_index,
vec,
usearch_scalar_f32_k,
k,
ef,
scanstate->labels,
scanstate->distances,
&error);
ldb_wal_retriever_area_reset(scanstate->retriever_ctx, NULL);

scanstate->count = num_returned;
Expand Down
2 changes: 1 addition & 1 deletion third_party/usearch