Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added hnsw.ef_search variable to change expansion factor during search at runtime #199

Merged
merged 10 commits into from
Oct 25, 2023

Conversation

therealdarkknight
Copy link
Contributor

@therealdarkknight therealdarkknight commented Oct 11, 2023

fixes #171
Created a new custom variable hnsw.search_ef (used to be hnsw.ef_search, but that conflicted with pgvector's) that can be set at runtime to update the expansion factor during search. Currently, if this variable is not set, it will use the ef factor specified in the index relation option. However, as soon as this variable is set, it takes precedence over this factor and is used in all searches.

Relies on this PR for usearch which creates a new function usearch_search_custom_ef that takes a custom expansion factor.

@Ngalstyan4
Copy link
Contributor

When you add tests for this, make sure you add tests in for pgvector compatibility test file as well.

…h_ef to avoid conflicting with pgvector's custom variable name, which was breaking
Copy link
Contributor

@Ngalstyan4 Ngalstyan4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! could you add tests?

"Expansion factor to use during vector search in a scan",
"Valid values are in range [1, 400]",
&ldb_hnsw_ef_search,
// HNSW_DEFAULT_EF,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this supposed to be commented out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will remove, doesn't impact functionality at all

src/hnsw/scan.c Outdated
@@ -193,8 +195,14 @@ bool ldb_amgettuple(IndexScanDesc scan, ScanDirection dir)
}

ldb_dlog("LANTERN querying index for %d elements", k);
num_returned = usearch_search(
scanstate->usearch_index, vec, usearch_scalar_f32_k, k, scanstate->labels, scanstate->distances, &error);
num_returned = usearch_search_custom_ef(scanstate->usearch_index,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be better to just add ef as optional parameter to usearch_search and if ef != 0, use the passed ef, otherwise use the ef from usearch header

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would do this but C doesn't support optional parameters in functions. We can make usearch_search a variadic function but that seems like overkill.

The "ef != 0 then use passed ef, otherwise use ef from usearch header" logic you mention is already added in the usearch library. So passing the ef from here, regardless of what it is, will lead to this desired functionality

Maybe just rename usearch_search_custom_ef to usearch_search? Or just to make it more readable, check in this code if ef != 0 then use usearch_search_custom_ef and if not, use the original usearch_search?

@Ngalstyan4
Copy link
Contributor

Notes:

  • call this ef_search and not search_ef. For pgvector compatibility, it can be lantern_hnsw.ef_search. See how index name compatibility works
  • Change the C api of usearch_search and make it take an additional ef parameter with a documented invalid value (if the invalid value is passed, usearch uses the ef valued passed during index creation)
  • Add tests mainly to trigger relevant codepaths. I suspect if you choose a very small value, search quality will decrease and that will be another way to make sure the value was changed in regression tests. But even if search results do not change, it is fine. We will at least get code coverage for relevant code paths

@therealdarkknight
Copy link
Contributor Author

* [x]  call this `ef_search` and not `search_ef`. For pgvector compatibility, it can be `lantern_hnsw.ef_search`. See how index name compatibility works

I renamed it. Note that the other CustomIntVariable we have, hnsw.init_k does not follow this naming convention. We can rename it to something like lantern_hnsw.init_k but it is mentioned quite a few times in the codebase so I refrained from doing that now. Let me know if I should.

* [x]  Change the C api of `usearch_search` and make it take an additional `ef` parameter with a documented invalid value (if the invalid value is passed, usearch uses the `ef` valued passed during index creation)

I implemented this as well. This was actually how it was working already but I refactored the usearch changes so that this logic is clearly reflected in the C api of usearch_search. Also, there is a #define'd invalid value there like you mentioned

* [x]  Add tests mainly to trigger relevant codepaths. I suspect if you choose a very small value, search quality will decrease and that will be another way to make sure the value was changed in regression tests. But even if search results do not change, it is fine. We will at least get code coverage for relevant code paths

I added a test hnsw_ef_search. I noticed one of the test queries from the hnsw_index_from_file test was different with small values of ef and so I expanded upon this to make this new test which tests a range of ef values with this one query. Let me know if there should be anything else we're testing.

@Ngalstyan4
Copy link
Contributor

Looks great! just merged your usearch pr (Ngalstyan4/usearch#19). Will merge this as soon as you rebase the submodule onto that one

@Ngalstyan4 Ngalstyan4 merged commit be86766 into lanterndata:main Oct 25, 2023
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow changing ef at runtime
2 participants