Skip to content
This repository has been archived by the owner on Nov 21, 2024. It is now read-only.

Better Guidance on Determining m/efconstruction/efsearch values. #47

Open
dannyseismic opened this issue Aug 28, 2023 · 1 comment
Open

Comments

@dannyseismic
Copy link

It would be VERY helpful if there were some rules of thumb for how to right-size these parameters. For example, pgvector gives the following guidance Choose an appropriate number of lists - a good place to start is rows / 1000 for up to 1M rows and sqrt(rows) for over 1M rows. Could similar guidance be created for pg_embedding?

There is some guidance about "large values" and what they do, but I don't know what a higher values mean. Should it be 10x higher or just 1 higher?

@raoufchebri
Copy link
Contributor

Thanks, @dannyseismic for the suggestion. We are currently running more tests to determine the ideal parameters.
Here is a good place to start:
For millions of vectors, m should be between 48 and 64
Start with efConstruction = 2 x m
efSeach >= k, k being the number of nearest neighbors.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants