Better Guidance on Determining m/efconstruction/efsearch values. #47

dannyseismic · 2023-08-28T21:22:10Z

It would be VERY helpful if there were some rules of thumb for how to right-size these parameters. For example, pgvector gives the following guidance Choose an appropriate number of lists - a good place to start is rows / 1000 for up to 1M rows and sqrt(rows) for over 1M rows. Could similar guidance be created for pg_embedding?

There is some guidance about "large values" and what they do, but I don't know what a higher values mean. Should it be 10x higher or just 1 higher?

The text was updated successfully, but these errors were encountered:

raoufchebri · 2023-09-11T08:28:35Z

Thanks, @dannyseismic for the suggestion. We are currently running more tests to determine the ideal parameters.
Here is a good place to start:
For millions of vectors, m should be between 48 and 64
Start with efConstruction = 2 x m
efSeach >= k, k being the number of nearest neighbors.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better Guidance on Determining m/efconstruction/efsearch values. #47

Better Guidance on Determining m/efconstruction/efsearch values. #47

dannyseismic commented Aug 28, 2023

raoufchebri commented Sep 11, 2023

Better Guidance on Determining m/efconstruction/efsearch values. #47

Better Guidance on Determining m/efconstruction/efsearch values. #47

Comments

dannyseismic commented Aug 28, 2023

raoufchebri commented Sep 11, 2023