feat: Cost estimation #5

VoVAllen · 2024-09-29T02:15:34Z

Ref: https://github.com/pgvector/pgvector/blob/158d9340bca5796d8f98182e5a65356bac676b74/src/ivfflat.c#L64

cutecutecat · 2024-12-09T13:47:39Z

We show the cost estimation and actual time cost by EXPLAIN ANALYZE at dataset openai-500k and its subset.
dim=1536

size(list=4096 prob=200)	seq cost estimate	seq actual time	index actual time
500, 000	11307.36	2290.954	6.594
100, 000	3508.43	567.231	5.996
50, 000	1759.54	310.085	5.476
1, 000	~53	3.765	~2.1
10 (list=4 probe=2)	~53	0.163	~2.1

As we can see, at size 10 seq scan is better than index scan, but at size 1000, index scan can overtake it. However, the cost estimation of seq scan is always the same: about 53 at this small data situation. So it is impossible to find an expression of index cost estimator which should be <53 at size 1000 and >53 at size 10.

The most acceptable way is to take both size 1000 and 10 as an index scan, which might sacrifice some QPS at size range from 1 to 1000. But to be honest, any formula meet this condition is sured to be equivalence to the current implementation.

As a conclusion, as the vchordrq index is much more efficient than ivf, we might not need a cost estimation like pgvector.

VoVAllen · 2024-12-09T14:44:47Z

What's the first column "size"?

cutecutecat · 2024-12-10T01:29:23Z

What's the first column "size"?

Number of vector rows for inserting to the database. We select the first 10, 1000, 50000 and 100000 rows from the whole 500000 openai database as a subset, to validate the cost estimator.

VoVAllen added this to the v0.1.0 milestone Sep 29, 2024

VoVAllen assigned cutecutecat and unassigned cutecutecat Nov 29, 2024

cutecutecat self-assigned this Dec 9, 2024

cutecutecat closed this as completed Dec 9, 2024

VoVAllen reopened this Dec 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Cost estimation #5

feat: Cost estimation #5

VoVAllen commented Sep 29, 2024

cutecutecat commented Dec 9, 2024 •

edited

Loading

VoVAllen commented Dec 9, 2024

cutecutecat commented Dec 10, 2024 •

edited

Loading

feat: Cost estimation #5

feat: Cost estimation #5

Comments

VoVAllen commented Sep 29, 2024

cutecutecat commented Dec 9, 2024 • edited Loading

VoVAllen commented Dec 9, 2024

cutecutecat commented Dec 10, 2024 • edited Loading

cutecutecat commented Dec 9, 2024 •

edited

Loading

cutecutecat commented Dec 10, 2024 •

edited

Loading