Will there be a difference in search speed between the daily and monthly partitions? #4840
-
Will there be a difference in search speed between the daily and monthly partitions? I think if i do a daily partition, there is data that has not reached index_file_size at the end of the partition every day, and an index for that data is not created, so it is expected that one segment without an index will be created every day. If you plan to keep data for 90 days, there will be 90 segments without indexes, and 3 for monthly partitions. It's a situation where data is constantly growing. 1.5M vectors will be added per day, dimension 1536 Will the daily and monthly partitions have a lot of performance impact? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
1.5M vectors, dimension=1536 Assume all index are created successfully.
For example: We say two vectors distance calculation is a compute unit. For IVF index, firstly the search engine calculate distance between target vector and each central vectors, secondly get the top 100(nprobe) buckets, calculate distance between target vector and each vector in the buckets. So, for each segment, the compute workload is nlist+nprobe*bucket_vector_count Now let estimate the compute workload for daily/monthly partitions: You plan to keep 90 daily partitions and 3 monthly partitions. Now we know the (2) compute workload is less than (1), so we say (2) is better than (1). |
Beta Was this translation helpful? Give feedback.
1.5M vectors, dimension=1536
For each daily partition, data size = 154641.5M=9.3G.
For each monthly partition, data size = 30*9.3G=279G.
Assume all index are created successfully.
Assume cache size is larger than monthly partition's index file size.
Assume your CPU cores is enough to do parallel computing.
The query performance will mainly depends on these factors:
For example:
index_file_size = 4GB, each daily partition has 3 segments, each monthly partition has 70 segments. Each segment contains 4GB/1546/4=650000 vectors.
nlist=10000, each segment has 10000 central vector…