Split load partition and search partition into two different steps and parallelize based on different configs #3459

vjc578db · 2025-02-18T01:32:17Z

No description provided.

github-actions · 2025-02-18T01:32:34Z

ACTION NEEDED
Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

codecov-commenter · 2025-02-18T19:13:56Z

Codecov Report

Attention: Patch coverage is 69.90291% with 31 lines in your changes missing coverage. Please review.

Project coverage is 78.84%. Comparing base (4a0fb90) to head (7585a3a).
Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
rust/lance/src/index/vector/ivf.rs	77.77%	3 Missing and 3 partials ⚠️
rust/lance/src/index/vector/ivf/v2.rs	77.77%	4 Missing and 2 partials ⚠️
rust/lance/src/io/exec/knn.rs	37.50%	1 Missing and 4 partials ⚠️
rust/lance-index/src/vector/hnsw/index.rs	0.00%	3 Missing ⚠️
rust/lance/src/index/vector/fixture_test.rs	0.00%	3 Missing ⚠️
rust/lance/src/index/vector/pq.rs	0.00%	3 Missing ⚠️
rust/lance/src/session/index_extension.rs	0.00%	3 Missing ⚠️
rust/lance-index/src/vector.rs	93.10%	0 Missing and 2 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3459      +/-   ##
==========================================
+ Coverage   78.80%   78.84%   +0.03%     
==========================================
  Files         251      251              
  Lines       92834    92887      +53     
  Branches    92834    92887      +53     
==========================================
+ Hits        73156    73234      +78     
+ Misses      16702    16677      -25     
  Partials     2976     2976

Flag	Coverage Δ
unittests	`78.84% <69.90%> (+0.03%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

BubbleCal

this makes sense to me, just need to do some benchmark to see the search performance

BubbleCal · 2025-02-19T05:42:47Z

rust/lance-index/src/vector.rs

@@ -215,3 +226,53 @@ pub trait VectorIndex: Send + Sync + std::fmt::Debug + Index {
    /// the index type of this vector index.
    fn sub_index_type(&self) -> (SubIndexType, QuantizationType);
 }
+
+#[async_trait]
+pub trait ParallelSearchInPartitionFunctions : Send + Sync{


Probably name it to PartitionSearcher?

BubbleCal · 2025-02-19T05:44:32Z

rust/lance-index/src/vector.rs

+    fn io_parallelism(&self) -> usize;
+}
+
+pub async fn parallel_search_in_partitions<T: Send + Sync + std::fmt::Debug>(


this is nice and it looks it can reduce the cold search latency, it would be nice if you can add some benchmark for this, that can also help us figure out what the perf impact is! see rust/lance/benches for more details

yeah this is nice. we can now control dispatch of cpu intensive tasks for IVF all in one place. (on warming path)

BubbleCal · 2025-02-19T05:48:29Z

rust/lance/src/io/exec/knn.rs

                            .map_err(|e| {
                                DataFusionError::Execution(format!(
                                    "Failed to calculate KNN: {}",
                                    e
                                ))
                            })
-                            .await
+                            .await?;
+                        concat_batches(&batches[0].schema(), &batches).map_err(|e| {


do we need to concat the batches? iiuc we didn't do this before this?

chebbyChefNEQ · 2025-02-19T13:16:39Z

rust/lance-index/src/vector.rs

@@ -215,3 +226,53 @@ pub trait VectorIndex: Send + Sync + std::fmt::Debug + Index {
    /// the index type of this vector index.
    fn sub_index_type(&self) -> (SubIndexType, QuantizationType);
 }
+
+#[async_trait]


@BubbleCal is this code used in V3 at all? I have async_trait-phobia now lol. Because it makes some optimizations no longer possible

chebbyChefNEQ · 2025-02-19T13:21:08Z

rust/lance/src/index/vector/ivf/v2.rs

+        pre_filter.wait_for_ready().await?;
+        let query = self.preprocess_query(partition_id, query)?;
+
+        spawn_cpu(move || {


nit: let's move spawn_cpu into parallel_search_in_partitions. So the execution topology decision is made at a high layers.

vjc578db added 4 commits February 14, 2025 23:41

First version of splitting load/search for partitions

f07a336

Workign code with helper function

0f90706

Fix warnings

1a5a410

Fix tests

f6706e3

eddyxu requested review from BubbleCal and chebbyChefNEQ February 18, 2025 01:32

vjc578db added 2 commits February 17, 2025 17:33

Merge branch 'main' into main

1ad5425

Ensure processing is not blocked on first finishing loading

d0c067b

Remove stray comment

7585a3a

BubbleCal reviewed Feb 19, 2025

View reviewed changes

chebbyChefNEQ reviewed Feb 19, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split load partition and search partition into two different steps and parallelize based on different configs #3459

Split load partition and search partition into two different steps and parallelize based on different configs #3459

vjc578db commented Feb 18, 2025

github-actions bot commented Feb 18, 2025

codecov-commenter commented Feb 18, 2025 •

edited

Loading

BubbleCal left a comment

BubbleCal Feb 19, 2025

chebbyChefNEQ Feb 19, 2025

BubbleCal Feb 19, 2025

chebbyChefNEQ Feb 19, 2025

BubbleCal Feb 19, 2025

chebbyChefNEQ Feb 19, 2025

chebbyChefNEQ Feb 19, 2025

Split load partition and search partition into two different steps and parallelize based on different configs #3459

Are you sure you want to change the base?

Split load partition and search partition into two different steps and parallelize based on different configs #3459

Conversation

vjc578db commented Feb 18, 2025

github-actions bot commented Feb 18, 2025

codecov-commenter commented Feb 18, 2025 • edited Loading

Codecov Report

BubbleCal left a comment

Choose a reason for hiding this comment

BubbleCal Feb 19, 2025

Choose a reason for hiding this comment

chebbyChefNEQ Feb 19, 2025

Choose a reason for hiding this comment

BubbleCal Feb 19, 2025

Choose a reason for hiding this comment

chebbyChefNEQ Feb 19, 2025

Choose a reason for hiding this comment

BubbleCal Feb 19, 2025

Choose a reason for hiding this comment

chebbyChefNEQ Feb 19, 2025

Choose a reason for hiding this comment

chebbyChefNEQ Feb 19, 2025

Choose a reason for hiding this comment

codecov-commenter commented Feb 18, 2025 •

edited

Loading