PERF : modify SMJ shuffle file reader to skip validation #15948

getChan · 2025-05-05T16:39:00Z

Which issue does this PR close?

Rationale for this change

#14078 #15454 shows when read shuffle file, skipping validation is effective.

What changes are included in this PR?

when SortMergeJoinExec read shuffle file, skipping validation

Are these changes tested?

already exist cargo test passed.
Verification completed that the read shuffle file is identical to the written one
Benchmark test is not included because it was difficult to limit them to shuffle read scope.

Are there any user-facing changes?

no

alamb · 2025-05-05T20:02:54Z

Benchmark test is not included because it was difficult to limit them to shuffle read scope.

Did you test it locally? Do you have any performance numbers you can share?

I wonder if we could consolidate the spill file managing into SpillManager that was introduced by @2010YOUY01 in refactor: Use SpillManager for all spilling scenarios #15405

2010YOUY01 · 2025-05-06T03:58:37Z

Benchmark test is not included because it was difficult to limit them to shuffle read scope.

Did you test it locally? Do you have any performance numbers you can share?

I wonder if we could consolidate the spill file managing into SpillManager that was introduced by @2010YOUY01 in refactor: Use SpillManager for all spilling scenarios #15405

I believe the SMJ reader is the only remaining component that hasn't been migrated to SpillManager (the write path of SMJ has already been refactored to use SpillManager). I agree it would be great to also refactor it to use SpillManager.

getChan · 2025-05-06T07:34:23Z

SMJExec already use SpillManager at spill write

datafusion/datafusion/physical-plan/src/joins/sort_merge_join.rs

Lines 1439 to 1451 in 11bf46e

    
           Err(_) if self.runtime_env.disk_manager.tmp_files_enabled() => { 
        
               // Spill buffered batch to disk 
        
               if let Some(batch) = buffered_batch.batch { 
        
                   let spill_file = self 
        
                       .spill_manager 
        
                       .spill_record_batch_and_finish( 
        
                           &[batch], 
        
                           "sort_merge_join_buffered_spill", 
        
                       )? 
        
                       .unwrap(); // Operation only return None if no batches are spilled, here we ensure that at least one batch is spilled 
        
                   buffered_batch.spill_file = Some(spill_file); 
        
                   buffered_batch.batch = None;

but at spill read, direct read spill files

datafusion/datafusion/physical-plan/src/joins/sort_merge_join.rs

Lines 2311 to 2316 in 11bf46e

    
           (Some(spill_file), None) => { 
        
               let mut buffered_cols: Vec<ArrayRef> = 
        
                   Vec::with_capacity(buffered_indices.len()); 
        
               let file = BufReader::new(File::open(spill_file.path())?); 
        
               let reader = unsafe {StreamReader::try_new(file, None)?.with_skip_validation(true)};

I will check if using SpillManager is possible. (another issue)

getChan · 2025-05-06T08:14:34Z

I just add benchmarks for SMJExec spill read execution.
maybe since it is not limited to the read spill scope, the diff is small.

sort_merge_join_spill/SortMergeJoinExec_spill
                        time:   [321.88 µs 325.75 µs 330.51 µs]
                        change: [-2.8665% -1.3757% +0.1785%] (p = 0.09 > 0.05)
                        No change in performance detected.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

getChan · 2025-05-06T08:19:10Z

datafusion/physical-plan/src/lib.rs

@@ -92,5 +92,4 @@ pub mod udaf {
 }

 pub mod coalesce;
-#[cfg(test)]


To use test utilities in the bench as well. Is it okay?

I think you could avoid doing this by using the actual operators -- see comment above

alamb · 2025-05-06T10:53:34Z

datafusion/physical-plan/benches/sort_merge_join.rs

+
+fn create_test_data() -> SortMergeJoinExec {
+    let left_batch = build_table_i32(
+        ("a1", &vec![0, 1, 2, 3, 4, 5]),


I think a bencmark that has only 5 rows is likely going to only measure the overhead of plan setup rather than the actual performance of a large join that needs to spill

Perhaps we can increase this size to 1M rows or something (is important that b1 and b2 remain sorted)

thanks for review!
now I run benchmark with 1_048_576 rows. with all row spill.
but benchmark result is little performance improvement...

SortMergeJoinExec_spill time: [79.761 s 79.858 s 79.974 s] change: [-0.7912% -0.5805% -0.3386%] (p = 0.00 < 0.05) Change within noise threshold. Found 1 outliers among 10 measurements (10.00%) 1 (10.00%) high mild

It seems that skip validation is small impact on the overall execution.

alamb · 2025-05-06T10:54:45Z

datafusion/physical-plan/src/lib.rs

@@ -92,5 +92,4 @@ pub mod udaf {
 }

 pub mod coalesce;
-#[cfg(test)]


I think you could avoid doing this by using the actual operators -- see comment above

alamb · 2025-05-06T10:58:09Z

datafusion/physical-plan/benches/sort_merge_join.rs

+use datafusion_physical_expr::expressions::Column;
+use datafusion_physical_plan::common::collect;
+use datafusion_physical_plan::joins::SortMergeJoinExec;
+use datafusion_physical_plan::test::{build_table_i32, TestMemoryExec};


I worry that using test only structures like this will means the benchmark is not measuring performance that will map directly to query performance. I think you could move the file from

datafusion/physical-plan/benches/sort_merge_join.rs

to

datafusion/core/benches/sort_merge_join.rs

And use a SessionContext and actual query to run to be closer.

Here is an example that does something similar: https://github.com/apache/datafusion/blob/main/datafusion/core/benches/filter_query_sql.rs

I moved file to /core/benches/

try to using actual query by SessionContext, but it is hard to simulate.
Because It satisfied SMJExec+spill execution, but securing enough memory for operations like RepartitionExec was challenging.

github-actions · 2025-07-08T02:12:55Z

Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days.

getChan force-pushed the smj-skip-validation branch 3 times, most recently from 9d16d52 to 1bc8ef3 Compare May 5, 2025 17:13

PERF : modify SMJ shuffle file reader to skip validation

11bf46e

getChan force-pushed the smj-skip-validation branch from 1bc8ef3 to 11bf46e Compare May 5, 2025 17:34

Merge branch 'main' into smj-skip-validation

c869542

Merge branch 'main' into smj-skip-validation

38255bd

add bench

4c19675

cargo fmt

ee2a6d6

getChan commented May 6, 2025

View reviewed changes

cargo fmt

a5bda29

alamb reviewed May 6, 2025

View reviewed changes

move bench

b17f5fd

github-actions bot added the core Core DataFusion crate label May 7, 2025

increase sample data size

28839e3

getChan force-pushed the smj-skip-validation branch from f0779c9 to 28839e3 Compare May 7, 2025 15:47

getChan added 2 commits May 8, 2025 01:00

clippy

dab92e1

fmt

0423d19

github-actions bot added the Stale PR has not had any activity for some time label Jul 8, 2025

Merge branch 'main' into smj-skip-validation

dccea7f

github-actions bot added the physical-plan Changes to the physical-plan crate label Jul 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PERF : modify SMJ shuffle file reader to skip validation #15948

PERF : modify SMJ shuffle file reader to skip validation #15948

Uh oh!

getChan commented May 5, 2025 •

edited

Loading

Uh oh!

alamb commented May 5, 2025

Uh oh!

2010YOUY01 commented May 6, 2025 •

edited

Loading

Uh oh!

getChan commented May 6, 2025

Uh oh!

getChan commented May 6, 2025

Uh oh!

getChan May 6, 2025

Uh oh!

alamb May 6, 2025

Uh oh!

alamb May 6, 2025

Uh oh!

getChan May 7, 2025 •

edited

Loading

Uh oh!

alamb May 6, 2025

Uh oh!

alamb May 6, 2025

Uh oh!

getChan May 7, 2025

Uh oh!

github-actions bot commented Jul 8, 2025

Uh oh!

Uh oh!

PERF : modify SMJ shuffle file reader to skip validation #15948

Are you sure you want to change the base?

PERF : modify SMJ shuffle file reader to skip validation #15948

Uh oh!

Conversation

getChan commented May 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

alamb commented May 5, 2025

Uh oh!

2010YOUY01 commented May 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

getChan commented May 6, 2025

Uh oh!

getChan commented May 6, 2025

Uh oh!

getChan May 6, 2025

Choose a reason for hiding this comment

Uh oh!

alamb May 6, 2025

Choose a reason for hiding this comment

Uh oh!

alamb May 6, 2025

Choose a reason for hiding this comment

Uh oh!

getChan May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb May 6, 2025

Choose a reason for hiding this comment

Uh oh!

alamb May 6, 2025

Choose a reason for hiding this comment

Uh oh!

getChan May 7, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jul 8, 2025

Uh oh!

Uh oh!

getChan commented May 5, 2025 •

edited

Loading

2010YOUY01 commented May 6, 2025 •

edited

Loading

getChan May 7, 2025 •

edited

Loading