Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[spark] PaimonSplitScan supports column pruning and filter push down #4217

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ulysses-you
Copy link
Contributor

@ulysses-you ulysses-you commented Sep 19, 2024

Purpose

PaimonSplitScan is built for internal scan with update/delete/mergeinto. It is used to generate deletion vector, collect touched files, etc. The main usage is to select some metadata columns based on target table, e.g., row index, file path. That says, it does not need to load data columns.

This pr makes PaimonSplitScan support column pruning and filter push down to improve performance:

  1. introduce KnownSplitsTable, it is a ReadonlyTable and hold some known data splits
  2. introduce PaimonSplitScanBuilder, it is used when the table is the KnownSplitsTable and build PaimonSplitScan

For example:

update test set c1 = 9 where c2 = 'a';

before:

(1) BatchScan default.test
Output [5]: [c1#197, c2#198, c3#199, c4#200, __paimon_file_path#205]
class org.apache.paimon.spark.PaimonSplitScan

(2) Filter [codegen id : 1]
Input [5]: [c1#197, c2#198, c3#199, c4#200, __paimon_file_path#205]
Condition : (c2#198 = a)

(3) Project [codegen id : 1]
Output [1]: [__paimon_file_path#205]
Input [5]: [c1#197, c2#198, c3#199, c4#200, __paimon_file_path#205]

after:

(1) BatchScan default.test
Output [2]: [c2#137, __paimon_file_path#144]
PaimonSplitScan: [test], PushedFilters: [Equal(c2, a)]

(2) Filter [codegen id : 1]
Input [2]: [c2#137, __paimon_file_path#144]
Condition : (c2#137 = a)

(3) Project [codegen id : 1]
Output [1]: [__paimon_file_path#144]
Input [2]: [c2#137, __paimon_file_path#144]

Tests

Pass CI

API and Format

No

Documentation

@ulysses-you
Copy link
Contributor Author

cc @JingsongLi @YannByron thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant