Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement][FlatJson] Improve flat json performace and extract strategy #50696

Merged
merged 8 commits into from
Sep 20, 2024

Conversation

Seaven
Copy link
Contributor

@Seaven Seaven commented Sep 4, 2024

Why I'm doing:

Performance:

  • Forbidden push down json/map expression to storage, it's bad performance in most scenarios
  • Improve read remain data performance
    • add bloom filter with json subfield keys, will check path when read remain

FlatJson Extract Strategy:

  • Improve flat json extract strategy:
    • Before: only extract leaf node
    • Now: try to extract Non-leaf node when leaf node don't meet required, will check Non-leaf node too
    • rewirte _finalize method by bottom-up dfs: for support check non-leaf node and extract it
  • support extract flat json when json is subfiled in array/struct/map

Porfile Enhancement:

  • support flat_json_meta on primary key table
  • support flat_json_meta on array/struct/map column
  • merge all flat json subfield when query whole json

BugFixs:

  • Fix json path with . bug: will effect extract subfield name, don't extract when name contains .
  • Fix flat json compaction bug: will lose some json subfield when some data with remain and some data without remain
  • Fix flat json porfile lose in share-data mode
  • Fix flat json crash use chunk accumulator

What I'm doing:

Fixes https://github.com/StarRocks/StarRocksTest/issues/8533 https://github.com/StarRocks/StarRocksTest/issues/8534 https://github.com/StarRocks/StarRocksTest/issues/8536 https://github.com/StarRocks/StarRocksTest/issues/8568

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
  • This is a backport pr

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 3.3
    • 3.2
    • 3.1
    • 3.0
    • 2.5

@Seaven Seaven requested review from a team as code owners September 4, 2024 13:05
@wanpengfei-git wanpengfei-git requested a review from a team September 4, 2024 13:05
@mergify mergify bot assigned Seaven Sep 4, 2024
@github-actions github-actions bot added the 3.3 label Sep 4, 2024
@Seaven Seaven requested review from a team as code owners September 6, 2024 13:11
Signed-off-by: Seaven <[email protected]>
Copy link

sonarcloud bot commented Sep 18, 2024

Copy link

[BE Incremental Coverage Report]

fail : 229 / 301 (76.08%)

file detail

path covered_line new_line coverage not_covered_line_detail
🔵 be/src/types/logical_type.h 0 4 00.00% [165, 166, 172, 173]
🔵 be/src/storage/meta_reader.cpp 0 28 00.00% [241, 242, 243, 244, 245, 247, 249, 250, 251, 252, 254, 255, 256, 257, 259, 260, 261, 262, 263, 265, 266, 268, 269, 277, 288, 289, 290, 291]
🔵 be/src/connector/lake_connector.cpp 0 3 00.00% [498, 499, 721]
🔵 be/src/column/json_column.cpp 0 10 00.00% [472, 473, 476, 477, 478, 481, 482, 485, 489, 491]
🔵 be/src/storage/chunk_helper.cpp 11 15 73.33% [562, 563, 568, 569]
🔵 be/src/storage/rowset/column_reader.cpp 87 106 82.08% [841, 842, 843, 876, 899, 900, 904, 905, 906, 907, 908, 909, 910, 913, 914, 915, 916, 917, 918]
🔵 be/src/util/json_flattener.cpp 112 116 96.55% [579, 580, 581, 602]
🔵 be/src/storage/rowset/json_column_compactor.cpp 2 2 100.00% []
🔵 be/src/column/column_access_path.cpp 3 3 100.00% []
🔵 be/src/storage/rowset/struct_column_writer.cpp 2 2 100.00% []
🔵 be/src/util/json_flattener.h 1 1 100.00% []
🔵 be/src/storage/rowset/json_column_writer.cpp 2 2 100.00% []
🔵 be/src/storage/rowset/json_column_iterator.cpp 1 1 100.00% []
🔵 be/src/storage/rowset/bloom_filter.h 1 1 100.00% []
🔵 be/src/storage/rowset/map_column_writer.cpp 4 4 100.00% []
🔵 be/src/exprs/json_functions.cpp 1 1 100.00% []
🔵 be/src/storage/rowset/array_column_writer.cpp 2 2 100.00% []

Copy link

[Java-Extensions Incremental Coverage Report]

pass : 0 / 0 (0%)

Copy link

[FE Incremental Coverage Report]

pass : 3 / 3 (100.00%)

file detail

path covered_line new_line coverage not_covered_line_detail
🔵 com/starrocks/catalog/FunctionSet.java 3 3 100.00% []

@Seaven Seaven merged commit 45d72ac into StarRocks:main Sep 20, 2024
64 of 65 checks passed
Copy link

@Mergifyio backport branch-3.3

@github-actions github-actions bot removed the 3.3 label Sep 20, 2024
Copy link
Contributor

mergify bot commented Sep 20, 2024

backport branch-3.3

✅ Backports have been created

mergify bot pushed a commit that referenced this pull request Sep 20, 2024
…tegy (#50696)

Signed-off-by: Seaven <[email protected]>
(cherry picked from commit 45d72ac)

# Conflicts:
#	be/src/exec/olap_scan_prepare.cpp
Seaven added a commit that referenced this pull request Sep 20, 2024
wanpengfei-git pushed a commit that referenced this pull request Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants