-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature](orc-reader) Implement new merge io facility for orc reader. #45966
base: master
Are you sure you want to change the base?
Conversation
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
run buildall |
95af48c
to
772ffb6
Compare
run buildall |
772ffb6
to
7df1d9d
Compare
run buildall |
7df1d9d
to
2fecd9c
Compare
run buildall |
2fecd9c
to
ee35b47
Compare
run buildall |
ee35b47
to
5b1e090
Compare
run buildall |
TPC-H: Total hot run time: 32432 ms
|
TPC-DS: Total hot run time: 190964 ms
|
ClickBench: Total hot run time: 30.76 s
|
TeamCity be ut coverage result: |
What problem does this PR solve?
Problem Summary:
The original merge io mechanism
MergeRangeFileReader
requires that the range must be read in order, and the ranges can be out of order, so the range cannot be read back.And if you turn on delayed materialization of orc complex types, you will need to present a stream readback scenario, such as
select struct_element(info, 'age'), id comes from test_orc_struct, where struct_element(info, 'name') = 'Alice'
.When late materialization is turned on, the current stream of the parent node
info
will be read first aftername
is read. When readingage
, the parent nodeinfo
needs to be read back. So the late materialization of the orc complex type cannot be turned on at present.Release note
The new merge io mechanism classifies the ranges read by the stream of orc stripe into small ranges and large ranges according to the
orc_once_max_read_bytes
size. The ranges smaller than theorc_once_max_read_bytes
size are divided into small ranges, and the ranges exceeding theorc_once_max_read_bytes
size are divided into large ranges.Finally, the merging of adjacent intervals for small ranges is established. The maximum merging length is orc_once_max_read_bytes, and the maximum merging distance allowed between intervals is
orc_max_merge_distance_bytes
. The merged range is established through a cache of the merged range to a reader in memory, and a corresponding inputstream is builded for the lower layer orc reader to read. Large ranges are read directly through the underlying file reader. The current implementation is able to read arbitrarily in the merged range.Future Work
Currently, implementations like
OrcMergeRangeFileReader
andRangeCacheFileReader
must finally use memcpy from the cache to the result slice due to the limitations of the FileReader interface. But in theory, it is possible not to do memcpy, but to directly point to the cache location to represent the slice. This can be reconstructed and optimized in the future.Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)