Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature](orc-reader) Implement new merge io facility for orc reader. #45966

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

kaka11chen
Copy link
Contributor

@kaka11chen kaka11chen commented Dec 25, 2024

What problem does this PR solve?

Problem Summary:

The original merge io mechanism MergeRangeFileReader requires that the range must be read in order, and the ranges can be out of order, so the range cannot be read back.
And if you turn on delayed materialization of orc complex types, you will need to present a stream readback scenario, such as select struct_element(info, 'age'), id comes from test_orc_struct, where struct_element(info, 'name') = 'Alice'.
When late materialization is turned on, the current stream of the parent node info will be read first after name is read. When reading age, the parent node info needs to be read back. So the late materialization of the orc complex type cannot be turned on at present.

Release note

The new merge io mechanism classifies the ranges read by the stream of orc stripe into small ranges and large ranges according to the orc_once_max_read_bytes size. The ranges smaller than the orc_once_max_read_bytes size are divided into small ranges, and the ranges exceeding the orc_once_max_read_bytes size are divided into large ranges.
Finally, the merging of adjacent intervals for small ranges is established. The maximum merging length is orc_once_max_read_bytes, and the maximum merging distance allowed between intervals is orc_max_merge_distance_bytes. The merged range is established through a cache of the merged range to a reader in memory, and a corresponding inputstream is builded for the lower layer orc reader to read. Large ranges are read directly through the underlying file reader. The current implementation is able to read arbitrarily in the merged range.

Future Work

Currently, implementations like OrcMergeRangeFileReader and RangeCacheFileReader must finally use memcpy from the cache to the result slice due to the limitations of the FileReader interface. But in theory, it is possible not to do memcpy, but to directly point to the cache location to represent the slice. This can be reconstructed and optimized in the future.

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Contributor

Thearas commented Dec 25, 2024

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@kaka11chen
Copy link
Contributor Author

run buildall

@kaka11chen kaka11chen force-pushed the new_merge_io_for_orc_reader branch from 95af48c to 772ffb6 Compare December 25, 2024 15:54
@kaka11chen
Copy link
Contributor Author

run buildall

@kaka11chen kaka11chen force-pushed the new_merge_io_for_orc_reader branch from 772ffb6 to 7df1d9d Compare December 25, 2024 17:34
@kaka11chen
Copy link
Contributor Author

run buildall

@kaka11chen kaka11chen force-pushed the new_merge_io_for_orc_reader branch from 7df1d9d to 2fecd9c Compare December 25, 2024 18:12
@kaka11chen
Copy link
Contributor Author

run buildall

@kaka11chen kaka11chen force-pushed the new_merge_io_for_orc_reader branch from 2fecd9c to ee35b47 Compare December 26, 2024 01:21
@kaka11chen
Copy link
Contributor Author

run buildall

@kaka11chen kaka11chen force-pushed the new_merge_io_for_orc_reader branch from ee35b47 to 5b1e090 Compare December 27, 2024 08:55
@kaka11chen
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 32432 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 5b1e0902555b2fac4d90160d04ecd0671bd2b6ad, data reload: false

------ Round 1 ----------------------------------
q1	17658	6117	6041	6041
q2	2055	308	162	162
q3	10498	1276	706	706
q4	10240	876	427	427
q5	7920	2244	1976	1976
q6	202	183	150	150
q7	912	730	620	620
q8	9252	1373	1172	1172
q9	5293	4896	4992	4896
q10	6747	2339	1865	1865
q11	473	277	241	241
q12	347	359	219	219
q13	17788	3658	2956	2956
q14	226	238	226	226
q15	563	491	505	491
q16	633	632	591	591
q17	575	860	322	322
q18	7258	6484	6358	6358
q19	2198	969	574	574
q20	301	311	182	182
q21	2836	2154	1946	1946
q22	365	325	311	311
Total cold run time: 104340 ms
Total hot run time: 32432 ms

----- Round 2, with runtime_filter_mode=off -----
q1	6319	6206	6230	6206
q2	240	328	236	236
q3	2230	2650	2325	2325
q4	1421	1854	1343	1343
q5	4370	4781	4960	4781
q6	182	175	141	141
q7	2133	2008	1788	1788
q8	2624	2803	2674	2674
q9	7485	7339	7238	7238
q10	3046	3384	2812	2812
q11	580	539	494	494
q12	671	753	616	616
q13	3332	3762	3070	3070
q14	288	325	291	291
q15	571	514	507	507
q16	662	694	666	666
q17	1210	1712	1245	1245
q18	7730	7350	6953	6953
q19	787	997	1099	997
q20	1979	2032	1789	1789
q21	5502	5005	4771	4771
q22	594	631	546	546
Total cold run time: 53956 ms
Total hot run time: 51489 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 190964 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 5b1e0902555b2fac4d90160d04ecd0671bd2b6ad, data reload: false

query1	998	416	383	383
query2	6513	2475	2468	2468
query3	6712	228	219	219
query4	33586	23619	23448	23448
query5	4326	623	460	460
query6	298	210	197	197
query7	4614	493	323	323
query8	307	253	235	235
query9	9697	2751	2742	2742
query10	481	312	267	267
query11	18287	15459	15137	15137
query12	165	107	114	107
query13	1685	536	425	425
query14	11039	6770	7251	6770
query15	249	188	187	187
query16	8090	578	469	469
query17	1552	743	550	550
query18	2135	390	290	290
query19	204	188	148	148
query20	120	111	106	106
query21	203	120	101	101
query22	4329	4509	4321	4321
query23	34489	33612	33710	33612
query24	6466	2254	2210	2210
query25	502	444	384	384
query26	1188	266	155	155
query27	2037	463	342	342
query28	5353	2452	2424	2424
query29	751	550	425	425
query30	225	180	153	153
query31	980	924	798	798
query32	99	61	59	59
query33	498	354	288	288
query34	774	843	511	511
query35	815	801	745	745
query36	1018	1052	947	947
query37	121	96	78	78
query38	4126	4137	4155	4137
query39	1501	1461	1389	1389
query40	212	118	103	103
query41	52	47	51	47
query42	120	108	105	105
query43	523	534	499	499
query44	1337	815	831	815
query45	188	181	172	172
query46	895	1041	695	695
query47	1946	1955	1860	1860
query48	385	409	322	322
query49	764	464	382	382
query50	621	656	390	390
query51	7145	7100	6960	6960
query52	108	102	93	93
query53	232	258	182	182
query54	473	493	399	399
query55	81	78	82	78
query56	259	252	236	236
query57	1225	1186	1139	1139
query58	228	218	226	218
query59	3164	3184	3082	3082
query60	271	271	243	243
query61	109	108	111	108
query62	867	810	744	744
query63	273	192	190	190
query64	4584	982	651	651
query65	3259	3242	3270	3242
query66	1057	408	315	315
query67	15958	15863	15541	15541
query68	8984	774	509	509
query69	468	280	245	245
query70	1226	1106	1155	1106
query71	441	282	249	249
query72	5785	3849	3825	3825
query73	658	754	361	361
query74	9898	9079	9104	9079
query75	4566	3166	2638	2638
query76	4182	1166	809	809
query77	807	372	296	296
query78	9976	10089	9612	9612
query79	3547	907	596	596
query80	720	525	438	438
query81	469	265	243	243
query82	609	153	123	123
query83	202	169	146	146
query84	282	94	77	77
query85	844	370	319	319
query86	353	331	299	299
query87	4628	4635	4653	4635
query88	4433	2216	2194	2194
query89	412	327	299	299
query90	1896	194	194	194
query91	138	139	114	114
query92	65	60	51	51
query93	995	882	524	524
query94	741	386	292	292
query95	333	271	258	258
query96	483	615	289	289
query97	2752	2824	2696	2696
query98	234	203	190	190
query99	1721	1582	1437	1437
Total cold run time: 295717 ms
Total hot run time: 190964 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.76 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 5b1e0902555b2fac4d90160d04ecd0671bd2b6ad, data reload: false

query1	0.03	0.03	0.03
query2	0.07	0.03	0.03
query3	0.24	0.07	0.07
query4	1.60	0.11	0.12
query5	0.43	0.42	0.39
query6	1.19	0.65	0.66
query7	0.02	0.01	0.01
query8	0.04	0.02	0.03
query9	0.58	0.50	0.50
query10	0.55	0.57	0.55
query11	0.15	0.09	0.10
query12	0.13	0.11	0.12
query13	0.61	0.61	0.59
query14	2.71	2.79	2.74
query15	0.89	0.82	0.84
query16	0.38	0.39	0.38
query17	1.11	1.04	1.03
query18	0.22	0.20	0.20
query19	1.89	1.78	1.99
query20	0.01	0.01	0.02
query21	15.35	0.94	0.58
query22	0.76	0.72	0.71
query23	15.31	1.40	0.57
query24	2.69	0.63	1.73
query25	0.23	0.15	0.08
query26	0.24	0.14	0.12
query27	0.05	0.04	0.05
query28	14.24	1.60	1.05
query29	12.57	3.91	3.21
query30	0.25	0.09	0.06
query31	2.83	0.61	0.38
query32	3.22	0.53	0.47
query33	3.19	3.01	3.08
query34	16.59	5.05	4.48
query35	4.45	4.47	4.46
query36	0.63	0.51	0.48
query37	0.09	0.06	0.06
query38	0.05	0.04	0.04
query39	0.03	0.02	0.02
query40	0.16	0.14	0.14
query41	0.08	0.03	0.02
query42	0.04	0.02	0.02
query43	0.03	0.02	0.03
Total cold run time: 105.93 s
Total hot run time: 30.76 s

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 38.88% (10125/26044)
Line Coverage: 29.88% (85541/286297)
Region Coverage: 29.02% (43719/150669)
Branch Coverage: 25.55% (22296/87270)
Coverage Report: http://coverage.selectdb-in.cc/coverage/5b1e0902555b2fac4d90160d04ecd0671bd2b6ad_5b1e0902555b2fac4d90160d04ecd0671bd2b6ad/report/index.html

@morningman
Copy link
Contributor

This pull request introduces a new OrcMergeRangeFileReader class and enhances the ORC file reading process with improved profiling and optimized I/O operations. The most important changes include adding new classes and methods, updating existing methods for better performance, and incorporating new profiling capabilities.

Enhancements to ORC file reading:

Updates to ORC reader implementation:

Profiling improvements:

These changes aim to optimize the ORC file reading process by merging small I/O operations, improving profiling, and handling large I/O operations more efficiently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants