Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](hudi) support reading hudi read optimized table with orc format #44995

Merged
merged 4 commits into from
Dec 4, 2024

Conversation

suxiaogang223
Copy link
Contributor

@suxiaogang223 suxiaogang223 commented Dec 4, 2024

What problem does this PR solve?

Problem Summary:
When reading the hudi ro table, it will be pushed back from jni to the native reader. However, this process will default the file format to parquet, and does not consider the situation that the hudi table is stored in orc format.

Release note

  1. support reading hudi read optimized table with orc format
  2. fix explain results of hudiScanNode when force_jni_reader=true
  3. add cases about timestamp with different timezones

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@suxiaogang223
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 39948 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 57371c8b073d4db01a83da0a7672eb4db953c3f6, data reload: false

------ Round 1 ----------------------------------
q1	17562	7387	7342	7342
q2	2049	197	189	189
q3	10596	1077	1213	1077
q4	10551	772	745	745
q5	7588	2718	2708	2708
q6	241	149	144	144
q7	994	635	591	591
q8	9247	1845	1929	1845
q9	6717	6530	6469	6469
q10	7037	2279	2293	2279
q11	464	274	264	264
q12	412	217	217	217
q13	17797	2997	3006	2997
q14	252	207	215	207
q15	570	526	505	505
q16	672	583	583	583
q17	981	541	515	515
q18	7120	6628	6639	6628
q19	1345	1056	1064	1056
q20	461	197	180	180
q21	3989	3137	3087	3087
q22	380	321	320	320
Total cold run time: 107025 ms
Total hot run time: 39948 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7254	7226	7188	7188
q2	325	227	226	226
q3	2854	2731	2929	2731
q4	2037	1799	1814	1799
q5	5689	5664	5676	5664
q6	222	145	143	143
q7	2282	1826	1755	1755
q8	3379	3594	3485	3485
q9	8859	8994	8891	8891
q10	3592	3544	3570	3544
q11	608	525	504	504
q12	839	609	608	608
q13	12339	3250	3254	3250
q14	318	287	269	269
q15	570	555	508	508
q16	700	622	636	622
q17	1833	1644	1633	1633
q18	8422	7813	7727	7727
q19	1700	1486	1573	1486
q20	2136	1855	1861	1855
q21	5608	5430	5444	5430
q22	638	561	591	561
Total cold run time: 72204 ms
Total hot run time: 59879 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 196720 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 57371c8b073d4db01a83da0a7672eb4db953c3f6, data reload: false

query1	1536	962	935	935
query2	6215	2075	1988	1988
query3	10960	4397	4459	4397
query4	66906	28859	23635	23635
query5	4968	460	434	434
query6	407	186	184	184
query7	5687	303	304	303
query8	321	235	239	235
query9	9582	2702	2682	2682
query10	485	254	264	254
query11	17774	15307	15813	15307
query12	155	104	105	104
query13	1607	424	426	424
query14	10649	7654	6744	6744
query15	208	182	189	182
query16	7061	500	461	461
query17	1254	580	571	571
query18	1836	305	314	305
query19	219	153	147	147
query20	117	139	114	114
query21	208	100	105	100
query22	4702	4491	4534	4491
query23	34948	34341	34490	34341
query24	5516	2571	2429	2429
query25	509	388	392	388
query26	638	154	149	149
query27	2057	280	284	280
query28	4597	2494	2451	2451
query29	681	413	405	405
query30	208	150	148	148
query31	1026	853	868	853
query32	75	56	53	53
query33	447	285	290	285
query34	946	512	537	512
query35	912	766	748	748
query36	1107	990	997	990
query37	123	71	71	71
query38	4457	4386	4497	4386
query39	1504	1463	1457	1457
query40	196	98	97	97
query41	43	48	42	42
query42	107	98	96	96
query43	526	483	481	481
query44	1198	816	847	816
query45	184	180	190	180
query46	1157	710	726	710
query47	2046	1934	1945	1934
query48	408	332	312	312
query49	733	407	409	407
query50	855	393	399	393
query51	7365	7101	7140	7101
query52	102	86	86	86
query53	257	176	177	176
query54	507	394	383	383
query55	76	71	74	71
query56	247	231	231	231
query57	1273	1124	1150	1124
query58	215	201	209	201
query59	3123	2961	2982	2961
query60	269	242	244	242
query61	114	114	108	108
query62	802	662	663	662
query63	214	175	185	175
query64	1391	714	620	620
query65	3305	3217	3219	3217
query66	698	294	304	294
query67	16146	15592	15767	15592
query68	4272	570	546	546
query69	420	252	258	252
query70	1141	1060	1081	1060
query71	348	251	247	247
query72	6450	4151	4143	4143
query73	765	365	359	359
query74	10202	9110	9012	9012
query75	3422	2653	2648	2648
query76	1904	991	1116	991
query77	477	279	265	265
query78	10362	9443	9601	9443
query79	1756	602	606	602
query80	1374	426	453	426
query81	518	229	234	229
query82	1186	119	120	119
query83	211	141	142	141
query84	281	69	69	69
query85	997	299	310	299
query86	415	297	299	297
query87	4875	4636	4627	4627
query88	3417	2209	2170	2170
query89	414	298	293	293
query90	1957	185	183	183
query91	134	101	104	101
query92	67	49	49	49
query93	1983	536	526	526
query94	849	275	292	275
query95	354	243	260	243
query96	628	278	284	278
query97	2821	2660	2659	2659
query98	223	198	190	190
query99	1600	1323	1326	1323
Total cold run time: 321988 ms
Total hot run time: 196720 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 32.86 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 57371c8b073d4db01a83da0a7672eb4db953c3f6, data reload: false

query1	0.03	0.03	0.02
query2	0.07	0.03	0.04
query3	0.23	0.07	0.07
query4	1.62	0.10	0.11
query5	0.42	0.43	0.41
query6	1.16	0.66	0.66
query7	0.02	0.01	0.02
query8	0.04	0.03	0.03
query9	0.58	0.52	0.52
query10	0.57	0.56	0.57
query11	0.14	0.11	0.10
query12	0.14	0.12	0.13
query13	0.60	0.60	0.60
query14	2.85	2.73	2.70
query15	0.91	0.83	0.82
query16	0.37	0.39	0.39
query17	1.09	0.98	1.06
query18	0.23	0.21	0.21
query19	1.95	1.85	1.97
query20	0.01	0.01	0.02
query21	15.36	0.60	0.58
query22	2.49	1.93	2.69
query23	17.00	0.92	0.90
query24	3.24	1.95	1.11
query25	0.32	0.25	0.05
query26	0.44	0.14	0.14
query27	0.05	0.05	0.04
query28	9.72	1.11	1.08
query29	12.56	3.22	3.20
query30	0.25	0.08	0.07
query31	2.82	0.38	0.38
query32	3.27	0.46	0.45
query33	3.02	3.01	3.05
query34	16.78	4.44	4.48
query35	4.53	4.48	4.53
query36	0.67	0.50	0.52
query37	0.09	0.06	0.06
query38	0.04	0.03	0.04
query39	0.03	0.02	0.03
query40	0.16	0.13	0.14
query41	0.08	0.03	0.02
query42	0.03	0.02	0.02
query43	0.04	0.03	0.04
Total cold run time: 106.02 s
Total hot run time: 32.86 s

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Dec 4, 2024
Copy link
Contributor

github-actions bot commented Dec 4, 2024

PR approved by at least one committer and no changes requested.

Copy link
Contributor

github-actions bot commented Dec 4, 2024

PR approved by anyone and no changes requested.

Copy link
Contributor

@kaka11chen kaka11chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@morningman morningman merged commit f3c415d into apache:master Dec 4, 2024
27 of 29 checks passed
@suxiaogang223 suxiaogang223 deleted the hudi_timezone branch December 5, 2024 11:25
suxiaogang223 added a commit to suxiaogang223/doris that referenced this pull request Dec 5, 2024
…apache#44995)

### What problem does this PR solve?
Problem Summary:
When reading the hudi ro table, it will be pushed back from jni to the
native reader. However, this process will default the file format to
parquet, and does not consider the situation that the hudi table is
stored in orc format.

1. support reading hudi read optimized table with orc format
2. fix explain results of hudiScanNode when force_jni_reader=true
3. add cases about  timestamp with different timezones
suxiaogang223 added a commit to suxiaogang223/doris that referenced this pull request Dec 5, 2024
…apache#44995)

### What problem does this PR solve?
Problem Summary:
When reading the hudi ro table, it will be pushed back from jni to the
native reader. However, this process will default the file format to
parquet, and does not consider the situation that the hudi table is
stored in orc format.

1. support reading hudi read optimized table with orc format
2. fix explain results of hudiScanNode when force_jni_reader=true
3. add cases about  timestamp with different timezones
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.1.8-merged dev/3.0.4-merged reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants