Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

branch-3.0: [enhance](nereids) add rule MultiDistinctSplit #45209 #46540

Open
wants to merge 1 commit into
base: branch-3.0
Choose a base branch
from

Conversation

github-actions[bot]
Copy link
Contributor

@github-actions github-actions bot commented Jan 7, 2025

Cherry-picked from #45209

### What problem does this PR solve?

Problem Summary:

This pr add a rewrite rule, which can do this 2 type of rewrite:
1. This rewrite can greatly improve the execution speed of multiple
count(distinct) operations. When 3be, ndv=10000000, the performance can
be improved by three to four times.

select count(distinct a),count(distinct b),count(distinct c) from t;
->
with tmp as (select * from t) 
select * from (select count(distinct a) from tmp) t1 cross join  (select count(distinct b) from tmp) t2 cross join  (select count(distinct c) from tmp) t3


2.Before this PR, the following SQL statement would fail to execute due
to an error: "The query contains multi count distinct or sum distinct,
each can't have multi columns". This PR rewrites this type of SQL
statement as follows, making it executable without an error.

select count(distinct a,d),count(distinct b,c),count(distinct c) from t;
->
with tmp as (select * from t) 
select * from (select count(distinct a,d) from tmp) t1 cross join  (select count(distinct b,c) from tmp) t2 cross join  (select count(distinct c) from tmp) t3

### Release note

Support multi count distinct with different parameters
@Thearas
Copy link
Contributor

Thearas commented Jan 7, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@dataroaring dataroaring closed this Jan 7, 2025
@dataroaring dataroaring reopened this Jan 7, 2025
@Thearas
Copy link
Contributor

Thearas commented Jan 7, 2025

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 40965 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 384746691581c00803a0c0da540afe37f4e784b5, data reload: false

------ Round 1 ----------------------------------
q1	17598	7517	7302	7302
q2	2072	172	169	169
q3	10764	1080	1234	1080
q4	10498	761	754	754
q5	7786	2898	2880	2880
q6	240	145	144	144
q7	973	608	599	599
q8	9353	1957	2023	1957
q9	6599	6436	6444	6436
q10	7025	2337	2289	2289
q11	477	270	271	270
q12	411	207	218	207
q13	17788	3018	3023	3018
q14	241	208	215	208
q15	577	504	531	504
q16	718	627	603	603
q17	991	580	548	548
q18	7220	6760	6652	6652
q19	1387	1124	1035	1035
q20	498	203	207	203
q21	4043	3269	3105	3105
q22	1110	1014	1002	1002
Total cold run time: 108369 ms
Total hot run time: 40965 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7358	7537	7257	7257
q2	327	224	224	224
q3	2975	2952	2911	2911
q4	2092	1849	1832	1832
q5	5733	5748	5727	5727
q6	230	142	142	142
q7	2211	1814	1841	1814
q8	3413	3551	3541	3541
q9	8909	8936	8863	8863
q10	3565	3600	3589	3589
q11	642	513	516	513
q12	832	579	609	579
q13	9851	3155	3133	3133
q14	307	271	266	266
q15	565	525	507	507
q16	723	689	696	689
q17	1865	1620	1630	1620
q18	8374	7868	7601	7601
q19	1662	1433	1574	1433
q20	2131	1861	1894	1861
q21	5715	5423	5240	5240
q22	1175	1038	1064	1038
Total cold run time: 70655 ms
Total hot run time: 60380 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 197839 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 384746691581c00803a0c0da540afe37f4e784b5, data reload: false

query1	1323	918	926	918
query2	6244	2085	2106	2085
query3	10961	4228	4454	4228
query4	66507	29337	23450	23450
query5	4892	455	451	451
query6	410	172	171	171
query7	5602	319	317	317
query8	314	234	233	233
query9	8831	2671	2686	2671
query10	454	272	254	254
query11	17434	15180	15778	15180
query12	167	113	101	101
query13	1487	448	456	448
query14	10489	7377	7275	7275
query15	209	198	197	197
query16	7151	503	493	493
query17	1110	615	593	593
query18	1830	327	336	327
query19	228	164	161	161
query20	118	109	108	108
query21	213	106	104	104
query22	4879	4386	4298	4298
query23	34895	34089	34250	34089
query24	6359	2925	2974	2925
query25	552	433	428	428
query26	661	176	171	171
query27	1883	357	358	357
query28	4417	2488	2460	2460
query29	751	464	462	462
query30	244	175	176	175
query31	1017	831	839	831
query32	65	57	53	53
query33	472	285	280	280
query34	962	523	539	523
query35	845	735	744	735
query36	1092	983	991	983
query37	129	78	76	76
query38	4203	4027	4067	4027
query39	1496	1469	1472	1469
query40	203	106	102	102
query41	50	48	51	48
query42	113	106	99	99
query43	547	522	501	501
query44	1205	875	858	858
query45	188	179	168	168
query46	1201	755	740	740
query47	1981	1907	1929	1907
query48	479	389	382	382
query49	730	392	377	377
query50	876	434	426	426
query51	7335	7254	7220	7220
query52	102	87	90	87
query53	258	179	182	179
query54	555	447	453	447
query55	81	81	75	75
query56	265	260	247	247
query57	1239	1089	1112	1089
query58	214	205	205	205
query59	3274	3080	2965	2965
query60	274	258	250	250
query61	143	110	106	106
query62	801	666	675	666
query63	221	186	185	185
query64	1373	695	662	662
query65	3364	3199	3165	3165
query66	694	299	308	299
query67	15975	15662	15517	15517
query68	4186	580	558	558
query69	434	271	269	269
query70	1229	1124	1127	1124
query71	368	255	259	255
query72	6366	4172	4294	4172
query73	794	358	347	347
query74	10273	9080	8873	8873
query75	3396	2650	2659	2650
query76	1866	1153	1170	1153
query77	500	275	277	275
query78	10647	9682	9588	9588
query79	1321	588	607	588
query80	864	425	422	422
query81	506	243	240	240
query82	1278	124	117	117
query83	186	144	144	144
query84	282	78	81	78
query85	881	304	293	293
query86	337	286	294	286
query87	4410	4347	4254	4254
query88	3873	2399	2355	2355
query89	419	302	292	292
query90	2015	188	190	188
query91	188	155	151	151
query92	67	54	58	54
query93	1760	563	544	544
query94	827	260	301	260
query95	363	261	260	260
query96	666	286	281	281
query97	3326	3189	3202	3189
query98	211	220	202	202
query99	1589	1298	1306	1298
Total cold run time: 320065 ms
Total hot run time: 197839 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 33 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 384746691581c00803a0c0da540afe37f4e784b5, data reload: false

query1	0.03	0.03	0.03
query2	0.06	0.04	0.03
query3	0.23	0.07	0.07
query4	1.64	0.11	0.10
query5	0.52	0.51	0.50
query6	1.14	0.73	0.74
query7	0.02	0.01	0.01
query8	0.04	0.03	0.03
query9	0.56	0.49	0.49
query10	0.55	0.55	0.56
query11	0.14	0.11	0.10
query12	0.13	0.12	0.11
query13	0.61	0.59	0.60
query14	3.11	3.10	3.05
query15	0.90	0.84	0.83
query16	0.37	0.38	0.38
query17	1.02	0.96	0.99
query18	0.24	0.21	0.22
query19	1.98	1.95	1.94
query20	0.02	0.01	0.01
query21	15.36	0.60	0.60
query22	2.75	1.95	1.73
query23	16.97	1.01	0.98
query24	3.31	0.56	1.01
query25	0.28	0.19	0.09
query26	0.27	0.14	0.14
query27	0.05	0.04	0.04
query28	10.87	1.11	1.08
query29	12.65	3.21	3.24
query30	0.25	0.06	0.05
query31	2.89	0.39	0.39
query32	3.25	0.48	0.47
query33	2.99	3.04	3.03
query34	17.28	4.54	4.59
query35	4.51	4.57	4.58
query36	0.70	0.48	0.49
query37	0.09	0.07	0.07
query38	0.04	0.04	0.04
query39	0.03	0.03	0.02
query40	0.15	0.12	0.12
query41	0.08	0.02	0.02
query42	0.03	0.02	0.02
query43	0.04	0.04	0.03
Total cold run time: 108.15 s
Total hot run time: 33 s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants