Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat](nereids) add rewrite rule :EliminateGroupByKeyByUniform #43391

Merged

Conversation

feiniaofeiafei
Copy link
Contributor

@feiniaofeiafei feiniaofeiafei commented Nov 7, 2024

What problem does this PR solve?

This PR introduces two main changes:

  1. Adds an optional constant value to the uniform attribute in DataTrait. A slot with a constant value that is not null will be considered uniform and not null.
  2. Introduces a new transform rule: EliminateGroupByKeyByUniform, which utilizes the newly added part of the uniform attribute. Following is example transformation:
    +--aggregate(group by a,b output a,b,max(c))
    (a is uniform and not null: e.g. a is projection 2 as a in logicalProject)
    ->
    +--aggregate(group by b output b,any_value(a) as a,max(c))

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No colde files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.
  • Release note

    None

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@feiniaofeiafei
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

import java.util.Map;
import java.util.Set;

/**ProjectFilterTransform*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

class comment should contain what the rule want to do and how

@feiniaofeiafei
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 41389 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 8bd01425901bc1de525e568ff0c226f281889a31, data reload: false

------ Round 1 ----------------------------------
q1	17572	7461	7337	7337
q2	2060	177	184	177
q3	10660	1082	1143	1082
q4	10562	879	824	824
q5	7758	3071	3071	3071
q6	247	144	145	144
q7	1022	602	611	602
q8	9373	1971	2024	1971
q9	6583	6458	6555	6458
q10	7115	2460	2429	2429
q11	483	250	262	250
q12	404	210	212	210
q13	17791	2993	3005	2993
q14	241	216	212	212
q15	587	526	517	517
q16	673	581	600	581
q17	979	528	557	528
q18	7272	6753	6650	6650
q19	1338	1025	942	942
q20	467	182	176	176
q21	3968	3302	3246	3246
q22	1119	989	1005	989
Total cold run time: 108274 ms
Total hot run time: 41389 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7336	7257	7293	7257
q2	346	247	258	247
q3	2996	2989	2925	2925
q4	2141	1903	1841	1841
q5	5760	5829	5849	5829
q6	227	143	142	142
q7	2286	1856	1841	1841
q8	3447	3561	3556	3556
q9	8912	8918	8966	8918
q10	3612	3637	3582	3582
q11	649	510	515	510
q12	851	627	650	627
q13	10088	3218	3167	3167
q14	322	270	298	270
q15	601	579	562	562
q16	674	656	648	648
q17	1859	1667	1679	1667
q18	8188	7827	7627	7627
q19	1714	1587	1579	1579
q20	2092	1899	1901	1899
q21	5586	5514	5440	5440
q22	1154	1052	1047	1047
Total cold run time: 70841 ms
Total hot run time: 61181 ms

@feiniaofeiafei
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 41292 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit e668096307fe4a2ded45eb474b160a5d2f76e293, data reload: false

------ Round 1 ----------------------------------
q1	17613	7399	7282	7282
q2	2069	186	191	186
q3	10529	1055	1165	1055
q4	10498	905	880	880
q5	7755	3087	3006	3006
q6	239	145	149	145
q7	1028	630	617	617
q8	9380	2020	2087	2020
q9	6559	6410	6417	6410
q10	7101	2406	2413	2406
q11	459	252	253	252
q12	397	212	209	209
q13	17793	3006	3027	3006
q14	237	210	205	205
q15	577	508	506	506
q16	652	570	588	570
q17	966	544	504	504
q18	7337	6755	6698	6698
q19	1332	1066	1014	1014
q20	450	180	179	179
q21	3964	3291	3182	3182
q22	1120	1014	960	960
Total cold run time: 108055 ms
Total hot run time: 41292 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7300	7283	7206	7206
q2	351	257	249	249
q3	2974	2966	2995	2966
q4	2067	1948	1811	1811
q5	5756	5801	5840	5801
q6	232	137	139	137
q7	2300	1846	1850	1846
q8	3418	3524	3531	3524
q9	8936	8919	8888	8888
q10	3611	3561	3559	3559
q11	614	500	498	498
q12	848	651	627	627
q13	10752	3189	3254	3189
q14	297	273	268	268
q15	607	568	548	548
q16	698	641	650	641
q17	1855	1644	1613	1613
q18	8201	7694	7757	7694
q19	1679	1505	1511	1505
q20	2171	1885	1873	1873
q21	5723	5430	5600	5430
q22	1161	1030	1053	1030
Total cold run time: 71551 ms
Total hot run time: 60903 ms

@feiniaofeiafei feiniaofeiafei force-pushed the project_filter_transform_const branch from e668096 to 452bb5f Compare November 12, 2024 13:37
@feiniaofeiafei
Copy link
Contributor Author

run buildall

@feiniaofeiafei feiniaofeiafei force-pushed the project_filter_transform_const branch from 0c01e45 to 6333e6f Compare November 12, 2024 13:40
@feiniaofeiafei
Copy link
Contributor Author

run buildall

@feiniaofeiafei feiniaofeiafei force-pushed the project_filter_transform_const branch 2 times, most recently from 6f13f10 to 892b7ec Compare November 13, 2024 04:22
@feiniaofeiafei
Copy link
Contributor Author

run buildall

@feiniaofeiafei feiniaofeiafei force-pushed the project_filter_transform_const branch 2 times, most recently from 6b4a551 to 852065c Compare November 14, 2024 09:46
@feiniaofeiafei
Copy link
Contributor Author

run buidlall

@feiniaofeiafei feiniaofeiafei force-pushed the project_filter_transform_const branch 2 times, most recently from b393ff3 to 0c470a6 Compare November 14, 2024 10:04
@feiniaofeiafei
Copy link
Contributor Author

run buildall

@feiniaofeiafei feiniaofeiafei changed the title [feat](nereids) add rewrite rule :EliminateGroupByKeyByUniform and PredicateDrivenProjectionSimplification [feat](nereids) add rewrite rule :EliminateGroupByKeyByUniform Nov 14, 2024
@feiniaofeiafei feiniaofeiafei force-pushed the project_filter_transform_const branch 2 times, most recently from 73c21af to c2bc25a Compare November 15, 2024 10:39
@feiniaofeiafei
Copy link
Contributor Author

run buildall

@feiniaofeiafei feiniaofeiafei force-pushed the project_filter_transform_const branch 2 times, most recently from cdb345c to c99e369 Compare November 15, 2024 10:47
@feiniaofeiafei
Copy link
Contributor Author

run buildall

@feiniaofeiafei feiniaofeiafei force-pushed the project_filter_transform_const branch 2 times, most recently from 1d2ad46 to d211163 Compare November 15, 2024 12:01
@feiniaofeiafei
Copy link
Contributor Author

run buildall

@feiniaofeiafei feiniaofeiafei force-pushed the project_filter_transform_const branch from 27e9155 to 1c2bb29 Compare November 27, 2024 04:23
@feiniaofeiafei
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 39943 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 1c2bb29dd64f003618525558e10c77773c15650e, data reload: false

------ Round 1 ----------------------------------
q1	17594	7460	7296	7296
q2	2042	178	179	178
q3	10595	1112	1169	1112
q4	10215	767	756	756
q5	7601	2651	2664	2651
q6	236	151	155	151
q7	980	660	616	616
q8	9260	1842	1913	1842
q9	6452	6365	6321	6321
q10	6970	2271	2318	2271
q11	457	262	269	262
q12	427	217	217	217
q13	17751	3018	3065	3018
q14	238	217	224	217
q15	573	524	522	522
q16	683	575	595	575
q17	987	662	582	582
q18	7308	6726	6719	6719
q19	1332	1052	958	958
q20	462	184	179	179
q21	3947	3182	3203	3182
q22	387	318	322	318
Total cold run time: 106497 ms
Total hot run time: 39943 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7346	7364	7256	7256
q2	324	231	234	231
q3	2957	2794	3043	2794
q4	2113	1916	1780	1780
q5	5549	5717	5629	5629
q6	227	141	145	141
q7	2239	1828	1853	1828
q8	3373	3577	3472	3472
q9	8896	8811	8867	8811
q10	3582	3585	3580	3580
q11	601	520	518	518
q12	870	615	622	615
q13	13429	3215	3083	3083
q14	285	283	280	280
q15	571	507	505	505
q16	679	636	630	630
q17	1775	1562	1540	1540
q18	7792	7555	7562	7555
q19	1655	1607	1444	1444
q20	2011	1816	1775	1775
q21	5427	5256	5265	5256
q22	623	567	577	567
Total cold run time: 72324 ms
Total hot run time: 59290 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 191723 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 1c2bb29dd64f003618525558e10c77773c15650e, data reload: false

query1	969	374	373	373
query2	6534	2121	2058	2058
query3	6717	217	217	217
query4	34292	23731	23581	23581
query5	4388	469	463	463
query6	309	201	189	189
query7	4612	301	314	301
query8	299	231	227	227
query9	9474	2741	2713	2713
query10	488	262	290	262
query11	18120	15140	15230	15140
query12	152	101	101	101
query13	1642	405	409	405
query14	9093	7620	7795	7620
query15	249	184	196	184
query16	8197	440	462	440
query17	1657	549	526	526
query18	2125	289	296	289
query19	359	152	147	147
query20	129	113	114	113
query21	208	100	99	99
query22	4640	4395	4206	4206
query23	35734	34067	34404	34067
query24	10536	2420	2530	2420
query25	611	418	438	418
query26	1225	161	161	161
query27	2797	289	295	289
query28	7739	2458	2428	2428
query29	709	400	412	400
query30	299	151	155	151
query31	1017	818	800	800
query32	92	56	61	56
query33	756	286	270	270
query34	956	515	533	515
query35	898	731	736	731
query36	1076	929	950	929
query37	121	75	84	75
query38	4408	4214	4285	4214
query39	1478	1423	1448	1423
query40	278	98	99	98
query41	49	44	48	44
query42	113	105	98	98
query43	554	485	485	485
query44	1222	803	816	803
query45	185	169	164	164
query46	1150	713	680	680
query47	1977	1851	1872	1851
query48	417	309	319	309
query49	1181	393	385	385
query50	810	383	393	383
query51	7268	7110	7171	7110
query52	104	96	88	88
query53	256	178	181	178
query54	1091	400	409	400
query55	87	79	80	79
query56	263	239	230	230
query57	1308	1188	1146	1146
query58	241	218	218	218
query59	3162	3140	2938	2938
query60	294	252	245	245
query61	107	109	108	108
query62	872	663	685	663
query63	214	192	191	191
query64	5257	649	729	649
query65	3302	3225	3297	3225
query66	1414	335	362	335
query67	15921	16188	15652	15652
query68	4953	555	559	555
query69	411	260	256	256
query70	1215	1130	1092	1092
query71	332	243	248	243
query72	6292	4010	4026	4010
query73	776	361	367	361
query74	10227	9089	8931	8931
query75	3420	2676	2645	2645
query76	2884	1026	1116	1026
query77	430	289	290	289
query78	10417	9547	9423	9423
query79	1115	612	620	612
query80	724	437	445	437
query81	509	238	238	238
query82	846	119	182	119
query83	269	157	154	154
query84	235	69	76	69
query85	1065	299	294	294
query86	311	299	302	299
query87	4747	4631	4578	4578
query88	3441	2244	2204	2204
query89	400	298	300	298
query90	2089	187	190	187
query91	150	104	104	104
query92	67	55	51	51
query93	1132	540	550	540
query94	785	303	298	298
query95	352	244	246	244
query96	617	281	280	280
query97	2862	2667	2641	2641
query98	221	195	199	195
query99	1564	1294	1306	1294
Total cold run time: 299060 ms
Total hot run time: 191723 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 32.14 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 1c2bb29dd64f003618525558e10c77773c15650e, data reload: false

query1	0.03	0.03	0.03
query2	0.07	0.03	0.04
query3	0.23	0.07	0.07
query4	1.62	0.10	0.10
query5	0.42	0.41	0.41
query6	1.16	0.67	0.66
query7	0.02	0.02	0.01
query8	0.04	0.03	0.03
query9	0.56	0.53	0.52
query10	0.55	0.55	0.56
query11	0.15	0.11	0.10
query12	0.14	0.12	0.10
query13	0.61	0.61	0.59
query14	2.71	2.73	2.84
query15	0.92	0.83	0.83
query16	0.39	0.39	0.39
query17	1.05	1.07	1.03
query18	0.22	0.21	0.21
query19	1.98	1.84	2.09
query20	0.02	0.02	0.01
query21	15.36	0.61	0.60
query22	2.56	2.56	1.37
query23	17.18	0.88	0.71
query24	3.33	1.79	0.85
query25	0.20	0.15	0.12
query26	0.57	0.14	0.14
query27	0.05	0.05	0.04
query28	9.83	1.12	1.09
query29	12.58	3.28	3.23
query30	0.25	0.06	0.06
query31	2.85	0.39	0.39
query32	3.26	0.47	0.47
query33	3.04	3.00	3.05
query34	17.01	4.48	4.52
query35	4.55	4.58	4.56
query36	0.66	0.50	0.48
query37	0.09	0.06	0.06
query38	0.05	0.03	0.04
query39	0.03	0.02	0.02
query40	0.16	0.14	0.13
query41	0.08	0.03	0.03
query42	0.03	0.02	0.02
query43	0.04	0.03	0.02
Total cold run time: 106.65 s
Total hot run time: 32.14 s

@feiniaofeiafei
Copy link
Contributor Author

run p0

sql "insert into test1 values(1,1),(2,1),(3,1);"
sql "create table test2(a int, b int) distributed by hash(a) properties('replication_num'='1');"
sql "insert into test2 values(1,105),(2,105);"
qt_full_join_uniform_should_not_eliminate_group_by_key "select t2.b,t1.b from test1 t1 full join (select * from test2 where b=105) t2 on t1.a=t2.a group by t2.b,t1.b order by 1,2;"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does the original code have bug for this case?

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Nov 29, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@morrySnow morrySnow merged commit ed4f7fb into apache:master Dec 2, 2024
28 of 29 checks passed
feiniaofeiafei added a commit to feiniaofeiafei/doris that referenced this pull request Dec 6, 2024
…e#43391)

This PR introduces two main changes:
1. Adds an optional constant value to the uniform attribute in
DataTrait. A slot with a constant value that is not null will be
considered uniform and not null.
2. Introduces a new transform rule: EliminateGroupByKeyByUniform, which
utilizes the newly added part of the uniform attribute. Following is
example transformation:

from

 +--aggregate(group by a,b output a,b,max(c))
(a is uniform and not null: e.g. a is projection 2 as a in
logicalProject)

to

+--aggregate(group by b output b,any_value(a) as a,max(c))
morrySnow pushed a commit that referenced this pull request Dec 10, 2024
morrySnow pushed a commit that referenced this pull request Jan 8, 2025
…46352)

### What problem does this PR solve?

Related PR: #43391

Problem Summary:
repeat's uniform is not right, remove it temprarily
github-actions bot pushed a commit that referenced this pull request Jan 8, 2025
…46352)

### What problem does this PR solve?

Related PR: #43391

Problem Summary:
repeat's uniform is not right, remove it temprarily
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/3.0.4-merged reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants