Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

branch-3.0: [opt](metrics) Remove IntervalHistogramStat #47459 #47606

Open
wants to merge 1 commit into
base: branch-3.0
Choose a base branch
from

Conversation

zhiqiang-hhhh
Copy link
Contributor

cherry pick from #47459

### What problem does this PR solve?

Use prometheus to calculate average value is better.

Related PR: apache#43144

For example, we use `task_execution_time_ns_avg_in_last_1000_times`
which is equal to `SUM(cost 0, ... cost 999) / 1000` to represent
average execution time, it has two problems:

1. Update of its data source `_task_execution_time_ns_statistic`
acquires lock.
2. Result of `task_execution_time_ns_avg_in_last_1000_times` is not zero
if we just finished a set of tasks and no more tasks to run. For
example, we have a continuous straight line after all tasks have
finished for a while.
<img width="416" alt="image"
src="https://github.com/user-attachments/assets/e874a077-1e74-4700-9dd8-4cf9625bc8f8"
/>

The problem can be fixed by:
1. Using `task_execution_time_ns_total` an atomic counter to store total
sum of execution time of each iteration.
2. With the help of `irate` function of prometheus, we can have an
equivalent substitution like
`irate(doris_be_task_execution_time_ns_total[$__rate_interval])/doris_be_thread_pool_active_threads`

<img width="349" alt="image"
src="https://github.com/user-attachments/assets/2b0b41bd-8709-4cab-84c1-cf11c4cc3ac9"
/>

After all tasks finished, the curve will be zero, this is more
reasonable.
@zhiqiang-hhhh
Copy link
Contributor Author

run buildall

@Thearas
Copy link
Contributor

Thearas commented Feb 7, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@doris-robot
Copy link

TPC-H: Total hot run time: 40829 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 4e52c84964bf467c64d56aeb88aec2da3eab72bf, data reload: false

------ Round 1 ----------------------------------
q1	17572	7400	7274	7274
q2	2065	172	176	172
q3	10564	1108	1188	1108
q4	10499	770	759	759
q5	7752	2896	2825	2825
q6	241	150	146	146
q7	1017	610	622	610
q8	9370	1994	2013	1994
q9	6600	6421	6445	6421
q10	7042	2292	2369	2292
q11	461	269	263	263
q12	406	217	215	215
q13	17781	3022	2998	2998
q14	257	217	210	210
q15	584	519	571	519
q16	687	577	602	577
q17	986	546	543	543
q18	7260	6723	6847	6723
q19	1389	1081	961	961
q20	469	207	202	202
q21	4065	3262	3067	3067
q22	1077	985	950	950
Total cold run time: 108144 ms
Total hot run time: 40829 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7234	7227	7223	7223
q2	328	235	236	235
q3	2910	2927	2906	2906
q4	2049	1851	1903	1851
q5	5716	5755	5690	5690
q6	228	139	140	139
q7	2294	1778	1813	1778
q8	3354	3537	3544	3537
q9	8820	8919	8823	8823
q10	3597	3582	3565	3565
q11	612	507	504	504
q12	820	618	605	605
q13	9593	3188	3178	3178
q14	313	266	268	266
q15	582	531	538	531
q16	696	668	646	646
q17	1862	1631	1641	1631
q18	8198	7822	7652	7652
q19	1667	1479	1517	1479
q20	2120	1842	1884	1842
q21	5539	5370	5345	5345
q22	1173	1032	1052	1032
Total cold run time: 69705 ms
Total hot run time: 60458 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 197589 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 4e52c84964bf467c64d56aeb88aec2da3eab72bf, data reload: false

query1	1307	960	924	924
query2	6237	2099	2004	2004
query3	10828	4389	4145	4145
query4	67122	28662	23802	23802
query5	5003	478	473	473
query6	422	183	200	183
query7	5596	317	315	315
query8	313	227	222	222
query9	8951	2724	2708	2708
query10	455	286	259	259
query11	17406	15200	15977	15200
query12	166	115	104	104
query13	1517	446	440	440
query14	9893	7948	6809	6809
query15	211	184	189	184
query16	6743	482	537	482
query17	1121	578	616	578
query18	1798	345	333	333
query19	208	161	180	161
query20	119	114	114	114
query21	211	102	106	102
query22	4901	4433	4428	4428
query23	34625	33923	34271	33923
query24	6087	2905	2895	2895
query25	541	437	425	425
query26	654	172	174	172
query27	1862	357	352	352
query28	4050	2504	2456	2456
query29	708	452	444	444
query30	241	163	170	163
query31	985	826	858	826
query32	66	56	57	56
query33	479	293	289	289
query34	901	510	538	510
query35	875	738	728	728
query36	1071	981	993	981
query37	121	77	77	77
query38	4118	4015	3988	3988
query39	1523	1465	1456	1456
query40	204	108	101	101
query41	50	52	47	47
query42	117	103	101	101
query43	529	493	507	493
query44	1204	852	847	847
query45	183	175	167	167
query46	1158	727	744	727
query47	2045	1911	1959	1911
query48	498	380	386	380
query49	726	412	398	398
query50	828	431	426	426
query51	7319	7256	7112	7112
query52	99	91	89	89
query53	262	183	178	178
query54	573	447	461	447
query55	75	77	76	76
query56	257	256	242	242
query57	1225	1165	1116	1116
query58	223	204	206	204
query59	3083	2974	2841	2841
query60	272	254	250	250
query61	133	106	108	106
query62	862	749	725	725
query63	212	186	198	186
query64	1392	673	643	643
query65	3279	3186	3168	3168
query66	719	287	291	287
query67	16057	15717	15820	15717
query68	4299	605	578	578
query69	470	276	269	269
query70	1138	1144	1151	1144
query71	352	263	259	259
query72	6430	4007	4066	4007
query73	746	358	358	358
query74	10250	9104	9078	9078
query75	3380	2634	2658	2634
query76	1902	1091	1110	1091
query77	486	276	275	275
query78	10544	9628	9583	9583
query79	1645	614	607	607
query80	884	430	425	425
query81	516	237	238	237
query82	1247	121	122	121
query83	250	149	143	143
query84	285	87	89	87
query85	888	300	298	298
query86	338	310	300	300
query87	4464	4402	4231	4231
query88	3770	2364	2332	2332
query89	416	292	289	289
query90	2023	191	186	186
query91	181	149	172	149
query92	68	51	51	51
query93	1907	566	552	552
query94	759	302	290	290
query95	358	254	259	254
query96	615	281	292	281
query97	3308	3201	3181	3181
query98	214	211	212	211
query99	1690	1428	1392	1392
Total cold run time: 318842 ms
Total hot run time: 197589 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 32.67 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 4e52c84964bf467c64d56aeb88aec2da3eab72bf, data reload: false

query1	0.03	0.03	0.03
query2	0.06	0.03	0.03
query3	0.23	0.06	0.07
query4	1.62	0.10	0.10
query5	0.54	0.50	0.51
query6	1.19	0.72	0.73
query7	0.02	0.02	0.02
query8	0.04	0.03	0.03
query9	0.55	0.50	0.50
query10	0.55	0.55	0.55
query11	0.14	0.11	0.11
query12	0.14	0.11	0.10
query13	0.60	0.60	0.60
query14	2.83	2.85	2.73
query15	0.89	0.82	0.84
query16	0.38	0.41	0.40
query17	1.05	1.00	1.07
query18	0.23	0.22	0.22
query19	1.97	1.82	2.04
query20	0.01	0.01	0.02
query21	15.35	0.58	0.58
query22	2.62	2.83	1.77
query23	17.03	0.96	0.87
query24	3.18	0.93	1.49
query25	0.21	0.17	0.07
query26	0.52	0.14	0.13
query27	0.06	0.03	0.04
query28	10.29	1.12	1.06
query29	12.61	3.29	3.25
query30	0.25	0.06	0.07
query31	2.85	0.39	0.38
query32	3.29	0.46	0.45
query33	2.98	3.03	3.04
query34	16.98	4.42	4.49
query35	4.49	4.51	4.54
query36	0.66	0.47	0.51
query37	0.10	0.06	0.06
query38	0.05	0.04	0.04
query39	0.04	0.03	0.02
query40	0.15	0.13	0.12
query41	0.08	0.02	0.02
query42	0.04	0.02	0.02
query43	0.04	0.03	0.04
Total cold run time: 106.94 s
Total hot run time: 32.67 s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants