Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] Support some compress functions #47307

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

lzyy2024
Copy link

@lzyy2024 lzyy2024 commented Jan 22, 2025

What problem does this PR solve?

Added the compress and uncompressed functions similar to mysql

Issue Number: close #45530

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

Copy link
Contributor

@zclllyybb zclllyybb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and remember to format your file


Status execute_impl(FunctionContext* context, Block& block, const ColumnNumbers& arguments,
uint32_t result, size_t input_rows_count) const override {
// LOG(INFO) << "Executing FunctionCompress with " << input_rows_count
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove these commented lines

col_data[idx] = '0', col_data[idx + 1] = 'x';
for (int i = 0; i < 4; i++) {
unsigned char byte = (value >> (i * 8)) & 0xFF;
col_data[idx + 2 + i * 2] = "0123456789ABCDEF"[byte >> 4]; // 高4位
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont use Chinese

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and make magic values


auto st = compression_codec->compress(data, &compressed_str);

if (!st.ok()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment about when will it fails

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we dont need modify this file anymore

std::string func_name = "compress";
InputTypeSet input_types = {TypeIndex::String};

// 压缩多个不同的字符串
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont use Chinese comment

std::string uncompressed;
Slice data;
Slice uncompressed_slice;
for (int row = 0; row < input_rows_count; row++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use size_t, not int

illegal = 1;
} else {
if (data[0] != '0' || data[1] != 'x') {
LOG(INFO) << "illegal: "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont log info here

if (x >= 'A' && x <= 'F') return true;
return false;
};
auto trans = [](char x) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just use from_chars and to_chars to replace your user implemented lambdas

// Print the compressed string (after compression)
// LOG(INFO) << "Compressed string at row " << row << ": "
// << std::string(reinterpret_cast<const char*>(col_data.data()));
col_offset[row] = col_offset[row - 1] + 10 + compressed_str.size() * 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this value for?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first ten digits of the compress value are "0x" and eight digits long, followed by each digit split into two hexadecimal values

Copy link
Contributor

@zclllyybb zclllyybb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix the correcteness

be/src/vec/functions/function_compress.cpp Outdated Show resolved Hide resolved
Slice data;
for (size_t row = 0; row < input_rows_count; row++) {
null_map[row] = false;
const auto& str = arg_column.get_data_at(row);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to use virtual function here

\N \N

-- !const_not_nullable --
0x05000000789C73C92FCA2C060005B00202 0x446F726973
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

carefully review your result!!!

Slice data;
Slice uncompressed_slice;
for (size_t row = 0; row < input_rows_count; row++) {
auto check = [](char x) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try to use std function firstly

const auto& str = arg_column.get_data_at(row);
data = Slice(str.data, str.size);

int illegal = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not bool?

unsigned char* src = compressed_str.data();
{
for (size_t i = 0; i < compressed_str.size(); i++) {
col_data[idx] =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so tricky here. try to improve code like it

Slice data;
Slice uncompressed_slice;
for (size_t row = 0; row < input_rows_count; row++) {
std::function<bool(char)> check = [](char x) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use isxdigit?

@@ -854,8 +854,13 @@ class ZlibBlockCompression : public BlockCompressionCodec {
Slice s(*output);

auto zres = ::compress((Bytef*)s.data, &s.size, (Bytef*)input.data, input.size);
if (zres != Z_OK) {
return Status::InvalidArgument("Fail to do ZLib compress, error={}", zError(zres));
if (zres == Z_MEM_ERROR) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also change other same calls

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

split them to another PR may be better

implements UnaryExpression, ExplicitlyCastableSignature, PropagateNullable {

public static final List<FunctionSignature> SIGNATURES = ImmutableList.of(
FunctionSignature.ret(StringType.INSTANCE).args(StringType.INSTANCE));

This comment was marked as resolved.

implements UnaryExpression, ExplicitlyCastableSignature, AlwaysNullable {

public static final List<FunctionSignature> SIGNATURES = ImmutableList.of(
FunctionSignature.ret(StringType.INSTANCE).args(StringType.INSTANCE));

This comment was marked as resolved.


unsigned int length = 0;
for (size_t i = 2; i <= 9; i += 2) {
unsigned char byte = (hex_ctoi.at(data[i]) << 4) + hex_ctoi.at(data[i + 1]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove hex_ctoi and just use from_chars

unsigned int length = 0;
for (size_t i = 2; i <= 9; i += 2) {
unsigned char byte = (hex_ctoi.at(data[i]) << 4) + hex_ctoi.at(data[i + 1]);
length += (byte << (8 * (i / 2 - 1))); //Little Endian : 0x01000000 -> 1

This comment was marked as resolved.

std::string uncompressed;
Slice data;
Slice uncompressed_slice;
for (size_t row = 0; row < input_rows_count; row++) {

This comment was marked as resolved.

}
idx += 10;

col_data.resize(col_data.size() + 2 * compressed_str.size());

This comment was marked as resolved.

//Converts a hexadecimal readable string to a compressed byte stream
std::string s(((int)data.size - 10) / 2, ' '); // byte stream data.size >= 10
for (size_t i = 10, j = 0; i < data.size; i += 2, j++) {
s[j] = (hex_ctoi.at(data[i]) << 4) + hex_ctoi.at(data[i + 1]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@lzyy2024
Copy link
Author

run buildall

1 similar comment
@lzyy2024
Copy link
Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 32100 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 89c394274803c1107b4fbc5b37d8608ba3af107d, data reload: false

------ Round 1 ----------------------------------
q1	17575	5496	5376	5376
q2	2052	334	182	182
q3	10468	1303	735	735
q4	10229	969	517	517
q5	7663	2384	2167	2167
q6	191	165	136	136
q7	925	764	608	608
q8	9235	1394	1149	1149
q9	5219	4920	4833	4833
q10	6811	2314	1890	1890
q11	476	280	258	258
q12	341	358	214	214
q13	17760	3661	3052	3052
q14	228	244	206	206
q15	512	471	459	459
q16	646	619	598	598
q17	558	860	317	317
q18	7189	6475	6417	6417
q19	1807	966	537	537
q20	304	319	185	185
q21	2804	2173	1957	1957
q22	356	330	307	307
Total cold run time: 103349 ms
Total hot run time: 32100 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5511	5460	5437	5437
q2	248	327	233	233
q3	2242	2638	2307	2307
q4	1439	1838	1365	1365
q5	4323	4725	4681	4681
q6	167	156	124	124
q7	2080	1986	1810	1810
q8	2656	2811	2662	2662
q9	7293	7156	7173	7156
q10	2932	3258	2769	2769
q11	572	516	494	494
q12	717	749	595	595
q13	3494	3934	3293	3293
q14	267	289	267	267
q15	505	474	464	464
q16	658	693	641	641
q17	1207	1731	1256	1256
q18	7613	7379	7400	7379
q19	768	1157	1031	1031
q20	2009	2029	1866	1866
q21	5644	5218	4986	4986
q22	597	652	555	555
Total cold run time: 52942 ms
Total hot run time: 51371 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 184954 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 89c394274803c1107b4fbc5b37d8608ba3af107d, data reload: false

query1	962	376	368	368
query2	6516	2097	2071	2071
query3	6802	211	218	211
query4	33731	23198	22991	22991
query5	4407	575	438	438
query6	270	185	173	173
query7	4596	496	311	311
query8	282	244	214	214
query9	9565	2688	2704	2688
query10	472	319	261	261
query11	18192	15054	15022	15022
query12	156	112	103	103
query13	1649	517	409	409
query14	9175	7286	6882	6882
query15	251	188	182	182
query16	8042	641	482	482
query17	1621	744	564	564
query18	2108	401	306	306
query19	229	188	155	155
query20	115	109	110	109
query21	212	123	100	100
query22	4110	4433	4267	4267
query23	33827	33022	32880	32880
query24	6450	2291	2288	2288
query25	529	488	377	377
query26	1198	265	156	156
query27	1997	463	333	333
query28	5369	2458	2448	2448
query29	719	545	418	418
query30	234	181	152	152
query31	934	849	774	774
query32	90	59	67	59
query33	496	354	330	330
query34	734	845	492	492
query35	800	817	741	741
query36	978	1063	968	968
query37	120	104	75	75
query38	4136	4246	4012	4012
query39	1461	1381	1398	1381
query40	221	112	100	100
query41	53	49	55	49
query42	118	97	101	97
query43	511	505	477	477
query44	1332	841	806	806
query45	175	173	163	163
query46	853	1032	637	637
query47	1802	1810	1729	1729
query48	380	404	305	305
query49	783	495	390	390
query50	621	641	390	390
query51	4237	4228	4089	4089
query52	111	103	90	90
query53	223	255	184	184
query54	472	480	413	413
query55	81	82	79	79
query56	258	257	245	245
query57	1158	1141	1066	1066
query58	243	237	245	237
query59	3158	2999	2979	2979
query60	275	266	260	260
query61	119	119	115	115
query62	777	729	637	637
query63	240	199	182	182
query64	4436	995	654	654
query65	3214	3196	3159	3159
query66	1064	407	313	313
query67	15922	15579	15428	15428
query68	4284	817	546	546
query69	466	290	261	261
query70	1212	1103	1118	1103
query71	374	281	250	250
query72	5796	3862	3799	3799
query73	648	747	360	360
query74	10488	8941	8914	8914
query75	3156	3155	2682	2682
query76	3139	1143	755	755
query77	492	339	271	271
query78	9923	10078	9404	9404
query79	2446	797	608	608
query80	788	528	465	465
query81	538	316	244	244
query82	348	151	125	125
query83	170	173	154	154
query84	237	88	77	77
query85	754	360	304	304
query86	440	321	306	306
query87	4424	4481	4498	4481
query88	4173	2165	2210	2165
query89	398	321	302	302
query90	1919	190	188	188
query91	137	139	108	108
query92	70	60	55	55
query93	2644	879	535	535
query94	746	408	294	294
query95	338	262	262	262
query96	484	603	277	277
query97	2775	2886	2750	2750
query98	237	198	191	191
query99	1286	1372	1254	1254
Total cold run time: 281702 ms
Total hot run time: 184954 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.68 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 89c394274803c1107b4fbc5b37d8608ba3af107d, data reload: false

query1	0.03	0.03	0.04
query2	0.08	0.04	0.03
query3	0.23	0.07	0.07
query4	1.61	0.11	0.10
query5	0.42	0.42	0.38
query6	1.15	0.66	0.66
query7	0.02	0.02	0.02
query8	0.04	0.04	0.03
query9	0.58	0.49	0.53
query10	0.55	0.56	0.56
query11	0.14	0.10	0.10
query12	0.13	0.11	0.11
query13	0.60	0.60	0.60
query14	2.85	2.81	2.88
query15	0.89	0.82	0.81
query16	0.39	0.38	0.39
query17	1.05	1.06	1.05
query18	0.22	0.21	0.20
query19	1.90	1.83	2.02
query20	0.02	0.01	0.01
query21	15.36	1.02	0.60
query22	0.75	0.75	0.65
query23	15.37	1.36	0.60
query24	2.91	1.92	0.87
query25	0.16	0.19	0.14
query26	0.23	0.14	0.13
query27	0.06	0.05	0.06
query28	14.17	1.02	0.43
query29	12.60	3.98	3.27
query30	0.26	0.09	0.06
query31	2.82	0.61	0.37
query32	3.24	0.55	0.46
query33	2.99	3.02	3.09
query34	16.54	5.17	4.51
query35	4.52	4.44	4.45
query36	0.65	0.49	0.52
query37	0.10	0.06	0.06
query38	0.05	0.04	0.03
query39	0.03	0.03	0.02
query40	0.17	0.13	0.13
query41	0.08	0.03	0.03
query42	0.04	0.02	0.02
query43	0.04	0.03	0.04
Total cold run time: 106.04 s
Total hot run time: 30.68 s

@lzyy2024
Copy link
Author

run buildall

1 similar comment
@lzyy2024
Copy link
Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 32971 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 75e8e78802a7295e88f4b0d103064acfcd5e6e4d, data reload: false

------ Round 1 ----------------------------------
q1	17843	6172	5422	5422
q2	2040	300	178	178
q3	10412	1224	728	728
q4	10882	965	536	536
q5	8400	2410	2141	2141
q6	192	176	134	134
q7	906	820	595	595
q8	9228	1339	1150	1150
q9	5785	5158	5032	5032
q10	6988	2361	1956	1956
q11	483	290	268	268
q12	344	370	227	227
q13	18216	3998	3387	3387
q14	272	251	243	243
q15	528	482	477	477
q16	649	626	589	589
q17	569	872	337	337
q18	8233	6545	6461	6461
q19	2878	984	543	543
q20	303	310	192	192
q21	2714	2218	2045	2045
q22	362	332	330	330
Total cold run time: 108227 ms
Total hot run time: 32971 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5710	5517	5469	5469
q2	231	318	233	233
q3	2257	2622	2340	2340
q4	1412	1807	1395	1395
q5	4328	4781	4883	4781
q6	165	162	129	129
q7	2091	1922	1827	1827
q8	2687	2805	2659	2659
q9	7273	7260	7262	7260
q10	3020	3212	2759	2759
q11	586	520	498	498
q12	675	790	601	601
q13	3504	3971	3293	3293
q14	284	306	281	281
q15	519	485	467	467
q16	661	677	630	630
q17	1209	1777	1246	1246
q18	7790	7450	7409	7409
q19	765	1154	1076	1076
q20	2000	2050	1919	1919
q21	5653	5124	5012	5012
q22	631	607	571	571
Total cold run time: 53451 ms
Total hot run time: 51855 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 191631 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 75e8e78802a7295e88f4b0d103064acfcd5e6e4d, data reload: false

query1	1306	964	932	932
query2	6184	2038	2033	2033
query3	11103	4702	4399	4399
query4	61069	29129	23025	23025
query5	5535	611	458	458
query6	432	204	183	183
query7	5529	511	307	307
query8	331	247	233	233
query9	8032	2708	2701	2701
query10	469	305	259	259
query11	17709	15224	15513	15224
query12	168	122	114	114
query13	1465	546	409	409
query14	11082	7040	6994	6994
query15	210	206	197	197
query16	7241	636	484	484
query17	1201	730	591	591
query18	1910	422	335	335
query19	205	194	165	165
query20	123	114	118	114
query21	225	131	106	106
query22	4449	4470	4543	4470
query23	34433	33834	33260	33260
query24	5996	2367	2297	2297
query25	460	467	404	404
query26	649	279	157	157
query27	1809	459	333	333
query28	4055	2489	2456	2456
query29	525	545	431	431
query30	214	192	158	158
query31	929	915	837	837
query32	64	60	57	57
query33	438	366	306	306
query34	742	872	503	503
query35	816	867	758	758
query36	1033	1051	950	950
query37	115	107	78	78
query38	4310	4362	4265	4265
query39	1508	1448	1442	1442
query40	217	113	103	103
query41	51	51	50	50
query42	124	109	102	102
query43	507	516	494	494
query44	1338	846	857	846
query45	183	173	171	171
query46	873	1054	654	654
query47	1891	1979	1874	1874
query48	396	407	342	342
query49	718	493	409	409
query50	649	707	400	400
query51	4265	4313	4172	4172
query52	111	105	99	99
query53	228	254	200	200
query54	485	513	426	426
query55	81	82	82	82
query56	260	266	244	244
query57	1237	1210	1154	1154
query58	233	231	236	231
query59	3223	3364	3053	3053
query60	279	271	266	266
query61	139	112	117	112
query62	736	720	663	663
query63	225	184	185	184
query64	1286	1034	656	656
query65	3273	3124	3142	3124
query66	689	435	332	332
query67	16065	15658	15451	15451
query68	5022	809	539	539
query69	475	295	264	264
query70	1178	1161	1126	1126
query71	416	286	253	253
query72	6050	3899	3797	3797
query73	803	764	353	353
query74	9860	8792	8698	8698
query75	3220	3156	2703	2703
query76	3796	1195	748	748
query77	536	353	275	275
query78	10087	10047	9345	9345
query79	2453	805	603	603
query80	1199	524	485	485
query81	540	279	227	227
query82	355	160	126	126
query83	242	165	159	159
query84	291	92	70	70
query85	746	342	301	301
query86	377	321	301	301
query87	4546	4476	4482	4476
query88	3486	2174	2136	2136
query89	393	332	292	292
query90	1576	182	188	182
query91	135	133	110	110
query92	61	56	55	55
query93	2163	848	534	534
query94	737	402	294	294
query95	321	260	265	260
query96	489	618	279	279
query97	2815	2894	2811	2811
query98	224	192	193	192
query99	1302	1402	1318	1318
Total cold run time: 309730 ms
Total hot run time: 191631 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.67 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 75e8e78802a7295e88f4b0d103064acfcd5e6e4d, data reload: false

query1	0.04	0.04	0.03
query2	0.07	0.03	0.03
query3	0.25	0.06	0.07
query4	1.61	0.11	0.10
query5	0.42	0.42	0.40
query6	1.17	0.65	0.66
query7	0.02	0.02	0.02
query8	0.04	0.04	0.03
query9	0.58	0.49	0.50
query10	0.56	0.56	0.54
query11	0.15	0.10	0.10
query12	0.14	0.11	0.11
query13	0.60	0.60	0.61
query14	2.85	2.74	2.72
query15	0.90	0.83	0.82
query16	0.39	0.38	0.36
query17	1.05	1.01	1.00
query18	0.24	0.20	0.20
query19	1.86	1.88	2.01
query20	0.01	0.01	0.01
query21	15.36	0.99	0.58
query22	0.77	0.82	0.75
query23	15.21	1.49	0.53
query24	3.25	0.92	0.84
query25	0.17	0.26	0.12
query26	0.18	0.15	0.14
query27	0.05	0.04	0.04
query28	13.60	1.09	0.44
query29	12.60	3.98	3.33
query30	0.26	0.08	0.06
query31	2.84	0.62	0.39
query32	3.23	0.54	0.46
query33	2.97	3.06	2.99
query34	16.60	5.17	4.52
query35	4.61	4.63	4.54
query36	0.65	0.48	0.50
query37	0.09	0.06	0.05
query38	0.05	0.04	0.04
query39	0.03	0.02	0.03
query40	0.16	0.13	0.14
query41	0.08	0.03	0.03
query42	0.04	0.03	0.02
query43	0.04	0.03	0.02
Total cold run time: 105.79 s
Total hot run time: 30.67 s

@lzyy2024
Copy link
Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 32328 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 13ebe672a083689491c631868d403d84b840cd3f, data reload: false

------ Round 1 ----------------------------------
q1	17587	5520	5400	5400
q2	2046	311	168	168
q3	10541	1284	722	722
q4	10240	962	540	540
q5	8273	2482	2182	2182
q6	195	165	135	135
q7	904	774	641	641
q8	9245	1366	1178	1178
q9	5286	4870	4929	4870
q10	6871	2353	1879	1879
q11	456	280	259	259
q12	352	358	216	216
q13	17765	3713	3109	3109
q14	232	240	206	206
q15	536	483	459	459
q16	634	616	600	600
q17	567	876	320	320
q18	7111	6386	6397	6386
q19	1677	953	548	548
q20	312	323	190	190
q21	2862	2253	2005	2005
q22	364	331	315	315
Total cold run time: 104056 ms
Total hot run time: 32328 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5710	5507	5510	5507
q2	237	324	254	254
q3	2251	2600	2267	2267
q4	1411	1801	1361	1361
q5	4357	4730	4650	4650
q6	173	163	129	129
q7	2075	1965	1892	1892
q8	2583	2829	2689	2689
q9	7428	7143	7226	7143
q10	3027	3345	2815	2815
q11	592	521	496	496
q12	672	778	609	609
q13	3498	3913	3403	3403
q14	290	305	283	283
q15	524	479	464	464
q16	631	694	636	636
q17	1240	1724	1262	1262
q18	7669	7556	7362	7362
q19	761	1067	1128	1067
q20	1975	2072	1898	1898
q21	5701	5295	5131	5131
q22	592	572	577	572
Total cold run time: 53397 ms
Total hot run time: 51890 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 185720 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 13ebe672a083689491c631868d403d84b840cd3f, data reload: false

query1	979	388	367	367
query2	6520	2069	2002	2002
query3	6799	218	219	218
query4	33222	23366	23107	23107
query5	4314	612	458	458
query6	285	210	187	187
query7	4590	486	314	314
query8	302	245	229	229
query9	9612	2703	2704	2703
query10	466	306	249	249
query11	17939	15268	15179	15179
query12	157	103	102	102
query13	1670	539	391	391
query14	10394	7012	6984	6984
query15	230	190	186	186
query16	7215	619	477	477
query17	1595	701	574	574
query18	1740	393	312	312
query19	237	190	171	171
query20	122	117	111	111
query21	213	123	103	103
query22	4125	4424	4309	4309
query23	34421	33045	33169	33045
query24	6612	2295	2392	2295
query25	506	475	398	398
query26	1221	277	158	158
query27	1968	461	347	347
query28	5186	2468	2451	2451
query29	607	571	453	453
query30	232	186	168	168
query31	964	891	814	814
query32	73	64	62	62
query33	524	419	307	307
query34	741	838	519	519
query35	794	804	762	762
query36	1022	1063	941	941
query37	121	105	80	80
query38	4089	4163	4004	4004
query39	1492	1380	1454	1380
query40	205	111	103	103
query41	55	52	63	52
query42	120	103	109	103
query43	517	506	484	484
query44	1373	814	816	814
query45	177	170	163	163
query46	859	1031	647	647
query47	1779	1835	1791	1791
query48	388	401	327	327
query49	758	478	397	397
query50	633	670	392	392
query51	4188	4212	4141	4141
query52	101	106	93	93
query53	236	251	196	196
query54	488	496	404	404
query55	83	77	79	77
query56	263	267	242	242
query57	1151	1168	1073	1073
query58	241	227	246	227
query59	3010	2995	2755	2755
query60	277	265	251	251
query61	117	109	113	109
query62	792	720	664	664
query63	217	192	194	192
query64	4076	1017	637	637
query65	3245	3205	3143	3143
query66	906	414	311	311
query67	15870	15809	15640	15640
query68	5346	836	541	541
query69	443	289	253	253
query70	1195	1164	1083	1083
query71	387	282	260	260
query72	5798	3822	3776	3776
query73	655	760	363	363
query74	9923	8945	9249	8945
query75	3187	3129	2656	2656
query76	3199	1183	785	785
query77	481	367	283	283
query78	10003	10019	9345	9345
query79	3024	829	604	604
query80	682	529	446	446
query81	498	277	282	277
query82	423	155	124	124
query83	165	174	153	153
query84	239	89	76	76
query85	787	337	301	301
query86	390	323	305	305
query87	4520	4423	4439	4423
query88	5058	2177	2157	2157
query89	386	327	294	294
query90	1809	192	206	192
query91	131	133	105	105
query92	62	59	56	56
query93	2315	876	541	541
query94	664	421	308	308
query95	334	278	262	262
query96	491	619	290	290
query97	2761	2875	2725	2725
query98	229	205	195	195
query99	1287	1394	1251	1251
Total cold run time: 282296 ms
Total hot run time: 185720 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.3 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 13ebe672a083689491c631868d403d84b840cd3f, data reload: false

query1	0.03	0.03	0.04
query2	0.06	0.04	0.03
query3	0.24	0.06	0.07
query4	1.61	0.11	0.10
query5	0.43	0.44	0.41
query6	1.16	0.65	0.65
query7	0.02	0.01	0.01
query8	0.04	0.03	0.04
query9	0.59	0.49	0.51
query10	0.55	0.58	0.56
query11	0.14	0.11	0.11
query12	0.13	0.10	0.11
query13	0.63	0.60	0.60
query14	2.73	2.89	2.85
query15	0.89	0.84	0.82
query16	0.39	0.39	0.38
query17	1.01	1.03	1.00
query18	0.22	0.20	0.20
query19	1.86	1.77	2.09
query20	0.02	0.01	0.01
query21	15.37	0.94	0.56
query22	0.75	0.86	0.77
query23	15.15	1.48	0.60
query24	2.94	1.71	0.36
query25	0.28	0.09	0.13
query26	0.34	0.14	0.13
query27	0.05	0.07	0.06
query28	13.57	1.03	0.44
query29	12.58	3.96	3.27
query30	0.25	0.09	0.07
query31	2.82	0.60	0.37
query32	3.25	0.55	0.46
query33	3.01	3.01	3.05
query34	16.61	5.27	4.56
query35	4.51	4.54	4.52
query36	0.65	0.49	0.48
query37	0.10	0.06	0.06
query38	0.04	0.04	0.04
query39	0.03	0.02	0.03
query40	0.17	0.14	0.14
query41	0.09	0.03	0.03
query42	0.04	0.02	0.03
query43	0.04	0.04	0.03
Total cold run time: 105.39 s
Total hot run time: 30.3 s

@lzyy2024
Copy link
Author

run buildall

Copy link
Contributor

PR approved by anyone and no changes requested.

namespace doris::vectorized {

class FunctionCompress : public IFunction {
string hex_itoc = "0123456789ABCDEF";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HEX_ITOC for const data. need constexpr, better be std::array

}

// first ten digits represent the length of the uncompressed string
col_data.resize(col_data.size() + 10 + 2 * compressed_str.size());
Copy link
Contributor

@HappenLee HappenLee Jan 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so what the function do ? seems maybe the result bigger than before compress ? Mysql do the same thing ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, mysql does the same thing. What I do is stream the compressed bytes into a visible hexadecimal string

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need to change the compressed bytes into a visible hexadecimal string.

  1. the work maybe the result bigger than before compress
  2. nobody care about the content of compressed bytes, people only care the compress really compress the data and decompress can get the same result before compress

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be not rery reasonable. There's reason for Mysql to behave like this.

  1. after compressing, the bytes in corresponding memory just a stream of bytes. so any case is possible. just interpret it as chars doesn’t keep consistency. consider a memory region of “a\b”. after printing it’s “” because ‘\b’ deletes ‘a’.
  2. for the compression ratio, it’s guaranteed by compression algorithm. it has a very large ratio. so even we print it as chars which would double it length, it doesn’t matter.

@lzyy2024
Copy link
Author

run buildall

@lzyy2024 lzyy2024 requested a review from HappenLee January 30, 2025 01:56
@doris-robot
Copy link

TPC-H: Total hot run time: 32303 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 23e089b95f2a690fe4e2f913b1ec7550fceabdd3, data reload: false

------ Round 1 ----------------------------------
q1	17578	5496	5373	5373
q2	2047	308	164	164
q3	10425	1238	769	769
q4	10214	991	547	547
q5	7944	2416	2160	2160
q6	201	171	132	132
q7	904	762	596	596
q8	9228	1371	1173	1173
q9	5320	5065	4890	4890
q10	6842	2342	1902	1902
q11	458	277	250	250
q12	341	359	217	217
q13	17750	3697	3118	3118
q14	240	239	219	219
q15	519	479	474	474
q16	641	621	584	584
q17	584	869	333	333
q18	6978	6336	6500	6336
q19	1875	974	556	556
q20	317	322	195	195
q21	2850	2295	2005	2005
q22	366	340	310	310
Total cold run time: 103622 ms
Total hot run time: 32303 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5621	5527	5489	5489
q2	251	326	238	238
q3	2247	2709	2351	2351
q4	1376	1855	1373	1373
q5	4350	4792	4871	4792
q6	182	164	128	128
q7	2078	1972	1847	1847
q8	2620	2874	2757	2757
q9	7344	7220	7301	7220
q10	3036	3279	2807	2807
q11	574	505	488	488
q12	632	723	607	607
q13	3795	3914	3382	3382
q14	288	294	274	274
q15	519	491	454	454
q16	657	704	648	648
q17	1257	1750	1255	1255
q18	7770	7681	7387	7387
q19	831	1214	1076	1076
q20	2000	2041	1893	1893
q21	5802	5259	5102	5102
q22	633	597	568	568
Total cold run time: 53863 ms
Total hot run time: 52136 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 190787 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 23e089b95f2a690fe4e2f913b1ec7550fceabdd3, data reload: false

query1	1313	946	942	942
query2	6103	2107	2037	2037
query3	10970	4393	4500	4393
query4	60657	29230	23228	23228
query5	5571	599	463	463
query6	440	203	175	175
query7	5543	507	293	293
query8	330	240	229	229
query9	8401	2659	2634	2634
query10	462	303	252	252
query11	17194	14930	15369	14930
query12	157	112	109	109
query13	1413	560	440	440
query14	10384	7112	6468	6468
query15	216	214	199	199
query16	7329	650	481	481
query17	1131	745	618	618
query18	1899	419	322	322
query19	228	196	164	164
query20	121	118	117	117
query21	215	125	106	106
query22	4377	4666	4599	4599
query23	34466	33134	33559	33134
query24	5719	2304	2365	2304
query25	465	462	389	389
query26	644	278	154	154
query27	1661	457	334	334
query28	4026	2490	2433	2433
query29	527	574	429	429
query30	210	194	154	154
query31	942	909	810	810
query32	75	54	57	54
query33	457	364	307	307
query34	747	873	511	511
query35	806	831	736	736
query36	1021	1010	948	948
query37	120	109	73	73
query38	4384	4349	4310	4310
query39	1487	1443	1447	1443
query40	205	114	102	102
query41	52	50	56	50
query42	125	109	102	102
query43	526	543	507	507
query44	1374	850	818	818
query45	185	179	166	166
query46	870	1071	658	658
query47	1908	1875	1846	1846
query48	389	425	326	326
query49	743	483	381	381
query50	649	654	392	392
query51	4334	4259	4251	4251
query52	114	99	94	94
query53	225	252	187	187
query54	502	506	432	432
query55	87	83	79	79
query56	246	271	245	245
query57	1200	1201	1141	1141
query58	240	227	237	227
query59	3147	3203	3131	3131
query60	302	268	259	259
query61	113	125	115	115
query62	748	741	661	661
query63	221	183	184	183
query64	1280	1000	645	645
query65	3228	3163	3188	3163
query66	722	395	291	291
query67	15918	15522	15416	15416
query68	3911	813	572	572
query69	480	309	258	258
query70	1162	1154	1149	1149
query71	411	288	255	255
query72	5928	4035	3821	3821
query73	658	767	370	370
query74	9969	8949	8745	8745
query75	3243	3133	2642	2642
query76	3065	1178	767	767
query77	485	362	275	275
query78	9974	10069	9362	9362
query79	2676	797	592	592
query80	1623	533	441	441
query81	555	275	237	237
query82	352	147	116	116
query83	267	166	147	147
query84	291	94	79	79
query85	768	341	303	303
query86	414	312	307	307
query87	4506	4496	4456	4456
query88	3689	2170	2209	2170
query89	393	324	284	284
query90	1583	185	189	185
query91	131	141	104	104
query92	59	58	55	55
query93	2363	897	543	543
query94	695	398	307	307
query95	328	264	310	264
query96	488	611	280	280
query97	2864	2881	2700	2700
query98	224	212	198	198
query99	1276	1401	1213	1213
Total cold run time: 306695 ms
Total hot run time: 190787 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.96 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 23e089b95f2a690fe4e2f913b1ec7550fceabdd3, data reload: false

query1	0.04	0.04	0.03
query2	0.07	0.03	0.04
query3	0.24	0.06	0.07
query4	1.61	0.11	0.10
query5	0.43	0.41	0.40
query6	1.14	0.65	0.65
query7	0.02	0.02	0.02
query8	0.04	0.03	0.03
query9	0.58	0.50	0.51
query10	0.57	0.55	0.56
query11	0.14	0.10	0.12
query12	0.14	0.11	0.11
query13	0.62	0.61	0.60
query14	2.84	2.86	2.90
query15	0.90	0.82	0.83
query16	0.37	0.39	0.38
query17	1.04	1.06	1.09
query18	0.22	0.22	0.22
query19	1.86	1.89	1.99
query20	0.02	0.01	0.01
query21	15.35	0.88	0.59
query22	0.76	0.86	0.65
query23	15.23	1.47	0.62
query24	3.01	0.99	1.16
query25	0.14	0.14	0.10
query26	0.37	0.17	0.14
query27	0.05	0.06	0.05
query28	13.36	1.02	0.42
query29	12.65	3.97	3.32
query30	0.25	0.10	0.06
query31	2.82	0.60	0.38
query32	3.22	0.56	0.46
query33	3.00	3.02	3.04
query34	16.52	5.12	4.50
query35	4.51	4.45	4.50
query36	0.64	0.52	0.49
query37	0.10	0.06	0.06
query38	0.05	0.04	0.04
query39	0.03	0.02	0.03
query40	0.17	0.12	0.12
query41	0.07	0.03	0.02
query42	0.03	0.02	0.02
query43	0.04	0.02	0.02
Total cold run time: 105.26 s
Total hot run time: 30.96 s

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 42.07% (10997/26138)
Line Coverage: 32.36% (92890/287059)
Region Coverage: 31.51% (47633/151146)
Branch Coverage: 27.54% (24107/87544)
Coverage Report: http://coverage.selectdb-in.cc/coverage/23e089b95f2a690fe4e2f913b1ec7550fceabdd3_23e089b95f2a690fe4e2f913b1ec7550fceabdd3/report/index.html

namespace doris::vectorized {

class FunctionCompress : public IFunction {
std::array<char, 16> hex_itoc = {'0', '1', '2', '3', '4', '5', '6', '7',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HEX_ITOC and constexpr and static

@lzyy2024
Copy link
Author

lzyy2024 commented Feb 2, 2025

run buildall

@lzyy2024 lzyy2024 requested a review from HappenLee February 2, 2025 06:22
@doris-robot
Copy link

TPC-H: Total hot run time: 32141 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit b702390db852ea9772db6d961cd374efc0e1148d, data reload: false

------ Round 1 ----------------------------------
q1	17579	5518	5389	5389
q2	2047	322	182	182
q3	10386	1225	746	746
q4	10204	975	542	542
q5	7539	2347	2134	2134
q6	190	168	137	137
q7	891	752	598	598
q8	9236	1355	1165	1165
q9	5147	4841	4905	4841
q10	6875	2388	1923	1923
q11	481	282	254	254
q12	350	368	229	229
q13	17840	3739	3095	3095
q14	224	220	214	214
q15	521	474	478	474
q16	638	616	598	598
q17	555	854	314	314
q18	6846	6285	6405	6285
q19	1729	939	531	531
q20	317	312	190	190
q21	2781	2226	1992	1992
q22	366	332	308	308
Total cold run time: 102742 ms
Total hot run time: 32141 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5643	5487	5516	5487
q2	233	327	227	227
q3	2283	2639	2308	2308
q4	1466	1833	1381	1381
q5	4278	4727	4627	4627
q6	166	156	126	126
q7	2025	2023	1823	1823
q8	2608	2803	2687	2687
q9	7280	7115	7205	7115
q10	3053	3278	2803	2803
q11	581	532	508	508
q12	654	715	566	566
q13	3458	4004	3326	3326
q14	277	297	270	270
q15	520	478	473	473
q16	681	673	630	630
q17	1216	1757	1258	1258
q18	7663	7491	7348	7348
q19	804	1175	1043	1043
q20	2035	2046	1952	1952
q21	5888	5287	4919	4919
q22	650	664	573	573
Total cold run time: 53462 ms
Total hot run time: 51450 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 190795 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit b702390db852ea9772db6d961cd374efc0e1148d, data reload: false

query1	1297	975	940	940
query2	6141	2024	2029	2024
query3	11107	4731	4655	4655
query4	32350	23169	22881	22881
query5	3580	611	437	437
query6	285	198	183	183
query7	3980	486	306	306
query8	299	244	246	244
query9	9497	2619	2603	2603
query10	459	305	255	255
query11	17585	15238	14891	14891
query12	155	109	101	101
query13	1575	523	420	420
query14	8863	6441	7457	6441
query15	242	189	189	189
query16	8175	672	516	516
query17	1670	775	595	595
query18	2125	406	313	313
query19	211	191	166	166
query20	121	122	114	114
query21	206	124	109	109
query22	4590	4538	4527	4527
query23	34342	33421	33389	33389
query24	6708	2245	2343	2245
query25	510	458	401	401
query26	982	279	150	150
query27	2362	475	320	320
query28	5385	2511	2416	2416
query29	730	566	442	442
query30	213	188	159	159
query31	934	867	837	837
query32	92	61	58	58
query33	486	357	326	326
query34	755	883	518	518
query35	811	817	756	756
query36	990	1068	955	955
query37	124	106	80	80
query38	4280	4331	4225	4225
query39	1490	1437	1431	1431
query40	203	110	102	102
query41	49	54	50	50
query42	124	103	102	102
query43	519	538	507	507
query44	1353	812	805	805
query45	198	175	170	170
query46	854	1030	639	639
query47	1902	1944	1845	1845
query48	386	411	336	336
query49	743	486	396	396
query50	634	659	393	393
query51	4299	4261	4267	4261
query52	101	103	95	95
query53	223	257	183	183
query54	505	486	404	404
query55	82	79	81	79
query56	263	279	253	253
query57	1243	1204	1157	1157
query58	250	234	238	234
query59	3105	3196	3059	3059
query60	285	266	250	250
query61	114	119	115	115
query62	789	742	705	705
query63	233	196	199	196
query64	4253	1028	642	642
query65	3299	3277	3257	3257
query66	973	395	302	302
query67	16048	15696	15286	15286
query68	4949	823	519	519
query69	470	294	267	267
query70	1188	1154	1128	1128
query71	388	282	278	278
query72	5837	3964	3812	3812
query73	651	753	356	356
query74	10122	8763	9005	8763
query75	3158	3133	2658	2658
query76	3106	1179	760	760
query77	464	370	281	281
query78	9951	9994	9371	9371
query79	3119	797	600	600
query80	689	526	445	445
query81	501	277	245	245
query82	442	157	119	119
query83	167	174	147	147
query84	236	94	79	79
query85	784	348	307	307
query86	390	336	301	301
query87	4422	4615	4491	4491
query88	4783	2172	2157	2157
query89	400	326	293	293
query90	1843	247	189	189
query91	132	138	107	107
query92	73	57	51	51
query93	2388	869	540	540
query94	656	379	286	286
query95	343	264	259	259
query96	492	604	297	297
query97	2870	2870	2752	2752
query98	233	202	204	202
query99	1280	1377	1294	1294
Total cold run time: 285264 ms
Total hot run time: 190795 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.65 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit b702390db852ea9772db6d961cd374efc0e1148d, data reload: false

query1	0.03	0.03	0.05
query2	0.07	0.04	0.03
query3	0.24	0.07	0.06
query4	1.61	0.10	0.10
query5	0.42	0.42	0.41
query6	1.15	0.65	0.66
query7	0.02	0.02	0.01
query8	0.04	0.03	0.03
query9	0.59	0.51	0.51
query10	0.56	0.57	0.55
query11	0.14	0.10	0.10
query12	0.14	0.10	0.12
query13	0.60	0.59	0.59
query14	2.86	2.88	2.76
query15	0.90	0.84	0.85
query16	0.40	0.38	0.38
query17	1.01	1.00	1.06
query18	0.24	0.20	0.21
query19	1.88	1.78	1.98
query20	0.01	0.01	0.01
query21	15.36	0.95	0.57
query22	0.74	1.07	0.78
query23	14.93	1.37	0.52
query24	2.60	1.40	0.77
query25	0.28	0.10	0.14
query26	0.21	0.14	0.14
query27	0.08	0.06	0.04
query28	14.02	1.06	0.42
query29	12.64	3.99	3.26
query30	0.25	0.10	0.07
query31	2.84	0.59	0.38
query32	3.23	0.55	0.46
query33	3.08	3.08	3.06
query34	16.61	5.16	4.57
query35	4.58	4.58	4.58
query36	0.67	0.48	0.50
query37	0.10	0.06	0.06
query38	0.05	0.04	0.03
query39	0.03	0.02	0.02
query40	0.17	0.13	0.12
query41	0.08	0.03	0.03
query42	0.03	0.02	0.03
query43	0.03	0.04	0.03
Total cold run time: 105.52 s
Total hot run time: 30.65 s

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 42.08% (10998/26138)
Line Coverage: 32.37% (92919/287059)
Region Coverage: 31.52% (47635/151146)
Branch Coverage: 27.55% (24114/87544)
Coverage Report: http://coverage.selectdb-in.cc/coverage/b702390db852ea9772db6d961cd374efc0e1148d_b702390db852ea9772db6d961cd374efc0e1148d/report/index.html

namespace doris::vectorized {

class FunctionCompress : public IFunction {
static constexpr std::array<char, 16> hex_itoc = {'0', '1', '2', '3', '4', '5', '6', '7',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please keep constexpr UPPER CASE

@lzyy2024 lzyy2024 requested a review from HappenLee February 3, 2025 05:21
@lzyy2024
Copy link
Author

lzyy2024 commented Feb 3, 2025

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 32399 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 376422f094b5ed32dcc058cd1f75940d1dd30081, data reload: false

------ Round 1 ----------------------------------
q1	17649	5525	5426	5426
q2	2065	317	191	191
q3	10463	1219	744	744
q4	10221	969	561	561
q5	7602	2429	2164	2164
q6	192	170	144	144
q7	922	779	608	608
q8	9240	1374	1175	1175
q9	5305	4922	4915	4915
q10	6845	2355	1894	1894
q11	483	271	259	259
q12	342	359	226	226
q13	17777	3654	3063	3063
q14	229	236	214	214
q15	517	471	472	471
q16	631	627	586	586
q17	568	877	328	328
q18	7133	6523	6421	6421
q19	1949	960	546	546
q20	307	319	194	194
q21	2793	2151	1956	1956
q22	368	333	313	313
Total cold run time: 103601 ms
Total hot run time: 32399 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5548	5460	5501	5460
q2	243	338	230	230
q3	2279	2658	2302	2302
q4	1437	1826	1400	1400
q5	4312	4725	4637	4637
q6	166	161	131	131
q7	2014	1959	1866	1866
q8	2627	2835	2707	2707
q9	7364	7270	7250	7250
q10	3050	3255	2786	2786
q11	596	499	496	496
q12	637	722	559	559
q13	3476	3944	3257	3257
q14	297	294	298	294
q15	521	468	464	464
q16	668	695	646	646
q17	1259	1755	1263	1263
q18	7607	7633	7340	7340
q19	800	1160	1089	1089
q20	2016	2053	1891	1891
q21	5848	5202	4891	4891
q22	634	614	590	590
Total cold run time: 53399 ms
Total hot run time: 51549 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 191844 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 376422f094b5ed32dcc058cd1f75940d1dd30081, data reload: false

query1	1304	934	923	923
query2	6204	2079	2067	2067
query3	10972	4380	4367	4367
query4	61069	29284	23235	23235
query5	5532	589	437	437
query6	414	214	193	193
query7	5469	515	297	297
query8	332	247	224	224
query9	7705	2664	2656	2656
query10	459	301	253	253
query11	17717	15342	15375	15342
query12	161	107	105	105
query13	1387	550	396	396
query14	11778	6928	6877	6877
query15	210	188	186	186
query16	6789	640	462	462
query17	1114	742	583	583
query18	1728	421	300	300
query19	201	182	164	164
query20	126	115	115	115
query21	212	126	108	108
query22	4646	4783	4441	4441
query23	34279	33373	33404	33373
query24	5515	2277	2330	2277
query25	462	481	403	403
query26	640	274	153	153
query27	1556	471	327	327
query28	3952	2523	2484	2484
query29	575	578	447	447
query30	218	193	156	156
query31	899	879	824	824
query32	70	61	59	59
query33	441	399	309	309
query34	727	864	517	517
query35	865	847	755	755
query36	1017	1039	980	980
query37	124	106	85	85
query38	4313	4336	4217	4217
query39	1551	1462	1437	1437
query40	205	117	108	108
query41	56	54	53	53
query42	121	109	113	109
query43	535	536	503	503
query44	1319	833	857	833
query45	189	175	171	171
query46	895	1051	678	678
query47	1898	1908	1870	1870
query48	387	410	334	334
query49	728	481	405	405
query50	650	673	400	400
query51	4225	4258	4290	4258
query52	138	101	90	90
query53	239	257	196	196
query54	507	491	424	424
query55	83	80	77	77
query56	278	269	256	256
query57	1187	1217	1166	1166
query58	237	233	239	233
query59	3138	3198	2996	2996
query60	277	257	268	257
query61	116	119	115	115
query62	746	705	656	656
query63	217	189	182	182
query64	1248	1014	678	678
query65	3352	3143	3166	3143
query66	746	388	295	295
query67	16208	15807	15507	15507
query68	5020	822	525	525
query69	486	293	264	264
query70	1224	1118	1143	1118
query71	412	277	252	252
query72	6422	3912	3873	3873
query73	782	744	361	361
query74	9833	9302	8668	8668
query75	3320	3125	2695	2695
query76	3803	1176	770	770
query77	480	359	361	359
query78	10156	10155	9334	9334
query79	2892	795	603	603
query80	1700	525	445	445
query81	547	275	237	237
query82	355	149	132	132
query83	267	166	146	146
query84	298	89	71	71
query85	765	346	347	346
query86	423	333	304	304
query87	4401	4750	4415	4415
query88	3650	2163	2135	2135
query89	394	321	287	287
query90	1649	192	188	188
query91	135	139	114	114
query92	67	57	56	56
query93	2134	853	530	530
query94	756	405	300	300
query95	317	262	248	248
query96	486	616	287	287
query97	2854	2863	2797	2797
query98	227	196	200	196
query99	1290	1377	1261	1261
Total cold run time: 310203 ms
Total hot run time: 191844 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.38 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 376422f094b5ed32dcc058cd1f75940d1dd30081, data reload: false

query1	0.04	0.04	0.03
query2	0.07	0.03	0.03
query3	0.24	0.07	0.07
query4	1.63	0.10	0.10
query5	0.42	0.41	0.40
query6	1.16	0.65	0.65
query7	0.02	0.02	0.02
query8	0.04	0.03	0.03
query9	0.58	0.49	0.50
query10	0.56	0.56	0.56
query11	0.15	0.10	0.10
query12	0.14	0.11	0.10
query13	0.61	0.59	0.59
query14	2.88	2.74	2.77
query15	0.88	0.85	0.83
query16	0.38	0.38	0.38
query17	1.01	1.04	1.05
query18	0.23	0.20	0.20
query19	1.83	1.75	2.03
query20	0.02	0.01	0.02
query21	15.38	0.92	0.57
query22	0.74	0.82	0.71
query23	15.15	1.42	0.62
query24	2.93	1.82	1.73
query25	0.13	0.10	0.09
query26	0.29	0.16	0.15
query27	0.07	0.06	0.04
query28	14.47	0.99	0.43
query29	12.58	3.92	3.24
query30	0.24	0.08	0.06
query31	2.84	0.58	0.39
query32	3.24	0.55	0.46
query33	2.99	2.97	3.01
query34	16.50	5.16	4.50
query35	4.55	4.53	4.57
query36	0.68	0.48	0.47
query37	0.09	0.06	0.05
query38	0.05	0.04	0.03
query39	0.04	0.02	0.03
query40	0.16	0.13	0.13
query41	0.08	0.02	0.02
query42	0.04	0.03	0.02
query43	0.03	0.03	0.03
Total cold run time: 106.16 s
Total hot run time: 31.38 s

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 42.08% (11000/26139)
Line Coverage: 32.37% (92931/287083)
Region Coverage: 31.52% (47645/151150)
Branch Coverage: 27.55% (24120/87544)
Coverage Report: http://coverage.selectdb-in.cc/coverage/376422f094b5ed32dcc058cd1f75940d1dd30081_376422f094b5ed32dcc058cd1f75940d1dd30081/report/index.html

@lzyy2024 lzyy2024 force-pushed the CompressFunctions branch 2 times, most recently from 124faf8 to 42df82b Compare February 4, 2025 04:37
@lzyy2024
Copy link
Author

lzyy2024 commented Feb 4, 2025

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 42.06% (10995/26139)
Line Coverage: 32.33% (92807/287076)
Region Coverage: 31.49% (47594/151142)
Branch Coverage: 27.52% (24088/87536)
Coverage Report: http://coverage.selectdb-in.cc/coverage/42df82b0be1d0ed027e05062f36c438e3bf32308_42df82b0be1d0ed027e05062f36c438e3bf32308/report/index.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Enhancement](good-first-issue) Support some compress functions
5 participants