feat: ChunkedArray uses a builder to implement to_canonical #2511

danking · 2025-02-25T17:28:14Z

No description provided.

codspeed-hq · 2025-02-25T17:35:34Z

CodSpeed Performance Report

Merging #2511 will degrade performances by 19.58%

_{Comparing dk/chunked-array-into-canonical-canonical-into-2 (41d62cc) with develop (8ba5f27)}

Summary

⚡ 42 improvements
❌ 4 regressions
✅ 729 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

	Benchmark	`BASE`	`HEAD`	Change
⚡	`chunked_dict_fsst_into_canonical[(1000, 1000, 10)]`	1.3 ms	1.2 ms	+10.05%
⚡	`chunked_dict_primitive_into_canonical[f32, (1000, 10, 10)]`	186.4 µs	86.7 µs	×2.1
⚡	`chunked_dict_primitive_into_canonical[f32, (1000, 10, 100)]`	1,593.1 µs	717.2 µs	×2.2
⚡	`chunked_dict_primitive_into_canonical[f32, (1000, 100, 10)]`	186.1 µs	88.6 µs	×2.1
⚡	`chunked_dict_primitive_into_canonical[f32, (1000, 100, 100)]`	1,628.5 µs	733.5 µs	×2.2
⚡	`chunked_dict_primitive_into_canonical[f32, (1000, 1000, 10)]`	205.6 µs	105.1 µs	+95.53%
⚡	`chunked_dict_primitive_into_canonical[f32, (1000, 1000, 100)]`	1,779.2 µs	896.7 µs	+98.42%
⚡	`chunked_dict_primitive_into_canonical[f64, (1000, 10, 10)]`	212.2 µs	105.2 µs	×2
⚡	`chunked_dict_primitive_into_canonical[f64, (1000, 10, 100)]`	1,849.1 µs	901.3 µs	×2.1
⚡	`chunked_dict_primitive_into_canonical[f64, (1000, 100, 10)]`	215.9 µs	108.5 µs	+98.99%
⚡	`chunked_dict_primitive_into_canonical[f64, (1000, 100, 100)]`	1,903.9 µs	933.6 µs	×2
⚡	`chunked_dict_primitive_into_canonical[f64, (1000, 1000, 10)]`	250.2 µs	141.2 µs	+77.13%
⚡	`chunked_dict_primitive_into_canonical[f64, (1000, 1000, 100)]`	2.2 ms	1.3 ms	+75.78%
⚡	`chunked_dict_primitive_into_canonical[u32, (1000, 10, 10)]`	184.1 µs	86.6 µs	×2.1
⚡	`chunked_dict_primitive_into_canonical[u32, (1000, 10, 100)]`	1,611.1 µs	717.4 µs	×2.2
⚡	`chunked_dict_primitive_into_canonical[u32, (1000, 100, 10)]`	188.1 µs	88.3 µs	×2.1
⚡	`chunked_dict_primitive_into_canonical[u32, (1000, 100, 100)]`	1,627.7 µs	733.3 µs	×2.2
⚡	`chunked_dict_primitive_into_canonical[u32, (1000, 1000, 10)]`	206.2 µs	104.8 µs	+96.79%
⚡	`chunked_dict_primitive_into_canonical[u32, (1000, 1000, 100)]`	1,778.5 µs	896.5 µs	+98.39%
⚡	`chunked_dict_primitive_into_canonical[u64, (1000, 10, 10)]`	217.1 µs	105 µs	×2.1
...	...	...	...	...

ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.

danking · 2025-02-25T17:58:46Z

I think we should merge this. One of the regressions seems flaky. I've seen it before on unrelated code. The other one is probably real. I'm not entirely sure the cause, but the profile is dominated by the cost to filter patches when the Mask is 1-ε full. Our implementation for that is not so great.

gatesn · 2025-02-25T18:02:15Z

The big difference here is that into_builder is a recursive canonicalize, whereas to_canonical only canonicalizes a single level.

So before, a Chunked gets cheaply turned into a Struct. Now it does a full recursive decompression.

I think it might be benefital to retain the distinction between the two for now? Not sure though...

danking · 2025-02-25T19:39:34Z

Hmm. The benchmark does not use chunked arrays (in develop or in this PR). I am a bit confused how the changes here could possibly have affected it.

danking · 2025-02-25T20:06:22Z

I think the microbenchmark is a red herring. The profiles look very similar, albeit reliably slower on this PR. ChunkedArray does not appear anywhere in the profile. The only difference I've found is that the left parts dictionary is reordered. I don't know why this would cause a slow down though.

robert3005 · 2025-02-25T20:24:28Z

~~the benchmarks that improved were using gen_dict_primitive_chunks which collects to ChunkedArray. I think what @gatesn says seems correct. Though there's no Structs in the benchmarks :/~~ Ignore my comment I focused on benchmarks and not on the semantics

danking · 2025-02-25T20:56:19Z

blocked by #2514

github-actions · 2025-02-26T14:41:08Z

Benchmarks: random_access

Table of Results

name	PR `2e01f59`	base `76bc8eb`	ratio (PR/base)	unit
random-access/vortex-tokio-local-disk	2254363	2.82066e+06	0.799231	ns
random-access/parquet-tokio-local-disk	233198812	2.43644e+08	0.957129	ns

github-actions · 2025-02-26T14:44:30Z

Benchmarks: TPC-H on S3

Table of Results

name	PR `bfd7ebc`	base `b219016`	ratio (PR/base)	unit
tpch_q01/parquet	272495586	251795772	1.08221	ns
tpch_q02/parquet	673640913	665037561	1.01294	ns
tpch_q03/parquet	439900464	413809822	1.06305	ns
tpch_q04/parquet	228764737	228971274	0.999098	ns
tpch_q05/parquet	587232875	576962593	1.0178	ns
tpch_q06/parquet	181682130	183908066	0.987896	ns
tpch_q07/parquet	639136770	618863454	1.03276	ns
tpch_q08/parquet	807393566	771955250	1.04591	ns
tpch_q09/parquet	699416349	670133506	1.0437	ns
tpch_q10/parquet	551074152	511994449	1.07633	ns
tpch_q11/parquet	312303426	262665731	1.18898	ns
tpch_q12/parquet	280729393	271553345	1.03379	ns
tpch_q13/parquet	404933153	388898302	1.04123	ns
tpch_q14/parquet	250933143	251891397	0.996196	ns
tpch_q15/parquet	472018712	474607290	0.994546	ns
tpch_q16/parquet	283457412	259052906	1.09421	ns
tpch_q17/parquet	395709968	393557295	1.00547	ns
tpch_q18/parquet	572017809	532286841	1.07464	ns
tpch_q19/parquet	303663532	274725653	1.10533	ns
tpch_q20/parquet	522712285	491389649	1.06374	ns
tpch_q21/parquet	634719147	617048250	1.02864	ns
tpch_q22/parquet	269424674	269338439	1.00032	ns
tpch_q01/vortex-file-compressed	148724032	146931023	1.0122	ns
tpch_q02/vortex-file-compressed	393747267	410850458	0.958371	ns
tpch_q03/vortex-file-compressed	299819895	295770574	1.01369	ns
tpch_q04/vortex-file-compressed	206011130	202789533	1.01589	ns
tpch_q05/vortex-file-compressed	326968651	326737852	1.00071	ns
tpch_q06/vortex-file-compressed	117060001	114910974	1.0187	ns
tpch_q07/vortex-file-compressed	390520071	375021593	1.04133	ns
tpch_q08/vortex-file-compressed	472730023	452631757	1.0444	ns
tpch_q09/vortex-file-compressed	401483065	385339055	1.0419	ns
tpch_q10/vortex-file-compressed	404405772	383550542	1.05437	ns
tpch_q11/vortex-file-compressed	167382127	155928761	1.07345	ns
tpch_q12/vortex-file-compressed	221176591	215370645	1.02696	ns
tpch_q13/vortex-file-compressed	181937634	180040657	1.01054	ns
tpch_q14/vortex-file-compressed	145561001	136321572	1.06778	ns
tpch_q15/vortex-file-compressed	313609523	300765561	1.0427	ns
tpch_q16/vortex-file-compressed	188262306	176935723	1.06402	ns
tpch_q17/vortex-file-compressed	215218971	209906636	1.02531	ns
tpch_q18/vortex-file-compressed	288329415	268366585	1.07439	ns
tpch_q19/vortex-file-compressed	207082590	193328990	1.07114	ns
tpch_q20/vortex-file-compressed	345435286	323877576	1.06656	ns
tpch_q21/vortex-file-compressed	489633761	469592629	1.04268	ns
tpch_q22/vortex-file-compressed	156341661	141245042	1.10688	ns

github-actions · 2025-02-26T14:46:29Z

Benchmarks: TPC-H on NVME

Table of Results

name	PR `bfd7ebc`	base `b219016`	ratio (PR/base)	unit
tpch_q01/arrow	43998318	41445469	1.0616	ns
tpch_q02/arrow	49835900	47749448	1.0437	ns
tpch_q03/arrow	31408098	28979094	1.08382	ns
tpch_q04/arrow	23901107	21051576	1.13536	ns
tpch_q05/arrow	50031159	46510524	1.0757	ns
tpch_q06/arrow	10205290	8354462	1.22154	ns
tpch_q07/arrow	78411279	72673027	1.07896	ns
tpch_q08/arrow	57963243	54915112	1.05551	ns
tpch_q09/arrow	72017515	66517283	1.08269	ns
tpch_q10/arrow	49147547	41755850	1.17702	ns
tpch_q11/arrow	25556917	23128159	1.10501	ns
tpch_q12/arrow	29983035	24619012	1.21788	ns
tpch_q13/arrow	17542449	14605459	1.20109	ns
tpch_q14/arrow	15795730	13420881	1.17695	ns
tpch_q15/arrow	30870879	24611050	1.25435	ns
tpch_q16/arrow	23216415	20169133	1.15109	ns
tpch_q17/arrow	65343870	58219368	1.12237	ns
tpch_q18/arrow	100238596	94017968	1.06616	ns
tpch_q19/arrow	29312270	24539984	1.19447	ns
tpch_q20/arrow	36220791	32559218	1.11246	ns
tpch_q21/arrow	117931172	108190737	1.09003	ns
tpch_q22/arrow	16149777	13463171	1.19955	ns
tpch_q01/parquet	113274262	111236442	1.01832	ns
tpch_q02/parquet	118372551	107163302	1.1046	ns
tpch_q03/parquet	105442328	101481361	1.03903	ns
tpch_q04/parquet	58992908	56292531	1.04797	ns
tpch_q05/parquet	118662122	114925283	1.03252	ns
tpch_q06/parquet	26698538	24712052	1.08039	ns
tpch_q07/parquet	138874046	129245017	1.0745	ns
tpch_q08/parquet	161805639	156743967	1.03229	ns
tpch_q09/parquet	213716805	202616504	1.05478	ns
tpch_q10/parquet	135349209	124801455	1.08452	ns
tpch_q11/parquet	56090799	50037172	1.12098	ns
tpch_q12/parquet	94630186	89472473	1.05765	ns
tpch_q13/parquet	151623856	145033936	1.04544	ns
tpch_q14/parquet	45228659	43500773	1.03972	ns
tpch_q15/parquet	74591417	62233733	1.19857	ns
tpch_q16/parquet	52039600	47877018	1.08694	ns
tpch_q17/parquet	139159507	125865059	1.10562	ns
tpch_q18/parquet	200965460	181216284	1.10898	ns
tpch_q19/parquet	74841741	72509083	1.03217	ns
tpch_q20/parquet	99804839	96697644	1.03213	ns
tpch_q21/parquet	189198791	173122983	1.09286	ns
tpch_q22/parquet	50816884	47164365	1.07744	ns
tpch_q01/vortex-file-compressed	36467117	34374898	1.06086	ns
tpch_q02/vortex-file-compressed	61426686	55267941	1.11143	ns
tpch_q03/vortex-file-compressed	34010416	29441298	1.15519	ns
tpch_q04/vortex-file-compressed	20937882	18505629	1.13143	ns
tpch_q05/vortex-file-compressed	49959730	44454222	1.12385	ns
tpch_q06/vortex-file-compressed	10217916	9356980	1.09201	ns
tpch_q07/vortex-file-compressed	74065604	66531394	1.11324	ns
tpch_q08/vortex-file-compressed	60506136	51724332	1.16978	ns
tpch_q09/vortex-file-compressed	71585000	64091059	1.11693	ns
tpch_q10/vortex-file-compressed	56439635	52369767	1.07771	ns
tpch_q11/vortex-file-compressed	25084898	23331648	1.07514	ns
tpch_q12/vortex-file-compressed	27155804	24499472	1.10842	ns
tpch_q13/vortex-file-compressed	24708265	24117325	1.0245	ns
tpch_q14/vortex-file-compressed	17229523	15151764	1.13713	ns
tpch_q15/vortex-file-compressed	31694796	27334273	1.15953	ns
tpch_q16/vortex-file-compressed	31140002	27117911	1.14832	ns
tpch_q17/vortex-file-compressed	54318520	49345209	1.10079	ns
tpch_q18/vortex-file-compressed	94245731	81621384	1.15467	ns
tpch_q19/vortex-file-compressed	37302764	30286117	1.23168	ns
tpch_q20/vortex-file-compressed	40340189	35217884	1.14545	ns
tpch_q21/vortex-file-compressed	95032648	85821445	1.10733	ns
tpch_q22/vortex-file-compressed	30573640	27261661	1.12149	ns

github-actions · 2025-02-26T15:13:11Z

Benchmarks: compress

Table of Results

name	PR `8b420cf`	base `dffeeb8`	ratio (PR/base)	unit
compress time/taxi throughput	0.223259	0.213525	1.04559	bytes/ns
parquet_rs-zstd compress time/taxi throughput	0.281972	0.275991	1.02167	bytes/ns
decompress time/taxi throughput	1.90431	1.92364	0.989951	bytes/ns
parquet_rs-zstd decompress time/taxi throughput	1.66478	1.64399	1.01265	bytes/ns
compress time/AirlineSentiment throughput	0.00277229	0.0027646	1.00278	bytes/ns
parquet_rs-zstd compress time/AirlineSentiment throughput	0.0543871	0.0513611	1.05892	bytes/ns
decompress time/AirlineSentiment throughput	0.0274608	0.025891	1.06063	bytes/ns
parquet_rs-zstd decompress time/AirlineSentiment throughput	0.0906794	0.102908	0.881167	bytes/ns
compress time/Arade throughput	0.15937	0.157153	1.01411	bytes/ns
parquet_rs-zstd compress time/Arade throughput	0.398245	0.393812	1.01126	bytes/ns
decompress time/Arade throughput	1.87581	1.85093	1.01344	bytes/ns
parquet_rs-zstd decompress time/Arade throughput	1.92723	1.88557	1.0221	bytes/ns
compress time/Bimbo throughput	0.383936	0.365445	1.0506	bytes/ns
parquet_rs-zstd compress time/Bimbo throughput	0.335391	0.326937	1.02586	bytes/ns
decompress time/Bimbo throughput	2.20343	2.20185	1.00072	bytes/ns
parquet_rs-zstd decompress time/Bimbo throughput	2.80494	2.76901	1.01298	bytes/ns
compress time/CMSprovider throughput	0.0506614	0.0535887	0.945374	bytes/ns
parquet_rs-zstd compress time/CMSprovider throughput	0.351077	0.353393	0.993448	bytes/ns
decompress time/CMSprovider throughput	4.00317	4.07612	0.982103	bytes/ns
parquet_rs-zstd decompress time/CMSprovider throughput	1.79826	1.78508	1.00738	bytes/ns
compress time/Euro2016 throughput	0.155374	0.154625	1.00484	bytes/ns
parquet_rs-zstd compress time/Euro2016 throughput	0.302725	0.299284	1.0115	bytes/ns
decompress time/Euro2016 throughput	2.43934	2.54213	0.959567	bytes/ns
parquet_rs-zstd decompress time/Euro2016 throughput	0.999457	0.995602	1.00387	bytes/ns
compress time/Food throughput	0.190328	0.184741	1.03024	bytes/ns
parquet_rs-zstd compress time/Food throughput	0.316722	0.313067	1.01167	bytes/ns
decompress time/Food throughput	5.2472	5.18618	1.01176	bytes/ns
parquet_rs-zstd decompress time/Food throughput	1.6208	1.60187	1.01182	bytes/ns
compress time/HashTags throughput	0.19306	0.191881	1.00614	bytes/ns
parquet_rs-zstd compress time/HashTags throughput	0.809605	0.788074	1.02732	bytes/ns
decompress time/HashTags throughput	5.86061	5.79372	1.01155	bytes/ns
parquet_rs-zstd decompress time/HashTags throughput	2.79485	2.59373	1.07754	bytes/ns
compress time/TPC-H l_comment chunked throughput	0.217801	0.208791	1.04315	bytes/ns
parquet_rs-zstd compress time/TPC-H l_comment chunked throughput	0.285559	0.277746	1.02813	bytes/ns
decompress time/TPC-H l_comment chunked throughput	3.09212	3.03214	1.01978	bytes/ns
parquet_rs-zstd decompress time/TPC-H l_comment chunked throughput	1.05759	1.0636	0.994351	bytes/ns
compress time/TPC-H l_comment canonical throughput	0.0271484	0.028362	0.95721	bytes/ns
parquet_rs-zstd compress time/TPC-H l_comment canonical throughput	0.286067	0.281179	1.01738	bytes/ns
decompress time/TPC-H l_comment canonical throughput	3.10137	3.15698	0.982387	bytes/ns
parquet_rs-zstd decompress time/TPC-H l_comment canonical throughput	1.07627	1.05025	1.02477	bytes/ns
compress time/wide table cols=10 chunks=1 rows=1000 throughput	0.120606	0.128001	0.94223	bytes/ns
parquet_rs-zstd compress time/wide table cols=10 chunks=1 rows=1000 throughput	0.181194	0.181452	0.998582	bytes/ns
decompress time/wide table cols=10 chunks=1 rows=1000 throughput	0.732692	0.622132	1.17771	bytes/ns
parquet_rs-zstd decompress time/wide table cols=10 chunks=1 rows=1000 throughput	0.475465	0.515869	0.921676	bytes/ns
compress time/wide table cols=100 chunks=1 rows=1000 throughput	0.120016	0.127274	0.942977	bytes/ns
parquet_rs-zstd compress time/wide table cols=100 chunks=1 rows=1000 throughput	0.195428	0.181696	1.07558	bytes/ns
decompress time/wide table cols=100 chunks=1 rows=1000 throughput	1.06175	1.10568	0.96027	bytes/ns
parquet_rs-zstd decompress time/wide table cols=100 chunks=1 rows=1000 throughput	0.498996	0.504599	0.988896	bytes/ns
compress time/wide table cols=1000 chunks=1 rows=1000 throughput	0.109319	0.116863	0.935448	bytes/ns
parquet_rs-zstd compress time/wide table cols=1000 chunks=1 rows=1000 throughput	0.16931	0.168818	1.00292	bytes/ns
decompress time/wide table cols=1000 chunks=1 rows=1000 throughput	0.848971	0.825598	1.02831	bytes/ns
parquet_rs-zstd decompress time/wide table cols=1000 chunks=1 rows=1000 throughput	0.455769	0.448721	1.01571	bytes/ns
compress time/wide table cols=10 chunks=50 rows=1000 throughput	0.0503496	0.0735901	0.684189	bytes/ns
parquet_rs-zstd compress time/wide table cols=10 chunks=50 rows=1000 throughput	0.131767	0.125877	1.04679	bytes/ns
decompress time/wide table cols=10 chunks=50 rows=1000 throughput	0.832106	0.877309	0.948476	bytes/ns
parquet_rs-zstd decompress time/wide table cols=10 chunks=50 rows=1000 throughput	0.529507	0.541026	0.97871	bytes/ns
compress time/wide table cols=100 chunks=50 rows=1000 throughput	0.0495447	0.0677259	0.731548	bytes/ns
parquet_rs-zstd compress time/wide table cols=100 chunks=50 rows=1000 throughput	0.133699	0.115987	1.15271	bytes/ns
decompress time/wide table cols=100 chunks=50 rows=1000 throughput	1.07806	1.1413	0.944587	bytes/ns
parquet_rs-zstd decompress time/wide table cols=100 chunks=50 rows=1000 throughput	0.512486	0.526451	0.973473	bytes/ns
compress time/wide table cols=1000 chunks=50 rows=1000 throughput	0.0415182	0.0605364	0.685839	bytes/ns
parquet_rs-zstd compress time/wide table cols=1000 chunks=50 rows=1000 throughput	0.101328	0.0981726	1.03215	bytes/ns
decompress time/wide table cols=1000 chunks=50 rows=1000 throughput	0.898782	0.926083	0.970519	bytes/ns
parquet_rs-zstd decompress time/wide table cols=1000 chunks=50 rows=1000 throughput	0.45219	0.478746	0.94453	bytes/ns
vortex:raw size/taxi	0.117729	0.117729	1
vortex size/taxi	5.82471e+07	5.82471e+07	1
vortex:parquet-zstd size/taxi	1.04085	1.04085	1
vortex:raw size/AirlineSentiment	1.25903	1.25903	1
vortex size/AirlineSentiment	4112	4112	1
vortex:parquet-zstd size/AirlineSentiment	4.25233	4.25233	1
vortex:raw size/Arade	0.255852	0.255852	1
vortex size/Arade	3.03615e+08	3.03615e+08	1
vortex:parquet-zstd size/Arade	0.994178	0.994178	1
vortex:raw size/Bimbo	0.115538	0.115538	1
vortex size/Bimbo	8.25997e+08	8.25997e+08	1
vortex:parquet-zstd size/Bimbo	2.12803	2.12803	1
vortex:raw size/CMSprovider	0.188928	0.188928	1
vortex size/CMSprovider	1.18655e+09	1.18655e+09	1
vortex:parquet-zstd size/CMSprovider	1.54193	1.54193	1
vortex:raw size/Euro2016	0.471339	0.471345	0.999986
vortex size/Euro2016	2.1447e+08	2.14473e+08	0.999986
vortex:parquet-zstd size/Euro2016	1.80394	1.80396	0.999986
vortex:raw size/Food	0.177321	0.177321	1
vortex size/Food	5.97293e+07	5.97293e+07	1
vortex:parquet-zstd size/Food	1.64861	1.64861	1
vortex:raw size/HashTags	0.142738	0.142738	1
vortex size/HashTags	2.73374e+08	2.73374e+08	1
vortex:parquet-zstd size/HashTags	2.03913	2.03913	1
vortex:raw size/TPC-H l_comment chunked	0.418263	0.417913	1.00084
vortex size/TPC-H l_comment chunked	1.0423e+08	1.04143e+08	1.00084
vortex:parquet-zstd size/TPC-H l_comment chunked	1.83066	1.82926	1.00077
vortex:raw size/TPC-H l_comment canonical	0.425506	0.425404	1.00024
vortex size/TPC-H l_comment canonical	1.06032e+08	1.06007e+08	1.00024
vortex:parquet-zstd size/TPC-H l_comment canonical	1.86245	1.86183	1.00033
vortex:raw size/wide table cols=10 chunks=1 rows=1000	0.622482	0.622482	1
vortex size/wide table cols=10 chunks=1 rows=1000	99688	99688	1
vortex:parquet-zstd size/wide table cols=10 chunks=1 rows=1000	1.06636	1.06636	1
vortex:raw size/wide table cols=100 chunks=1 rows=1000	0.620919	0.620919	1
vortex size/wide table cols=100 chunks=1 rows=1000	994288	994288	1
vortex:parquet-zstd size/wide table cols=100 chunks=1 rows=1000	1.06363	1.06363	1
vortex:raw size/wide table cols=1000 chunks=1 rows=1000	0.620763	0.620763	1
vortex size/wide table cols=1000 chunks=1 rows=1000	9.94029e+06	9.94029e+06	1
vortex:parquet-zstd size/wide table cols=1000 chunks=1 rows=1000	1.06336	1.06336	1
vortex:raw size/wide table cols=10 chunks=50 rows=1000	0.597263	0.597263	1
vortex size/wide table cols=10 chunks=50 rows=1000	99688	99688	1
vortex:parquet-zstd size/wide table cols=10 chunks=50 rows=1000	1.06636	1.06636	1
vortex:raw size/wide table cols=100 chunks=50 rows=1000	0.597024	0.597024	1
vortex size/wide table cols=100 chunks=50 rows=1000	994288	994288	1
vortex:parquet-zstd size/wide table cols=100 chunks=50 rows=1000	1.06363	1.06363	1
vortex:raw size/wide table cols=1000 chunks=50 rows=1000	0.597	0.597	1
vortex size/wide table cols=1000 chunks=50 rows=1000	9.94029e+06	9.94029e+06	1
vortex:parquet-zstd size/wide table cols=1000 chunks=50 rows=1000	1.06336	1.06336	1

robert3005 · 2025-03-06T18:23:59Z

@danking @gatesn I have rebased this pr and made list and struct dtype not be recursive

danking · 2025-03-06T18:45:52Z

🧟

github-actions · 2025-03-06T18:56:42Z

Benchmarks: Clickbench on NVME

Table of Results

name	PR `4e147b2`	base `590c413`	ratio (PR/base)	unit
clickbench_q00/parquet	2201274	2.46163e+06	0.894235	ns
clickbench_q01/parquet	33908458	3.11474e+07	1.08865	ns
clickbench_q02/parquet	66921331	6.1963e+07	1.08002	ns
clickbench_q03/parquet	54290967	5.08026e+07	1.06867	ns
clickbench_q04/parquet	331501412	3.03316e+08	1.09292	ns
clickbench_q05/parquet	313314632	2.94382e+08	1.06431	ns
clickbench_q06/parquet	1887919	2.30354e+06	0.819571	ns
clickbench_q07/parquet	32842041	3.13745e+07	1.04677	ns
clickbench_q08/parquet	386289755	3.64969e+08	1.05842	ns
clickbench_q09/parquet	574763190	5.36035e+08	1.07225	ns
clickbench_q10/parquet	119441969	1.11639e+08	1.0699	ns
clickbench_q11/parquet	140601339	1.3517e+08	1.04018	ns
clickbench_q12/parquet	315466317	3.00894e+08	1.04843	ns
clickbench_q13/parquet	486150909	4.58776e+08	1.05967	ns
clickbench_q14/parquet	323846475	3.03767e+08	1.0661	ns
clickbench_q15/parquet	351110224	3.4184e+08	1.02712	ns
clickbench_q16/parquet	745337931	7.39723e+08	1.00759	ns
clickbench_q17/parquet	691183583	6.38369e+08	1.08273	ns
clickbench_q18/parquet	1584497347	1.49463e+09	1.06012	ns
clickbench_q19/parquet	43416247	3.99775e+07	1.08602	ns
clickbench_q20/parquet	567920588	5.2527e+08	1.0812	ns
clickbench_q21/parquet	617688969	6.09546e+08	1.01336	ns
clickbench_q22/parquet	977115832	9.22323e+08	1.05941	ns
clickbench_q23/parquet	3915226010	3.69577e+09	1.05938	ns
clickbench_q24/parquet	196975832	1.8275e+08	1.07784	ns
clickbench_q25/parquet	171461905	1.61239e+08	1.0634	ns
clickbench_q26/parquet	218620723	2.0607e+08	1.0609	ns
clickbench_q27/parquet	755316714	7.11813e+08	1.06112	ns
clickbench_q28/parquet	4451513650	4.14999e+09	1.07266	ns
clickbench_q29/parquet	254876238	2.33725e+08	1.0905	ns
clickbench_q30/parquet	327758475	3.10847e+08	1.0544	ns
clickbench_q31/parquet	381224308	3.55963e+08	1.07097	ns
clickbench_q32/parquet	1872624986	1.7588e+09	1.06472	ns
clickbench_q33/parquet	1499204345	1.46961e+09	1.02013	ns
clickbench_q34/parquet	1484310010	1.43205e+09	1.0365	ns
clickbench_q35/parquet	510484128	4.86946e+08	1.04834	ns
clickbench_q36/parquet	146897131	1.42492e+08	1.03091	ns
clickbench_q37/parquet	68620007	6.39827e+07	1.07248	ns
clickbench_q38/parquet	93853569	8.97205e+07	1.04607	ns
clickbench_q39/parquet	282249989	2.70175e+08	1.04469	ns
clickbench_q40/parquet	45496531	4.29045e+07	1.06041	ns
clickbench_q41/parquet	42531240	4.12096e+07	1.03207	ns
clickbench_q42/parquet	56754558	5.09657e+07	1.11358	ns
clickbench_q00/vortex-file-compressed	4069814	3.96859e+06	1.02551	ns
clickbench_q01/vortex-file-compressed	14881589	1.45143e+07	1.02531	ns
clickbench_q02/vortex-file-compressed	27911228	2.66694e+07	1.04656	ns
clickbench_q03/vortex-file-compressed	39186402	3.59457e+07	1.09016	ns
clickbench_q04/vortex-file-compressed	339967497	3.29912e+08	1.03048	ns
clickbench_q05/vortex-file-compressed	322798344	3.22507e+08	1.0009	ns
clickbench_q06/vortex-file-compressed	4559938	4.02405e+06	1.13317	ns
clickbench_q07/vortex-file-compressed	17411821	1.72854e+07	1.00731	ns
clickbench_q08/vortex-file-compressed	403244028	3.71656e+08	1.08499	ns
clickbench_q09/vortex-file-compressed	482041025	4.56185e+08	1.05668	ns
clickbench_q10/vortex-file-compressed	63908467	6.10863e+07	1.0462	ns
clickbench_q11/vortex-file-compressed	70964724	6.84318e+07	1.03701	ns
clickbench_q12/vortex-file-compressed	254936992	2.45874e+08	1.03686	ns
clickbench_q13/vortex-file-compressed	361451038	3.37806e+08	1.07	ns
clickbench_q14/vortex-file-compressed	252009717	2.40862e+08	1.04628	ns
clickbench_q15/vortex-file-compressed	414336090	3.80475e+08	1.089	ns
clickbench_q16/vortex-file-compressed	790523249	7.46148e+08	1.05947	ns
clickbench_q17/vortex-file-compressed	777178470	7.38743e+08	1.05203	ns
clickbench_q18/vortex-file-compressed	1282206947	1.24054e+09	1.03358	ns
clickbench_q19/vortex-file-compressed	26336693	2.48946e+07	1.05793	ns
clickbench_q20/vortex-file-compressed	448499121	4.24246e+08	1.05717	ns
clickbench_q21/vortex-file-compressed	466356086	4.39311e+08	1.06156	ns
clickbench_q22/vortex-file-compressed	615462681	6.01514e+08	1.02319	ns
clickbench_q23/vortex-file-compressed	1234784394	1.19407e+09	1.0341	ns
clickbench_q24/vortex-file-compressed	87274871	7.9318e+07	1.10032	ns
clickbench_q25/vortex-file-compressed	98263513	9.20108e+07	1.06796	ns
clickbench_q26/vortex-file-compressed	115366777	1.08503e+08	1.06326	ns
clickbench_q27/vortex-file-compressed	700695208	6.61765e+08	1.05883	ns
clickbench_q28/vortex-file-compressed	4857061682	4.88988e+09	0.993289	ns
clickbench_q29/vortex-file-compressed	266409360	2.40578e+08	1.10737	ns
clickbench_q30/vortex-file-compressed	222537362	2.20949e+08	1.00719	ns
clickbench_q31/vortex-file-compressed	234471104	2.22688e+08	1.05291	ns
clickbench_q32/vortex-file-compressed	1341180150	1.24597e+09	1.07642	ns
clickbench_q33/vortex-file-compressed	1340678622	1.28734e+09	1.04143	ns
clickbench_q34/vortex-file-compressed	1342920116	1.28347e+09	1.04632	ns
clickbench_q35/vortex-file-compressed	606887934	5.7486e+08	1.05571	ns
clickbench_q36/vortex-file-compressed	60921294	5.36623e+07	1.13527	ns
clickbench_q37/vortex-file-compressed	37854695	3.30074e+07	1.14685	ns
clickbench_q38/vortex-file-compressed	26181144	2.3982e+07	1.0917	ns
clickbench_q39/vortex-file-compressed	109165798	1.0528e+08	1.03691	ns
clickbench_q40/vortex-file-compressed	24674606	1.98978e+07	1.24007	ns
clickbench_q41/vortex-file-compressed	21375346	1.87571e+07	1.13959	ns
clickbench_q42/vortex-file-compressed	30615420	2.7645e+07	1.10745	ns

robert3005 · 2025-03-07T15:14:19Z

eh @danking you wiped the changes I have made to preserve the old behaviour

danking · 2025-03-07T15:46:04Z

whoops truly a sin that -f is --force and not --force-with-lease.

danking · 2025-03-07T15:47:59Z

I don't understand why the SQL benchmarks aren't running

danking · 2025-03-07T15:48:41Z

maybe now working?

robert3005 · 2025-03-07T15:49:42Z

they're running

danking · 2025-03-07T16:01:24Z

Hmm. It seems to have no macro effect though most microbenchmarks improve.

robert3005 · 2025-03-07T16:03:28Z

that's expected imho at this point. We are mostly bottlenecked on io and datafusion logic

robert3005 · 2025-03-07T16:03:57Z

but this is the last piece of builder canonical migration so would be nice to have it done

danking marked this pull request as ready for review February 25, 2025 17:28

danking added the benchmark Run benchmarks on this branch label Feb 25, 2025

danking enabled auto-merge (squash) February 25, 2025 17:28

github-actions bot removed the benchmark Run benchmarks on this branch label Feb 25, 2025

danking mentioned this pull request Feb 25, 2025

feat: ChunkedArray uses into_canonical for to_canonical #2503

Closed

danking force-pushed the dk/chunked-array-into-canonical-canonical-into-2 branch from e8c8cad to 4035433 Compare February 26, 2025 14:37

danking added the benchmark Run benchmarks on this branch label Feb 26, 2025

github-actions bot removed the benchmark Run benchmarks on this branch label Feb 26, 2025

danking added benchmark Run benchmarks on this branch benchmark-sql labels Feb 26, 2025

github-actions bot removed benchmark Run benchmarks on this branch benchmark-sql labels Feb 26, 2025

robert3005 force-pushed the dk/chunked-array-into-canonical-canonical-into-2 branch from 4035433 to 2d57fca Compare March 6, 2025 18:23

danking added the benchmark Run benchmarks on this branch label Mar 6, 2025

github-actions bot removed the benchmark Run benchmarks on this branch label Mar 6, 2025

danking added the benchmark-sql label Mar 6, 2025

github-actions bot removed the benchmark-sql label Mar 6, 2025

robert3005 force-pushed the dk/chunked-array-into-canonical-canonical-into-2 branch from 2d57fca to 5be2f29 Compare March 7, 2025 10:49

danking force-pushed the dk/chunked-array-into-canonical-canonical-into-2 branch from 5be2f29 to 91cf1f9 Compare March 7, 2025 14:22

danking added the benchmark-sql label Mar 7, 2025

github-actions bot removed the benchmark-sql label Mar 7, 2025

robert3005 force-pushed the dk/chunked-array-into-canonical-canonical-into-2 branch from 91cf1f9 to 5be2f29 Compare March 7, 2025 15:14

danking and others added 4 commits March 7, 2025 15:14

feat: ChunkedArray uses a builder to implement to_canonical

3d10a94

fixes

9c0bcfb

bringback

1a765fe

less

21db3bf

robert3005 force-pushed the dk/chunked-array-into-canonical-canonical-into-2 branch from 5be2f29 to 21db3bf Compare March 7, 2025 15:14

inline

41d62cc

danking added the benchmark-sql label Mar 7, 2025

github-actions bot removed the benchmark-sql label Mar 7, 2025

robert3005 approved these changes Mar 7, 2025

View reviewed changes

danking merged commit 250c31b into develop Mar 7, 2025
32 of 33 checks passed

danking deleted the dk/chunked-array-into-canonical-canonical-into-2 branch March 7, 2025 16:13

robert3005 mentioned this pull request Mar 12, 2025

Teach chunked array to use canonical_into as the implementation of into_canonical #2356

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: ChunkedArray uses a builder to implement to_canonical #2511

feat: ChunkedArray uses a builder to implement to_canonical #2511

danking commented Feb 25, 2025

codspeed-hq bot commented Feb 25, 2025 •

edited

Loading

danking commented Feb 25, 2025

gatesn commented Feb 25, 2025 •

edited

Loading

danking commented Feb 25, 2025

danking commented Feb 25, 2025

robert3005 commented Feb 25, 2025 •

edited

Loading

danking commented Feb 25, 2025

github-actions bot commented Feb 26, 2025 •

edited

Loading

github-actions bot commented Feb 26, 2025 •

edited

Loading

github-actions bot commented Feb 26, 2025 •

edited

Loading

github-actions bot commented Feb 26, 2025 •

edited

Loading

robert3005 commented Mar 6, 2025

danking commented Mar 6, 2025

github-actions bot commented Mar 6, 2025 •

edited

Loading

robert3005 commented Mar 7, 2025

danking commented Mar 7, 2025

danking commented Mar 7, 2025

danking commented Mar 7, 2025 •

edited

Loading

robert3005 commented Mar 7, 2025

danking commented Mar 7, 2025

robert3005 commented Mar 7, 2025

robert3005 commented Mar 7, 2025

feat: ChunkedArray uses a builder to implement to_canonical #2511

feat: ChunkedArray uses a builder to implement to_canonical #2511

Conversation

danking commented Feb 25, 2025

codspeed-hq bot commented Feb 25, 2025 • edited Loading

CodSpeed Performance Report

Merging #2511 will degrade performances by 19.58%

Summary

Benchmarks breakdown

danking commented Feb 25, 2025

gatesn commented Feb 25, 2025 • edited Loading

danking commented Feb 25, 2025

danking commented Feb 25, 2025

robert3005 commented Feb 25, 2025 • edited Loading

danking commented Feb 25, 2025

github-actions bot commented Feb 26, 2025 • edited Loading

Benchmarks: random_access

github-actions bot commented Feb 26, 2025 • edited Loading

Benchmarks: TPC-H on S3

github-actions bot commented Feb 26, 2025 • edited Loading

Benchmarks: TPC-H on NVME

github-actions bot commented Feb 26, 2025 • edited Loading

Benchmarks: compress

robert3005 commented Mar 6, 2025

danking commented Mar 6, 2025

github-actions bot commented Mar 6, 2025 • edited Loading

Benchmarks: Clickbench on NVME

robert3005 commented Mar 7, 2025

danking commented Mar 7, 2025

danking commented Mar 7, 2025

danking commented Mar 7, 2025 • edited Loading

robert3005 commented Mar 7, 2025

danking commented Mar 7, 2025

robert3005 commented Mar 7, 2025

robert3005 commented Mar 7, 2025

codspeed-hq bot commented Feb 25, 2025 •

edited

Loading

gatesn commented Feb 25, 2025 •

edited

Loading

robert3005 commented Feb 25, 2025 •

edited

Loading

github-actions bot commented Feb 26, 2025 •

edited

Loading

github-actions bot commented Feb 26, 2025 •

edited

Loading

github-actions bot commented Feb 26, 2025 •

edited

Loading

github-actions bot commented Feb 26, 2025 •

edited

Loading

github-actions bot commented Mar 6, 2025 •

edited

Loading

danking commented Mar 7, 2025 •

edited

Loading