Releases: ashvardanian/StringZilla
Releases · ashvardanian/StringZilla
Release v3.10.9
Release: v3.10.9 [skip ci]
Release v3.10.8
Release: v3.10.8 [skip ci]
Release v3.10.7
Release: v3.10.7 [skip ci]
Release v3.10.6
Release: v3.10.6 [skip ci]
Release v3.10.5
Release: v3.10.5 [skip ci]
Release v3.10.4
Release: v3.10.4 [skip ci]
Release v3.10.3
Release: v3.10.3 [skip ci]
Release v3.10.2
Release: v3.10.2 [skip ci]
Release v3.10.1
Release: v3.10.1 [skip ci]
v3.10: Improved Memory Operations
This update brings many performance optimizations before the next wave of breaking major releases with new functionality and wider range of CPUs supported. Time to get excited 🥳
Faster memcpy
and memset
On Intel Sapphire Rapids:
$ build_release/stringzilla_bench_memory leipzig1M.txt
StringZilla. Starting memory benchmarks.
Parsed the dataset with:
- 8388608 words of mean length ~ 5.12 bytes
- 262144 lines of mean length ~ 128.64 bytes
Benchmarking on entire dataset:
- memcpy<aligned> 19.7128 GB/s 3404322.4 ns 0 errors in 7344 iterations
- sz_copy_serial<aligned> 11.7727 GB/s 5700374.0 ns 0 errors in 4388 iterations
- sz_copy_avx512<aligned> 20.0675 GB/s 3344156.1 ns 0 errors in 7476 iterations
- sz_copy_avx2<aligned> 11.4429 GB/s 5864690.5 ns 0 errors in 4264 iterations
- memcpy<unaligned> 19.4694 GB/s 3446883.2 ns 0 errors in 7256 iterations
- sz_copy_serial<unaligned> 11.6158 GB/s 5777373.4 ns 0 errors in 4328 iterations
- sz_copy_avx512<unaligned> 20.3848 GB/s 3292099.3 ns 0 errors in 7596 iterations
- sz_copy_avx2<unaligned> 11.2894 GB/s 5944407.9 ns 0 errors in 4208 iterations
- memset 27.9879 GB/s 2397785.1 ns 0 errors in 10428 iterations
- sz_fill_serial 28.0284 GB/s 2394315.1 ns 0 errors in 10444 iterations
- sz_fill_avx512 28.9894 GB/s 2314942.1 ns 0 errors in 10800 iterations
- sz_fill_avx2 27.7442 GB/s 2418845.8 ns 0 errors in 10336 iterations
On AWS Graviton 4 we still have room for improvement.
A potential improvement can come from non-temporal stores on large payloads.
$ build_release/stringzilla_bench_memory leipzig1M.txt
StringZilla. Starting memory benchmarks.
Parsed the dataset with:
- 8388608 words of mean length ~ 5.12 bytes
- 262144 lines of mean length ~ 128.64 bytes
Benchmarking on entire dataset:
- memcpy<aligned> 28.4008 GB/s 2362924.1 ns 0 errors in 10584 iterations
- sz_copy_serial<aligned> 23.0014 GB/s 2917600.0 ns 0 errors in 8572 iterations
- sz_copy_sve<aligned> 27.5536 GB/s 2435573.1 ns 0 errors in 10268 iterations
- sz_copy_neon<aligned> 21.1320 GB/s 3175702.1 ns 0 errors in 7876 iterations
- memcpy<unaligned> 26.9551 GB/s 2489652.6 ns 0 errors in 10044 iterations
- sz_copy_serial<unaligned> 22.6073 GB/s 2968456.4 ns 0 errors in 8424 iterations
- sz_copy_sve<unaligned> 25.6073 GB/s 2620692.7 ns 0 errors in 9540 iterations
- sz_copy_neon<unaligned> 20.8439 GB/s 3219593.9 ns 0 errors in 7768 iterations
- memset 66.9055 GB/s 1003039.9 ns 0 errors in 24928 iterations
- sz_fill_serial 44.1775 GB/s 1519072.9 ns 0 errors in 16460 iterations
- sz_fill_sve 34.5010 GB/s 1945126.1 ns 0 errors in 12856 iterations
- sz_fill_neon 44.5696 GB/s 1505708.6 ns 0 errors in 16604 iterations
256-byte Look-Up Table Transform
On Intel Sapphire Rapids:
$ build_release/stringzilla_bench_memory leipzig1M.txt
StringZilla. Starting memory benchmarks.
Parsed the dataset with:
- 8388608 words of mean length ~ 5.12 bytes
- 262144 lines of mean length ~ 128.64 bytes
Benchmarking on entire dataset:
- str::transform<lookup> 3.8070 GB/s 17627743.2 ns 0 errors in 1420 iterations
- str::transform<increment> 23.9881 GB/s 2797588.7 ns 0 errors in 8940 iterations
- sz_look_up_transform_serial 3.6020 GB/s 18630895.7 ns 0 errors in 1344 iterations
- sz_look_up_transform_avx512 21.1733 GB/s 3169507.5 ns 0 errors in 7888 iterations
- sz_look_up_transform_avx2 8.3881 GB/s 8000528.7 ns 0 errors in 3128 iterations
On AWS Graviton 4:
$ build_release/stringzilla_bench_memory leipzig1M.txt
StringZilla. Starting memory benchmarks.
Parsed the dataset with:
- 8388608 words of mean length ~ 5.12 bytes
- 262144 lines of mean length ~ 128.64 bytes
Benchmarking on entire dataset:
- str::transform<lookup> 2.6494 GB/s 25329887.2 ns 0 errors in 988 iterations
- str::transform<increment> 23.7150 GB/s 2829809.9 ns 0 errors in 8836 iterations
- sz_look_up_transform_serial 2.6069 GB/s 25742844.6 ns 0 errors in 972 iterations
- sz_look_up_transform_neon 8.4908 GB/s 7903721.1 ns 0 errors in 3164 iterations