Skip to content

Releases: ashvardanian/StringZilla

Release v3.10.9

09 Nov 16:37
Compare
Choose a tag to compare

Release: v3.10.9 [skip ci]

Release v3.10.8

06 Nov 21:11
Compare
Choose a tag to compare

Release: v3.10.8 [skip ci]

Release v3.10.7

30 Oct 10:34
Compare
Choose a tag to compare

Release: v3.10.7 [skip ci]

Release v3.10.6

29 Oct 18:37
Compare
Choose a tag to compare

Release: v3.10.6 [skip ci]

Release v3.10.5

19 Oct 08:51
Compare
Choose a tag to compare

Release: v3.10.5 [skip ci]

Release v3.10.4

17 Oct 06:45
Compare
Choose a tag to compare

Release: v3.10.4 [skip ci]

Release v3.10.3

16 Oct 13:05
Compare
Choose a tag to compare

Release: v3.10.3 [skip ci]

Release v3.10.2

16 Oct 02:48
Compare
Choose a tag to compare

Release: v3.10.2 [skip ci]

Release v3.10.1

15 Oct 17:09
Compare
Choose a tag to compare

Release: v3.10.1 [skip ci]

v3.10: Improved Memory Operations

13 Oct 02:36
Compare
Choose a tag to compare

This update brings many performance optimizations before the next wave of breaking major releases with new functionality and wider range of CPUs supported. Time to get excited 🥳

Faster memcpy and memset

On Intel Sapphire Rapids:

$ build_release/stringzilla_bench_memory leipzig1M.txt 
StringZilla. Starting memory benchmarks.
Parsed the dataset with:
- 8388608 words of mean length ~ 5.12 bytes
- 262144 lines of mean length ~ 128.64 bytes
Benchmarking on entire dataset:
- memcpy<aligned>                          19.7128 GB/s       3404322.4 ns          0 errors in       7344 iterations                     
- sz_copy_serial<aligned>                  11.7727 GB/s       5700374.0 ns          0 errors in       4388 iterations                     
- sz_copy_avx512<aligned>                  20.0675 GB/s       3344156.1 ns          0 errors in       7476 iterations                     
- sz_copy_avx2<aligned>                    11.4429 GB/s       5864690.5 ns          0 errors in       4264 iterations                     
- memcpy<unaligned>                        19.4694 GB/s       3446883.2 ns          0 errors in       7256 iterations                     
- sz_copy_serial<unaligned>                11.6158 GB/s       5777373.4 ns          0 errors in       4328 iterations                     
- sz_copy_avx512<unaligned>                20.3848 GB/s       3292099.3 ns          0 errors in       7596 iterations                     
- sz_copy_avx2<unaligned>                  11.2894 GB/s       5944407.9 ns          0 errors in       4208 iterations                     
- memset                                   27.9879 GB/s       2397785.1 ns          0 errors in      10428 iterations                     
- sz_fill_serial                           28.0284 GB/s       2394315.1 ns          0 errors in      10444 iterations                     
- sz_fill_avx512                           28.9894 GB/s       2314942.1 ns          0 errors in      10800 iterations                     
- sz_fill_avx2                             27.7442 GB/s       2418845.8 ns          0 errors in      10336 iterations

On AWS Graviton 4 we still have room for improvement.
A potential improvement can come from non-temporal stores on large payloads.

$ build_release/stringzilla_bench_memory leipzig1M.txt 
StringZilla. Starting memory benchmarks.
Parsed the dataset with:
- 8388608 words of mean length ~ 5.12 bytes
- 262144 lines of mean length ~ 128.64 bytes
Benchmarking on entire dataset:
- memcpy<aligned>                          28.4008 GB/s       2362924.1 ns          0 errors in      10584 iterations                     
- sz_copy_serial<aligned>                  23.0014 GB/s       2917600.0 ns          0 errors in       8572 iterations                     
- sz_copy_sve<aligned>                     27.5536 GB/s       2435573.1 ns          0 errors in      10268 iterations                     
- sz_copy_neon<aligned>                    21.1320 GB/s       3175702.1 ns          0 errors in       7876 iterations                     
- memcpy<unaligned>                        26.9551 GB/s       2489652.6 ns          0 errors in      10044 iterations                     
- sz_copy_serial<unaligned>                22.6073 GB/s       2968456.4 ns          0 errors in       8424 iterations                     
- sz_copy_sve<unaligned>                   25.6073 GB/s       2620692.7 ns          0 errors in       9540 iterations                     
- sz_copy_neon<unaligned>                  20.8439 GB/s       3219593.9 ns          0 errors in       7768 iterations                     
- memset                                   66.9055 GB/s       1003039.9 ns          0 errors in      24928 iterations                     
- sz_fill_serial                           44.1775 GB/s       1519072.9 ns          0 errors in      16460 iterations                     
- sz_fill_sve                              34.5010 GB/s       1945126.1 ns          0 errors in      12856 iterations                     
- sz_fill_neon                             44.5696 GB/s       1505708.6 ns          0 errors in      16604 iterations

256-byte Look-Up Table Transform

On Intel Sapphire Rapids:

$ build_release/stringzilla_bench_memory leipzig1M.txt 
StringZilla. Starting memory benchmarks.
Parsed the dataset with:
- 8388608 words of mean length ~ 5.12 bytes
- 262144 lines of mean length ~ 128.64 bytes
Benchmarking on entire dataset:
- str::transform<lookup>                    3.8070 GB/s      17627743.2 ns          0 errors in       1420 iterations                     
- str::transform<increment>                23.9881 GB/s       2797588.7 ns          0 errors in       8940 iterations                     
- sz_look_up_transform_serial               3.6020 GB/s      18630895.7 ns          0 errors in       1344 iterations                     
- sz_look_up_transform_avx512              21.1733 GB/s       3169507.5 ns          0 errors in       7888 iterations                     
- sz_look_up_transform_avx2                 8.3881 GB/s       8000528.7 ns          0 errors in       3128 iterations

On AWS Graviton 4:

$ build_release/stringzilla_bench_memory leipzig1M.txt 
StringZilla. Starting memory benchmarks.
Parsed the dataset with:
- 8388608 words of mean length ~ 5.12 bytes
- 262144 lines of mean length ~ 128.64 bytes
Benchmarking on entire dataset:
- str::transform<lookup>                    2.6494 GB/s      25329887.2 ns          0 errors in        988 iterations                     
- str::transform<increment>                23.7150 GB/s       2829809.9 ns          0 errors in       8836 iterations                     
- sz_look_up_transform_serial               2.6069 GB/s      25742844.6 ns          0 errors in        972 iterations                     
- sz_look_up_transform_neon                 8.4908 GB/s       7903721.1 ns          0 errors in       3164 iterations