Skip to content

Commit f78be3f

Browse files
author
Daniel Lemire
committed
integrating AVX-512
1 parent 1a0efdb commit f78be3f

File tree

3 files changed

+85
-69
lines changed

3 files changed

+85
-69
lines changed

README.md

Lines changed: 33 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -99,49 +99,52 @@ cd benchmark
9999
sudo dotnet run -c Release
100100
```
101101

102+
103+
--anyCategories sse avx avx512
102104
## Results (x64)
103105

104-
On an Intel Ice Lake system, our validation function is up to seven times
106+
On an Intel Ice Lake system, our validation function is up to 13 times
105107
faster than the standard library.
106-
A realistic input is Twitter.json which is mostly ASCII with some Unicode content.
108+
A realistic input is Twitter.json which is mostly ASCII with some Unicode content
109+
where we are 2.4 times faster.
107110

108-
| data set | SimdUnicode current AVX2 (GB/s) | .NET speed (GB/s) |
109-
|:----------------|:------------------------|-------------------|
110-
| Twitter.json | 24 | 12 |
111-
| Arabic-Lipsum | 9.0 | 2.3 |
112-
| Chinese-Lipsum | 9.0 | 3.9 |
113-
| Emoji-Lipsum | 7.1 | 0.9 |
114-
| Hebrew-Lipsum | 8.0 | 2.3 |
115-
| Hindi-Lipsum | 8.0 | 2.1 |
116-
| Japanese-Lipsum | 8.0  | 3.5 |
117-
| Korean-Lipsum | 8.0 | 1.3 |
118-
| Latin-Lipsum | 76 | 96 |
119-
| Russian-Lipsum | 8.0 | 1.2 |
111+
| data set | SimdUnicode current AVX2 (GB/s) | .NET speed (GB/s) | speed up |
112+
|:----------------|:------------------------|:-------------------|:-------------------|
113+
| Twitter.json | 29 | 12 | 2.4 x |
114+
| Arabic-Lipsum | 12 | 2.3 | 5.2 x |
115+
| Chinese-Lipsum | 12 | 3.9 | 3.0 x |
116+
| Emoji-Lipsum | 12 | 0.9 | 13 x |
117+
| Hebrew-Lipsum |12 | 2.3 | 5.2 x |
118+
| Hindi-Lipsum | 12 | 2.1 | 5.7 x |
119+
| Japanese-Lipsum | 10  | 3.5 | 2.9 x |
120+
| Korean-Lipsum | 10 | 1.3 | 7.7 x |
121+
| Latin-Lipsum | 76 | 76 | --- |
122+
| Russian-Lipsum | 12 | 1.2 | 10 x |
120123

121-
On the pure ASCII inputs (Latin-Lipsum) has a small advantage but both
122-
functions are extremely fast.
123124

124125

125126
On x64 system, we offer several functions: a fallback function for legacy systems,
126-
a SSE42 function for older CPUs, and an AVX2 function for current x64 systems.
127+
a SSE42 function for older CPUs, an AVX2 function for current x64 systems and
128+
an AVX-512 function for the most recent processors (AMD Zen 4 or better, Intel
129+
Ice Lake, etc.).
127130

128131
## Results (ARM)
129132

130-
On an Apple M2 system, our validation function is two to three times
133+
On an Apple M2 system, our validation function is 1.5 to four times
131134
faster than the standard library.
132135

133-
| data set | SimdUnicode speed (GB/s) | .NET speed (GB/s) |
134-
|:----------------|:-----------|:--------------------------|
135-
| Twitter.json | 25 | 14 |
136-
| Arabic-Lipsum | 7.4 | 3.5 |
137-
| Chinese-Lipsum | 7.4 | 4.8 |
138-
| Emoji-Lipsum | 7.4 | 2.5 |
139-
| Hebrew-Lipsum | 7.4 | 3.5 |
140-
| Hindi-Lipsum | 7.3 | 3.0 |
141-
| Japanese-Lipsum | 7.3 | 4.6  |
142-
| Korean-Lipsum | 7.4 | 1.8 |
143-
| Latin-Lipsum | 87 | 38 |
144-
| Russian-Lipsum | 7.4 | 2.7 |
136+
| data set | SimdUnicode speed (GB/s) | .NET speed (GB/s) | speed up |
137+
|:----------------|:-----------|:--------------------------|:-------------------|
138+
| Twitter.json | 25 | 14 | 1.8 x |
139+
| Arabic-Lipsum | 7.4 | 3.5 | 2.1 x |
140+
| Chinese-Lipsum | 7.4 | 4.8 | 1.5 x |
141+
| Emoji-Lipsum | 7.4 | 2.5 | 3.0 x |
142+
| Hebrew-Lipsum | 7.4 | 3.5 | 2.1 x |
143+
| Hindi-Lipsum | 7.3 | 3.0 | 2.4 x |
144+
| Japanese-Lipsum | 7.3 | 4.6  | 1.6 x |
145+
| Korean-Lipsum | 7.4 | 1.8 | 4.1 x |
146+
| Latin-Lipsum | 87 | 38 | 2.3 x |
147+
| Russian-Lipsum | 7.4 | 2.7 | 2.7 x |
145148

146149

147150
## Building the library

benchmark/Benchmark.cs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,7 @@ public string GetValue(Summary summary, BenchmarkCase benchmarkCase)
6363
public class RealDataBenchmark
6464
{
6565
// We only informs the user once about the SIMD support of the system.
66-
private static bool printed = false;
66+
private static bool printed;
6767
#pragma warning disable CA1812
6868
private sealed class Config : ManualConfig
6969
{

0 commit comments

Comments
 (0)