@@ -99,49 +99,52 @@ cd benchmark
99
99
sudo dotnet run -c Release
100
100
```
101
101
102
+
103
+ --anyCategories sse avx avx512
102
104
## Results (x64)
103
105
104
- On an Intel Ice Lake system, our validation function is up to seven times
106
+ On an Intel Ice Lake system, our validation function is up to 13 times
105
107
faster than the standard library.
106
- A realistic input is Twitter.json which is mostly ASCII with some Unicode content.
108
+ A realistic input is Twitter.json which is mostly ASCII with some Unicode content
109
+ where we are 2.4 times faster.
107
110
108
- | data set | SimdUnicode current AVX2 (GB/s) | .NET speed (GB/s) |
109
- | :----------------| :------------------------| -------------------|
110
- | Twitter.json | 24 | 12 |
111
- | Arabic-Lipsum | 9.0 | 2.3 |
112
- | Chinese-Lipsum | 9.0 | 3.9 |
113
- | Emoji-Lipsum | 7.1 | 0.9 |
114
- | Hebrew-Lipsum | 8.0 | 2.3 |
115
- | Hindi-Lipsum | 8.0 | 2.1 |
116
- | Japanese-Lipsum | 8.0 | 3.5 |
117
- | Korean-Lipsum | 8.0 | 1.3 |
118
- | Latin-Lipsum | 76 | 96 |
119
- | Russian-Lipsum | 8.0 | 1.2 |
111
+ | data set | SimdUnicode current AVX2 (GB/s) | .NET speed (GB/s) | speed up |
112
+ | :----------------| :------------------------| :------------------- | : -------------------|
113
+ | Twitter.json | 29 | 12 | 2.4 x |
114
+ | Arabic-Lipsum | 12 | 2.3 | 5.2 x |
115
+ | Chinese-Lipsum | 12 | 3.9 | 3.0 x |
116
+ | Emoji-Lipsum | 12 | 0.9 | 13 x |
117
+ | Hebrew-Lipsum | 12 | 2.3 | 5.2 x |
118
+ | Hindi-Lipsum | 12 | 2.1 | 5.7 x |
119
+ | Japanese-Lipsum | 10 | 3.5 | 2.9 x |
120
+ | Korean-Lipsum | 10 | 1.3 | 7.7 x |
121
+ | Latin-Lipsum | 76 | 76 | --- |
122
+ | Russian-Lipsum | 12 | 1.2 | 10 x |
120
123
121
- On the pure ASCII inputs (Latin-Lipsum) has a small advantage but both
122
- functions are extremely fast.
123
124
124
125
125
126
On x64 system, we offer several functions: a fallback function for legacy systems,
126
- a SSE42 function for older CPUs, and an AVX2 function for current x64 systems.
127
+ a SSE42 function for older CPUs, an AVX2 function for current x64 systems and
128
+ an AVX-512 function for the most recent processors (AMD Zen 4 or better, Intel
129
+ Ice Lake, etc.).
127
130
128
131
## Results (ARM)
129
132
130
- On an Apple M2 system, our validation function is two to three times
133
+ On an Apple M2 system, our validation function is 1.5 to four times
131
134
faster than the standard library.
132
135
133
- | data set | SimdUnicode speed (GB/s) | .NET speed (GB/s) |
134
- | :----------------| :-----------| :--------------------------|
135
- | Twitter.json | 25 | 14 |
136
- | Arabic-Lipsum | 7.4 | 3.5 |
137
- | Chinese-Lipsum | 7.4 | 4.8 |
138
- | Emoji-Lipsum | 7.4 | 2.5 |
139
- | Hebrew-Lipsum | 7.4 | 3.5 |
140
- | Hindi-Lipsum | 7.3 | 3.0 |
141
- | Japanese-Lipsum | 7.3 | 4.6 |
142
- | Korean-Lipsum | 7.4 | 1.8 |
143
- | Latin-Lipsum | 87 | 38 |
144
- | Russian-Lipsum | 7.4 | 2.7 |
136
+ | data set | SimdUnicode speed (GB/s) | .NET speed (GB/s) | speed up |
137
+ | :----------------| :-----------| :--------------------------| :------------------- |
138
+ | Twitter.json | 25 | 14 | 1.8 x |
139
+ | Arabic-Lipsum | 7.4 | 3.5 | 2.1 x |
140
+ | Chinese-Lipsum | 7.4 | 4.8 | 1.5 x |
141
+ | Emoji-Lipsum | 7.4 | 2.5 | 3.0 x |
142
+ | Hebrew-Lipsum | 7.4 | 3.5 | 2.1 x |
143
+ | Hindi-Lipsum | 7.3 | 3.0 | 2.4 x |
144
+ | Japanese-Lipsum | 7.3 | 4.6 | 1.6 x |
145
+ | Korean-Lipsum | 7.4 | 1.8 | 4.1 x |
146
+ | Latin-Lipsum | 87 | 38 | 2.3 x |
147
+ | Russian-Lipsum | 7.4 | 2.7 | 2.7 x |
145
148
146
149
147
150
## Building the library
0 commit comments