1
1
# x86-simd-sort
2
2
3
3
C++ header file library for SIMD based 16-bit, 32-bit and 64-bit data type
4
- sorting on x86 processors. Source header files are available in src directory.
5
- We currently only have AVX-512 based implementation of quicksort. This
6
- repository also includes a test suite which can be built and run to test the
7
- sorting algorithms for correctness. It also has benchmarking code to compare
8
- its performance relative to std::sort.
4
+ sorting algorithms on x86 processors. Source header files are available in src
5
+ directory. We currently only have AVX-512 based implementation of quicksort,
6
+ argsort, quickselect, paritalsort and key-value sort. This repository also
7
+ includes a test suite which can be built and run to test the sorting algorithms
8
+ for correctness. It also has benchmarking code to compare its performance
9
+ relative to std::sort. The following API's are currently supported:
10
+
11
+ #### Quicksort
12
+
13
+ ```
14
+ void avx512_qsort<T>(T* arr, int64_t arrsize)
15
+ ```
16
+ Supported datatypes: `uint16_t, int16_t, _ Float16, uint32_t, int32_t, float,
17
+ uint64_t, int64_t and double`
18
+
19
+ #### Argsort
20
+
21
+ ```
22
+ std::vector<int64_t> arg = avx512_argsort<T>(T* arr, int64_t arrsize)
23
+ void avx512_argsort<T>(T* arr, int64_t *arg, int64_t arrsize)
24
+ ```
25
+ Supported datatypes: ` uint32_t, int32_t, float, uint64_t, int64_t and double ` .
26
+ The algorithm resorts to scalar ` std::sort ` if the array contains NAN.
27
+
28
+ #### Quickselect
29
+
30
+ ```
31
+ void avx512_qselect<T>(T* arr, int64_t arrsize)
32
+ void avx512_qselect<T>(T* arr, int64_t arrsize, bool hasnan)
33
+ ```
34
+ Supported datatypes: `uint16_t, int16_t, _ Float16 ,uint32_t, int32_t, float,
35
+ uint64_t, int64_t and double` . Use an additional optional argument ` bool
36
+ hasnan` if you expect your arrays to contain nan.
37
+
38
+ #### Partialsort
39
+
40
+ ```
41
+ void avx512_partialsort<T>(T* arr, int64_t arrsize)
42
+ void avx512_partialsort<T>(T* arr, int64_t arrsize, bool hasnan)
43
+ ```
44
+ Supported datatypes: `uint16_t, int16_t, _ Float16 ,uint32_t, int32_t, float,
45
+ uint64_t, int64_t and double` . Use an additional optional argument ` bool
46
+ hasnan` if you expect your arrays to contain nan.
47
+
48
+ #### Key-value sort
49
+ ```
50
+ void avx512_qsort_kv<T>(T* key, uint64_t* value , int64_t arrsize)
51
+ ```
52
+ Supported datatypes: ` uint64_t, int64_t and double `
9
53
10
54
## Algorithm details
11
55
@@ -20,13 +64,14 @@ network. The core implementations of the vectorized qsort functions
20
64
` avx512_qsort<T>(T*, int64_t) ` are modified versions of avx2 quicksort
21
65
presented in the paper [ 2] and source code associated with that paper [ 3] .
22
66
23
- ## Handling NAN in float and double arrays
67
+ ## A note on NAN in float and double arrays
24
68
25
69
If you expect your array to contain NANs, please be aware that the these
26
- routines ** do not preserve your NANs as you pass them** . The
27
- ` avx512_qsort<T>() ` routine will put all your NAN's at the end of the sorted
28
- array and replace them with ` std::nan("1") ` . Please take a look at
29
- ` avx512_qsort<float>() ` and ` avx512_qsort<double>() ` functions for details.
70
+ routines ** do not preserve your NANs as you pass them** . The quicksort,
71
+ quickselect, partialsort and key-value sorting routines will sort NAN's to the
72
+ end of the array and replace them with ` std::nan("1") ` . ` avx512_argsort `
73
+ routines will also resort to a scalar argsort that uses ` std::sort ` to sort array
74
+ that contains NAN.
30
75
31
76
## Example to include and build this in a C++ code
32
77
@@ -36,7 +81,7 @@ array and replace them with `std::nan("1")`. Please take a look at
36
81
#include " src/avx512-32bit-qsort.hpp"
37
82
38
83
int main () {
39
- const int ARRSIZE = 10 ;
84
+ const int ARRSIZE = 1000 ;
40
85
std::vector<float> arr;
41
86
42
87
/* Initialize elements is reverse order */
@@ -45,7 +90,7 @@ int main() {
45
90
}
46
91
47
92
/* call avx512 quicksort */
48
- avx512_qsort<float> (arr.data(), ARRSIZE);
93
+ avx512_qsort (arr.data(), ARRSIZE);
49
94
return 0;
50
95
}
51
96
@@ -54,7 +99,7 @@ int main() {
54
99
### Build using gcc
55
100
56
101
```
57
- gcc main.cpp -mavx512f -mavx512dq -O3
102
+ g++ main.cpp -mavx512f -mavx512dq -O3
58
103
```
59
104
60
105
This is a header file only library and we do not provide any compile time and
@@ -75,33 +120,40 @@ compiler to build.
75
120
gcc >= 8.x
76
121
```
77
122
123
+ ### Build using Meson
124
+
125
+ meson is the recommended build system to build the test and benchmark suite.
126
+
127
+ ```
128
+ meson setup builddir && cd builddir && ninja
129
+ ```
130
+
131
+ It build two executables:
132
+
133
+ - ` testexe ` : runs a bunch of tests written in ./tests directory.
134
+ - ` benchexe ` : measures performance of these algorithms for various data types.
135
+
136
+
78
137
### Build using Make
79
138
80
- ` make ` command builds two executables:
139
+ Makefile uses ` -march=sapphirerapids ` as a global compile flag and hence it
140
+ will require g++-12. ` make ` command builds two executables:
81
141
- ` testexe ` : runs a bunch of tests written in ./tests directory.
82
142
- ` benchexe ` : measures performance of these algorithms for various data types
83
143
and compares them to std::sort.
84
144
85
145
You can use ` make test ` and ` make bench ` to build just the ` testexe ` and
86
146
` benchexe ` respectively.
87
147
88
- ### Build using Meson
89
-
90
- You can also build ` testexe ` and ` benchexe ` using Meson/Ninja with the following
91
- command:
92
-
93
- ```
94
- meson setup builddir && cd builddir && ninja
95
- ```
96
-
97
148
## Requirements and dependencies
98
149
99
150
The sorting routines relies only on the C++ Standard Library and requires a
100
151
relatively modern compiler to build (gcc 8.x and above). Since they use the
101
152
AVX-512 instruction set, they can only run on processors that have AVX-512.
102
153
Specifically, the 32-bit and 64-bit require AVX-512F and AVX-512DQ instruction
103
154
set. The 16-bit sorting requires the AVX-512F, AVX-512BW and AVX-512 VMBI2
104
- instruction set. The test suite is written using the Google test framework.
155
+ instruction set. The test suite is written using the Google test framework. The
156
+ benchmark is written using the google benchmark framework.
105
157
106
158
## References
107
159
0 commit comments