Skip to content

Commit 587af58

Browse files
authored
Merge pull request #25 from r-devulap/upate-README
Update README file
2 parents a4e57cb + b2e482f commit 587af58

File tree

1 file changed

+76
-24
lines changed

1 file changed

+76
-24
lines changed

README.md

+76-24
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,55 @@
11
# x86-simd-sort
22

33
C++ header file library for SIMD based 16-bit, 32-bit and 64-bit data type
4-
sorting on x86 processors. Source header files are available in src directory.
5-
We currently only have AVX-512 based implementation of quicksort. This
6-
repository also includes a test suite which can be built and run to test the
7-
sorting algorithms for correctness. It also has benchmarking code to compare
8-
its performance relative to std::sort.
4+
sorting algorithms on x86 processors. Source header files are available in src
5+
directory. We currently only have AVX-512 based implementation of quicksort,
6+
argsort, quickselect, paritalsort and key-value sort. This repository also
7+
includes a test suite which can be built and run to test the sorting algorithms
8+
for correctness. It also has benchmarking code to compare its performance
9+
relative to std::sort. The following API's are currently supported:
10+
11+
#### Quicksort
12+
13+
```
14+
void avx512_qsort<T>(T* arr, int64_t arrsize)
15+
```
16+
Supported datatypes: `uint16_t, int16_t, _Float16, uint32_t, int32_t, float,
17+
uint64_t, int64_t and double`
18+
19+
#### Argsort
20+
21+
```
22+
std::vector<int64_t> arg = avx512_argsort<T>(T* arr, int64_t arrsize)
23+
void avx512_argsort<T>(T* arr, int64_t *arg, int64_t arrsize)
24+
```
25+
Supported datatypes: `uint32_t, int32_t, float, uint64_t, int64_t and double`.
26+
The algorithm resorts to scalar `std::sort` if the array contains NAN.
27+
28+
#### Quickselect
29+
30+
```
31+
void avx512_qselect<T>(T* arr, int64_t arrsize)
32+
void avx512_qselect<T>(T* arr, int64_t arrsize, bool hasnan)
33+
```
34+
Supported datatypes: `uint16_t, int16_t, _Float16 ,uint32_t, int32_t, float,
35+
uint64_t, int64_t and double`. Use an additional optional argument `bool
36+
hasnan` if you expect your arrays to contain nan.
37+
38+
#### Partialsort
39+
40+
```
41+
void avx512_partialsort<T>(T* arr, int64_t arrsize)
42+
void avx512_partialsort<T>(T* arr, int64_t arrsize, bool hasnan)
43+
```
44+
Supported datatypes: `uint16_t, int16_t, _Float16 ,uint32_t, int32_t, float,
45+
uint64_t, int64_t and double`. Use an additional optional argument `bool
46+
hasnan` if you expect your arrays to contain nan.
47+
48+
#### Key-value sort
49+
```
50+
void avx512_qsort_kv<T>(T* key, uint64_t* value , int64_t arrsize)
51+
```
52+
Supported datatypes: `uint64_t, int64_t and double`
953

1054
## Algorithm details
1155

@@ -20,13 +64,14 @@ network. The core implementations of the vectorized qsort functions
2064
`avx512_qsort<T>(T*, int64_t)` are modified versions of avx2 quicksort
2165
presented in the paper [2] and source code associated with that paper [3].
2266

23-
## Handling NAN in float and double arrays
67+
## A note on NAN in float and double arrays
2468

2569
If you expect your array to contain NANs, please be aware that the these
26-
routines **do not preserve your NANs as you pass them**. The
27-
`avx512_qsort<T>()` routine will put all your NAN's at the end of the sorted
28-
array and replace them with `std::nan("1")`. Please take a look at
29-
`avx512_qsort<float>()` and `avx512_qsort<double>()` functions for details.
70+
routines **do not preserve your NANs as you pass them**. The quicksort,
71+
quickselect, partialsort and key-value sorting routines will sort NAN's to the
72+
end of the array and replace them with `std::nan("1")`. `avx512_argsort`
73+
routines will also resort to a scalar argsort that uses `std::sort` to sort array
74+
that contains NAN.
3075

3176
## Example to include and build this in a C++ code
3277

@@ -36,7 +81,7 @@ array and replace them with `std::nan("1")`. Please take a look at
3681
#include "src/avx512-32bit-qsort.hpp"
3782

3883
int main() {
39-
const int ARRSIZE = 10;
84+
const int ARRSIZE = 1000;
4085
std::vector<float> arr;
4186

4287
/* Initialize elements is reverse order */
@@ -45,7 +90,7 @@ int main() {
4590
}
4691

4792
/* call avx512 quicksort */
48-
avx512_qsort<float>(arr.data(), ARRSIZE);
93+
avx512_qsort(arr.data(), ARRSIZE);
4994
return 0;
5095
}
5196

@@ -54,7 +99,7 @@ int main() {
5499
### Build using gcc
55100

56101
```
57-
gcc main.cpp -mavx512f -mavx512dq -O3
102+
g++ main.cpp -mavx512f -mavx512dq -O3
58103
```
59104

60105
This is a header file only library and we do not provide any compile time and
@@ -75,33 +120,40 @@ compiler to build.
75120
gcc >= 8.x
76121
```
77122

123+
### Build using Meson
124+
125+
meson is the recommended build system to build the test and benchmark suite.
126+
127+
```
128+
meson setup builddir && cd builddir && ninja
129+
```
130+
131+
It build two executables:
132+
133+
- `testexe`: runs a bunch of tests written in ./tests directory.
134+
- `benchexe`: measures performance of these algorithms for various data types.
135+
136+
78137
### Build using Make
79138

80-
`make` command builds two executables:
139+
Makefile uses `-march=sapphirerapids` as a global compile flag and hence it
140+
will require g++-12. `make` command builds two executables:
81141
- `testexe`: runs a bunch of tests written in ./tests directory.
82142
- `benchexe`: measures performance of these algorithms for various data types
83143
and compares them to std::sort.
84144

85145
You can use `make test` and `make bench` to build just the `testexe` and
86146
`benchexe` respectively.
87147

88-
### Build using Meson
89-
90-
You can also build `testexe` and `benchexe` using Meson/Ninja with the following
91-
command:
92-
93-
```
94-
meson setup builddir && cd builddir && ninja
95-
```
96-
97148
## Requirements and dependencies
98149

99150
The sorting routines relies only on the C++ Standard Library and requires a
100151
relatively modern compiler to build (gcc 8.x and above). Since they use the
101152
AVX-512 instruction set, they can only run on processors that have AVX-512.
102153
Specifically, the 32-bit and 64-bit require AVX-512F and AVX-512DQ instruction
103154
set. The 16-bit sorting requires the AVX-512F, AVX-512BW and AVX-512 VMBI2
104-
instruction set. The test suite is written using the Google test framework.
155+
instruction set. The test suite is written using the Google test framework. The
156+
benchmark is written using the google benchmark framework.
105157

106158
## References
107159

0 commit comments

Comments
 (0)