Skip to content

Commit 7559f70

Browse files
authored
Merge pull request #96 from r-devulap/libname
Prep for v4.0
2 parents 2e5e136 + 6491d42 commit 7559f70

File tree

6 files changed

+260
-157
lines changed

6 files changed

+260
-157
lines changed

.github/workflows/c-cpp.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -163,7 +163,7 @@ jobs:
163163
164164
- name: List exported symbols
165165
run: |
166-
nm --demangle --dynamic --defined-only --extern-only builddir/libx86simdsort.so
166+
nm --demangle --dynamic --defined-only --extern-only builddir/libx86simdsortcpp.so
167167
168168
- name: Run test suite on SPR
169169
run: sde -spr -- ./builddir/testexe

README.md

+64-152
Original file line numberDiff line numberDiff line change
@@ -1,173 +1,85 @@
11
# x86-simd-sort
22

3-
C++ header file library for SIMD based 16-bit, 32-bit and 64-bit data type
4-
sorting algorithms on x86 processors. Source header files are available in src
5-
directory. We currently only have AVX-512 based implementation of quicksort,
6-
argsort, quickselect, paritalsort and key-value sort. This repository also
7-
includes a test suite which can be built and run to test the sorting algorithms
8-
for correctness. It also has benchmarking code to compare its performance
9-
relative to std::sort. The following API's are currently supported:
10-
11-
#### Quicksort
12-
13-
```cpp
14-
void avx512_qsort<T>(T* arr, int64_t arrsize)
15-
```
16-
Supported datatypes: `uint16_t`, `int16_t`, `_Float16`, `uint32_t`, `int32_t`,
17-
`float`, `uint64_t`, `int64_t` and `double`.
18-
19-
For floating-point types, if `arr` contains NaNs, they are moved to the end and
20-
replaced with a quiet NaN. That is, the original, bit-exact NaNs in the input
21-
are not preserved.
22-
23-
#### Argsort
3+
C++ template library for high performance SIMD based sorting routines for
4+
16-bit, 32-bit and 64-bit data types. The sorting routines are accelerated
5+
using AVX-512/AVX2 when available. The library auto picks the best version
6+
depending on the processor it is run on. If you are looking for the AVX-512 or
7+
AVX2 specific implementations, please see
8+
[README](https://github.com/intel/x86-simd-sort/src/README.md) file under
9+
`src/` directory. The following routines are currently supported:
2410

2511
```cpp
26-
std::vector<int64_t> arg = avx512_argsort<T>(T* arr, int64_t arrsize)
27-
void avx512_argsort<T>(T* arr, int64_t *arg, int64_t arrsize)
12+
x86simdsort::qsort(T* arr, size_t size, bool hasnan);
13+
x86simdsort::qselect(T* arr, size_t k, size_t size, bool hasnan);
14+
x86simdsort::partial_qsort(T* arr, size_t k, size_t size, bool hasnan);
15+
std::vector<size_t> arg = x86simdsort::argsort(T* arr, size_t size, bool hasnan);
16+
std::vector<size_t> arg = x86simdsort::argselect(T* arr, size_t k, size_t size, bool hasnan);
2817
```
29-
Supported datatypes: `uint32_t`, `int32_t`, `float`, `uint64_t`, `int64_t` and
30-
`double`.
3118
32-
The algorithm resorts to scalar `std::sort` if the array contains NaNs.
19+
### Build/Install
3320
34-
#### Quickselect
21+
[meson](https://github.com/mesonbuild/meson) is the used build system. Command
22+
to build and install the library:
3523
36-
```cpp
37-
void avx512_qselect<T>(T* arr, int64_t arrsize)
38-
void avx512_qselect<T>(T* arr, int64_t arrsize, bool hasnan)
39-
```
40-
Supported datatypes: `uint16_t`, `int16_t`, `_Float16`, `uint32_t`, `int32_t`,
41-
`float`, `uint64_t`, `int64_t` and `double`.
42-
43-
For floating-point types, if `bool hasnan` is set, NaNs are moved to the end of
44-
the array, preserving the bit-exact NaNs in the input. If NaNs are present but
45-
`hasnan` is `false`, the behavior is undefined.
46-
47-
#### Partialsort
48-
49-
```cpp
50-
void avx512_partial_qsort<T>(T* arr, int64_t arrsize)
51-
void avx512_partial_qsort<T>(T* arr, int64_t arrsize, bool hasnan)
5224
```
53-
Supported datatypes: `uint16_t`, `int16_t`, `_Float16`, `uint32_t`, `int32_t`,
54-
`float`, `uint64_t`, `int64_t` and `double`.
55-
56-
For floating-point types, if `bool hasnan` is set, NaNs are moved to the end of
57-
the array, preserving the bit-exact NaNs in the input. If NaNs are present but
58-
`hasnan` is `false`, the behavior is undefined.
59-
60-
#### Key-value sort
61-
```cpp
62-
void avx512_qsort_kv<T>(T* key, uint64_t* value , int64_t arrsize)
25+
meson setup --buildtype release builddir && cd builddir
26+
meson compile
27+
sudo meson install
6328
```
64-
Supported datatypes: `uint64_t, int64_t and double`
6529
66-
## Algorithm details
30+
Once installed, you can use `pkg-config --cflags --libs x86simdsortcpp` to
31+
populate the right cflags and ldflags to compile and link your C++ program.
32+
This repository also contains a test suite and benchmarking suite which are
33+
written using [googletest](https://github.com/google/googletest) and [google
34+
benchmark](https://github.com/google/benchmark) frameworks respectively. You
35+
can configure meson to build them both by using `-Dbuild_tests=true` and
36+
`-Dbuild_benchmarks=true`.
6737
68-
The ideas and code are based on these two research papers [1] and [2]. On a
69-
high level, the idea is to vectorize quicksort partitioning using AVX-512
70-
compressstore instructions. If the array size is < 128, then use Bitonic
71-
sorting network implemented on 512-bit registers. The precise network
72-
definitions depend on the size of the dtype and are defined in separate files:
73-
`avx512-16bit-qsort.hpp`, `avx512-32bit-qsort.hpp` and
74-
`avx512-64bit-qsort.hpp`. Article [4] is a good resource for bitonic sorting
75-
network. The core implementations of the vectorized qsort functions
76-
`avx512_qsort<T>(T*, int64_t)` are modified versions of avx2 quicksort
77-
presented in the paper [2] and source code associated with that paper [3].
78-
79-
## Example to include and build this in a C++ code
80-
81-
### Sample code `main.cpp`
38+
### Example usage
8239
8340
```cpp
84-
#include "src/avx512-32bit-qsort.hpp"
41+
#include "x86simdsort.h"
8542
8643
int main() {
87-
const int ARRSIZE = 1000;
88-
std::vector<float> arr;
89-
90-
/* Initialize elements is reverse order */
91-
for (int ii = 0; ii < ARRSIZE; ++ii) {
92-
arr.push_back(ARRSIZE - ii);
93-
}
94-
95-
/* call avx512 quicksort */
96-
avx512_qsort(arr.data(), ARRSIZE);
44+
std::vector<float> arr{1000};
45+
x86simdsort::qsort(arr, 1000, true);
9746
return 0;
9847
}
99-
100-
```
101-
102-
### Build using gcc
103-
104-
```
105-
g++ main.cpp -mavx512f -mavx512dq -O3
10648
```
10749

108-
This is a header file only library and we do not provide any compile time and
109-
run time checks which is recommended while including this your source code. A
110-
slightly modified version of this source code has been contributed to
111-
[NumPy](https://github.com/numpy/numpy) (see this [pull
112-
request](https://github.com/numpy/numpy/pull/22315) for details). This NumPy
113-
pull request is a good reference for how to include and build this library with
114-
your source code.
115-
116-
## Build requirements
117-
118-
None, its header files only. However you will need `make` or `meson` to build
119-
the unit tests and benchmarking suite. You will need a relatively modern
120-
compiler to build.
121-
122-
```
123-
gcc >= 8.x
124-
```
125-
126-
### Build using Meson
127-
128-
meson is the recommended build system to build the test and benchmark suite.
129-
130-
```
131-
meson setup builddir && cd builddir && ninja
132-
```
133-
134-
It build two executables:
135-
136-
- `testexe`: runs a bunch of tests written in ./tests directory.
137-
- `benchexe`: measures performance of these algorithms for various data types.
138-
139-
140-
### Build using Make
141-
142-
Makefile uses `-march=sapphirerapids` as a global compile flag and hence it
143-
will require g++-12. `make` command builds two executables:
144-
- `testexe`: runs a bunch of tests written in ./tests directory.
145-
- `benchexe`: measures performance of these algorithms for various data types
146-
and compares them to std::sort.
147-
148-
You can use `make test` and `make bench` to build just the `testexe` and
149-
`benchexe` respectively.
150-
151-
## Requirements and dependencies
152-
153-
The sorting routines relies only on the C++ Standard Library and requires a
154-
relatively modern compiler to build (gcc 8.x and above). Since they use the
155-
AVX-512 instruction set, they can only run on processors that have AVX-512.
156-
Specifically, the 32-bit and 64-bit require AVX-512F and AVX-512DQ instruction
157-
set. The 16-bit sorting requires the AVX-512F, AVX-512BW and AVX-512 VMBI2
158-
instruction set. The test suite is written using the Google test framework. The
159-
benchmark is written using the google benchmark framework.
160-
161-
## References
162-
163-
* [1] Fast and Robust Vectorized In-Place Sorting of Primitive Types
164-
https://drops.dagstuhl.de/opus/volltexte/2021/13775/
165-
166-
* [2] A Novel Hybrid Quicksort Algorithm Vectorized using AVX-512 on Intel
167-
Skylake https://arxiv.org/pdf/1704.08579.pdf
168-
169-
* [3] https://github.com/simd-sorting/fast-and-robust: SPDX-License-Identifier: MIT
170-
171-
* [4] http://mitp-content-server.mit.edu:18180/books/content/sectbyfn?collid=books_pres_0&fn=Chapter%2027.pdf&id=8030
17250

173-
* [5] https://bertdobbelaere.github.io/sorting_networks.html
51+
### Details
52+
53+
- `x86simdsort::qsort` is equivalent to `qsort` in
54+
[C](https://www.tutorialspoint.com/c_standard_library/c_function_qsort.htm)
55+
or `std::sort` in [C++](https://en.cppreference.com/w/cpp/algorithm/sort).
56+
- `x86simdsort::qselect` is equivalent to `std::nth_element` in
57+
[C++](https://en.cppreference.com/w/cpp/algorithm/nth_element) or
58+
`np.partition` in
59+
[NumPy](https://numpy.org/doc/stable/reference/generated/numpy.partition.html).
60+
- `x86simdsort::partial_qsort` is equivalent to `std::partial_sort` in
61+
[C++](https://en.cppreference.com/w/cpp/algorithm/partial_sort).
62+
- `x86simdsort::argsort` is equivalent to `np.argsort` in
63+
[NumPy](https://numpy.org/doc/stable/reference/generated/numpy.argsort.html).
64+
- `x86simdsort::argselect` is equivalent to `np.argpartition` in
65+
[NumPy](https://numpy.org/doc/stable/reference/generated/numpy.argpartition.html).
66+
67+
Supported datatypes: `uint16_t, int16_t, _Float16, uint32_t, int32_t, float,
68+
uint64_t, int64_t, double`. Note that `_Float16` will require building this
69+
library with g++ >= 12.x. All the functions have an optional argument `bool
70+
hasnan` set to `false` by default (these are relevant to floating point data
71+
types only). If your array has NAN's, the the behaviour of the sorting routine
72+
is undefined. If `hasnan` is set to true, NAN's are always sorted to the end of
73+
the array. In addition to that, qsort will replace all your NAN's with
74+
`std::numeric_limits<T>::quiet_NaN`. The original bit-exact NaNs in
75+
the input are not preserved. Also note that the arg methods (argsort and
76+
argselect) will not use the SIMD based algorithms if they detect NAN's in the
77+
array. You can read details of all the implementations
78+
[here](https://github.com/intel/x86-simd-sort/src/README.md).
79+
80+
### Downstream projects using x86-simd-sort
81+
82+
- NumPy uses this as a [submodule](https://github.com/numpy/numpy/pull/22315) to accelerate `np.sort, np.argsort, np.partition and np.argpartition`.
83+
- A slightly modifed version this library has been integrated into [openJDK](https://github.com/openjdk/jdk/pull/14227).
84+
- [GRAPE](https://github.com/alibaba/libgrape-lite.git): C++ library for parallel graph processing.
85+
- AVX-512 version of the key-value sort has been submitted to [Oceanbase](https://github.com/oceanbase/oceanbase/pull/1325).

meson.build

+4-4
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ int main() {
1919
cancompilefp16 = cpp.compiles(fp16code, args:'-march=sapphirerapids')
2020

2121
subdir('lib')
22-
libsimdsort = shared_library('x86simdsort',
22+
libsimdsort = shared_library('x86simdsortcpp',
2323
'lib/x86simdsort.cpp',
2424
include_directories : [utils, lib],
2525
link_with : [libtargets],
@@ -31,9 +31,9 @@ libsimdsort = shared_library('x86simdsort',
3131
pkg_mod = import('pkgconfig')
3232
pkg_mod.generate(libraries : libsimdsort,
3333
version : '4.0',
34-
name : 'libx86simdsort',
35-
filebase : 'x86simdsort',
36-
description : 'High performance SIMD based sorting routines.')
34+
name : 'libx86simdsortcpp',
35+
filebase : 'x86simdsortcpp',
36+
description : 'C++ template library for high performance SIMD based sorting routines.')
3737

3838
# Build test suite if option build_tests set to true
3939
if get_option('build_tests')

0 commit comments

Comments
 (0)