Skip to content

Commit 5b5884c

Browse files
authored
Merge pull request #130 from r-devulap/v5.0
Update README with object_qsort
2 parents 6362001 + eaadd5e commit 5b5884c

File tree

5 files changed

+652
-15
lines changed

5 files changed

+652
-15
lines changed

README.md

+88-14
Original file line numberDiff line numberDiff line change
@@ -1,32 +1,57 @@
11
# x86-simd-sort
22

33
C++ template library for high performance SIMD based sorting routines for
4-
16-bit, 32-bit and 64-bit data types. The sorting routines are accelerated
5-
using AVX-512/AVX2 when available. The library auto picks the best version
6-
depending on the processor it is run on. If you are looking for the AVX-512 or
7-
AVX2 specific implementations, please see
8-
[README](https://github.com/intel/x86-simd-sort/blob/main/src/README.md) file under
9-
`src/` directory. The following routines are currently supported:
4+
built-in integers and floats (16-bit, 32-bit and 64-bit data types) and custom
5+
defined C++ objects. The sorting routines are accelerated using AVX-512/AVX2
6+
when available. The library auto picks the best version depending on the
7+
processor it is run on. If you are looking for the AVX-512 or AVX2 specific
8+
implementations, please see
9+
[README](https://github.com/intel/x86-simd-sort/blob/main/src/README.md) file
10+
under `src/` directory. The following routines are currently supported:
11+
12+
## Sort an array of custom defined class objects (uses `O(N)` space)
13+
``` cpp
14+
template <typename T, typename Func>
15+
void x86simdsort::object_qsort(T *arr, uint32_t arrsize, Func key_func)
16+
```
17+
`T` is any user defined struct or class and `arr` is a pointer to the first
18+
element in the array of objects of type `T`. `Func` is a lambda function that
19+
computes the `key` value for each object which is the metric used to sort the
20+
objects. `Func` needs to have the following signature:
1021
22+
```cpp
23+
[] (T obj) -> key_t { key_t key; /* compute key for obj */ return key; }
24+
```
1125

12-
### Sort routines on arrays
26+
Note that the return type of the key `key_t` needs to be one of the following
27+
: `[float, uint32_t, int32_t, double, uint64_t, int64_t]`. `object_qsort` has a
28+
space complexity of `O(N)`. Specifically, it requires `arrsize *
29+
sizeof(key_t)` bytes to store a vector with all the keys and an additional
30+
`arrsize * sizeof(uint32_t)` bytes to store the indexes of the object array.
31+
For performance reasons, we support `object_qsort` only when the array size is
32+
less than or equal to `UINT32_MAX`. An example usage of `object_qsort` is
33+
provided in the [examples](#Sort-an-array-of-Points-using-object_qsort)
34+
section. Refer to [section](#Performance-of-object_qsort) to get a sense of
35+
how fast this is relative to `std::sort`.
36+
37+
## Sort an array of built-in integers and floats
1338
```cpp
14-
x86simdsort::qsort(T* arr, size_t size, bool hasnan);
15-
x86simdsort::qselect(T* arr, size_t k, size_t size, bool hasnan);
16-
x86simdsort::partial_qsort(T* arr, size_t k, size_t size, bool hasnan);
39+
void x86simdsort::qsort(T* arr, size_t size, bool hasnan);
40+
void x86simdsort::qselect(T* arr, size_t k, size_t size, bool hasnan);
41+
void x86simdsort::partial_qsort(T* arr, size_t k, size_t size, bool hasnan);
1742
```
1843
Supported datatypes: `T` $\in$ `[_Float16, uint16_t, int16_t, float, uint32_t,
1944
int32_t, double, uint64_t, int64_t]`
2045
21-
### Key-value sort routines on pairs of arrays
46+
## Key-value sort routines on pairs of arrays
2247
```cpp
23-
x86simdsort::keyvalue_qsort(T1* key, T2* val, size_t size, bool hasnan);
48+
void x86simdsort::keyvalue_qsort(T1* key, T2* val, size_t size, bool hasnan);
2449
```
2550
Supported datatypes: `T1`, `T2` $\in$ `[float, uint32_t, int32_t, double,
2651
uint64_t, int64_t]` Note that keyvalue sort is not yet supported for 16-bit
2752
data types.
2853

29-
### Arg sort routines on arrays
54+
## Arg sort routines on arrays
3055
```cpp
3156
std::vector<size_t> arg = x86simdsort::argsort(T* arr, size_t size, bool hasnan);
3257
std::vector<size_t> arg = x86simdsort::argselect(T* arr, size_t k, size_t size, bool hasnan);
@@ -55,16 +80,38 @@ can configure meson to build them both by using `-Dbuild_tests=true` and
5580

5681
## Example usage
5782

83+
#### Sort an array of floats
84+
5885
```cpp
5986
#include "x86simdsort.h"
6087

6188
int main() {
6289
std::vector<float> arr{1000};
63-
x86simdsort::qsort(arr, 1000, true);
90+
x86simdsort::qsort(arr.data(), 1000, true);
6491
return 0;
6592
}
6693
```
6794

95+
#### Sort an array of Points using object_qsort
96+
```cpp
97+
#include "x86simdsort.h"
98+
#include <cmath>
99+
100+
struct Point {
101+
double x, y, z;
102+
};
103+
104+
int main() {
105+
std::vector<Point> arr{1000};
106+
// Sort an array of Points by its x value:
107+
x86simdsort::object_qsort(arr.data(), 1000, [](Point p) { return p.x; });
108+
// Sort an array of Points by its distance from origin:
109+
x86simdsort::object_qsort(arr.data(), 1000, [](Point p) {
110+
return sqrt(p.x*p.x+p.y*p.y+p.z*p.z);
111+
});
112+
return 0;
113+
}
114+
```
68115
69116
## Details
70117
@@ -95,6 +142,33 @@ argselect) will not use the SIMD based algorithms if they detect NAN's in the
95142
array. You can read details of all the implementations
96143
[here](https://github.com/intel/x86-simd-sort/blob/main/src/README.md).
97144
145+
## Performance comparison on AVX-512: `object_qsort` v/s `std::sort`
146+
Performance of `object_qsort` can vary significantly depending on the defintion
147+
of the custom class and we highly recommend benchmarking before using it. For
148+
the sake of illustration, we provide a few examples in
149+
[./benchmarks/bench-objsort.hpp](./benchmarks/bench-objsort.hpp) which measures
150+
performance of `object_qsort` relative to `std::sort` when sorting an array of
151+
3D points represented by the class: `struct Point {double x, y, z;}` and
152+
`struct Point {float x, y, x;}`. We sort these points based on several
153+
different metrics:
154+
155+
+ sort by coordinate `x`
156+
+ sort by manhanttan distance (relative to origin): `abs(x) + abx(y) + abs(z)`
157+
+ sort by Euclidean distance (relative to origin): `sqrt(x*x + y*y + z*z)`
158+
+ sort by Chebyshev distance (relative to origin): `max(abs(x), abs(y), abs(z))`
159+
160+
The performance data (shown in the plot below) can be collected by building the
161+
benchmarks suite and running `./builddir/benchexe --benchmark_filter==*obj*`.
162+
The data plot shown below was collected on a processor with AVX-512 because
163+
`object_qsort` is currently accelerated only on AVX-512 (we plan to add the
164+
AVX2 version soon). For the simplest of cases where we want to sort an array of
165+
struct by one of its members, `object_qsort` can be up-to 5x faster for 32-bit
166+
data type and about 4x for 64-bit data type. It tends to do even better when
167+
the metric to sort by gets more complicated. Sorting by Euclidean distance can
168+
be up-to 10x faster.
169+
170+
![alt text](./misc/object_qsort-perf.jpg?raw=true)
171+
98172
## Downstream projects using x86-simd-sort
99173
100174
- NumPy uses this as a [submodule](https://github.com/numpy/numpy/pull/22315) to accelerate `np.sort, np.argsort, np.partition and np.argpartition`.

benchmarks/bench-objsort.hpp

+1-1
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ struct Point3D {
2929
return std::abs(x) + std::abs(y) + std::abs(z);
3030
}
3131
else if constexpr (name == "chebyshev") {
32-
return std::max(std::max(x, y), z);
32+
return std::max(std::max(std::abs(x), std::abs(y)), std::abs(z));
3333
}
3434
}
3535
};

0 commit comments

Comments
 (0)