Everything's Reduced!

This is a collection of key reduction kernel patterns collated from other benchmarks.

TODO

oneDPL
RAJA custom reducer for complex min

Benchmarks

List of benchmark kernels, and their sources.

Reduction	Original benchmark
r += a[i] * b[i]	Dot product
r += c[i]	Sum of complex numbers
(r1,r2) += (c1[i],c2[i])	Sum of complex numbers, stored as SoA
min(abs(c[i]))	Minimum absolute value of complex numbers
Various scalar sums	CloverLeaf Field Summary kernel
count, min, max, mean, std	Pandas Series.Describe

Building

The benchmark is structured as a driver calling routines from a linked library. Each library implements the benchmark kernels.

Build using CMake:

cmake -Bbuild -H. -DMODEL=<model>    # Valid: OpenMP, Kokkos, RAJA, OpenMP-target, SYCL, oneDPL
cmake --build build

OpenMP version

No extra stages are required to build with OpenMP (for the CPU).

OpenMP Target version

The MODEL name is OpenMP-target; you must also specify OMP_TARGET to indicate which backend to use. Currently, only Intel is supported.

We recommend installing oneAPI to use the Intel backend. The HPC Kit is needed for OpenMP. Remember to run /opt/intel/oneapi/setvars.sh (or wherever you have installed it) before building / running.

Backend	Options
Intel	`-DMODEL=OpenMP-target -DOMP_TARGET=Intel -DCMAKE_CXX_COMPILER=icpx`

Kokkos version

This code builds Kokkos inline.

Download the Kokkos source.
Add -DKOKKOS_SRC=/path/to/downloaded/kokkos to the CMake configure stage.
Pass any Kokkos options to the CMake configure too: -DKokkos_.... For example:

Backend	Options
CUDA	`-DKokkos_ENABLE_CUDA=On -DKokkos_ENABLE_CUDA_LAMBDA=On -DKokkos_ARCH_VOLTA72=On`

RAJA version

Download the latest RAJA source: git clone --recursive https://github.com/LLNL/RAJA.git
Add -DRAJA_SRC=/path/to/downloaded/RAJA to the CMake configure stage.
Add other options according to the table below:

Backend	Options
OpenMP	`-DENABLE_OpenMP=On`
CUDA	`-DENABLE_CUDA=On -DCMAKE_CUDA_ARCHITECTURES=XX -DCUDA_ARCH=sm_XX -DCUDA_TOOLKIT_ROOT_DIR=/path/to/cuda`
HIP	`-DENABLE_HIP=On`

SYCL version

We recommend installing the oneAPI basekit. Remember to run /opt/intel/oneAPI/setvars.sh (or wherever you have installed it) before building / running.

We also recommend using a recent 'nightly' build of the oneAPI DPC++ compiler, which can be found here (see the 'assets' arrow and download dpcpp_compiler.tar.gz.) When you have untarred the package, you should source <path-to-nightly>/startup.sh.

Toolchain	Options
oneAPI	`-DCMAKE_CXX_COMPILER=clang++ -DCMAKE_CXX_FLAGS='-fsycl -fsycl-unnamed-lambda'`

oneDPL version

We recommend installing the oneAPI basekit. Remember to run /opt/intel/oneAPI/setvars.sh (or wherever you have installed it) before building / running.

This code has only been tested with the 2021.3 release of oneAPI.

Toolchain	Options
oneAPI	`-DCMAKE_CXX_COMPILER=clang++ -DCMAKE_CXX_FLAGS='-fsycl -fsycl-unnamed-lambda'`

Organisation

The repository is arranged as follows:

<root>
    README.md               # This README file
    AUTHORS                 # A list of contributors
    LICENSE                 # The software license governing the software in this repository
    src/
        main.cpp            # Main driver code
        CMakeLists.txt      # CMake build system configuration
        config.hpp.in       # Input file for printing software version at runtime
        *.hpp               # Benchmark definition header files
        omp/*.cpp           # OpenMP implementations of the benchmarks
        kokkos/*.cpp        # Kokkos implementations of the benchmarks
        raja/*.cpp          # RAJA implementations of the benchmarks

Each benchmark follows a standardised interface:

Constructor: Set up the programming model (if required).
setup(): Allocate and initialise benchmark data.
run(): Run the benchmark once. This will return a result that can be verified.
teardown(): Deallocate benchmark data.
Deconstructor: Programming model finalisation (if required).

The main driver code calls (and times) these routines in the above order.

The benchmark definition defines a expect() routine, which specifies the expected result of the benchmark, as returned byrun().

Each implementation in a particular programming model supplies a .cpp file for each of the benchmarks. These take the form of the class definition for each of the benchmark classes in the corresponding headers.

At build time, one implementation (programming model) is selected, and linked against the main application. We use a PImpl approach for storing the benchmark data. Each programming model uses very different types for data arrays (pointers, Views, Buffers, etc.). Using this approach we can avoid templating the main driver code, and supply different implementations for each benchmark at link time.

Citing

To be announced.

Name		Name	Last commit message	Last commit date
Latest commit History 184 Commits
results		results
src		src
AUTHORS		AUTHORS
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Everything's Reduced!

Contents

TODO

Benchmarks

Building

OpenMP version

OpenMP Target version

Kokkos version

RAJA version

SYCL version

oneDPL version

Organisation

Citing

About

Releases

Packages

Contributors 3

Languages

License

UoB-HPC/everythingsreduced

Folders and files

Latest commit

History

Repository files navigation

Everything's Reduced!

Contents

TODO

Benchmarks

Building

OpenMP version

OpenMP Target version

Kokkos version

RAJA version

SYCL version

oneDPL version

Organisation

Citing

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages