benchParanumal

benchParanumal contains several benchmark problems set out, or inspired by, the Center for Efficient Exascale Discretizations (CEED) Bake-off Problems.

How to compile `benchParanumal`

There are a couple of prerequisites for building benchParanumal;

MPI
OpenBlas

Installing MPI and OpenBlas can be done using whatever package manager your operating system provides.

OCCA is packaged with benchParanumal in a git submodule. Either clone with --recursive or run

git submodule init
git submoduel update

To build benchParanumal:

$ git clone --recursive https://github.com/paranumal/benchparanumal
$ cd benchparanumal
$ export LIBP_BLAS_DIR=</path/to/openblas>
$ make -j `nproc`

If your MPI supports GPU-aware RDMA functionality, you can optionally build benchParanumal with this support via:

$ make -j `nproc` --gpu-aware-mpi=true

How to run `benchParanumal`

benchParanumal contains two distinct types of tests:

BK - Benchmark Kernels
BP - Benchmark Problems

The usage of each benchmark, outside of the provide run scripts, can be found with the -h option. For example:

$ mpirun -np 1 ./BK/BK1/BK1 -h

Name:     [THREAD MODEL]
CL keys:  [-m, --mode]
Description: OCCA's Parallel execution platform
Possible values: { Serial, OpenMP, CUDA, HIP, OpenCL }

Name:     [PLATFORM NUMBER]
CL keys:  [-pl, --platform]
Description: Parallel platform number (used in OpenCL mode)

Name:     [DEVICE NUMBER]
CL keys:  [-d, --device]
Description: Parallel device number

Name:     [ELEMENT TYPE]
CL keys:  [-e, --elements]
Description: Type of mesh elements
Possible values: { Tri, Quad, Tet, Hex }

Name:     [BOX NX]
CL keys:  [-nx, --dimx]
Description: Number of elements in X-dimension per rank

Name:     [BOX NY]
CL keys:  [-ny, --dimy]
Description: Number of elements in Y-dimension per rank

Name:     [BOX NZ]
CL keys:  [-nz, --dimz]
Description: Number of elements in Z-dimension per rank

Name:     [POLYNOMIAL DEGREE]
CL keys:  [-p, --degree]
Description: Degree of polynomial finite element space
Possible values: { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 }

Name:     [VERBOSE]
CL keys:  [-v, --verbose]
Description: Enable verbose output
Possible values: { TRUE, FALSE }

Name:     [HELP]
CL keys:  [-h, --help]
Description: Print this help message

Here is an example large problem size that you can run on one GPU:

$ mpirun -np 1 ./BP/BP5/BP5 -m HIP -nx 24 -ny 24 -nz 24 -p 15 -v

Running on multiple GPUs can by done by passing a larger argument to np:

$ mpirun -np 4 ./BP/BP5/BP5 -m HIP -nx 24 -ny 24 -nz 24 -p 15 -v

Verifying correctness

To verify that the computation is correct, add the -v option to the command line. Example output towards the end of the run may look like this:

CG: it 96, r norm 1.405229334496e-04, alpha = 2.686587e+00
CG: it 97, r norm 1.375460859099e-04, alpha = 2.540830e+00
CG: it 98, r norm 1.198097786957e-04, alpha = 2.780510e+00
CG: it 99, r norm 1.108821042895e-04, alpha = 2.907639e+00
CG: it 100, r norm 9.086922290200e-05, alpha = 2.946219e+00
BP5: N, DOFs, elapsed, iterations, time per DOF, avg BW (GB/s), avg GFLOPs, DOFs*iterations/ranks*time
4, 68921, 0.0216, 100, 3.13e-07, 81.5, 49.8, 3.20e+08

The printed value of r norm at the end of 100 CG iterations should be small.

How to clean build objects

To clean the benchParanumal build objects:

$ make realclean

Please invoke make help for more supported options.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

benchParanumal

How to compile `benchParanumal`

How to run `benchParanumal`

Verifying correctness

How to clean build objects

Files

README.md

Latest commit

History

README.md

File metadata and controls

benchParanumal

How to compile benchParanumal

How to run benchParanumal

Verifying correctness

How to clean build objects

How to compile `benchParanumal`

How to run `benchParanumal`