-
Notifications
You must be signed in to change notification settings - Fork 572
BlockCRS Benchmark
We demonstrate how to configure the Trilinos code for Intel and NVIDIA GPU architectures. First we show the base configuration that is commonly used for our target architectures and we show the custom cmake variables and setup for each.
#!/bin/bash
USE_CUDA=OFF # ON if GPU
USE_OPENMP=ON
EXAMPLE=ON
TEST=ON
BUILD_TYPE=RELEASE # or DEBUG
TRILINOS_DIR=/your/trilinos/source/directory
INSTALL_DIR=/your/trilinos/install/directory
rm -rf C*
cmake \
-D BUILD_SHARED_LIBS:BOOL=OFF \
-D Trilinos_ENABLE_EXPLICIT_INSTANTIATION:BOOL=ON \
-D Trilinos_ENABLE_INSTALL_CMAKE_CONFIG_FILES:BOOL=ON \
-D Trilinos_ENABLE_EXAMPLES:BOOL=${EXAMPLE} \
-D Trilinos_ENABLE_TESTS:BOOL=${TEST} \
-D Trilinos_ENABLE_Fortran:BOOL=OFF \
-D Trilinos_ENABLE_KokkosCore:BOOL=ON \
-D Trilinos_ENABLE_KokkosAlgorithms:BOOL=ON \
-D Trilinos_ENABLE_ALL_PACKAGES:BOOL=OFF \
-D Trilinos_ENABLE_ALL_OPTIONAL_PACKAGES:BOOL=OFF \
-D Trilinos_ENABLE_Tpetra:BOOL=ON \
-D Teuchos_ENABLE_LONG_LONG_INT:BOOL=OFF \
-D CMAKE_BUILD_TYPE:STRING=${BUILD_TYPE} \
-D CMAKE_CXX_COMPILER:FILEPATH="mpicxx" \
-D CMAKE_VERBOSE_MAKEFILE:BOOL=OFF \
-D CMAKE_SKIP_RULE_DEPENDENCY=ON \
-D CMAKE_INSTALL_PREFIX:PATH=${INSTALL_DIR} \
-D TPL_ENABLE_GLM=OFF \
-D TPL_ENABLE_MPI:BOOL=ON \
-D TPL_ENABLE_LAPACK:BOOL=ON \
-D TPL_ENABLE_BLAS:BOOL=ON \
-D CMAKE_SKIP_RULE_DEPENDENCY=ON \
-D Trilinos_ENABLE_OpenMP=${USE_OPENMP} \
-D Kokkos_ENABLE_OpenMP:BOOL=${USE_OPENMP} \
-D Kokkos_ENABLE_TESTS:BOOL=ON \
-D TPL_ENABLE_CUDA:BOOL=${USE_CUDA} \
-D TPL_ENABLE_CUSPARSE:BOOL=${USE_CUDA} \
-D Kokkos_ENABLE_Cuda:BOOL=${USE_CUDA} \
-D Kokkos_ENABLE_Cuda_UVM:BOOL=${USE_CUDA} \
$TRILINOS_DIR
- specify KOKKOS_ARCH
-D KOKKOS_ARCH="[OPT]", available options are
[AMD]
AMDAVX = AMD CPU
[ARM]
ARMv80 = ARMv8.0 Compatible CPU
ARMv81 = ARMv8.1 Compatible CPU
ARMv8-ThunderX = ARMv8 Cavium ThunderX CPU
[IBM]
Power7 = IBM POWER7 and POWER7+ CPUs
Power8 = IBM POWER8 CPUs
Power9 = IBM POWER9 CPUs
[Intel]
WSM = Intel Westmere CPUs
SNB = Intel Sandy/Ivy Bridge CPUs
HSW = Intel Haswell CPUs
BDW = Intel Broadwell Xeon E-class CPUs
SKX = Intel Sky Lake Xeon E-class HPC CPUs (AVX512)
[Intel Xeon Phi]
KNC = Intel Knights Corner Xeon Phi
KNL = Intel Knights Landing Xeon Phi
[NVIDIA]
Kepler30 = NVIDIA Kepler generation CC 3.0
Kepler32 = NVIDIA Kepler generation CC 3.2
Kepler35 = NVIDIA Kepler generation CC 3.5
Kepler37 = NVIDIA Kepler generation CC 3.7
Maxwell50 = NVIDIA Maxwell generation CC 5.0
Maxwell52 = NVIDIA Maxwell generation CC 5.2
Maxwell53 = NVIDIA Maxwell generation CC 5.3
Pascal60 = NVIDIA Pascal generation CC 6.0
Pascal61 = NVIDIA Pascal generation CC 6.1
Volta70 = NVIDIA Volta generation CC 7.0
Volta72 = NVIDIA Volta generation CC 7.2
for heterogeneous architectures, put each arch variables with comma e.g., "Power8,Pascal60"
- specify LAPACK and BLAS libraries
-D TPL_LAPACK_LIBRARIES:FILEPATH="-llapack" or "-mkl" (if an Intel compiler is used)
-D TPL_BLAS_LIBRARIES:FILEPATH="-lblas" or "-mkl" (if an Intel compiler is used)
if your BLAS and LAPACK is located in a non-standard path, please append the path to LD_LIBRARY_PATH.
- For CUDA, set CUDA specfiic environment varialbes as follows.
export OMPI_CXX=${TRILINOS_DIR}/packages/kokkos/bin/nvcc_wrapper
export CUDA_LAUNCH_BLOCKING=1
export CUDA_MANAGED_FORCE_DEVICE_ALLOC=1
- $BUILD/packages/tpetra/core/example/BlockCrs/TpetraCore_BlockCrsPerfTest.exe
[kyukim @bread] BlockCrs > ./TpetraCore_BlockCrsPerfTest.exe --help
Usage: ./TpetraCore_BlockCrsPerfTest.exe [options]
options:
--help Prints this help message
--pause-for-debugging Pauses for user input to allow attaching a debugger
--echo-command-line Echo the command-line but continue as normal
--num-elements-i int Number of cells in the I dimension.
(default: --num-elements-i=2)
--num-elements-j int Number of cells in the J dimension.
(default: --num-elements-j=2)
--num-elements-k int Number of cells in the K dimension.
(default: --num-elements-k=2)
--num-procs-i int Processor grid of (npi,npj,npk); npi*npj*npk should be equal to the number of MPI ranks.
(default: --num-procs-i=1)
--num-procs-j int Processor grid of (npi,npj,npk); npi*npj*npk should be equal to the number of MPI ranks.
(default: --num-procs-j=1)
--num-procs-k int Processor grid of (npi,npj,npk); npi*npj*npk should be equal to the number of MPI ranks.
(default: --num-procs-k=1)
--blocksize int Block size. The # of DOFs coupled in a multiphysics flow problem.
(default: --blocksize=5)
--nrhs int Number of right hand sides to solve for.
(default: --nrhs=1)
--repeat int Number of iterations of matvec operations to measure performance.
(default: --repeat=100)
- Single Node OpenMP Strong Scale
export OMP_PROC_BIND=spread
export OMP_PLACES=threads
OMP_NUM_THREADS=1 ./TpetraCore_BlockCrsPerfTest.exe --num-elements-i=32 --num-elements-j=32 --num-elements-k=32 --blocksize=5 --nrhs=1 --repeat=20
OMP_NUM_THREADS=4 ./TpetraCore_BlockCrsPerfTest.exe --num-elements-i=32 --num-elements-j=32 --num-elements-k=32 --blocksize=5 --nrhs=1 --repeat=20
-
Single Node CUDA
-
Multi Node Weak Scale
- Platform used:
- Summary or screenshot:
Copyright © Trilinos a Series of LF Projects, LLC
For web site terms of use, trademark policy and other project policies please see https://lfprojects.org.
Trilinos Developer Home
Trilinos Package Owners
Policies
New Developers
Trilinos PR/CR
Productivity++
Support Policy
Test Dashboard Policy
Testing Policy
Managing Issues
New Issue Quick Ref
Handling Stale Issues and Pull Requests
Release Notes
Software Quality Plan
Compiler Warnings/Errors
Proposing a New Package
Guidance on Copyrights and Licenses
Tools
CMake
Doxygen
git
GitHub Notifications
Mail lists
Clang-format
Version Control
Initial git setup
'feature'/'develop'/'master' (cheatsheet)
Simple centralized workflow
Building
SEMS Dev Env
Mac OS X
ATDM Platforms
Containers
Development Tips
Automated Workflows
Testing
Test Harness
Pull Request Testing
Submitting a Pull Request
Pull Request Workflow
Reproducing PR Errors
Addressing Test Failures
Trilinos Status Table Archive
Pre-push (Checkin) Testing
Remote pull/test/push
PR Creation & Approval Guidelines for Tpetra, Ifpack2, and MueLu Developers