Skip to content
This repository has been archived by the owner on Mar 20, 2023. It is now read-only.

Integrate changes from NERSC GPU hackathon. #713

Merged
merged 31 commits into from
Dec 23, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
d452e1a
Update nmodl to hackathon_main.
olupton Nov 23, 2021
8ab49e9
[Hackathon] disable a lot of CI (#694)
olupton Nov 23, 2021
560cc3f
[Hackathon] Add a temporary option for benchmark data. (#695)
olupton Nov 25, 2021
de4e433
Minor changes for building on perlmutter (#697)
pramodk Nov 26, 2021
81dd5ef
Enable OpenMP in CoreNEURON CI. (#698)
olupton Nov 29, 2021
3e394c4
Set by default the number of warps to execute in a large reasonable n…
iomaganaris Nov 29, 2021
a8bb716
Add memory pool for Random123 streams. (#702)
olupton Dec 1, 2021
9649814
Fix Boost-free compilation. (#703)
olupton Dec 2, 2021
21dc2c8
Basic OpenACC -> OpenMP migration. (#693)
olupton Dec 7, 2021
02abf78
GPU data management using OpenACC as well as OpenMP API (#704)
pramodk Dec 9, 2021
57f7724
small openacc fixes (#707)
Dec 10, 2021
56889cc
Fixup to make the CI work better while finalising hackathon changes.
olupton Dec 13, 2021
01a39d7
solve_interleaved2_launcher (CUDA interface) : fixing size of blocksP…
Dec 13, 2021
0fe815e
OpenMP offload: use #pragma instead of runtime API (#708)
alkino Dec 13, 2021
78081b4
Remove unused GPU code (#711)
alkino Dec 14, 2021
781d34f
Fixes and improvements from LLVM/XLC work. (#716)
olupton Dec 14, 2021
1f01552
Use pragmas instead of omp_get_mapped_ptr (#705)
alkino Dec 16, 2021
d03c45f
GPU implementation improvements (#718)
iomaganaris Dec 17, 2021
3fc7037
More CI + disable OpenACC in OpenMP builds (#717)
olupton Dec 17, 2021
9a98f73
NMODL -> hackathon_main.
olupton Dec 17, 2021
5ce52d5
Separate handling of ml inside nrn_acc_manager (#719)
alkino Dec 21, 2021
a6c7078
Fixing jenkins tests
iomaganaris Dec 21, 2021
6b8b6c3
Address review comments.
olupton Dec 21, 2021
531c4fe
Add CUDA toolkit includes.
olupton Dec 21, 2021
e3aeafc
Fixup cmake-format.
olupton Dec 21, 2021
9fddc7d
Compile with -cuda. (#721)
olupton Dec 22, 2021
1fbba17
Cleanup CMake for GPU offload.
olupton Dec 22, 2021
847d415
fixup
olupton Dec 22, 2021
53b0c5f
fixup the fixup :facepalm:
olupton Dec 22, 2021
2c7377c
NMODL -> master after #783.
olupton Dec 22, 2021
5c5b8a3
Drop two OpenMP taskwait directives.
olupton Dec 22, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .clang-format.changes
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
SortIncludes: false
IndentCaseLabels: true
SortIncludes: false
StatementMacros: [nrn_pragma_acc, nrn_pragma_omp]
5 changes: 0 additions & 5 deletions .cmake-format.changes.yaml
Original file line number Diff line number Diff line change
@@ -1,9 +1,4 @@
additional_commands:
cuda_add_library:
pargs: '*'
flags: ["STATIC", "SHARED", "MODULE", "EXCLUDE_FROM_ALL"]
kwargs:
OPTIONS: '*'
cpp_cc_build_time_copy:
flags: ['NO_TARGET']
kwargs:
Expand Down
39 changes: 36 additions & 3 deletions .gitlab-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ spack_setup:
variables:
SPACK_PACKAGE: neuron
SPACK_PACKAGE_REF: ''
SPACK_PACKAGE_SPEC: +coreneuron+debug+tests~legacy-unit
SPACK_PACKAGE_SPEC: +coreneuron+debug+tests~legacy-unit model_tests=channel-benchmark,olfactory
.gpu_node:
variables:
bb5_constraint: volta
Expand Down Expand Up @@ -93,7 +93,19 @@ build:coreneuron+nmodl:gpu:
SPACK_PACKAGE: coreneuron
# +report pulls in a lot of dependencies and the tests fail.
# See https://github.com/BlueBrain/CoreNeuron/issues/518 re: build_type
SPACK_PACKAGE_SPEC: +nmodl+gpu+tests~legacy-unit~report build_type=RelWithDebInfo
SPACK_PACKAGE_SPEC: +nmodl+openmp+gpu+tests~legacy-unit~report~sympy build_type=RelWithDebInfo
extends:
- .spack_build
- .spack_nvhpc
needs: ["build:nmodl:gpu"]

build:coreneuron+nmodl~openmp:gpu:
variables:
SPACK_PACKAGE: coreneuron
# +report pulls in a lot of dependencies and the tests fail.
# See https://github.com/BlueBrain/CoreNeuron/issues/518 re: build_type
# Sympy + OpenMP target offload does not currently work with NVHPC
SPACK_PACKAGE_SPEC: +nmodl~openmp+gpu+tests~legacy-unit~report+sympy build_type=RelWithDebInfo
extends:
- .spack_build
- .spack_nvhpc
Expand All @@ -104,7 +116,7 @@ build:coreneuron:gpu:
SPACK_PACKAGE: coreneuron
# +report pulls in a lot of dependencies and the tests fail.
# See https://github.com/BlueBrain/CoreNeuron/issues/518 re: build_type
SPACK_PACKAGE_SPEC: +gpu+tests~legacy-unit~report build_type=RelWithDebInfo
SPACK_PACKAGE_SPEC: +gpu+openmp+tests~legacy-unit~report build_type=RelWithDebInfo
extends:
- .spack_build
- .spack_nvhpc
Expand All @@ -121,6 +133,10 @@ test:coreneuron+nmodl:gpu:
extends: [.ctest, .gpu_node]
needs: ["build:coreneuron+nmodl:gpu"]

test:coreneuron+nmodl~openmp:gpu:
extends: [.ctest, .gpu_node]
needs: ["build:coreneuron+nmodl~openmp:gpu"]

test:coreneuron:gpu:
extends: [.ctest, .gpu_node]
needs: ["build:coreneuron:gpu"]
Expand Down Expand Up @@ -153,6 +169,18 @@ build:neuron+nmodl:gpu:
- !reference [.spack_build, before_script]
needs: ["build:coreneuron+nmodl:gpu"]

build:neuron+nmodl~openmp:gpu:
stage: build_neuron
extends:
- .spack_build
- .spack_neuron
- .spack_nvhpc
before_script:
# Build py-cython and py-numpy with GCC instead of NVHPC.
- SPACK_PACKAGE_DEPENDENCIES="${SPACK_PACKAGE_DEPENDENCIES}^py-cython%gcc^py-numpy%gcc"
- !reference [.spack_build, before_script]
needs: ["build:coreneuron+nmodl~openmp:gpu"]

build:neuron:gpu:
stage: build_neuron
extends:
Expand Down Expand Up @@ -180,6 +208,11 @@ test:neuron+nmodl:gpu:
extends: [.ctest, .gpu_node]
needs: ["build:neuron+nmodl:gpu"]

test:neuron+nmodl~openmp:gpu:
stage: test_neuron
extends: [.ctest, .gpu_node]
needs: ["build:neuron+nmodl~openmp:gpu"]

test:neuron:gpu:
stage: test_neuron
extends: [.ctest, .gpu_node]
Expand Down
34 changes: 22 additions & 12 deletions CMake/OpenAccHelper.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,9 @@
if(CORENRN_ENABLE_GPU)
# Enable cudaProfiler{Start,Stop}() behind the Instrumentor::phase... APIs
add_compile_definitions(CORENEURON_CUDA_PROFILING CORENEURON_ENABLE_GPU)
# Plain C++ code in CoreNEURON may need to use CUDA runtime APIs for, for example, starting and
# stopping profiling. This makes sure those headers can be found.
include_directories(${CMAKE_CUDA_TOOLKIT_INCLUDE_DIRECTORIES})
# cuda unified memory support
if(CORENRN_ENABLE_CUDA_UNIFIED_MEMORY)
add_compile_definitions(CORENEURON_UNIFIED_MEMORY)
Expand Down Expand Up @@ -47,25 +50,32 @@ if(CORENRN_ENABLE_GPU)
endif()
set(CORENRN_CUDA_VERSION_SHORT "${CUDAToolkit_VERSION_MAJOR}.${CUDAToolkit_VERSION_MINOR}")
endif()
# -acc enables OpenACC support, -cuda links CUDA libraries and (very importantly!) seems to be
# required to make the NVHPC compiler do the device code linking. Otherwise the explicit CUDA
# device code (.cu files in libcoreneuron) has to be linked in a separate, earlier, step, which
# apparently causes problems with interoperability with OpenACC. Passing -cuda to nvc++ when
# compiling (as opposed to linking) seems to enable CUDA C++ support, which has other consequences
# due to e.g. __CUDACC__ being defined. See https://github.com/BlueBrain/CoreNeuron/issues/607 for
# more information about this. -gpu=cudaX.Y ensures that OpenACC code is compiled with the same
# CUDA version as is used for the explicit CUDA code.
set(NVHPC_ACC_COMP_FLAGS "-acc -gpu=cuda${CORENRN_CUDA_VERSION_SHORT}")
set(NVHPC_ACC_LINK_FLAGS "-acc -cuda")
# -cuda links CUDA libraries and also seems to be important to make the NVHPC do the device code
# linking. Without this, we had problems with linking between the explicit CUDA (.cu) device code
# and offloaded OpenACC/OpenMP code. Using -cuda when compiling seems to improve error messages in
# some cases, and to be recommended by NVIDIA. We pass -gpu=cudaX.Y to ensure that OpenACC/OpenMP
# code is compiled with the same CUDA version as the explicit CUDA code.
set(NVHPC_ACC_COMP_FLAGS "-cuda -gpu=cuda${CORENRN_CUDA_VERSION_SHORT},lineinfo")
# Make sure that OpenACC code is generated for the same compute capabilities as the explicit CUDA
# code. Otherwise there may be confusing linker errors. We cannot rely on nvcc and nvc++ using the
# same default compute capabilities as each other, particularly on GPU-less build machines.
foreach(compute_capability ${CMAKE_CUDA_ARCHITECTURES})
string(APPEND NVHPC_ACC_COMP_FLAGS ",cc${compute_capability}")
endforeach()
if(CORENRN_ACCELERATOR_OFFLOAD STREQUAL "OpenMP")
# Enable OpenMP target offload to GPU and if both OpenACC and OpenMP directives are available
# for a region then prefer OpenMP.
add_compile_definitions(CORENEURON_PREFER_OPENMP_OFFLOAD)
string(APPEND NVHPC_ACC_COMP_FLAGS " -mp=gpu")
elseif(CORENRN_ACCELERATOR_OFFLOAD STREQUAL "OpenACC")
# Only enable OpenACC offload for GPU
string(APPEND NVHPC_ACC_COMP_FLAGS " -acc")
else()
message(FATAL_ERROR "${CORENRN_ACCELERATOR_OFFLOAD} not supported with NVHPC compilers")
endif()
# avoid PGI adding standard compliant "-A" flags
set(CMAKE_CXX14_STANDARD_COMPILE_OPTION --c++14)
string(APPEND CMAKE_EXE_LINKER_FLAGS " ${NVHPC_ACC_LINK_FLAGS}")
string(APPEND CMAKE_EXE_LINKER_FLAGS " ${NVHPC_ACC_COMP_FLAGS}")
# Use `-Mautoinline` option to compile .cpp files generated from .mod files only. This is
# especially needed when we compile with -O0 or -O1 optimisation level where we get link errors.
# Use of `-Mautoinline` ensure that the necessary functions like `net_receive_kernel` are inlined
Expand All @@ -81,7 +91,7 @@ if(CORENRN_ENABLE_GPU)
GLOBAL
PROPERTY
CORENEURON_LIB_LINK_FLAGS
"${NVHPC_ACC_COMP_FLAGS} ${NVHPC_ACC_LINK_FLAGS} -rdynamic -lrt -Wl,--whole-archive -L${CMAKE_HOST_SYSTEM_PROCESSOR} -lcorenrnmech -L${CMAKE_INSTALL_PREFIX}/lib -lcoreneuron -Wl,--no-whole-archive"
"${NVHPC_ACC_COMP_FLAGS} -rdynamic -lrt -Wl,--whole-archive -L${CMAKE_HOST_SYSTEM_PROCESSOR} -lcorenrnmech -L${CMAKE_INSTALL_PREFIX}/lib -lcoreneuron -Wl,--no-whole-archive"
)
else()
set_property(GLOBAL PROPERTY CORENEURON_LIB_LINK_FLAGS
Expand Down
17 changes: 16 additions & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,7 @@ add_subdirectory(${CORENEURON_PROJECT_SOURCE_DIR}/external/CLI11)
# Build options
# =============================================================================
option(CORENRN_ENABLE_OPENMP "Build the CORE NEURON with OpenMP implementation" ON)
option(CORENRN_ENABLE_OPENMP_OFFLOAD "Prefer OpenMP target offload to OpenACC" ON)
option(CORENRN_ENABLE_TIMEOUT "Enable nrn_timeout implementation" ON)
option(CORENRN_ENABLE_REPORTING "Enable use of ReportingLib for soma reports" OFF)
option(CORENRN_ENABLE_MPI "Enable MPI-based execution" ON)
Expand Down Expand Up @@ -117,6 +118,7 @@ else()
set(CORENRN_HAVE_NVHPC_COMPILER OFF)
endif()

set(CORENRN_ACCELERATOR_OFFLOAD "Disabled")
if(CORENRN_ENABLE_GPU)
# Older CMake versions than 3.15 have not been tested for GPU/CUDA/OpenACC support after
# https://github.com/BlueBrain/CoreNeuron/pull/609.
Expand All @@ -135,7 +137,7 @@ if(CORENRN_ENABLE_GPU)

# Set some sensible default CUDA architectures.
if(NOT DEFINED CMAKE_CUDA_ARCHITECTURES)
set(CMAKE_CUDA_ARCHITECTURES 60 70 80)
set(CMAKE_CUDA_ARCHITECTURES 70 80)
message(STATUS "Setting default CUDA architectures to ${CMAKE_CUDA_ARCHITECTURES}")
endif()

Expand Down Expand Up @@ -185,6 +187,18 @@ if(CORENRN_ENABLE_GPU)
set(CMAKE_CUDA_FLAGS
"${CMAKE_CUDA_FLAGS} --expt-relaxed-constexpr -Xcudafe --diag_suppress=3057,--diag_suppress=3085"
)

if(CORENRN_ENABLE_NMODL)
# NMODL supports both OpenACC and OpenMP target offload
if(CORENRN_ENABLE_OPENMP AND CORENRN_ENABLE_OPENMP_OFFLOAD)
set(CORENRN_ACCELERATOR_OFFLOAD "OpenMP")
else()
set(CORENRN_ACCELERATOR_OFFLOAD "OpenACC")
endif()
else()
# MOD2C only supports OpenACC offload
set(CORENRN_ACCELERATOR_OFFLOAD "OpenACC")
endif()
endif()

# =============================================================================
Expand Down Expand Up @@ -526,6 +540,7 @@ message(STATUS "MOD2CPP PATH | ${CORENRN_MOD2CPP_BINARY}")
message(STATUS "GPU Support | ${CORENRN_ENABLE_GPU}")
if(CORENRN_ENABLE_GPU)
message(STATUS " CUDA | ${CUDAToolkit_LIBRARY_DIR}")
message(STATUS " Offload | ${CORENRN_ACCELERATOR_OFFLOAD}")
message(STATUS " Unified Memory | ${CORENRN_ENABLE_CUDA_UNIFIED_MEMORY}")
endif()
message(STATUS "Auto Timeout | ${CORENRN_ENABLE_TIMEOUT}")
Expand Down
10 changes: 10 additions & 0 deletions coreneuron/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -284,6 +284,16 @@ target_include_directories(coreneuron SYSTEM
target_include_directories(coreneuron SYSTEM
PRIVATE ${CORENEURON_PROJECT_SOURCE_DIR}/external/CLI11/include)

if(CORENRN_ENABLE_GPU)
# nrnran123.cpp possibly-temporarily uses Boost.Pool in GPU builds if it's available.
olupton marked this conversation as resolved.
Show resolved Hide resolved
find_package(Boost QUIET)
if(Boost_FOUND)
olupton marked this conversation as resolved.
Show resolved Hide resolved
message(STATUS "Boost found, enabling use of memory pools for Random123...")
target_include_directories(coreneuron SYSTEM PRIVATE ${Boost_INCLUDE_DIRS})
target_compile_definitions(coreneuron PRIVATE CORENEURON_USE_BOOST_POOL)
endif()
endif()

set_target_properties(
coreneuron scopmath
PROPERTIES ARCHIVE_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/lib
Expand Down
7 changes: 6 additions & 1 deletion coreneuron/apps/corenrn_parameters.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,12 @@ corenrn_parameters::corenrn_parameters() {
"Print number of instances of each mechanism and detailed memory stats.");

auto sub_gpu = app.add_option_group("GPU", "Commands relative to GPU.");
sub_gpu->add_option("-W, --nwarp", this->nwarp, "Number of warps to balance.", true)
sub_gpu
->add_option("-W, --nwarp",
this->nwarp,
"Number of warps to execute in parallel the Hines solver. Each warp solves a "
"group of cells. (Only used with cell permute 2)",
true)
->check(CLI::Range(0, 1'000'000));
sub_gpu
->add_option("-R, --cell-permute",
Expand Down
4 changes: 2 additions & 2 deletions coreneuron/apps/corenrn_parameters.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -46,8 +46,8 @@ struct corenrn_parameters {
unsigned ms_subint = 2; /// Number of multisend interval. 1 or 2
unsigned spkcompress = 0; /// Spike Compression
unsigned cell_interleave_permute = 0; /// Cell interleaving permutation
unsigned nwarp = 0; /// Number of warps to balance for cell_interleave_permute == 2
unsigned num_gpus = 0; /// Number of gpus to use per node
unsigned nwarp = 65536; /// Number of warps to balance for cell_interleave_permute == 2
unsigned num_gpus = 0; /// Number of gpus to use per node
unsigned report_buff_size = report_buff_size_default; /// Size in MB of the report buffer.
int seed = -1; /// Initialization seed for random number generator (int)

Expand Down
13 changes: 6 additions & 7 deletions coreneuron/apps/main1.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -193,10 +193,11 @@ void nrn_init_and_load_data(int argc,
// precedence is: set by user, globals.dat, 34.0
celsius = corenrn_param.celsius;

#if _OPENACC
#if CORENEURON_ENABLE_GPU
if (!corenrn_param.gpu && corenrn_param.cell_interleave_permute == 2) {
fprintf(stderr,
"compiled with _OPENACC does not allow the combination of --cell-permute=2 and "
"compiled with CORENEURON_ENABLE_GPU does not allow the combination of "
"--cell-permute=2 and "
"missing --gpu\n");
exit(1);
}
Expand Down Expand Up @@ -497,7 +498,7 @@ extern "C" void mk_mech_init(int argc, char** argv) {
}
#endif

#ifdef _OPENACC
#ifdef CORENEURON_ENABLE_GPU
if (corenrn_param.gpu) {
init_gpu();
}
Expand Down Expand Up @@ -558,10 +559,8 @@ extern "C" int run_solve_core(int argc, char** argv) {
#endif
bool compute_gpu = corenrn_param.gpu;

// clang-format off

#pragma acc update device(celsius, secondorder, pi) if (compute_gpu)
// clang-format on
nrn_pragma_acc(update device(celsius, secondorder, pi) if (compute_gpu))
nrn_pragma_omp(target update to(celsius, secondorder, pi) if (compute_gpu))
{
double v = corenrn_param.voltage;
double dt = corenrn_param.dt;
Expand Down
Loading