Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: add RAJA kernel launches and basic CUDA support #1026

Open
wants to merge 100 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 74 commits
Commits
Show all changes
100 commits
Select commit Hold shift + click to select a range
b47db88
Clang format
johnbowen42 Oct 18, 2023
325f868
Cleanup print statements and commented code
johnbowen42 Oct 18, 2023
9c5d5b1
Refactor more raja for all loops
johnbowen42 Oct 25, 2023
4e9e7a9
Passing more tests
johnbowen42 Nov 29, 2023
a8c1dd6
tmp
johnbowen42 Dec 4, 2023
8b5dc3d
adding memcpy ops
johnbowen42 Dec 11, 2023
162052a
Add memcpy operations
johnbowen42 Dec 13, 2023
f35d677
Unit test passing for hexahedron
johnbowen42 Feb 8, 2024
bb181ec
enable more unit tests
johnbowen42 Feb 13, 2024
8d51959
Remove Cmakelists cruft
johnbowen42 Feb 13, 2024
dc9739f
Format RAJA launch kernels
johnbowen42 Feb 14, 2024
831ec04
Merge branch 'develop' into feature/bowen/raja-for-all
johnbowen42 Feb 14, 2024
6aa995c
format RAJA kernels
johnbowen42 Feb 15, 2024
7c598cb
re-enable functional shape derivatives
johnbowen42 Feb 22, 2024
83f8bcd
Re-enable functional_basic_h1_vector
johnbowen42 Feb 22, 2024
fdb4b23
Fix more unit tests not compiling
johnbowen42 Feb 23, 2024
2480dc3
Convert lambda use to functors. Change signature of interpolate to a…
johnbowen42 Feb 26, 2024
96049f1
Merge branch 'develop' into feature/bowen/raja-for-all
johnbowen42 Feb 26, 2024
61130b4
Refactor interpolate API for finite elements
johnbowen42 Feb 27, 2024
7838993
Change umpire usage
johnbowen42 Feb 27, 2024
0e02e98
Refactor boundary_integral_kernels to use CUDA. Make interpolate void
johnbowen42 Feb 28, 2024
e4224e2
Eliminate unused code
johnbowen42 Feb 28, 2024
bffae94
Fixing some unit tests
johnbowen42 Feb 29, 2024
4c1f6e5
Fix more unit tests
johnbowen42 Mar 4, 2024
274f432
Fix various solid unit tests
johnbowen42 Mar 5, 2024
fcbf4a8
fix thermal unit tests
johnbowen42 Mar 6, 2024
be9abc1
Merge branch 'develop' into feature/bowen/raja-for-all
johnbowen42 Mar 6, 2024
72e6d30
TMP: use mfem vector/device to manage memory
johnbowen42 Mar 12, 2024
be05190
Fix functional with domain heap overflow
johnbowen42 Mar 14, 2024
caeec16
tmp
johnbowen42 Mar 14, 2024
865848e
Fix bug
johnbowen42 Mar 14, 2024
9836836
reenable test
johnbowen42 Mar 14, 2024
1bca89d
Fix compilation error
johnbowen42 Mar 14, 2024
f3d03f7
Fix unused variable and narrowing warnings
johnbowen42 Mar 18, 2024
10da80a
modify CMakeLists
johnbowen42 Mar 19, 2024
5dca932
Merge branch 'develop' into feature/bowen/raja-for-all
johnbowen42 Mar 19, 2024
46bf790
fix compilation errors
johnbowen42 Mar 19, 2024
dddf080
Fix internal compiler error in functional shape derivatives. Add do…
johnbowen42 Mar 20, 2024
c47ded4
More docs and gcc build errors
johnbowen42 Mar 20, 2024
2c96666
add doc strings
johnbowen42 Mar 21, 2024
474d0bd
Merge branch 'develop' into feature/bowen/raja-for-all
johnbowen42 Mar 21, 2024
da53cca
docs and format
johnbowen42 Mar 21, 2024
b4ec3c2
clang format
johnbowen42 Mar 21, 2024
25223b3
Merge branch 'develop' into feature/bowen/raja-for-all
johnbowen42 Mar 22, 2024
7e823ff
remove headers
johnbowen42 Mar 22, 2024
e1014e7
delete more headers
johnbowen42 Mar 22, 2024
bec4cde
decrease level of optimization
johnbowen42 Mar 22, 2024
6a0a0d4
Merge branch 'develop' of github.com:LLNL/serac into feature/bowen/ra…
jamiebramwell Mar 25, 2024
e2165ea
Revert assemble()
jamiebramwell Mar 25, 2024
0f843d0
lower job slots to see if memory is the issue
white238 Mar 26, 2024
e5c91f6
remove unneeded options
white238 Mar 27, 2024
a6605b6
quiet warnings, unify defines
white238 Mar 27, 2024
84feae7
Merge branch 'develop' into feature/bowen/raja-for-all
white238 Mar 27, 2024
a9da335
style
white238 Mar 27, 2024
0c95469
remove using statements from elements
johnbowen42 Apr 9, 2024
7d58b5d
remove using statements from headers
johnbowen42 Apr 9, 2024
3545588
Move RAJA types into header, cleanup conditional compilation in evalu…
johnbowen42 Apr 10, 2024
5d7cdc7
Merge branch 'develop' into feature/bowen/raja-for-all
johnbowen42 Apr 10, 2024
d77c24b
Format element headers
johnbowen42 Apr 10, 2024
5f4d580
increase parallelism in build script
johnbowen42 Apr 11, 2024
974d6c4
Add RAJA includes
johnbowen42 Apr 11, 2024
194f0c3
re-enable tests
johnbowen42 Apr 11, 2024
9d16d2c
Rename CUDA execution macro. Add comments. Remove unnecessary expli…
johnbowen42 Apr 11, 2024
2047461
Add cuda scalar unit test
johnbowen42 Apr 11, 2024
ced8b7b
clang style
johnbowen42 Apr 16, 2024
3665bdf
Add CUDA unit tests.
johnbowen42 Apr 29, 2024
d822b96
Merge branch 'develop' into feature/bowen/raja-for-all
white238 May 16, 2024
723062e
fix merge error
white238 May 16, 2024
98fc557
get bug_boundary_qoi to compile
white238 May 16, 2024
54d01c9
Merge branch 'develop' of github.com:LLNL/serac into feature/bowen/ra…
jamiebramwell May 28, 2024
4954a36
Merge branch 'develop' of github.com:LLNL/serac into feature/bowen/ra…
jamiebramwell May 28, 2024
13c7c20
Merge branch 'feature/bowen/raja-for-all' of github.com:LLNL/serac in…
johnbowen42 Aug 1, 2024
ea744ec
Merge branch 'develop' into feature/bowen/raja-for-all
johnbowen42 Aug 1, 2024
27510b4
Enable functional comparisons unit test for CUDA
johnbowen42 Aug 29, 2024
6125197
Fix build issues
johnbowen42 Sep 6, 2024
e58dfab
blt submodule
johnbowen42 Sep 6, 2024
09f4282
Fix unit tests
johnbowen42 Sep 9, 2024
091ccf8
Merge branch 'develop' into feature/bowen/raja-for-all
johnbowen42 Sep 9, 2024
1d8a6aa
fix compilation bug
johnbowen42 Sep 9, 2024
364b46a
fix compilation bug
johnbowen42 Sep 10, 2024
fa98c4c
Plum ExecutionSpace template parameter through more classes
johnbowen42 Sep 13, 2024
1e2f7dc
Add exec space parameter to finite elements
johnbowen42 Sep 13, 2024
7c4067f
attempt to fix ambiguous call error
johnbowen42 Sep 17, 2024
c062c52
Merge branch 'develop' into feature/bowen/raja-for-all
johnbowen42 Sep 17, 2024
2ded3ec
delete whitespace changes
johnbowen42 Sep 17, 2024
e912aab
Merge branch 'feature/bowen/raja-v2' into feature/bowen/raja-for-all
johnbowen42 Sep 18, 2024
9721e45
tmp
johnbowen42 Sep 19, 2024
7ac6b25
Merge branch 'develop' into feature/bowen/raja-for-all
johnbowen42 Sep 19, 2024
8995b61
Change cuda execution interface
johnbowen42 Sep 19, 2024
abac93d
remove whitespace changes
johnbowen42 Sep 19, 2024
bdafb37
Remove conditional compilation
johnbowen42 Sep 20, 2024
0a8b42e
Complete addition of execution space parameter to code
johnbowen42 Sep 24, 2024
17cd58c
Fix link error
johnbowen42 Sep 24, 2024
26b99de
Fix fit test
johnbowen42 Sep 25, 2024
151d793
Add docs and try to fix build error
johnbowen42 Sep 25, 2024
0dc4d88
Merge branch 'develop' into feature/bowen/raja-for-all
johnbowen42 Sep 25, 2024
0a720b7
increase timeout
johnbowen42 Sep 25, 2024
7686463
Fix docs
johnbowen42 Sep 25, 2024
4b2916e
Merge branch 'develop' into feature/bowen/raja-for-all
johnbowen42 Sep 26, 2024
01c8e6a
Merge branch 'develop' into feature/bowen/raja-for-all
johnbowen42 Sep 30, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -13,5 +13,6 @@
*.orig
__pycache__/
view
*.cache*
/_serac_build_and_test*
build-linux-*-*-*
6 changes: 3 additions & 3 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ if (NOT SERAC_STYLE_CI_ONLY)
set(ENABLE_MPI ON CACHE BOOL "")

if (NOT MPI_C_COMPILER OR NOT MPI_CXX_COMPILER)
message(FATAL_ERROR
message(FATAL_ERROR
"Serac requires MPI. It is required to provide the MPI C/C++ "
"compiler wrappers via the CMake variables, "
"MPI_C_COMPILER and MPI_CXX_COMPILER.")
Expand Down Expand Up @@ -146,7 +146,7 @@ endif()
if (SERAC_STYLE_CI_ONLY)
# Exit processing the rest of the build in style only build to avoid any possible
# CMake configuration issues outside of just enabling `make style`. This build,
# is not capable of building Serac and should not be advertised. It is for CI
# is not capable of building Serac and should not be advertised. It is for CI
# purposes only.
return()
endif()
Expand Down Expand Up @@ -212,7 +212,7 @@ if (SERAC_ENABLE_CODEVELOP)
)
endif()

install(EXPORT serac-targets
install(EXPORT serac-targets
NAMESPACE serac::
DESTINATION lib/cmake
)
4 changes: 2 additions & 2 deletions cmake/SeracMacros.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
#------------------------------------------------------------------------------
# Adds code checks for all cpp/hpp files recursively under the current directory
# that regex match INCLUDES and excludes any files that regex match EXCLUDES
#
#
# This creates the following parent build targets:
# check - Runs a non file changing style check and CppCheck
# style - In-place code formatting
Expand Down Expand Up @@ -71,7 +71,7 @@ macro(serac_add_code_checks)
blt_add_clang_tidy_target(NAME ${arg_PREFIX}_guidelines_check_tests
CHECKS "clang-analyzer-*,clang-analyzer-cplusplus*,cppcoreguidelines-*,-cppcoreguidelines-avoid-magic-numbers"
SRC_FILES ${_test_sources})

if (ENABLE_COVERAGE)
blt_add_code_coverage_target(NAME ${arg_PREFIX}_coverage
RUNNER ${CMAKE_MAKE_PROGRAM} test
Expand Down
2 changes: 1 addition & 1 deletion cmake/blt
Submodule blt updated 404 files
24 changes: 12 additions & 12 deletions cmake/thirdparty/SetupSeracThirdParty.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ if (NOT SERAC_THIRD_PARTY_LIBRARIES_FOUND)
# Manually set includes as system includes
foreach(_target cuda_runtime cuda)
get_target_property(_dirs ${_target} INTERFACE_INCLUDE_DIRECTORIES)
set_property(TARGET ${_target}
set_property(TARGET ${_target}
APPEND PROPERTY INTERFACE_SYSTEM_INCLUDE_DIRECTORIES
"${_dirs}")
endforeach()
Expand Down Expand Up @@ -114,7 +114,7 @@ if (NOT SERAC_THIRD_PARTY_LIBRARIES_FOUND)

# Manually set includes as system includes
get_target_property(_dirs conduit::conduit INTERFACE_INCLUDE_DIRECTORIES)
set_property(TARGET conduit::conduit
set_property(TARGET conduit::conduit
APPEND PROPERTY INTERFACE_SYSTEM_INCLUDE_DIRECTORIES
"${_dirs}")

Expand All @@ -125,7 +125,7 @@ if (NOT SERAC_THIRD_PARTY_LIBRARIES_FOUND)
# Note: Sundials is currently only used via MFEM and MFEM's target contains it's information
serac_assert_is_directory(DIR_VARIABLE SUNDIALS_DIR)
set(SERAC_USE_SUNDIALS ON CACHE BOOL "")

# Note: MFEM sets SUNDIALS_FOUND itself
if (NOT SERAC_ENABLE_CODEVELOP)
set(SUNDIALS_FOUND TRUE)
Expand All @@ -143,7 +143,7 @@ if (NOT SERAC_THIRD_PARTY_LIBRARIES_FOUND)
serac_assert_is_directory(DIR_VARIABLE PETSC_DIR)
# NOTE: PETSc is built and used through MFEM
set(SERAC_USE_PETSC ON CACHE BOOL "")

# Note: MFEM *does not* set PETSC_FOUND itself, likely because we skip petsc build tests
set(PETSC_FOUND TRUE)
else()
Expand All @@ -159,7 +159,7 @@ if (NOT SERAC_THIRD_PARTY_LIBRARIES_FOUND)
serac_assert_is_directory(DIR_VARIABLE SLEPC_DIR)
# NOTE: SLEPc is built and used through MFEM
set(SERAC_USE_SLEPC ON CACHE BOOL "")

# Note: MFEM sets SLEPC_FOUND itself
if (NOT SERAC_ENABLE_CODEVELOP)
set(SLEPC_FOUND TRUE)
Expand Down Expand Up @@ -196,7 +196,7 @@ if (NOT SERAC_THIRD_PARTY_LIBRARIES_FOUND)

#### Store Data that MFEM clears
set(tpls_to_save ADIAK AMGX AXOM CALIPER CAMP CONDUIT HDF5
HYPRE LUA METIS MFEM NETCDF PARMETIS PETSC RAJA
HYPRE LUA METIS MFEM NETCDF PARMETIS PETSC RAJA
SLEPC SUPERLU_DIST STRUMPACK SUNDIALS TRIBOL
UMPIRE)
foreach(_tpl ${tpls_to_save})
Expand Down Expand Up @@ -311,7 +311,7 @@ if (NOT SERAC_THIRD_PARTY_LIBRARIES_FOUND)
else()
add_subdirectory(${PROJECT_SOURCE_DIR}/mfem ${CMAKE_BINARY_DIR}/mfem)
endif()

set(MFEM_FOUND TRUE CACHE BOOL "" FORCE)

# Patch the mfem target with the correct include directories
Expand Down Expand Up @@ -447,7 +447,7 @@ if (NOT SERAC_THIRD_PARTY_LIBRARIES_FOUND)
else()
set(TRIBOL_FOUND OFF)
endif()

message(STATUS "Tribol support is " ${TRIBOL_FOUND})
else()
set(ENABLE_FORTRAN OFF CACHE BOOL "" FORCE)
Expand All @@ -466,7 +466,7 @@ if (NOT SERAC_THIRD_PARTY_LIBRARIES_FOUND)
endif()

add_subdirectory(${tribol_repo_dir} ${CMAKE_BINARY_DIR}/tribol)

target_include_directories(redecomp PUBLIC
$<BUILD_INTERFACE:${tribol_repo_dir}/src>
)
Expand All @@ -475,7 +475,7 @@ if (NOT SERAC_THIRD_PARTY_LIBRARIES_FOUND)
$<BUILD_INTERFACE:${CMAKE_BINARY_DIR}/tribol/include>
$<INSTALL_INTERFACE:include>
)

set(TRIBOL_FOUND TRUE CACHE BOOL "" FORCE)
set(ENABLE_FORTRAN ON CACHE BOOL "" FORCE)
endif()
Expand Down Expand Up @@ -525,7 +525,7 @@ if (NOT SERAC_THIRD_PARTY_LIBRARIES_FOUND)
if(STRUMPACK_DIR)
list(GET MPI_C_LIBRARIES 0 _first_mpi_lib)
get_filename_component(_mpi_lib_dir ${_first_mpi_lib} DIRECTORY)

foreach(_target ${_mfem_targets})
if(TARGET ${_target})
message(STATUS "Adding MPI link directory to target [${_target}]")
Expand Down Expand Up @@ -591,7 +591,7 @@ if (NOT SERAC_THIRD_PARTY_LIBRARIES_FOUND)
message(FATAL_ERROR "Serac+Caliper+CUDA requires CMake > 3.17.")
else()
find_package(CUDAToolkit REQUIRED)
endif()
endif()
endif()

find_dependency(caliper REQUIRED PATHS "${CALIPER_DIR}")
Expand Down
6 changes: 3 additions & 3 deletions scripts/llnl/common_build_functions.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
"""
file: common_build_functions.py

description:
description:
helpers for installing src and tpls on llnl lc systems.

"""
Expand Down Expand Up @@ -374,7 +374,7 @@ def build_and_test_host_configs(prefix, timestamp, use_generated_host_configs, r

test_root = get_build_and_test_root(prefix, timestamp)
os.mkdir(test_root)
write_build_info(pjoin(test_root,"info.json"))
write_build_info(pjoin(test_root,"info.json"))
ok = []
bad = []
for host_config in host_configs:
Expand Down Expand Up @@ -417,7 +417,7 @@ def build_and_test_host_configs(prefix, timestamp, use_generated_host_configs, r
def set_group_and_perms(directory):
"""
Sets the proper group and access permissions of given input
directory.
directory.
"""

skip = True
Expand Down
29 changes: 28 additions & 1 deletion src/serac/infrastructure/accelerator.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
*/

#pragma once

#include "RAJA/RAJA.hpp"
#if defined(__CUDACC__)
#define SERAC_HOST_DEVICE __host__ __device__
#define SERAC_HOST __host__
Comment on lines 15 to 17
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#if defined(__CUDACC__)
#define SERAC_HOST_DEVICE __host__ __device__
#define SERAC_HOST __host__
#define SERAC_HOST_DEVICE RAJA_HOST_DEVICE
#define SERAC_HOST RAJA_HOST
#if defined(__CUDACC__)

Expand Down Expand Up @@ -72,6 +72,26 @@ enum class ExecutionSpace
Dynamic // Corresponds to execution that can "legally" happen on either the host or device
};

#ifdef SERAC_USE_CUDA_KERNEL_EVALUATION

/// @brief Alias for parallel threads policy on GPU
using threads_x = RAJA::LoopPolicy<RAJA::cuda_thread_x_direct>;
using teams_e = RAJA::LoopPolicy<RAJA::cuda_block_x_direct>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use a more descriptive name than x and e?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that e is about MFEM element iterations. I am guessing x is about iterating over vectors?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to not have to guess.

using launch_policy = RAJA::LaunchPolicy<RAJA::cuda_launch_t<false>>;
using forall_policy = RAJA::cuda_exec<128>;

#else

/// @brief Alias for parallel threads policy on GPU.
using threads_x = RAJA::LoopPolicy<RAJA::seq_exec>;
/// @brief Alias for number of teams for GPU kernel launches.
using teams_e = RAJA::LoopPolicy<RAJA::seq_exec>;
/// @brief Alias for GPU kernel launch policy.
using launch_policy = RAJA::LaunchPolicy<RAJA::seq_launch_t>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// @brief Alias for parallel threads policy on GPU.
using threads_x = RAJA::LoopPolicy<RAJA::seq_exec>;
/// @brief Alias for number of teams for GPU kernel launches.
using teams_e = RAJA::LoopPolicy<RAJA::seq_exec>;
/// @brief Alias for GPU kernel launch policy.
using launch_policy = RAJA::LaunchPolicy<RAJA::seq_launch_t>;
/// @brief Alias for sequential policy on normal CPU.
using threads_x = RAJA::LoopPolicy<RAJA::seq_exec>;
/// @brief Alias for sequential policy on normal CPU.
using teams_e = RAJA::LoopPolicy<RAJA::seq_exec>;
/// @brief Alias for sequential policy on normal CPU.
using launch_policy = RAJA::LaunchPolicy<RAJA::seq_launch_t>;

using forall_policy = RAJA::seq_exec;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you thought about also defining OpenMP policies to take advantage of multi-threading on normal multi-core CPUs? Or enabling simd vectorizations with a RAJA::simd_exec policy?

What about policies for El Capitain that doesn't use CUDA, right?

#endif

/**
* @brief The default execution space for serac builds
*/
Expand All @@ -88,6 +108,13 @@ struct execution_to_memory {
static constexpr axom::MemorySpace value = axom::MemorySpace::Dynamic;
};

/// @brief This helper is needed to suppress -Werror compilation errors caused by the
/// explicit captures in the main execution lambdas.
template <typename... T>
SERAC_HOST_DEVICE void suppress_unused_capture_warnings(T...)
{
}

#ifdef SERAC_USE_UMPIRE
/// @overload
template <>
Expand Down
20 changes: 11 additions & 9 deletions src/serac/infrastructure/debug_print.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -131,16 +131,18 @@ void printCUDAMemUsage()
{
int deviceCount = 0;
cudaGetDeviceCount(&deviceCount);
int i = 0;
cudaSetDevice(i);
for (int i = 0; i < deviceCount; ++i) {
cudaSetDevice(i);

size_t freeBytes, totalBytes;
cudaMemGetInfo(&freeBytes, &totalBytes);
size_t usedBytes = totalBytes - freeBytes;
size_t freeBytes, totalBytes;
cudaMemGetInfo(&freeBytes, &totalBytes);
size_t usedBytes = totalBytes - freeBytes;

std::cout << "Device Number: " << i << std::endl;
std::cout << " Total Memory (MB): " << (totalBytes / 1024.0 / 1024.0) << std::endl;
std::cout << " Free Memory (MB): " << (freeBytes / 1024.0 / 1024.0) << std::endl;
std::cout << " Used Memory (MB): " << (usedBytes / 1024.0 / 1024.0) << std::endl;
std::cout << "Device Number: " << i << std::endl;
std::cout << " Total Memory (MB): " << (totalBytes / 1024.0 / 1024.0) << std::endl;
std::cout << " Free Memory (MB): " << (freeBytes / 1024.0 / 1024.0) << std::endl;
std::cout << " Used Memory (MB): " << (usedBytes / 1024.0 / 1024.0) << std::endl;
}
cudaSetDevice(0);
}
#endif
Loading