Skip to content

Commit

Permalink
Squashed 'tpls/kokkos/' changes from 1a3ea28..08ceff9
Browse files Browse the repository at this point in the history
08ceff92b Merge pull request #7202 from ndellingwood/master-release-4.4.00
948c13463 update master_history.txt for 4.4.00
6068673cb Merge branch 'release-candidate-4.4.00' for 4.4.00
f4ef4dab4 Merge pull request #7207 from dalg24/cherry_pick_automated_releases_master
76ffeea8d Add workflow to create releases with SLSA provenance generation
f15e90c9c [ci skip] update changelog for 4.4.0 (#7188)
818b82712 Merge pull request #7190 from dalg24/rc440_desul_atomics_config
c8bbbe2ef Fix atomic accessor for pre-volta GPU architectures (#7189)
ca0efd501 Merge pull request #7185 from ndellingwood/cherry-pick-7181-to-rc4.4.0
c8c11c2f7 Fix bogus warnings for cuda/11.4 with gcc/8.5 (#7181)
608fcbea8 Set version number to 4.4.0
94e62755d Hide `IMPL_REF_COUNT_BRANCH_UNLIKELY` option (#7175)
dbb0abb67 Merge pull request #7174 from rgayatri23/ompt_lock_code_delete
e933bfd5b Implement KOKKOS_ENABLE_IMPL_VIEW_OF_VIEWS_DESTRUCTOR_PRECONDITION_VIOLATION_WORKAROUND (#7168)
c8c6ae9bd OpenMPTarget: Delete ununsed code.
3bf86311e Merge pull request #7173 from dalg24/prefer_exec_space_name_in_tutorial
3a4fb03be Prefer ExecutionSpace::name() to a typeid expression in hello world
39cfac80e Merge pull request #7170 from dalg24/gcc_deprecated_declarations_warnings
af6fb10d5 Disable deprecated warnings with GCC < 11.1 for Pair<T1, void>
02ed662e8 Enable deprecation warnings in the GCC 8.4 build
a6beb5a56 Merge pull request #7166 from tjhei/examples_c++11
b13cef88b tutorials: do not mention requiring c++11
73b6d032b Add nvidia Grace Architecture (#7158)
5a2292743 Fix Kokkos_CoreUnitTest_DeviceAndThreads (#7159)
c22d638b9 Merge pull request #7163 from kokkos/dependabot/github_actions/ossf/scorecard-action-2.4.0
a65a6ce36 Bump ossf/scorecard-action from 2.3.3 to 2.4.0
eb11070f6 Make ExecutionSpace constructors explicit (#7156)
7d0e3e818 Fix Kokkos::Array<T, 0> default initialization for icpc (#7154)
5e2f147cd Make struct "ChunkSize" constructor explicit to avoid implicit construction in RangePolicy (#7151)
277c616e3 OpenMPTarget: Update docker clang build. (#7147)
bc7e02a16 Hidden friend operator== for Kokkos::Array (#7148)
d306b408b SYCL: Use sycl::shift_group_[left|right] and sycl::select_from_group (#7146)
49eccc859 Merge pull request #6828 from masterleinad/sycl_use_auto_range
28a7788b7 Use sycl::ext::oneapi::experimental::auto_range
2b8380992 Drop `Experimental::RawMemoryAllocationFailure` and don't catch exceptions to rethrow them in shared alloc (#7145)
3290a3de1 Fix using View without corresponding mdspan-type (#7140)
23b8813dc Add CMake options to control compilation flags for AMD GPUs (#7127)
f53b26d44 Merge pull request #7143 from dalg24/ompt_bad_alloc
f33443b49 fixup! Throw bad alloc if omp_target_alloc() returns nullptr
84a60e56c Fix Trilinos nightly failure due to `create_mirror*` refactor (#7126)
a07cbe4eb Fix gcc-14 C++26 nightly jenkins build (#7137)
dca818b1a Enable test_view_allocation_error with OpenMPTarget
5154f9d0b Throw bad alloc if omp_target_alloc() returns nullptr
571fbae26 Add `likely` and `unlikely` attribute from C++20 to ref counting in views (#6730)
b02c83a4c Merge pull request #7141 from masterleinad/disable_failing_nan_tests_nvhpc
0a64dfc74 Get rid of `RawMemoryAllocationFailure::AllocationMechanism` and derived backend-specific exceptions (#7139)
7247c7f4f Check for LIBCXX 10 or later for C++20 and later (#7123)
1dd782dbf no_device_stack is unknown
4dc70d811 NVHPC: Disable failing NaN tests
8e682582f Merge pull request #6987 from masterleinad/remove_nvhpc_as_device_compiler_support
7bcd7acaf [ci skip] rename jenkins build
981163a67 Merge pull request #7138 from masterleinad/fix_sycl_sstream
ee50eeb17 SYCL: Add support for Graphs (#6912)
d7c0be575 SYCL: Add missing include for std::stringstream
62e60260d OpenMP: Ensure kernels submitted by multiple threads to the same instance don't run concurrently (#6151)
c0edea91c Merge pull request #7133 from dalg24/simplify_finalize_logic
7ff2e10b3 Disable the PushFinalizeHookTerminate test on Windows
54cd8dac4 Merge pull request #7134 from masterleinad/fix_hip_nightly
59773deec [ci skip] Fix ROCm version to 6.1.2 in nightly CI
db8326fb1 Merge pull request #7132 from dalg24/get_rid_of_exception_swallowers
1fbfc7248 Simplify the logic when finalizing and calling the registered functions
e215b6bb3 Drop (unused) KOKKOS_ADD_ADVANCED_TEST TriBITS function
cbed0764b Let the throwing push finalize hook calls terminate test actually run
44befe5a0 Do not swallow errors when deallocating memory with CUDA
63141b5b8 Merge pull request #7128 from masterleinad/c++20_minimum_compiler_versions
41eb0b6e1 Fix using and, or, xor in desul with MSVC (#7124)
1c04d0464 Merge pull request #7131 from dalg24/avoid_catch_mem_alloc_failure_and_rethrow
0e0307313 Do not bother catching memory allocation failure and rethrow
af933fe32 Merge pull request #7129 from dalg24/drop_cuda_uvm_allocation_count
971c88440 Drop (unused) cuda uvm allocations counter
cca439ff5 Define minimum compiler versions for C++20 support
73ac2492d Merge pull request #7081 from Rombur/gcc_14
2b248b0d2 Refactor: Move logic of `create_mirror*` to `Impl::create_mirror*` (#7061)
a6e7f0d1f Merge pull request #7121 from kokkos/dependabot/github_actions/actions/upload-artifact-4.3.4
669b9f279 Deprecate `RawMemoryAllocationFailure::FailureMode::MaximumCudaUVMAllocationsExceeded` (#7120)
35b3f288f Merge pull request #7122 from dalg24/update_hip_nightly
72d9d077f Update HIP nightly build base image Ubuntu 20.04 -> 22.04
a18bcb059 Bump actions/upload-artifact from 4.3.3 to 4.3.4
93e372cbb Fix and test with -fsanitize=undefined in GitHub CI (#7104)
487b310c9 Merge pull request #7119 from ldh4/simd_fix_div_by_zero
56a40db0a Fix div by zero in math ops testing
33c9b8cef Merge pull request #7118 from crtrott/update-mdspan
b84125ed6 Add AtomicAccessorRelaxed (#7089)
dc175068b Update mdspan to 98a12b01b51b2
ba2075b3d Merge pull request #7117 from Rombur/hip_20
2a465fa8c Merge pull request #7102 from vicentebolea/fix-relative-dir-install
7b3e7c872 Merge pull request #7112 from masterleinad/fx_sycl_ci
4ddc65077 Merge pull request #7114 from crtrott/add-concepts-include-in-test
e396c8f50 Update base image for ROCm 5.6
83f975a94 Github CI: Test with C++17, C++20, and C++23 (#7082)
4a54fb34b Add missing concepts include in test
14093185a SYCL CI: Manually build oneDPL
fc45a032c move view allocation related functionality to a new header (#7110)
b5f51b9c8 Workaround to ice with icpc when using -no-ip (#7106)
f562ca246 Merge pull request #6802 from ldh4/simd_use_larger_vec_width
9f7a92f05 Clean up KOKKOS_LIB_INCLUDE_DIRECTORIES, append include directories to associated targets in Trilinos builds (#7103)
d9f7dfe85 Merge pull request #7108 from masterleinad/restrict_jenkins_cuda
720490e7c cmake: fix relative to find kokkos_compiler_launcher
f10076cb9 Restrict jenkins CI not to run on hopper for nvcc < 11.8
b650199ca clang formating
4d1278ec2 Added a comment about is_type structs
6e167f26a Workaround for the compilation failure for rocm 5.6-6.0
1eb1abe5e Disabling simd unit tests from building for Windows+CUDA build
61de582d1 clang-formatted
e02c6a351 Added for width 4 for NEON
aa833570a Added for AVX512
e320e0054 Added width 8 abi for avx2
2d7715239 Fix SpaceAwareAccessor based on usage experiment in View (#7088)
93db4f783 Merge pull request #7094 from aprokop/exec_spaces
28614907c Remove FIXME_NVHPC 23.7 guards
53b320221 Cleanup KokkosP hooks in `Profiling::` (#7096)
46df6c18f Merge pull request #7093 from seyonglee/disable_mdspanerror_openacc
b69cf9eab Merge pull request #7099 from ndellingwood/fix-werror-icpc
a10c912db Merge pull request #7097 from kokkos/dependabot/github_actions/Jimver/cuda-toolkit-0.2.16
a4e7eab6c Couple more icpc -Werror fixes
546bb2bd4 Merge pull request #7098 from DerNils-git/develop
1771bfd90 Copy print_configuration setting in combination of kokkos settings.
c2a586338 Bump Jimver/cuda-toolkit from 0.2.15 to 0.2.16
70d50fe6f Merge pull request #7091 from JBludau/remove_overwrite_of_default_space
e0d99fdd7 Merge pull request #7095 from ndellingwood/fix-more-icpc
e826e7fc8 Fix more icpc issues
8fc95f871 Add missing space
304ad9d3e Temporarily disable failing parts in the TestMDSpan.hpp for the OpenACC backend.
c7acfb75b remove cmake options to change default spaces
3c30f4023 Remove support for NVHPC as CUDA device compiler
24454fa82 Resolve various bogus icpc -Werror (#7079)
2c0bd1644 Merge pull request #7080 from masterleinad/threads_safety_serial
a0b8deab0 Merge pull request #7078 from crtrott/update-desul
8501d5a90 Update desul version in github workflow
ea4b96f8f Update internal desul file copies to 60c1115
04f6a4f5e Merge pull request #7083 from crtrott/add-missing-include
cf14f1c71 Complex needs a tuple include
362c9d724 Merge pull request #7074 from crtrott/space-aware-accessor
9ac81a88d Don't delete special member functions explicitly
1e14d047c fix refcount exception safety (#6289)
3de267cb8 Improve performance for deleting an instance.
7b962ce29 Use correct includes for spaceawareaccessor
708abe21a Move `layout_iterate_type_selector` into Impl namespace (#7076)
e2d5815bb Update from GCC 13 to 14 and use C++ 26 in Jenkins nightly
3309d9332 Fix thread-safety for the Serial backend
2678194c7 Structured binding support for Kokkos::complex (#7040)
3d27bf596 Merge pull request #7077 from crtrott/fix-dynamic-extent-definition
3418084ea OpenACC: Skip exec_space_thread_safety_range_scan (#7022)
549858227 Fix using shared libraries and -fvisibility=hidden (#7065)
6c78f4b1f SpaceAwareAccessor: fix issues (no-unique-address, is_empty)
a1f1255ac Fix incompatible dynamic_extent definition in Kokkos
34db5182e Address review comments
a6b95e9f5 Add specialization of SpaceAwareAcc for AnonymousSpace
e2d68fd2b Use SpaceAwareAccessor in View mdspan-interop
b2046a40e Add basic tests for SpaceAwareAccessor
afbff6c53 Add SpaceAwareAccessor
0d5cc923a Enable MDSPAN support by default (#7069)
892e13c8c Merge pull request #7062 from masterleinad/use_find_cudatoolkit
b967b1012 Merge pull request #7072 from ndellingwood/issue-7071
63d8093c1 Workaround icpc "missing return statement at end of non-void function"
a3e2b84a7 KOKKOS_CUDA_ERROR->DEFAULT_MSG
64406064d Fix closing brackets
8229477b4 Move check CMake 3.20.1 with nvhpc
40cf84f91 Fix using CUDAToolkit for CMake 3.28.4 and higher
d54619970 Merge pull request #7068 from masterleinad/fix_msvc_cuda
90d877036 Avoid lambda in sort_by_key_via_sort
d50a87832 Workaround MSVC compiler issues in Views
f96df0277 Merge pull request #7070 from masterleinad/fix_mdspan_test
363b464f5 Update to CUDA 12.4.1 in MSVC CI
1d7ccd8df Fix mdspan test
043f87304 Switch to using functors in sort_by_key_via_sort (#7059)
660136f5d Merge pull request #7021 from masterleinad/use_werror_for_cuda
00b4e7fe2 Merge pull request #7063 from masterleinad/restrict_to_array_subtest
0a5fac076 Merge pull request #7066 from Rombur/rocm_61
7c4f2b40a [ci skip] Use ROCM 6.1 in the nightly CI and disable one test
5d0983823 Restrict to_array subtest to NVCC >= 11.4.0
63f05204d Merge pull request #7058 from cedricchevalier19/bump-version-readme
013ef0cad Bump version in the readme
f0a7c764a Merge pull request #7057 from kokkos/dependabot/github_actions/DoozyX/clang-format-lint-action-0.17
517f48a4b Merge pull request #7056 from kokkos/dependabot/github_actions/Jimver/cuda-toolkit-0.2.15
5269803eb Bump DoozyX/clang-format-lint-action from 0.16.2 to 0.17
e7ddeee49 Bump Jimver/cuda-toolkit from 0.2.14 to 0.2.15
8e0d4a923 Merge pull request #7055 from masterleinad/move_dependabot
4b913d3e7 Move dependabot to .github
2c3fd02aa Use -Xcudafe --diag_suppress=20208 in Makefile build
1625ec210 Try moving pragma suppress to tests
5b0d94518 Use -Xcudafe --diag_suppress=20208 for 11.6 build; nothing else seems to help
2a15c75c2 Suppress 'long double' is treated as 'double' in device code
0e88744dc Fix dangling reference
9d1842e71 Only use -Werror all-warnings with explicit nvcc_wrapper
5906cba05 Fix .jenkins whitespce
13447c433 Fix gtest
d0d99bd58 Fix array size
1876867d6 Fix kokkos_swap
726a8f296 Fix quotation marks in CXX flags
e89955018 Cuda: Fix nvcc warnings
fad664c8f Merge pull request #7051 from tpadioleau/fix-unused-symbols-ctad-tests
69b0db4c1 Fix unused symbols in CTAD tests
f53f905ec Merge pull request #7054 from masterleinad/update_scorecard
150f9009d Update scorecard GitHub workflow
1f602905c Add nightly CI on Frontier (#7048)
63a3cef18 Introduce `KOKKOS_DEDUCTION_GUIDE` macro to allow user-defined deduction guide in device code for clang compiler (#6954)
9f1cc4c97 Merge pull request #7046 from masterleinad/add_dependabot
669746ef8 Improve Kokkos Graphs (#7039)
e011753fb Merge pull request #7047 from nliber/array-structured-binding-improvements
f0704b39d Add tests to `ScopeGuard` (#7028)
c8a5870c2 Merge pull request #7042 from masterleinad/fix_msvc_warnings
2a7ca1a37 Added static_asserts for out of range tuple_element and get (to match checks in complex structured bindings)
5071f2fd3 Add dependabot for GitHub Actions
d65b67bbf (Rebase) Partial fix to compile time issues w/nvcc + Kokkos_ENABLE_DEBUG_BOUNDS_CHECK (#7013)
9bd74ee75 Avoid using "#if not defined"
561818bcd Merge pull request #7041 from ndellingwood/issue-7038
0f9efac16 TestArray: add intel guard to to_array implicit conversion test
e06ddf6c1 Fix adjacent difference (#6922)
7472ed7ac Merge pull request #6812 from tcclevenger/unorderedmap_deepcopy
580dba58d Merge pull request #7034 from ndellingwood/issue-7031
1c60c8007 Merge pull request #7030 from nliber/ctad-teampolicy-v3
cf791bc2e Adding `Kokkos::to_array` (#6375)
7c67b020c Workaround icpc warnings
0410363d7 Refactor: Replace SFINAE by `if constexpr` for `create_mirror*` functions (#6955)
a78d4ddb2 Copied the deduction guides and test cases over from branch nliber/ctad-teampolicy-crtp
24b24d075 Merge pull request #7006 from masterleinad/test_no_default_constructor_dualview
07a500982 Merge pull request #7024 from masterleinad/sycl_cuda_fix_graph_tests
6a3d918a4 Merge pull request #6834 from mhoemmen/fix-README-FENL-link
a5bb0d41b Fix Kokkos README's FENL link
c8e0a95cb HIP: Use builtin atomic for compare_exchange (#7000)
cb27c9941 SYCL: Skip launch_six Graph test
6f176cde0 OpenMPTarget: Fix compiling Graph tests (#7020)
083fb014c Improve `Impl::is_zero_byte()` (#7017)
068d46882 Merge pull request #7018 from dalg24/disable_openmptarget_graph_test
42e83f165 Merge pull request #7023 from dalg24/remove_unused_cuda_api_wrappers
f3bd253d3 Remove unused CudaInternal::cuda_{malloc,free}_async_wrapper
02433b625 Merge pull request #7019 from dalg24/nvhpc_suppress_deprecation_warnings
bfe9aa2f1 Fixup for disabling deprecation warnings with NVC++
fa8b50102 Disable OpenMPTarget Kokkos::Graph test (does not compile)
468faaa37 Merge pull request #7015 from G-071/fix_hpx_execution_space_nvcc_compilation
ce0915b5e Fix undefined behavior in is_zero_byte (#7014)
f8f0cc473 Always run Graph tests (#7011)
6aa2ad7da Add a CITATION.cff file (#7008)
64fe75637 SYCL: Don't use shuffles for top-level reductions (#7009)
81b63c5c5 mdspan converting constructors (#6830)
226aecfb8 Properly guard deprecated `Kokkos_Vector.hpp` header self contained test (#7016)
fc4383ab6 Fix unique_any_senders nvcc template deduction
2b7b98a1a Use parallel_for instead of parallel_reduce for check
835dbf594 Merge pull request #7012 from seyonglee/openacc_default_async_val_for_team
da8be2257 This PR changes the default execution behavior of the parallel_for(team-policy) constructs in the OpenACC backend.     - This PR handles a missing case not covered by the previous PR #6772 This PR also fixes the OpenACC backend error in the thread-safety test in PR #6938.
df018d97f Suppress deprecated warnings via pragma push/pop in the tests (#6999)
cadab6c1e Test DualView resize/realloc for types without default constructor
1d9d0df2e SYCL: Print submission command queue property (#7004)
506da184f Merge pull request #7002 from dalg24/rm_tpl_cusparse
00170ae80 Remove cuSPARSE TPL
5a5306c4e Merge pull request #6997 from masterleinad/sycl_fix_custom_parallel_for_range_deprecations
a69e81a59 Merge pull request #6998 from rgayatri23/ompt_scan_lock
7cad3e7c3 OpenMPTarget: Use mutex lock for parallel scan.
37986fde4 [ci skip] update changelog for 4.3.1 (#6995)
6ecdf605e Merge pull request #6994 from ndellingwood/master-release-4.3.01
f5b34222c SYCL: Fix deprecation in custom parallel_for RangePolicy implementation
50a862cf6 SYCL: Prepare Parallel* for Graphs (#6988)
d61d75ace Fix a bug when using realloc on views of non-default constructible element types (#6993)
c80cdafef update master_history.txt
262d2d6e8 Merge branch 'release-candidate-4.3.01' for 4.3.01
e4cc6862c Merge pull request #6990 from masterleinad/fix_32bit_tpl_library_path
06e4c5bdc Merge pull request #6989 from dalg24/deprecated_attribute_comparison_operators_pair_t1_void
ccadc7d9b Disable failing parallel_scan_with_reducers test
28260178f Avoid duplicated definition of KOKKOS_IMPL_32BIT
7b8e3a68f Fix TPL_LIBRARY_SUFFIXES for 32-bit build
9c7920291 Fix deprecation warnings with GCC for pair<T1,void> comparison operators
69567f305 Add thread-safety tests (#6938)
c6d86474a Also use is_nothrow_swappable workaround for Intel Classic Compilers (#6983)
68fabc8a2 Merge pull request #6980 from ndellingwood/update-changelog-4301
fecc96c9e Merge pull request #6978 from ndellingwood/cherrypick-6951-4.3.01
4ee802725 Merge pull request #6979 from ndellingwood/cherrypick-6877-4.3.01
49e265601 Merge pull request #6977 from ndellingwood/cherrypick-6931-4.3.01
cd34c2e8b Merge pull request #6976 from ndellingwood/cherrypick-6578-4.3.01
a75dc70d8 Merge pull request #6982 from masterleinad/fix_fedora
b7bb509d8 Merge pull request #6985 from crtrott/copyright-rc
85610f455 Merge pull request #6984 from crtrott/Copyright
83498bdc6 Fix Copyright file
45a140491 Fix Copyright file
ccd0126b8 Fix fedora CI builds with flang-new
9fccb6107 Update changelog for 4.3.01
cf7f87c19 Merge pull request #6951 from masterleinad/fix_serial_space_team_policy
fbab8bdf0 bring back --fmad option to nvcc_wrapper (#6931)
4d7258c26 MI300 support unified memory support (#6877)
30979fb93 cuda: reduction with `RangePolicy`: fix grid dimensions to work for large values and avoid overflow (#6578)
6486a9d68 Merge pull request #6975 from ndellingwood/update-version-4_3_01
dbd7f583a Merge pull request #6962 from dalg24/kokkos_array_const_qualified_element_type
775023262 changelog: header for version 4.3.01
73a7a41ba update to version 4.3.01
ed4d2544f Merge pull request #6972 from dalg24/fix_kokkos_compile_language_cuda_hip_w_omp
15d13f23b Merge pull request #6882 from Rombur/hip_atomic_fetch
27b3ced35 Merge pull request #6949 from Rombur/nightly_deprecated
2574b8029 Fix OpenMP+CUDA when `Kokkos_ENABLE_COMPILE_AS_CMAKE_LANGUAGE` is `ON`
f699a2c7a Fix enabling OpenMP with HIP and "compile as CMake language"
4f416f3b7 Merge pull request #6965 from dalg24/cmake_openmp_cxx
77ea52f97 Threads: Don't silently allow m_instance to be a nullptr (#6969)
4ec82963f OpenMPTarget: Update loop order in MDRange (#6925)
7e7709fdb SYCL: Avoid deprecated floating-point number abs overloads (#6959)
18642875f Merge pull request #6967 from crtrott/update-readme-kk-version
968639211 Add Linux Foundation notice and fix C++ standard
19ca9ce97 Update version
d434f87e9 Do not require OpenMP support for languages other than CXX
2391f1765 Avoid introducing a 2nd definition of the Impl::swappable trait
031f6d94a Alternate definition of Impl::is_nothrow_swappable_v for NVCC version less than 11.4
ebb1cb308 Revert "Try to fix the CUDA 11.0 build"
63eef4623 Try to fix the CUDA 11.0 build
2e82fdd87 Merge pull request #6961 from dalg24/fixup_deprcated_guards_pair_void
fafe861d0 Fix support for Kokkos::Array of const-qualified element type
ab3cae486 Fix wrong macro guards for deprecated Kokkos::pair<T1,void> specialization
cf59f3120 Merge pull request #6943 from dalg24/kokkos_swap_specialization_for_kokkos_arrays
e2b7bb99e Merge pull request #6958 from masterleinad/sycl_replace_deprecated_usm_address_spaces
a7827731c Kokkos::Impl::SYCLTypes:: -> Kokkos::Impl::sycl_
5932685c9 Introduce alias based on feature macro
205fd156d Replace deprecated sycl::device_ptr/sycl::host_ptr
cc602957c Merge pull request #6951 from masterleinad/fix_serial_space_team_policy
86f5988b3 Fix noexcept specification for kokkos_swap on zero-sized arrays
8706b68d5 kokkos_swap(Array) member friend should not be templated on some other type U
44fde213f Use Kokkos::AUTO for OpenMPTarget
34d0db2f4 Add test
04bc3d9e3 Merge pull request #6952 from nliber/changelog43
d5fd51274 Merge pull request #6947 from dalg24/deprecate_kokkos_pair_void_specialization
0859ab0af Fixed the link for P6601 (Threads backend change)
e7b486ff6 Serial: Use the provided execution space instance in TeamPolicy
69c527a42 [ci skip] Enable deprecated code and deprecated warnings in nightly CI
d914fe316 Fix deprecated warning from `Kokkos::Array` specialization (#6945)
906e8ce3c Merge pull request #6942 from dalg24/fix_nightlies_cxx20_requires_expression
730d8d828 Deprecate specialization of Kokkos::pair for a single element
c9e21ce2a Add `kokkos_swap(Array<T, N>)` sepcialization
f94e8d34d Prefer standard C++ feature testing to guard the C++20 requires expression
a8115e5df Adding converting constructor in Kokkos::RandomAccessIterator (#6929)
8c7cc95f9 Merge pull request #6940 from dalg24/unused_limits_header_include_in_kokkos_array
f2d37801d Remove unnecessary header include
de3a2632c Merge pull request #6934 from dalg24/deprecate_kokkos_array_proxy_template_param
d88e2a5b0 bring back --fmad option to nvcc_wrapper (#6931)
b5ec79bc9 Merge pull request #6936 from rgayatri23/issue_6874
92e02b50c CUDA: Update nvcc_wrapper
a2af4e0d4 Deprecate trailing Proxy template argument in Kokkos::Array
b0c2566c8 Merge pull request #6930 from Rombur/fix_nightly
0099c10be Fix nightly CI
6ea7be76e cuda: reduction with `RangePolicy`: fix grid dimensions to work for large values and avoid overflow (#6578)
164519d7d MI300 support unified memory support (#6877)
74c81228f Merge pull request #6926 from Rombur/latest_rocm
1fe8108fb Merge pull request #6906 from dalg24/make_view_of_arrays_less_special
3a27cdbc2 Add ROCm 6.0 in the nightly CI
7b41536c4 Merge pull request #6924 from masterleinad/fix_sycl_workgroup_scan
8cf841076 SYCL: Fix range in subgroup scan for workgroup_scan
55c575750 Use recommended/max team size functions in Cuda ParallelFor and Reduce constructors (#6891)
e52cda370 Merge pull request #6785 from Rombur/memory_test
e93b168ba Merge pull request #6907 from dalg24/rm_experimental_layout_tiled
98b1a38e5 SYCL: Improve team_reduce implementation (#6562)
1256f6919 Merge pull request #6822 from CExA-project/fix-deep-copy
486cc745c Merge pull request #6908 from ndellingwood/master-release-4.3.00
4b9093099 Refactor: Uniformize `create_mirror*` parameter name for views (#6917)
077ea33c4 Remove trailing whitespace in changelog
cc21a5482 Merge pull request #6919 from ndellingwood/dev-changelog-4300
497b438f1 CHANGELOG.md: 4.3.00 update
a833fb00b Preparing readme for develop as the default branch (#6796)
caa139c9b SYCL: Unroll shuffle loops for top-level parallel_reduce and parallel_scan (#6750)
47a50ac3c Update master_history.txt for 4.3.0
f08217a49 Accommodate users that depend on a code that define silly macros (#6909)
5cf09513c Merge pull request #6910 from tpadioleau/remove-return-functor-copy-for_each
2aecb1d24 SYCL: Fix multi-GPU support and add test (#6887)
059cd15c0 Accommodate users that depend on a code that define silly macros (#6909)
e33da600f Fix merge artifact
b6678539a Drop specialization of ViewMapping for Kokkos::Array
391e0408b Do not return a copy of the input functor for Kokkos::Experimental::for_each
635551058 Move `Kokkos::Array` tests to a more suitable place (#6905)
5f9214049 Merge branch 'release-candidate-4.3.00' for 4.3.0
06850bf74 [ci skip] Update changelog (#6886)
5eac0bc6f Merge pull request #6876 from masterleinad/disable_fedora_rawhide
1efeb5d76 Deprecate is_layouttiled trait
51b98e1d7 Get rid of now unnecessary use of is_layouttiled trait
e2cfdec54 Drop Experimental::LayoutTiled class template
68c668469 Update Intel GPU architectures in Makefile (#6895)
a53d30aab Merge pull request #6896 from masterleinad/fix_makefile_threads
7ddc2d39c [4.3.00]  Cuda: Fix configuring with CMake 3.28.4 (#6903)
8d734b026 Cuda: Fix configuring with CMake 3.28.4 (#6898)
772e745a1 Merge pull request #6899 from ndellingwood/cherrypick-6892
0834a1281 Fix a bug in Makefile when using AMD GPU architectures (#6892)
2035e313d Fix a bug in Makefile when using AMD GPU architectures (#6892)
872dc422f Fix Makefile.kokkos for Threads
46354d25d Use builtin for atomic_fetch in the HIP backend
ae4d0013d TestViewCopy_c.hpp: better handling for OpenMPTarget
a2f2ba404 TestViewCopy_c.hpp: add new unit test for deep copy (ViewFill)
841b3a9f9 Fix deep copy when filling Rank-7 views
9fff1e066 Merge pull request #6881 from dalg24/bump_develop_to_4_3_99
05bd48516 [ci skip] Bump version number to 4.3.99
1c60a32b7 Set version number to 4.3.0
a34d910ac Merge pull request #6879 from ndellingwood/update-rocthrust-check-trilinos
096e72437 Scratch space fix for MultiGPU (#6866)
49bd895ae kokkos_tpls.cmake: update default option to enable rocthrust
c1a800650 Don't use Fedora development version in GitHub CI
5931cbd29 Merge pull request #6871 from masterleinad/fix_link_rocthrust
5e7cab99b SYCL: Make sure to call find_dependency for oneDPL if necessary  (#6870)
8062a6020 Fix linking with rothrust in downstream applications
a2b64e0e8 Improve message on view out of bounds access and always abort (#6861)
da77d6e14 Merge pull request #6868 from Rombur/hip_sort_by_key
128caa1df Merge pull request #6869 from masterleinad/mdrange_ctad_test_warning
3a765351c Fix unused variable warning in TestMDRangePolicyCTAD.cpp
e5126e929 Add HIP specialization for sort-by-key
35ad698e0 Add support for rocThrust in sort when using HIP (#6793)
4e835e136 Merge pull request #6816 from crtrott/add-security-md
5ffcc1dcc Merge pull request #6840 from CExA-project/cmake-bench
cfc260ac0 CTAD (deduction guides) for MDRangePolicy (#5516)
6db04b3b5 CTAD (deduction guides) for RangePolicy (#6850)
c7ad79c4f Merge pull request #6862 from nmm0/update-mdspan-tpl
82b0f2a60 Merge pull request #6860 from masterleinad/fix_cstyle_cast_clang_tidy
121964a93 update mdspan tpl
9a7e7958a Split some classes from Kokkos_ViewMapping (#6859)
c3c8a70d2 Update the unsafe implicit conversion error message in MDRangePolicy (#6855)
9feb104d9 Fix fallback implementation for sort_by_key (#6856)
99c7e1b1c Fix amdclang++ compilation (#6857)
97a94b60a Fix C-style cast
3d485c19d bytes_and_flops: fix a counter name
52c41e6b3 Merge pull request #6854 from dalg24/bump_google_benchmark
4dcbff2cf Benchmarks: disable 2 benchmarks for OpenMPTarget
715d6156e policy_benchmark: fix indentation
97fa76f29 fix some warnings in policy_performance benchmark
750ef211a add policy_performance benchmark to CMake
16d2edbb3 add atomic benchmark to CMake
932466f21 add gather benchmark to CMake
5c9a4aa3c bytes_and_flops fix a small bug in command line argument
277339090 bytes_and_flops with CMake
e83619830 Merge pull request #6858 from masterleinad/fix_unused_variable_ctad
dc524910d Avoid unused variable warning in TestRangePolicyCTAD.cpp
8b8de2cf4 Remove variadic range policy constructor (#6845)
0cdc9eb76 Bump Google Benchmark version v1.{6.2 -> 7.1} in CMake FetchContent
04a5334c6 Remove redundant RangePolicy constructor (#6841)
058c3a08e Fix scorecard workflow (#6831)
c90a9c6f7 Implement sort_by_key (#6801)
549c50b9c Merge pull request #6800 from masterleinad/sycl_clean_device_selection
16a5ebe95 multi-GPU support: Add test for all policies (#6782)
bb734012e Merge pull request #6837 from masterleinad/fix_unwanted_fence_parallel_scan_no_fence_test
99510b131 Merge pull request #6825 from masterleinad/cleanup_kokkos_configure_core
24f251a85 Add test for current CTAD support with RangePolicy (#6803)
e2c810e1f Avoid detecting unwanted fences in the parallel_scan_no_fence test
e2689abc0 Merge pull request #6829 from ndellingwood/update-changelog-421
9d33cb772 Clean up shift_{right, left}_team_impl (#6821)
361bdbf49 [4.2.01]: changelog update  (#6656)
74b421b19 Merge pull request #6826 from masterleinad/update_github_actions
e67ce088d Merge pull request #6824 from masterleinad/fix_sycl_ci
1112e07eb Update GitHub actions ot use Node 20
c3f0a2698 Cleanup KOKKOS_CONFIGURE_CORE
df68761f9 SYCL CI: Avoid setvars.sh
696654a1c Only call deep_copy_view() from deep_copy(), add deprecation warning
0f3b727b1 Merge pull request #6813 from fnrizzi/fix_constness_for_views_std_algos
48588d08b Add CodeQL GitHub Action (#6818)
fe6a937af Merge pull request #6815 from masterleinad/fix_sort_fence
a46a5a14e Merge pull request #6806 from dalg24/rm_older_cpu_archs
a1199b3df Explicity pass template params to ZeroMemset for intel icpc compilers (#6807)
cdb634dba Merge pull request #6817 from dalg24/license_in_readme
bd9db1562 [ci skip] Update license badge and links in the README
2a8ac6f48 Adding SECURITY.md file
c95f9542f Fix fence in Kokkos::sort when using std::sort
8e8d45724 Remove unused typedef
513d8db05 fix constness for views
59e2fd08d Redeine deep_copy for UnorderMap
f40d55528 Merge pull request #6810 from crtrott/update-workflow-permissions
54c2336c5 Update workflow permissions
8963927d0 Merge pull request #6808 from crtrott/ossf-scorecard
4b84ae0e6 Add OpenSSF scorecard workflow
e0dc0128e Merge pull request #6770 from ndellingwood/master-release-4.2.01
3611cfef3 SYCL: Improve print_configuration (#6795)
37962b3d2 SYCL: Cleanup device selection
17d074259 Drop Intel Westmere and SSE4.2 extension
5b86415d6 Drop IBM Blue Gene/Q and POWER7 architectures
65dca527a Merge pull request #6798 from dalg24/rm_librt
54d41bdb8 Merge pull request #6797 from dalg24/intel_mm_alloc
3b515c99e Cuda multi-GPU support: Pass the correct device id to get_cuda_kernel_func_attributes (#6767)
aced864ec Drop librt TPL and associated KOKKOS_ENABLE_LIBRT macro
21b110542 Drop KOKKOS_ENABLE_INTEL_MM_ALLOC macro
7ff87a5b2 SYCL: Filter GPU devices (#6758)
8d58aadf0 Merge pull request #6790 from dalg24/impl_get_gpu_returns_optional
3e405209d Add support for RISCV and the Milk-V's Pioneer (#6773)
49f646283 Merge pull request #6786 from masterleinad/fix_cuda_occupancy
76f740fdb Merge pull request #6791 from dalg24/rm_hbw_space
3496c6fde Remove stray include header
393509470 Merge pull request #6792 from ldh4/simd_add_missing_vector_aligned_in_neon
1502379d0 Added missing copy_from() in neon for vector_aligned
95f70b3f4 Remove support for memkind
473cd5313 Remove DummyPolicy
136360bb3 Restore TestCommonPolicyConstructors.hpp
e28b57976 Fixup bogous shared alloc fence labels mentioning HBWSpace
f07a537c4 Drop Experimental::HBWSpace
0ed2ebfee Make ranges non-trivial
391f2d12f Fix SharedAllocationRecord to allocate using the correct execution space instance (#6789)
3db377e15 Fixup select from visible devices
1327c3779 Let Impl::get_gpu return std::optional and delegate device selection when appropriate
26060fed7 Don't try to compile the test for any backend with MSVC+Cuda
19dcd64da test_execution_policy_occupancy_and_hint might be unused
99b2e46ec Run OccupancyControlTrait on all execution spaces
91cc45e3a Split runtime checks from TestCommonPolicyConstructors into OccupancyControlTrait
442e4d42a Add checks for unsafe implicit conversions in RangePolicy (#6754)
4d29e39ab Disable test for MSVC+Cuda
31fb4761d simd: support vector_aligned_tag (#6243)
5f128d27d Merge pull request #6787 from ldh4/simd_skip_reduction_omptarget
01d5f8149 SYCL: Error out on initialization if the backend is different from ext_oneapi_* (#6784)
97997807d Temporarily disable simd_reduction test for omptarget build
20d52fb1c Fix Occupancy for Cuda
b4bc40614 Reenable TestHIP_Memory_Requirements
63a1208b3 Fixup use provided execution space when copying host inaccessible reduction result (#6777)
7d2ea7212 Cuda multi-GPU support: Make some variables device-specific, update Kokkos::fence (#6753)
2d273c86a Merge pull request #6778 from Rombur/fix_in_parallel
379007a35 Merge pull request #6768 from dalg24/fix_device_id_test_omp_target
917baa6d6 Fix typo in deprecatation macro used in HIP
cc6ecf058 Merge pull request #6772 from seyonglee/openacc_default_async_val
4c94f089b Get rid of `ZeroMemset`'s silly trailing value argument (#6769)
af806fb5d Drop 2-arguments `ZeroMemset` constructor overloads (#6764)
729940c87 Attempt to fix device id test with OpenMPTarget
69fc8f851 Merge pull request #6763 from masterleinad/cuda_dont_use_singleton_wrapper_tasks
b4c61a8f2 Merge pull request #6766 from dalg24/std_algo_tests_drop_print_statements
7d5fff958 Get rid of print statements in parallel algorithms unit tests
408e8be5b OpenMPTarget on Intel GPUs update (#6735)
61a07cf2a Merge pull request #6762 from masterleinad/cuda_dont_use_singleton_wrapper_space_instance
bbb895a34 Remove redundant calls in rangepolicy constructors (#6765)
71b246d67 Deprecate `in_parallel` (#6032) (#6582)
7439ec9d4 Avoid calling wrapper functions with singleton in Kokkos_Cuda_Task.cpp
eecd917f6 Change the default execution policy behavior of the OpenACC backend from synchronous to asynchronous executions. - Change the default OpenACC async_arg value from acc_async_sync to acc_async_noval. - Add acc_wait(async_arg) to scalar reduction operations (parallel_reduce()).
92307a5ec Update master_history.txt for 4.2.01
221e5f7a2 Merge branch 'release-candidate-4.2.01' for 4.2.01
26ad2643c Merge pull request #6761 from dalg24/cuda_get_last_error
7b5fbd414 [4.2.01]: changelog update  (#6656)
a082f820d Avoid calling wrapper functions with singleton in some classes
b3d8643e8 Drop CudaInternal::cuda_get_last_error_wrapper()
2ca8e73a6 Merge pull request #6751 from ndellingwood/cherrypick-6746-to-rc4201
11b58159e Merge pull request #6756 from ndellingwood/cherrypick-6742-rc-4201
d2913cb38 Add runtime function to query the number of devices and make device ID consistent with `KOKKOS_VISIBLE_DEVICES` (#6713)
e2f452882 Merge pull request #6742 from masterleinad/cleanup_trilinos_cmake_cxx_flags
4621c8643 Merge pull request #6742 from masterleinad/cleanup_trilinos_cmake_cxx_flags
20150550e Merge pull request #6747 from uliegecsm/fix-remove-if
540368114 std(remove-if): fixing tmp view alloc + avoid evaluating twice the predicate during final pass
d4a099599 Merge pull request #6749 from ndellingwood/cherrypick-6510-to-rc4201
6229367eb Merge pull request #6746 from tcclevenger/cuda_warp_sync_to_avoid_race_conditions
d8ace9763 Merge pull request #6746 from tcclevenger/cuda_warp_sync_to_avoid_race_conditions
8845cee38 Merge pull request #6510 from ndellingwood/fix-werror-pedantic
650ac4067 Avoid unnecessary zero-memset of the scratch flags in SYCL (#6739)
d560c4719 Drop support for deprecated command-line arguments and environment variables (#6744)
57126af31 add more warp sync for cuda reductions
e1415f8fc Merge pull request #6630 from tcclevenger/potential_racecondition_in_cuda_reduce
dcf93fc08 Merge pull request #6738 from dalg24/shared_allocation_record
2dc7cbcc9 Cuda multi-GPU support: Allow execution space instance constructor to run (#6706)
a1a6ea14c Fix TestThreadVectorMDRangeParallelReduce (#6734)
c17969f33 Trilinos: Don't let Kokkos set CMAKE_CXX_FLAGS
d18ad8f34 Untangle SharedAllocationRecord spaghetti code
34973c773 Merge pull request #6731 from Rombur/hip_ci_new
abd50dc36 Merge pull request #6733 from simon-schlepphorst/fix_cmake_for_cxx26
5610068c5 Don't touch my records! (refactor Cuda/HIP/SYCL/Threads to not directly mess with `SharedAllocationRecord`) (#6732)
407e18dc8 Use team_size_max to fix "Team size too large" error in reducer test (#6725)
523d70189 Disabling failing HIP test in the CI
c4e1b86c8 Reenable HIP testing
87f32846b Add KOKKOS_ENABLE_CXX26 to the configuration metadata
39a0f3d67 Add support for C++26 in generated makefiles
bd3c0a552 Add C++26 standard to CMake Setup
6912b3998 Guard `[MD]RangePolicy` precondition check for deprecated code 4 (#6726)
8a914909d Merge pull request #6729 from dalg24/acc_allocation_error
5781d176e Disable openacc.view_allocation_error test
000fccc50 Merge pull request #6728 from Rombur/hip_ci_tmp_fix
3d33665ff Fixup using declaration
f9f3c6e13 [OpenACC] throw if acc_malloc returned nullptr
a3aa567af Add RawMemoryAllocationFailure::AllocationMechanism::OpenACCMalloc enumerator
8f743cf95 Ensure view_allocation_error does not silently ignore that no exception was thrown
9eca17795 Fix Docker env variables
86f5bb7d8 Let the smart pointer manage the CUDA/HIP stream (#6721)
f42a8cb03 Temporary fix to reenable HIP CI
179d2e67f Add bound checks in RangePolicy and MDRangePolicy (#6617)
ea564a274 Merge pull request #6723 from ndellingwood/cherrypick-6671-rc-4.2.01
4d784fe01 CHANGELOG.md: remove stray trailing whitespaces
f53b18b6c Merge pull request #6671 from rbberger/add_mi300_gfx940
95934133f Merge pull request #6722 from ndellingwood/fix-hip-missing-header
256c0ca62 Kokkos_HIP.cpp: include Kokkos_Core.hpp to resolve errors
35a867d37 Make initialize and finalize of the Cuda/HIP singleton less special (#6714)
bed3064ef Merge pull request #6712 from dalg24/cuda_error_cleanup
1e10099a1 Merge pull request #6715 from dalg24/hip_extraneous_closing_brace
4e33b3bf9 HIP: Forgot to delete matching brace closing the namespace
e6ff1a469 No need to jump through so many hoops to print the error message
868e42e7b Get rid of CudaInternal::cuda_get_error_{name,string}_wrapper
c75d730d2 Deprecate `{Cuda,HIP}::detect_device_count()` and `Cuda::[detect_]device_arch()` (#6710)
474366af4 [4.2.01] Fix msvc cuda release (#6660)
9393b358f Don't use the compiler launcher script if the compile language is CUDA. (#6704)
fa91c962a Merge pull request #6711 from dalg24/pointless_cudaexec_forward_declaration
0254c631b Drop pointless Kokkos::Impl::CudaExec forward declaration
be0c796c4 Merge pull request #6658 from masterleinad/fill_random_sync
cad863fca Merge pull request #6708 from fnrizzi/inplace_transform_inclusive_scan
36da6cca7 add tests for in-place `inclusive_scan` (#6682)
ee5cbfc25 Fix TeamThreadMDRange parallel_reduce (#6511)
89ba3fbae Provide new public headers `<Kokkos_Clamp.hpp>` and `<Kokkos_MinMax.hpp>` (#6687)
0ba8c40fc Provide `kokkos_swap` as part of Core and deprecate `Experimental::swap` in Algorithms (#6697)
673401038 add tests
96d530a24 Remove Kokkos::[b]half_t volatile overloads (#6579)
0e4a158a7 Check matching static extents in View constructor (#5190)
27286c32d Add `ATOMICS_BYPASS` configuration option to disable atomics (#6692)
23b02f064 Merge pull request #6701 from masterleinad/fix_enable_compile_as_cmake_language
c9038983a Merge pull request #6681 from masterleinad/disable_bessel_sycl
5bb3ba32a Merge pull request #6695 from dalg24/cleanup_profiling_section
3523bc3e7 Enable `{transform_}exclusive_scan` in place (#6667)
bec13acd0 Merge pull request #6703 from ndellingwood/issue-6702
68de5ce19 Merge pull request #6700 from dalg24/fixup_print_tolerance
716bef2a4 test_array_ctad: disable test for intel versions < 2021
3358970c2 Try linking against CUDA libararies even with KOKKOS_ENABLE_COMPILE_AS_CMAKE_LANGUAGE
cbf1c644c Fixup cast tolerance to double before printing
9f5e38e97 SYCL: Address deprecations after oneAPI 2023.2.0 (#6577)
06de563f9 Add CI for MSVC+Cuda (#6661)
efc0c365c Kokkos::Array deduction guide (#6373)
654a51f60 GitHub CI: Test with AddressSanitizer (#6676)
4eae6a99f Cosmetic changes to ProfilingSection
f485cfa53 Let `Profiling::ProfilingSection(std::string)` constructor be explicit and nodiscard (#6690)
4078a0d8a Cuda: Allocate using the correct device (#6392)
02b46c09c #5333: CUDA: Use scratch space appropriate to small reduction elements in Team reductions (#5334)
f02539e35 Merge pull request #6647 from dalg24/ulp_should_be_integral
79a36295d Merge pull request #6649 from dalg24/we_dont_need_no_dual_view_converting_assignment_operator
2ac06ce63 Merge pull request #6689 from dalg24/profiling_section
73c750755 Drop unnecessary header include in Kokkos_Profiling_ProfileSection.hpp
5aa0ceee4 Drop unnecessary guarding for a tool library being loaded in ProfilingSection
391daefd5 fill_random without exceution space instance should fence
8de16ea35 Disable more Bessel tests for SYCL on INtel GPUs
cbbe09b93 OpenMP: Use `omp_get_nested` for older gcc versions (#6685)
fe06b6f36 Merge pull request #6652 from masterleinad/ompt_printf
7e73c2b47 Merge pull request #6675 from brian-kelley/DeepCopyMsg
f38553cb0 Merge pull request #6361 from masterleinad/cuda_multiple_devices_constructor
79164a43a Improve handling of printf in OMPT on Intel GPUs
52e44d6cf SYCL: Force inlining of Kokkos::printf (#6650)
2092c01a4 Merge pull request #6651 from masterleinad/disable_hip_ci
f71052b5f Merge pull request #6680 from Rombur/rocm_60
5df22b87b Workaround for ROCm 6.0 failing to compile with AVX2 SIMD support
64a7774b6 Merge pull request #6671 from rbberger/add_mi300_gfx940
154a57df8 src->source, dst->destination
72bc7ed42 Add missing include sstream
838f8938e Add a unit test for new deep_copy exception msg
316ceac58 Improve "no copy mechanism" exception message
18d7d78f5 Merge pull request #6664 from dalg24/openacc_not_always_true
e4a7cfc78 Per review prefer always_false<Arg>::value to is_void_v<Arg>
33db3046a Add Impl::always_false type-dendent false trait
293319c58 Add missing gfx940
62855dcf1 Merge pull request #6662 from cwpearson/feature/cmake-stream
cedbf56f6 Merge pull request #6665 from dalg24/not_accomodating_external_definition_of_kokkos_assert
1bd9ce7a5 Merge pull request #6659 from crtrott/fix-msvc-cuda-develop
fb668b143 Merge pull request #6666 from masterleinad/openmp_use_omp_get_max_active_levels
a996c12a0 Use omp_get_max_active_levels() when supported
ae71e4002 Drop guards to accommodate external code defining KOKKOS_ASSERT
76ea3a3a9 Do not negate the dependent true traits helper
379d5db1a Add CMakeLists.txt for stream benchmark
ed08974c7 Unit test for issue 3371 (negative vector length should not yield a negative max_team_size) (#6076)
e524ec777 Move header for Damien because he is right
c6d01e943 Fix formatting
249f8b4fb Sidestep lacking CTAD support msvc/cuda
7dcf1deba Avoid lambdas in constexpr branch for msvc/cuda
458910fbf Fix missing include on msvc/cuda
fb0380b91 Fix builtin_unreachable use for MSVC/CUDA
843fca336 OpenMPTarget: clang extensions for dynamic shared memory. (#6380)
1abcca9d4 Merge pull request #6626 from masterleinad/cherry_pick_6608_4_2_01
d0548d658 Merge pull request #6631 from tcclevenger/cherry-pick-6630
4e4a047a2 Merge pull request #6627 from masterleinad/cherry_pick_6623_4_2_01
6e1865714 Merge pull request #6638 from dalg24/rc421_early_tools_profiling
84299466a Merge pull request #6655 from kokkos/cherry_pick_6653
232114fcd Merge pull request #6653 from masterleinad/remove_deprecation_allocation_mechanism_gcc_11_0
24b64848e Merge pull request #6653 from masterleinad/remove_deprecation_allocation_mechanism_gcc_11_0
8e16df3cf Merge pull request #6557 from dalg24/rm_logical_spaces
eadc210bf Remove deprecation warning for AllocationMechanism for gcc <11.0
dcdfcac91 Diable HIP CI
9fd95ebcb Don't use rocm-docker for clang-format
a35bc6890 Merge pull request #6643 from seyonglee/fix_openacc_toomuchwarning
b9b63dfd8 Drop DualView converting copy assignment operator
71729af71 Fixup test math functions ulp should double -> int
b877a6e9b Merge pull request #6645 from fnrizzi/fix_6644
a41dba586 SYCL: Restrict workaround for is_device_copyable to oneAPI versions before 2024.0.0 (#6532)
07cdd7000 add missing header fix #6644
c3dde624d Merge pull request #6642 from uliegecsm/kokkos-tools-typo
685620918 This PR fixes the too-much-OpenACC-warning issue, mentioned in PR #6639. This PR also re-enables the OpenACC CI test.
9041bdaf3 Merge pull request #6625 from Rombur/jenkins_multibranch
4a6a92056 Merge pull request #6634 from Rombur/ubuntu_18
ed64cea7f tools(profiling): type (related to kokkos/kokkos-tools/pull/221)
30f020777 Merge pull request #6640 from uliegecsm/unorderedmap-types
52d5c3738 nvcc wrapper: remove troubling flag to fix 6628 (#6629)
12af5769b Merge pull request #6639 from Rombur/disable_openacc
e9899a5b1 unorderedmap: modernize traits
7739ca191 Disabling OpenACC in the CI because it emits too many warnings
e4753753b Merge pull request #6635 from uliegecsm/kokkos-profiling-fix
f8788ef2a Merge pull request #6635 from uliegecsm/kokkos-profiling-fix
b00c1e068 update comment to include final() mention
54c62d15d Replace ubuntu:18.04 with ubuntu:20.04 as base image for clang-format
c9d7bbad1 kokkos(profiling): do not finalize in any backend
6dcd72b9b Add warp sync for Cuda parallel reduce
4d4a343e5 Add warp sync for Cuda parallel reduce
c9540f51c Merge pull request #6624 from uliegecsm/kokkos-graph-hip-fix
0c617db8a Merge pull request #6608 from masterleinad/fix_numeric_traits_bfloat16
16972af28 Add jenkins multibranch pipeline options
0d3428087 Merge pull request #6614 from masterleinad/gh_workflows_icpc_fedora_to_intel
a7bf142d5 Merge pull request #6604 from fnrizzi/fix_test_dev_and_th
81580ca15 Merge pull request #6615 from uliegecsm/nvcc-wrapper-missing
b54105701 Merge pull request #6624 from uliegecsm/kokkos-graph-hip-fix
f31436a09 graph(HIP): adding inline keyword to fix #6623
f1d466622 Merge pull request #6608 from masterleinad/fix_numeric_traits_bfloat16
a4720ce41 Add clang-format check to GitHub workflows (#6612)
0262f7405 nvcc(wrapper): adding missing `--generate-line-info` arg
71a9bcae5 Merge pull request #6613 from ndellingwood/master-release-4.2.00
ae75d3895 GitHub Workflows: Use Ubuntu 22.04 instead of Fedora for Intel compiler testing
0a40d16b8 Merge pull request #6611 from cz4rs/fix-formatting
3dd0b8253 [ci skip] fix formatting
374064ab7 add branching
33a1106da use reference
61842b7d1 remove comments
68e4bedc4 fix for macos
17af2f3c4 try
2779b29b5 avoid pyt package
f0af4672c try fix
aa2ff89fb Merge pull request #6598 from uliegecsm/kokkos-unique-fix
81a958653  Remove KOKKOS_IMPL_DO_NOT_USE_PRINTF  (#6593)
ff7104cee [ci skip] Update changelog on develop for 4.2.00 (#6592)
932c1fb2f Added missing operator* to NEON simd
8fd8c94aa Threads: add missing broadcast to TeamThreadRange parallel_scan (#6601)
1a145311f Fix generated Makefile when using gnu_generate_makefile.sh and make >= 4.3
ee655c08a Fix TestNumericTriats.hpp for SYCL with bfloat16 support
c60716df4 try fix
38cbde408 Update master_history.txt for 4.2.00
abe01c88f Merge pull request #6600 from masterleinad/cherry-pick-6590
08efbb919 Merge pull request #6595 from masterleinad/cherry-pick-6543
9c37437ea Use binary wrapper for consistency in definition of half types numeric traits (#6590)
61b93ec7f kokkos(unique): fix allocation of temporary view to enfore using the provided space instance
2f5723bd6 Merge pull request #6585 from masterleinad/ompt_guard_scratch_allocations
d5a480291 Fix infinity, quiet_NaN, signaling_Nan, isfinite, isnan, isinf for half_t and bhalf_t (#6543)
97a90d5dd OpenACC: add atomics support (#6446)
fb73a7359 Merge pull request #6589 from kokkos/revert-6586-desul_sycl_device_global_supported
6023c1919 Merge branch 'release-candidate-4.2.00' for 4.2.00
81e308e7d Revert "Desul atomics: Trade SYCL-specific compile definition for a macro defintion in the configuration header"
3f773d057 OpenMP: No memset in viewfill (#6573)
0a83695e5 Replace Marsaglia polar method with Box-muller to generate a normally distributed random number (#6556)
91ee4e1e1 Merge pull request #6569 from masterleinad/cleanup_static_assert_kokkos_impl_do_not_use_printf
13c6c5783 Merge pull request #6586 from dalg24/desul_sycl_device_global_supported
605784265 [ci skip] Adding Changelog for Release 4.2.0 (#6583)
c8b4fe848 Desul atomics: Trade SYCL-specific compile definition for a macro defintion in the configuration header
26464df04 SYCL: Implement DESUL_ATOMICS_ENABLE_SYCL_SEPARABLE_COMPILATION path (#6534)
fcb0452d0 OpenMPTarget: Guard scratch memory usage in ParallelReduce
d4a517f82 Set the device id explicitly for CUDA API calls in impl_initialize
80084960c simd: temporarily skip device math ops unit test for OpenMPTarget build (#6574)
a453e9fc3 Simplify fence functions in the Threads backend (#6571)
8d9400e29 Add crtrott's launch_latency benchmark (#6379)
403c34f30 m_cudaDev isn't static anymore
a7b16b351 OpenMPTarget: CI compiler upgrade. (#6545)
c7a162342 Merge pull request #6576 from dalg24/remove_cuda_clang_workaround
1e1ed1318 Drop Clang+CUDA workaround
6fc7a4930 Merge pull request #6575 from dalg24/drop_unused_memory_fence_header
cead4f559 [ci skip] Drop unused <impl/Kokkos_Memory_Fence.hpp> header
3a285ecf1 Merge pull request #6276 from ldh4/simd_add_missing_unit_tests
21a3d6f12 Merge pull request #6570 from ldh4/simd_move_fallback_impls
3b8c449f1 Remove empty quotation marks for static_assert
b76e1dcc1 fallback implementation cleanup
024d6c21b Remove unused Sandia testing files (#6568)
6da3fa7e9 Threads remove unused variables and functions (#6566)
0e5aa1503 Merge pull request #6553 from masterleinad/avoid_redundance_algorithm_unit_test_variables
a07c7a2b6 Address reviewer comments
8c4fe6b06 Merge remote-tracking branch 'upstream/develop' into cuda_multiple_devices_constructor
6eb12dbc9 Rollback changes to view constructors to reduce the number of instantiations (#6564)
fd80cbef4 Merge pull request #6541 from Rombur/threads_refactor_3
6d95b621e Remove logical memory spaces
54f2e7f23 Merge pull request #6548 from Rombur/hip_split
3093a0e64 Only define STDALGO_TEAM_SOURCES_* once
3edbef33d Merge pull request #6531 from masterleinad/sycl_use_ext_oneapi_device_global_feature_macro
400dd1d99 Trim some fat in `CudaInternal` (towards multiple GPUs support) (#6544)
201d1dead Merge pull request #6536 from dalg24/view_constructor_from_label
6b4ee34ee Split files in HIP backend
13efa71ac Merge pull request #6547 from msimberg/bump-hpx-1.9.0
a41df08a7 Bump HPX version used in CI to 1.9.0
840d6b775 Reduce number of View constructor instantiations
189aaa6da Merge pull request #6542 from Rombur/fix_guard
b4f27c87f Fix typo in macro guard
c4d0dfe02 Fix indentation
33010ecc3 Add comments
a417450bb Remove spawn function
bb759df49 Remove useless forward declaration
7d31c2273 Small cleanup of ThreadsInternal::initialize
6ac5aa846 Remove Sentinel struct from Threads
b875be75d Remove unused variables
09756717d SYCL: Use host-pinned memory to copy reduction/scan result (#6500)
6056c6b1f Merge pull request #6537 from Rombur/threads_refactor_2
3bcf9657f Prefer defaulted default constructor for Bitset (#6524)
1fcce6936 Remove extra constructor
9158785df Remove sleep and wake functions
ae0bd54eb Added unit tests for reduction ops and few intel svml intrinsics
cf5a859bf Threads: replace enum with constexpr int and enum class (#6514)
e156d5859 Check that device associated with stream matches requested device
0b59a1b40 Merge pull request #6512 from eltociear/patch-1
a30b9aa78 Fixup in README (github -> GitHub)
4383d1c1e Merge pull request #6528 from ndellingwood/cherrypick-6516
a440ac9e3 Merge pull request #6527 from ndellingwood/cherrypick-6518
4c88f2569 Merge pull request #6525 from dalg24/bitset_do_not_mess_with_labels
92cc6ce95 Merge pull request #6523 from dalg24/rm_deprecated_code_3
2bc1721d7 SYCL: Use SYCL_EXT_ONEAPI_DEVICE_GLOBAL to detect support for device global variables
e9a540605 Merge pull request #6516 from fnrizzi/fix_6502
86e3c8db7 Merge pull request #6518 from fnrizzi/fix_6515
0120c431b Merge pull request #6516 from fnrizzi/fix_6502
ef889a7ab with_updated_label -> append_to_label
380754b91 Do not append " - blocks" to the bitset label
589ad55b0 fixup! [deprecated code 3] remove using declaration in Kokkos::Experimental:: for all math functions
2e6765a2a [deprecated code 3] remove ENABLE_DEPRECATED_CODE_3 option
7c63c32bd [deprecated code 3] remove MasterLock
fb0bd5297 Get rid of FIXME_OPENMP
ca49c65f6 OpenMP backend cleanup following removal of deprecated code 3
0505ce294 [deprecated code 3] remove {OpenMP,HPX}::partition_master
3172fd1b0 [deprecated code 3] remove using declaration in Kokkos::Experimental:: for all math constants
d515a51ea [deprecated code 3] remove using declaration in Kokkos::Experimental:: for all math functions
35dda2ac6 [deprecated code 3] remove using declaration in Kokkos::Experimental:: for clamp, min, max, and minmax
57c0aa61f [deprecated code 3] remove KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_* macros
dfd0a6d31 [deprecated code 3] remove InitArguments
e6c51df7f [deprecated code 3] remove all default device init tests
3490ec1e7 Merge pull request #6520 from masterleinad/update_kokkos_version_develop
505c396dc Merge pull request #6518 from fnrizzi/fix_6515
629135a0f [ci skip] Update Kokkos version to 4.2.99
b26a1f735 avoid auto
7b86b80a9 add guards
0c0cafaba Merge pull request #6509 from Rombur/threads_team
58f53a6a2 Merge pull request #6510 from ndellingwood/fix-werror-pedantic
78c1ed885 Kokkos_SIMD_Scalar.hpp: remove extra ';'
a0cacc305 Rename Kokkos_ThreadsTeam.hpp to Kokkos_Threads_Team.hpp
654d283a6 Update version number for 4.2.00 release
bd361e562 Merge pull request #6505 from Rombur/threads_instance
3beb7f191 Merge pull request #6499 from masterleinad/nvhpc_impl_only
e2ad3b36b Merge pull request #6506 from masterleinad/promote_kokkos_printf_header
89bd35cc3 Merge pull request #6198 from uliegecsm/unordered-map-space
0cad570ad Merge pull request #6503 from fnrizzi/relax_guards_team_algos
04a631081 Update CI in OpenMPTarget to use llvm-17  (#6472)
c586fa172 simd: add floor, ceil, round, trunc operations (#6393)
1095b640e Merge pull request #6497 from fnrizzi/openmptarget_scan_return
5518eb99e Promote Kokkos_Printf.hpp to public include
6ff5721a6 Rename Kokkos_ThreadsExec to align with the other backends
5544c0c22 UnorderedMap(space instance): proposal for #6067
adc885184 remove guards
377b3f057 fix order
02e6bdcce ad threadvector
bc83a8912 Merge pull request #6430 from aelovikov-intel/fix-red
578bc7f91 Merge branch 'develop' into openmptarget_scan_return
e8687a5df Merge pull request #6495 from masterleinad/hpx_parallel_scan_team_thread_thread_vector
63d9ae201 Merge pull request #6493 from fnrizzi/fix_team_uniquecopy_copyif
a856f973e Allow NVHPC as device compiler only with Kokkos_ENABLE_IMPL_NVHPC_AS_DEVICE_COMPILER=ON
e4038bcd0 Merge pull request #6498 from Rombur/threads_split_files
f511dca95 SIMD: Split math functions from SIMD_Common.hpp (#6487)
8420c2f00 Update to HIP TeamPolicy Block number heuristic (#6284)
4e69e4010 Merge pull request #6370 from Rombur/hip_graph
5b693fd95 address review comment
fdfeaf916 add overload for TeamThreadRange
d97f16f44 Merge pull request #6479 from uliegecsm/fix_hip_concurrency
ebef19bdf Serial: Allow for distinct execution space instances (#6441)
23496b47e HPX: Implement TeamThread and ThreadVector parallel_scan with return value
cb22b8061 Split Kokkos_Threads_Parallel files
f5c0cc5a4 Update core/src/HIP/Kokkos_HIP_KernelLaunch.hpp
ad9eb209f Merge pull request #6308 from thearusable/5635-sycl-parallel-scan-with-value-ThreadVectorRange
a601d81f9 Merge pull request #6490 from masterleinad/fix_build_cmake_installed_different_compiler
1f4e3d5db fix impl
8659ffa0b Fix example/build_cmake_installed_different_compiler
1ebb3afc4 Merge pull request #6484 from masterleinad/fix_bessel_function
ef1922d30 Merge pull request #6394 from masterleinad/simd_checks_neon
7fafc641a Merge pull request #6485 from masterleinad/fix_simd_cuda_compilations
8181d7075 Fix atomic operations bug for Min and Max (#6435)
ee8d58ea3 Merge pull request #6482 from uliegecsm/iostream
9035ab2d3 Merge pull request #6486 from ajpowelsnl/fix/6451
c5bf8705d #5635: SYCL: Add parallel_scan overload with value for ThreadVectorRange
1ccf4995b Also fix annotations for generator constructor for AVX512 and NEON
e52b957db Merge pull request #6305 from thearusable/5635-threads-parallel-scan-with-value-ThreadVectorRange
e40f026ef team-level std algos: part 13 (#6351)
890148e5d Fix NVCC warnings (#6483)
4d3958bec guards to ensure DBL_EPSILON return for POWER8,9
96edf73bd Fix compiling SIMD unit tests on NVIDIA
6b21fde9e cleaning: remove iostream from headers where possible (IWYU)
6ff0deb9b Fix implementation for cyl_bessel_i0
29d4ffdbf Add KOKKOS_ARCH_ARM_NEON
4ce289baa Allow detecting SIMD types based on compiler macros  (#6188)
495b1ccfd Add parallel_scan overloads with value for Threads
c63f125ec Add test for parallel_scan with return value for ThreadVectorRange
0bf937cdd Moving abort and assert into their own public headers (#6445)
2075ae79b core/src: Add half single and double mixed compare (LT,GT,LE,GE) (#6407)
e04f637dc Merge pull request #6307 from thearusable/5635-sycl-parallel-scan-with-value-TeamThreadRange
567524c8f Merge pull request #6242 from thearusable/5635-hip-parallel-scan-with-value
3ad6473f6 Merge pull request #6465 from masterleinad/simd_math_functions
fbdb0e04f Merge pull request #6478 from masterleinad/minimum_version_google_benchmark
872ffb770 Merge pull request #6474 from uliegecsm/dualview_compatible_copy_constructor_assignment
148e6a6c3 Merge pull request #6471 from masterleinad/fix_openmp_teamthreadrange_parallel_scan_return
60e4d1359 team-level std algos: part 12 (#6350)
d8f8142ab Use call operator instead of run_me function
94c5d9ab6 Modify test so that source and destination view are of different type
744711864 Compute concurrency on HIP using Kokkos hardcoded m_maxWavesPerCU
39316fa8c Add test of copy constructor/assignment operator for DualView.
e1f2cf545 Fix minimum version for Google benchmark
82044c696 Add compatible copy assignment operator to DualView
b610a288b OpenMP: Fix TeamThreadRange parallel_scan with return value for team_size > 1
bbfe63981 Use std::is_same_v
7d817b88b #5635: SYCL: Add parallel_scan overload with return value
e4eb204ee #5635: Move some tests for parallel_scan to TestTeamScan
df1901b1c Use std::is_same_v
6c6a26ab1 Add parallel_scan overloads with value for HIP backend
41cf2e51c Merge pull request #6303 from thearusable/5635-threads-parallel-scan-with-value-TeamThreadRange
68a97a1fb Merge pull request #6235 from thearusable/5635-cuda-parallel-scan-with-value
5150a9fad Merge pull request #6463 from masterleinad/sycl_disable_bessel_test_intel_gpus
c395c0cf1 Fix formatting
61e7b262d Skip testing for non-power-of-two team sizes
b8d4feb26 use shortcut
4f6ddd190 #5635: Move some tests for parallel_scan to TestTeamScan
190bfe4ab #5635: Add parallel_scan overloads with value for Threads
1675997f2 #5635: HIP: Add Overloads for parallel_scan with return value for TeamThreadRange (#6302)
9db1ea46d team-level std algos: part 11 (#6258)
925032879 Merge pull request #6378 from cwpearson/feature/gups-permute-mode
d458fdadb team-level std algos: part 10 (#6256)
f6977cf43 Check for default device
6e2ca15da Merge pull request #6464 from masterleinad/restrict_avx2_workaround_rocm5_67
c195ee69a SYCL: Disable another bessel function test for Intel GPUs
b85563160 SIMD: Math functions should be in namespace Kokkos
e542e989a improve tests to check intra-team result (#6431)
9b3778134 fixes build error for TeamReduce and TeamTranformReduced tests for specific GCC (#6459)
b813f2bb3 HIP: Restrict AVX2 workaround to ROCm 5.6 and 5.7
e1c82660e Merge pull request #6462 from fnrizzi/fix_warning_random_test_windows
b9fa28cfb Workaround for ROCm 5.6+ failing to compile with AVX2 SIMD support (#6449)
b2a1820b0 fix casting warning in Random test
988a9e6a9 Move final assignment to correct scope
2e743674a improve tests (#6437)
1f0183bc1 improve tests (#6432)
773e34648 team-level stdalgos: improve tests, check intra-team result matching (part 3/7) (#6425)
5f279b02c Fix parallel_scan_with_reducers test
8e4820194 Fix race condition in functor_vec_scan_ret_val test
e13f67ca6 team-level stdalgos: improve tests, check intra-team result matching (part 6/7) (#6436)
5ed274c93 Skip bessel function tests known to fail on Intel GPUs (#6434)
3cd281376 team-level stdalgos: improve tests, check intra-team result matching (part 2/7) (#6426)
d8fa85644 std_algos: improving min, max, minmax (#6421)
6f9e50c83 Merge pull request #6301 from thearusable/5635-cuda-parallel-scan-TeamThreadRange
06c6a73ab Merge pull request #6213 from fnrizzi/team_level_p10
ecbe79507 Merge pull request #6452 from masterleinad/disable_simd_compiler_macro_check_ompt
a56b433ce Merge pull request #6455 from fnrizzi/fix_6442
8239de526 Fix compiling code using Kokkos::printf for OpenMPTarget on Intel GPUs (#6443)
36af9d6e4 Merge pull request #6456 from fnrizzi/fix_6440
fb20a482d Assign final sum in Cuda parallel_scan ThreadVectorRange
4a819b6b7 Fix Cuda parallel_scan ThreadVectorRange range
6f85f19eb Merge pull request #6454 from fnrizzi/fix_copyif_team_test_assert
7e2749632 OpenMPTarget init-join fix (#6444)
56cc35bdf re-enable unit tests for sort and random via makefile (#6422)
e95075930 fix unreachable for intel
bcb92a619 fix intel compile error
ba9165994 add intra team check for missing test
4a266d8ee #5635: Add test for parallel_scan with return value for ThreadVectorRange
2b7eb0b0b add missing assert
96320555f #5635: Add parallel_scan with value for CUDA and ThreadVectorRange
7284cd215 OpenMPTarget: Disable check for SIMD compiler macros
e743017e8 benchmark/gups: use CMake
002cce07f Clean up benchmarks/gups
6d794df99 #5635: Enable TeamThreadRange test for CUDA
db8498389 remove old impl
1fb6f4a74 #5635: Add parallel_scan changes for CUDA and TeamThreadRange
47aecc6c4 Merge remote-tracking branch 'upstream/develop' into team_level_p10
6a95b5f3a Merge pull request #6292 from thearusable/5635-serial-parallel-scan-part-2
6494d96c1 Merge pull request #6212 from fnrizzi/team_level_p9
8bff82ff8 try fix for unique, previous impl to remove later
615fc1aef Fixes for Kokkos::Array (#6372)
7e35f1087 Merge pull request #6433 from masterleinad/cuda_fix_m_num_scratch_locks_initialization
541f67468 improve tests with intra-team result check
d270e064a Merge pull request #6428 from masterleinad/enable_kokkos_isnan_for_bhalf_t
c046bdba1 Merge pull request #6429 from tcclevenger/hip_potential_race_condition
6979f67ca Initialize m_num_scratch_locks for Cuda parallel_for TeamPolicy
fea838822 Same for scan
89a42341c [SYCL][Reduction] Group counter should use at least memory_order::acq_rel
41253bd55 Set the device id in cuda_kernel_arch
111371f10 Merge remote-tracking branch 'upstream/develop' into cuda_multiple_devices_constructor
96bb26b0c avoid potential race condition HIP
732d39219 Fix guard for isnan test for bhalf_t
9081d366c Merge pull request #6423 from uliegecsm/viewmapping_comparison
3bbbe2ba5 improve tests to address review
e3a608bb5 add comment
4389d81c4 Fix to avoid #186-D pointless comparison warning.
ba1bd2303 Merge pull request #6418 from dalg24/uvm_warn_once
cb459e741 Merge pull request #6419 from dalg24/cuda_bock_size_deduction_device_properties
035d28487 Address reviewer' comments
45646ab3b Use execution space instance argument to get device properties in block size deduction
582dfeac7 check-copyright improvements (#6399)
26a4cd43a Only warn once (at initialization) when forcing allocation in unified memory
03ba69e07 Merge pull request #6417 from dalg24/drop_check_support_unified_addressing
eb8ee282a fix single as per Christian's suggestion
21f72433b Drop check whether device supports unified addressing
c32f9c90b Merge pull request #6415 from cz4rs/fedora-enable-death-tests
1c0c73402 Merge pull request #6413 from dalg24/pre_kepler_arch_not_supported
1affb05d0 core/src: Add half math functions to private header (#6124)
c19926fd5 Enable death tests for fedora rawhide
7d9394ddb formatting
1cb10cb27 Merge remote-tracking branch 'upstream/develop' into team_level_p10
f4d7ea559 Merge remote-tracking branch 'upstream/develop' into team_level_p9
fc213ead1 Team-level std algos: part 7 (#6211)
db591e369 formatting
afac7784f address comments
8a3ec1bf5 Merge remote-tracking branch 'upstream/develop' into team_level_p10
dd5c624b7 use single
c4c9ed551 Merge remote-tracking branch 'upstream/develop' into team_level_p9
fd774d916 Merge pull request #6411 from dalg24/precondition_not_initalized
28061e84d Drop pre-Kepler logic in Cuda::impl_initialize
7b4d0a6f7 Merge pull request #6410 from dalg24/unused_hip_internal_data_members
c692a816d !initialized() should be a precondition for calling {Cuda,HIP,SYCL}Internal::initialize
d8846bf84 Merge pull request #6409 from dalg24/host_exec_initialized_before_device
881801cb4 Drop unused HIPInternal::m_hipArch static data member
6aaf3736b Drop unused HIPInternal::m_maxSharedWords data member
53b2b2285 Merge pull request #6408 from dalg24/drop_cuda_internal_maximum_shared_words
c6cd24794 Drop check that the host backend is initialized before the Cuda/HIP/SYCL one
34aebe55c Drop (unused) `Cuda::cuda_internal_maximum_shared_words`
3c97512cc OpenMP backend refactor files. (#6403)
93fe629cf address comments
c53a95e0c Merge pull request #6402 from dalg24/cuda_malloc_async_on_by_default
bda5326b1 team-level std algos: part 6 (#6210)
49d4048cd Merge pull request #6405 from cz4rs/fix-cmake-warning-benchmarks
31c060ab5 Merge pull request #6401 from dalg24/manage_stream_should_be_private
629128cb7 Merge pull request #6406 from cz4rs/appveyor-disable-benchmarks
201f78d6d Merge pull request #6400 from dalg24/checked_integer_ops_death_category
e5a23d10e Disable performance benchmarks in AppVeyor CI
926a6420f Use archive extraction time for timestamps…
  • Loading branch information
etphipp committed Sep 12, 2024
1 parent faeaf92 commit 5db4f44
Show file tree
Hide file tree
Showing 800 changed files with 66,747 additions and 35,204 deletions.
25 changes: 13 additions & 12 deletions .github/ISSUE_TEMPLATE/bug_report.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,22 @@
---
name: Bug report
about: Create a report to correct failures and improve our code
about: Create a report (for github issue tracker) to correct failures
title: ''
labels: ''
assignees: ''
---
**Describe the bug**
Please provide a concise, clear description of the bug, as well as any available error logs.

**Please also include the following items to support reproducing the bug**
1. compilers (with versions)
Please provide a concise, clear description of the bug, as well as any available error logs. Feel free to contact the Kokkos Slack `# build` channel for further discussion of your issue.

**Please include the following for a minimal reproducer**

1. Compilers (with versions)
2. Kokkos release or commit used (i.e., the sha1 number)
3. platform and backend
4. cmake configure command
5. output from cmake command
6. code needed to reproduce the bug
7. command line needed to reproduce the bug
7. please also attach the `KokkosCore_config.h` header file (generated during the build);
**Any additional info**
Please provide any additional context about the issue here.
3. Platform, architecture and backend
4. CMake configure command
5. Output from CMake configure command
6. Minimum, complete code needed to reproduce the bug
7. Command line needed to reproduce the bug
8. `KokkosCore_config.h` header file (generated during the build)
9. Please provide any additional relevant error logs
6 changes: 6 additions & 0 deletions .github/dependabot.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
version: 2
updates:
- package-ecosystem: "github-actions"
directory: /
schedule:
interval: "weekly"
15 changes: 15 additions & 0 deletions .github/workflows/clang-format-check.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
name: clang-format check

on: [push, pull_request]

permissions: read-all

jobs:
formatting-check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run clang-format style check.
uses: DoozyX/[email protected]
with:
clangFormatVersion: 8
51 changes: 51 additions & 0 deletions .github/workflows/codeql.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
name: "CodeQL"

on:
push:
branches: [ "master", "develop", "release-*" ]
pull_request:
branches: [ "develop" ]

permissions: read-all

jobs:
analyze:
name: Analyze
runs-on: ubuntu-latest
timeout-minutes: 360
permissions:
# required for all workflows
security-events: write

# only required for workflows in private repositories
actions: read
contents: read

steps:
- name: Checkout repository
uses: actions/checkout@v4

# Initializes the CodeQL tools for scanning.
- name: Initialize CodeQL
uses: github/codeql-action/init@v3
with:
languages: c-cpp

- name: configure
run:
cmake -B build .
-DKokkos_ENABLE_OPENMP=ON
-DCMAKE_CXX_STANDARD=17
-DKokkos_ENABLE_DEPRECATED_CODE_4=OFF
-DKokkos_ENABLE_TESTS=ON
-DKokkos_ENABLE_EXAMPLES=ON
-DKokkos_ENABLE_BENCHMARKS=ON
-DCMAKE_BUILD_TYPE=Debug
- name: build
run:
cmake --build build --parallel 2

- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v3
with:
category: "/language:c-cpp"
16 changes: 13 additions & 3 deletions .github/workflows/continuous-integration-workflow-32bit.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,15 @@
name: github-Linux-32bit
on: [push, pull_request]

on:
push:
branches:
- develop
pull_request:
paths-ignore:
- '**/*.md'
types: [ opened, reopened, synchronize ]

permissions: read-all

concurrency:
group: ${ {github.event_name }}-${{ github.workflow }}-${{ github.ref }}
Expand All @@ -13,7 +23,7 @@ jobs:
image: ghcr.io/kokkos/ci-containers/ubuntu:latest
steps:
- name: Checkout code
uses: actions/checkout@v3
uses: actions/checkout@v4
- name: install_multilib
run: sudo apt-get update && sudo apt-get install -y gcc-multilib g++-multilib gfortran-multilib
- name: Configure Kokkos
Expand All @@ -26,7 +36,7 @@ jobs:
-DKokkos_ENABLE_DEPRECATED_CODE_4=ON \
-DKokkos_ENABLE_DEPRECATION_WARNINGS=OFF \
-DKokkos_ENABLE_COMPILER_WARNINGS=ON \
-DCMAKE_CXX_FLAGS="-Werror -m32 -DKOKKOS_IMPL_32BIT" \
-DCMAKE_CXX_FLAGS="-Werror -m32" \
-DCMAKE_CXX_COMPILER=g++ \
-DCMAKE_BUILD_TYPE=RelWithDebInfo
- name: Build
Expand Down
21 changes: 15 additions & 6 deletions .github/workflows/continuous-integration-workflow-hpx.yml
Original file line number Diff line number Diff line change
@@ -1,19 +1,28 @@
name: github-Linux-hpx

on: [push, pull_request]
on:
push:
branches:
- develop
pull_request:
paths-ignore:
- '**/*.md'
types: [ opened, reopened, synchronize ]

concurrency:
group: ${ {github.event_name }}-${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: ${{github.event_name == 'pull_request'}}

permissions: read-all

jobs:
hpx:
name: hpx
runs-on: [ubuntu-latest]

steps:
- name: checkout code
uses: actions/checkout@v3
uses: actions/checkout@v4
with:
path: kokkos
- name: setup hpx dependencies
Expand All @@ -26,12 +35,12 @@ jobs:
libboost-all-dev \
ninja-build
- name: checkout hpx
uses: actions/checkout@v3
uses: actions/checkout@v4
with:
repository: STELLAR-GROUP/hpx
ref: 1.8.0
ref: v1.9.0
path: hpx
- uses: actions/cache@v3
- uses: actions/cache@v4
id: cache-hpx
with:
path: ./hpx/install
Expand Down Expand Up @@ -73,7 +82,7 @@ jobs:
-DKokkos_ENABLE_DEPRECATED_CODE_4=OFF \
-DKokkos_ENABLE_EXAMPLES=ON \
-DKokkos_ENABLE_HPX=ON \
-DKokkos_ENABLE_SERIAL=OFF \
-DKokkos_ENABLE_SERIAL=ON \
-DKokkos_ENABLE_TESTS=ON \
..
Expand Down
56 changes: 42 additions & 14 deletions .github/workflows/continuous-integration-workflow.yml
Original file line number Diff line number Diff line change
@@ -1,60 +1,87 @@
name: github-Linux
on: [push, pull_request]

on:
push:
branches:
- develop
pull_request:
paths-ignore:
- '**/*.md'
types: [ opened, reopened, synchronize ]

concurrency:
group: ${ {github.event_name }}-${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: ${{github.event_name == 'pull_request'}}

permissions: read-all

jobs:
CI:
continue-on-error: true
strategy:
matrix:
distro: ['fedora:latest', 'fedora:rawhide', 'ubuntu:latest']
distro: ['fedora:latest', 'ubuntu:latest']
cxx: ['g++', 'clang++']
cxx_extra_flags: ['']
cmake_build_type: ['Release', 'Debug']
backend: ['OPENMP']
clang-tidy: ['']
stdcxx: [17]
include:
- distro: 'fedora:intel'
- distro: 'ubuntu:intel'
cxx: 'icpc'
cxx_extra_flags: '-diag-disable=177,10441'
cmake_build_type: 'Release'
backend: 'OPENMP'
- distro: 'fedora:intel'
stdcxx: '17'
- distro: 'ubuntu:intel'
cxx: 'icpc'
cxx_extra_flags: '-diag-disable=177,10441'
cmake_build_type: 'Debug'
backend: 'OPENMP'
- distro: 'fedora:intel'
stdcxx: '17'
- distro: 'ubuntu:intel'
cxx: 'icpx'
cxx_extra_flags: '-fp-model=precise -Wno-pass-failed'
cxx_extra_flags: '-fp-model=precise -Wno-pass-failed -fsanitize=address,undefined -fno-sanitize=function -fno-sanitize-recover=all'
extra_linker_flags: '-fsanitize=address,undefined -fno-sanitize=function -fno-sanitize-recover=all'
cmake_build_type: 'Release'
backend: 'OPENMP'
- distro: 'fedora:intel'
stdcxx: '17'
- distro: 'ubuntu:intel'
cxx: 'icpx'
cxx_extra_flags: '-fp-model=precise -Wno-pass-failed'
cmake_build_type: 'Debug'
backend: 'OPENMP'
stdcxx: '20'
- distro: 'ubuntu:latest'
cxx: 'clang++'
cxx_extra_flags: '-fsanitize=address,undefined -fno-sanitize=function -fno-sanitize-recover=all'
extra_linker_flags: '-fsanitize=address,undefined -fno-sanitize=function -fno-sanitize-recover=all'
cmake_build_type: 'RelWithDebInfo'
backend: 'THREADS'
clang-tidy: '-DCMAKE_CXX_CLANG_TIDY="clang-tidy;-warnings-as-errors=*"'
stdcxx: '23'
- distro: 'ubuntu:latest'
cxx: 'clang++'
cxx_extra_flags: '-fsanitize=address,undefined -fno-sanitize=function -fno-sanitize-recover=all'
extra_linker_flags: '-fsanitize=address,undefined -fno-sanitize=function -fno-sanitize-recover=all'
cmake_build_type: 'RelWithDebInfo'
backend: 'SERIAL'
stdcxx: '20'
- distro: 'ubuntu:latest'
cxx: 'g++'
cmake_build_type: 'RelWithDebInfo'
backend: 'THREADS'
stdcxx: '23'
runs-on: ubuntu-latest
container:
image: ghcr.io/kokkos/ci-containers/${{ matrix.distro }}
steps:
- name: Checkout desul
uses: actions/checkout@v3
uses: actions/checkout@v4
with:
repository: desul/desul
ref: 477da9c8f40f8db369c28dd3f93a67e376d8511b
ref: 779d0441a778c7088a36d38c4cbf8df3cfa182cc
path: desul
- name: Install desul
working-directory: desul
Expand All @@ -66,15 +93,12 @@ jobs:
cmake -DDESUL_ENABLE_TESTS=OFF -DCMAKE_INSTALL_PREFIX=/usr/desul-install ..
sudo cmake --build . --target install --parallel 2
- name: Checkout code
uses: actions/checkout@v3
- uses: actions/cache@v3
uses: actions/checkout@v4
- uses: actions/cache@v4
with:
path: ~/.cache/ccache
key: kokkos-${{ matrix.distro }}-${{ matrix.cxx }}-${{ matrix.cmake_build_type }}-${{ matrix.openmp }}-${{ github.ref }}-${{ github.sha }}
restore-keys: kokkos-${{ matrix.distro }}-${{ matrix.cxx }}-${{ matrix.cmake_build_type }}-${{ matrix.openmp }}-${{ github.ref }}
- name: maybe_disable_death_tests
if: ${{ matrix.distro == 'fedora:rawhide' }}
run: echo "GTEST_FILTER=-*DeathTest*" >> $GITHUB_ENV
- name: maybe_use_flang_new
if: ${{ matrix.cxx == 'clang++' && startsWith(matrix.distro,'fedora:') }}
run: echo "FC=flang-new" >> $GITHUB_ENV
Expand All @@ -100,6 +124,8 @@ jobs:
-DKokkos_ENABLE_DEPRECATION_WARNINGS=OFF \
-DKokkos_ENABLE_COMPILER_WARNINGS=ON \
-DCMAKE_CXX_FLAGS="-Werror ${{ matrix.cxx_extra_flags }}" \
-DCMAKE_CXX_STANDARD="${{ matrix.stdcxx }}" \
-DCMAKE_EXE_LINKER_FLAGS="${{ matrix.extra_linker_flags }}" \
-DCMAKE_CXX_COMPILER=${{ matrix.cxx }} \
-DCMAKE_CXX_COMPILER_LAUNCHER=ccache \
-DCMAKE_BUILD_TYPE=${{ matrix.cmake_build_type }}
Expand All @@ -112,6 +138,7 @@ jobs:
working-directory: builddir
run: ctest --output-on-failure
- name: Test linking against build dir
if: ${{ !contains(matrix.cxx_extra_flags, '-fsanitize=address') }}
working-directory: example/build_cmake_installed
run: |
cmake -B builddir_buildtree -DCMAKE_CXX_COMPILER=${{ matrix.cxx }} -DKokkos_ROOT=../../builddir
Expand All @@ -122,6 +149,7 @@ jobs:
- name: Install
run: sudo cmake --build builddir --target install
- name: Test install
if: ${{ !contains(matrix.cxx_extra_flags, '-fsanitize=address') }}
working-directory: example/build_cmake_installed
run: |
cmake -B builddir -DCMAKE_CXX_COMPILER=${{ matrix.cxx }}
Expand Down
13 changes: 11 additions & 2 deletions .github/workflows/osx.yml
Original file line number Diff line number Diff line change
@@ -1,11 +1,20 @@
name: github-OSX

on: [push, pull_request]
on:
push:
branches:
- develop
pull_request:
paths-ignore:
- '**/*.md'
types: [ opened, reopened, synchronize ]

concurrency:
group: ${ {github.event_name }}-${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: ${{github.event_name == 'pull_request'}}

permissions: read-all

jobs:
osxci:
name: osx-ci
Expand All @@ -24,7 +33,7 @@ jobs:
cmake_build_type: "Release"

steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
- name: configure
run:
cmake -B build .
Expand Down
9 changes: 7 additions & 2 deletions .github/workflows/performance-benchmark.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,11 @@ on:
branches:
- develop
pull_request:
paths-ignore:
- '**/*.md'
types: [ opened, reopened, synchronize ]

permissions: read-all

jobs:
CI:
Expand All @@ -20,8 +25,8 @@ jobs:
BUILD_ID: ${{ matrix.distro }}-${{ matrix.cxx }}-${{ matrix.backend }}
steps:
- name: Checkout code
uses: actions/checkout@v3
- uses: actions/cache@v3
uses: actions/checkout@v4
- uses: actions/cache@v4
with:
path: ~/.cache/ccache
key: kokkos-${{ matrix.distro }}-${{ matrix.cxx }}-${{ matrix.backend }}-${{ github.ref }}-${{ github.sha }}
Expand Down
Loading

0 comments on commit 5db4f44

Please sign in to comment.