Skip to content

Latest commit

 

History

History
977 lines (887 loc) · 51.1 KB

CHANGELOG.md

File metadata and controls

977 lines (887 loc) · 51.1 KB

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog.

[1.2.0] - 2024-10-02

Added

  • device functions to simplify writing kernel code #2337 #2369 #2383
  • support Clang 18 and 19 #2387
  • support oneAPI 2024.2 #2368
  • support for mapped memory allocation for the SYCL backend #2375
  • support for pre-commit #2253
  • support for device and constant global variables in the SYCL backend #2242
  • alpaka::meta::isList, alpaka::meta::ToList and alpaka::meta::toTuple #2269
  • accelerator trait to check for single and multi-threads per block #2263
  • trait IsKernelTriviallyCopyable #2302
  • trait AccIsEnabled #2267
  • documentation: cmake flag to build alpaka benchmarks #2272
  • benchmark: babelstream support for different Accs #2299
  • example: using MdSpan to pass 2D data #2293
  • example: 2D heat equation #2365 #2383
  • example: Convolution #2228 #2220

Changed

  • update cheatsheet.rst #2398 #2386 #2241
  • signature of [get|is]ValidWorkDiv* #2349
  • use shared CUDA libraries by default #2348 #2342
  • add thread count to CPU blocks accelerators #2338
  • link libcudart even when libcurand is not used #2329
  • ctest: display only output of tests, which failed #2322
  • example: Matrix Multiplication use MdSpan #2317
  • move the Complex class to internal namespace #2301
  • run examples with all enabled accelerators #2280
  • template order allocMappedBuf #2270
  • slow getWarpSize problem #2246
  • simplification of workdiv creation #2240
  • benchmarks: move from examples into own directory #2237

Fixed

  • [get|is]ValidWorkDiv* #2349 #2335
  • cray clang compiler errors #2392
  • fix and update SYCL targets #2390 #2361
  • single thread acc throw for invalid workdiv fix #2391
  • explicitly call alpaka::detail to achieve SYCL compilation #2385
  • deduction guide for vector #2376
  • issue with device global variables with CUDA 12.4 #2303
  • clang9/nvcc11.2 boost bug #2294
  • HIP: fix CMake relocatable device code option #2290
  • Re-enable AtomicAtomicRef #2288
  • alpaka_add_library relocatable device code #2273
  • forwarding of msvc compiler flag '/Zo' #2266
  • Windows: usage of Idx to alpaka::Idx #2265
  • compiler detection for clang 17 and 18 as CUDA compiler with libstdc++ (gcc) #2256
  • support for non-integral types in Vec generator constructor #2236
  • memcpy warning #2295

Removed

  • support for nvcc11.0 and nvcc11.1 #2310

[1.1.0] - 2024-01-18

Added

  • Warp Shfl- Up, Down and Xor #1924
  • Add alpaka-ls #2175 #2218
  • Named access to Vec .x(), .y() #2201
  • Add CMake presets #2203
  • Add trait IsKernelArgumentTriviallyCopyable #2198
  • Add alpaka::getPreferredWarpSize(dev) #2216
  • ROCm
    • ROCm 5.3 and later support asynchronous memory operations #2197
    • Support for ROCM 5.6 - 6.0 #2207 #2210
    • Use CMake's native HIP support #2215
  • CUDA
    • Support for CUDA 12.3 #2211

Changed

  • Trim device names #2193
  • Change all CUDA warp operations to synchronise all threads #2204

Fixed

  • Fix a few warnings #2164
  • Workaround gcc warning on uninitialized PlatformCpu #2165
  • Fix icpx 2024.0 OpenMP atomics #2213

Removed

  • Remove ALPAKA_ASSERT_OFFLOAD, introduce ALPAKA_ASSERT_ACC #2199
  • ROCm
    • Remove support for HIP ROCm 5.0 #2214

[1.0.0] - 2023-11-14

Added

  • g++:
    • Added support for g++-13 #1967
    • Added support for g++-12 #1721 #1754 #1765 #1867
  • clang++:
    • Added support for clang-17 #2171 #2174
    • Added support for clang-16 #1971 #2006
    • Added support for clang-15 #1898
    • Added support for clang-14 #1766
    • Added support for clang-13 #1756
  • icpx:
    • Added support for the Intel® oneAPI DPC++/C++ Compiler (icpx) #1700 #1706 #1884 #2064 #2081
  • Xcode:
    • Added support for Xcode 14.3.1 #1973
    • Added support for Xcode 14.2 #1899
  • CUDA:
    • Added support for CUDA 12.2 #2043
    • Added support for CUDA 12.1 #1957
    • Added support for CUDA 11.{6,7,8} and 12.0 #1890
  • ROCm:
    • Added support for ROCm 5.5 #1961
    • Added support for ROCm 5.4 #1915
    • Added support for ROCm 5.3 #1838
    • Added support for ROCm 5.2.3 #1812
  • alpaka::math:
    • Added alpaka::math::copysign function #2050
    • Added alpaka::math::log2 and alpaka::math::log10 functions #2029
    • Added alpaka::math::fma functions #2015
    • Added hyperbolic functions #1828 #2030
    • Added constants namespace which contains constants such as π, e, etc. #1710
  • alpaka::Vec:
    • Added generator constructor #2085
    • Added front and back methods #2085
    • Added elementwise_{min,max} methods #1805
    • Vec now features a deduction guide for easier construction #1610
  • Documentation:
    • Added example illustrating typical data-parallel patterns with alpaka #1712
    • Added documentation about the behaviour of constexpr functions in kernel code #1699
    • Added documentation about CUDA function attributes #1697
    • Added documentation about setting the C++ standard library for clang #1695
  • Test cases:
    • Added test for alpaka::ViewSubView #2095
    • Added queue test which checks that a task is destroyed after execution #2047
    • Added test for alpaka::getValidWorkDiv with Idx type #1830
    • Added tests for alpaka::subDivideGridElements #1829
  • CI:
    • Run test cases with -Werror #2163
    • Added UBSan CI job #2059
    • Added CI job to create amalgamated alpaka.hpp #1956 #1965 #1972
    • Made GitLab CI jobs interruptible #1904
    • Updated used Boost and CMake versions #1903 #1969
    • Added agc-manager support #1871 #1921
    • Added TSan CI job #1851 #2103 #2137
    • GitLab CI jobs are now automatically generated #1785 #1889 #1896 #1951 #1952 #2005 #2041
  • Upgraded to clang-format-16 #2147
  • Added alpaka::getPitchesInBytes function which returns all pitches for a given view as an alpaka::Vec #2092 #2093 #2116 #2125
  • Added alpaka::get{Extents,Offsets} functions which return all extents/offsets for a given view as an alpaka::Vec #2080
  • Added alpaka_DISABLE_VENDOR_RNG CMake flag and its corresponding preprocessor macro ALPAKA_DISABLE_VENDOR_RNG to optionally disable vendor RNG libraries #2036
  • Added alpaka port of BabelStream #1846 #1934
  • Added utility functions alpaka::core::{divCeil,intPow,nthRootFloor} #1830
  • Added operator== for alpaka::WorkDivMembers #1829
  • Added alpaka::is{Accelerator,Device,Platform,Queue} variable templates #1818
  • Added accelerator tags which allow for accelerator-specific code paths without enabling the corresponding back-end #1804 #1814
  • Added experimental support for std::mdspan #1788 #2048 #2052 #2053
  • Added alpaka::ViewConst which wraps another view but prevents modifying accesses #1746
  • alpaka::{memcpy,memset} now support temporary destination views #1743
  • Host memory alignment can now be specified by using the ALPAKA_DEFAULT_HOST_MEMORY_ALIGNMENT macro #1686
  • Added alpaka::allocMappedBuf for allocating device-accessible pinned host memory #1685 #1782 #2162
    • Added related trait alpaka::trait::hasMappedBufSupport to query the host CPU for device-accessible pinned memory support #1782
    • Added related utility function alpaka::allocMappedBufIfSupported to allocate device-accessible pinned memory, if supported, and regular memory otherwise #1782 #2120
  • Relocatable device code can now be enabled using the alpaka_RELOCATABLE_DEVICE_CODE CMake option #1467

Changed

  • API changes:
    • Breaking change: alpaka::get{Width,Height,Depth} now always return 1 for unavailable dimensions instead of static_asserting #2148
    • Breaking change: alpaka platforms have been renamed from alpaka::Pltf* to alpaka::Platform* #2024 #2032
    • Breaking change: alpaka platforms are now full objects instead of types #1988 #2051 #2165
    • operator<<(std::ostream&, WorkDivMembers const&) is now a friend of alpaka::WorkDivMembers instead of a method #1829
    • Potentially breaking change: Switched several view-related methods from ALPAKA_FN_HOST_ACC to ALPAKA_FN_HOST #1826
    • Accelerators' copy/move constructors and assignment operators are now explicitly deleted #1825
    • alpaka::test::allocAsyncBufIfSupported was moved into the general namespace alpaka #1782
    • Removed unnecessary attribute ALPAKA_FN_HOST_ACC from defaulted functions #1761
    • The UniformCudaHip types are now templated on traits-like structs which encapsulate the CUDA or HIP API #1665
  • General behavioural changes:
    • Improved handling of CMake generator expressions #2146
    • Improved detection of C++20 features #2138
    • Simplified internals of alpaka_add_{executable,library} #2072 #2082
    • Breaking change: Removed dummy atomics from memory fence implementations. Users now need to guarantee correctness themselves #2071
    • In debug mode MSVC will use the /Od optimization level #1977
    • In debug mode clang-based compilers will explicitly use the -O0 optimization level #1977
    • In debug mode g++ will use the -Og optimization level #1977
    • -Werror and its MSVC equivalent /WX are no longer enabled by default when BUILD_TESTING is set to ON #1977
    • A platform's internal std::vector containing the alpaka::Devices now reserves the necessary memory before initialization #1926
    • Potentially breaking change: ALPAKA_FN_INLINE now enforces inlining for platforms other than CUDA and HIP #1918
    • Replaced alpaka::core::ConcurrentExecPool with alpaka::core::CallbackThread in all queue implementations #1870
    • If no back-end is enabled, alpaka automatically selects the serial back-end for examples and test cases #1843
    • On Linux platforms, the free global memory is now determined by a call to sysconf(_SC_AVPHYS_PAGES) instead of querying /proc/sysinfo #1776
    • Potentially breaking change: Changed CMake's look-up of MSVC's runtime libraries (see here for an in-depth explanation) #1751
    • Unified alpaka::{memcpy,memset}'s internal static_asserts #1748
    • alpaka::core::aligned{Alloc,Free} now internally use aligned new/delete instead of OS-specific APIs #1689
  • CUDA/HIP back-end changes:
    • nvcc now makes correct use of --Werror and more CUDA-related warnings #2135
    • Unified ALPAKA_UNIFORM_CUDA_HIP_RT_CHECK macros #2090
    • Made some internal constants constexpr #2063
    • The CUDA/HIP back-ends will now always use std::size_t for internal pitch calculations #2056
    • Breaking change: clang as CUDA compiler will only work in Release build mode #2027
    • Potentially breaking change: In debug mode ǹvcc will now use the -G flag which enables device-side debug symbols #1977
    • Starting from HIP 5.2.0, the HIP back-end includes <hip/hiprand_kernel.h> instead of <hiprand_kernel.h> #1914
    • Starting from HIP 5.2.0, the HIP back-end makes use of hip{Malloc,Free}Async #1894
    • If clang is used as CUDA compiler together with CUDA 11.3 a warning will be printed #1890
    • Starting from HIP 5.4.0, the HIP back-end internally uses hipLaunchHostFunc instead of a work-around #1883
    • Adapted to API changes in CUDA 11.7's stream memory operations #1878 #1919
    • Shortened mangled CUDA kernel names #1795
    • CUDA runtime versions checks are now based upon CUDART_VERSION instead of BOOST_LANG_CUDA #1777
    • Because of a HIP performance regression the HIP back-end now uses the emulated atomicAdd(float) on the Threads hierarchy level #1771
    • Changed look-up of built-in and emulated atomic functions for the CUDA and HIP back-ends #1768
    • The HIP back-end now uses the built-in atomicAdd(double) #1767
    • CUDA/HIP queues now internally make use of callback threads #1719 #1735 #1976 #2011
  • SYCL back-end changes:
    • Removed unnecessary -fintelfpga flag from CMake build system when compiling the SYCL back-end for Intel FPGAs #2179
    • Breaking change: Support for the activemask intrinsic is disabled for the SYCL back-end #2161
    • Updated README_SYCL.md #2140
    • Breaking change: Reworked CMake handling for SYCL targets #1970 #2066
    • Breaking change: The SYCL back-end now accepts SYCL USM pointers as kernel parameters #1845 #2042
    • Breaking change: The SYCL CPU selector was generalized to both Intel and non-Intel CPUs and therefore renamed #1845
    • Breaking change: The SYCL back-end replaced sycl::stream with printf for device side printing #1845 #2045
    • The SYCL back-end now features a kernel trait which allows to set the SYCL sub-group (= warp) size #1845
    • The SYCL back-end now supports RNG through the Intel oneAPI libraries #1845
    • The SYCL back-end is now based upon the SYCL 2020 specification #1845 #1981
  • RNG changes:
    • Breaking change: Philox RNG is now counter-based and stateless #1792
    • Philox random engines are now trivially copyable #1778
  • Documentation:
    • Improved documentation of ALPAKA_FN_INLINE #2091
    • Reduced example work sizes #2084
    • Improved documentation of alpaka::QueueCpuOmp2Collective #2025
    • Clarified kernel and kernel argument requirements #1944
    • Replaced license headers with SPDX license identifiers #1917
    • Collapsed compiler support matrix in README.md #1860
  • Refactorings:
    • Refactored test classes #2156 #2158
    • Use nested namespace specifiers #2152
    • Removed unnecessary member initialization calls #2151
    • Avoid unnecessary indentions #2149
    • Renamed internal variables of ViewSubViewTest.cpp and ViewPlainPtrTest.cpp to prevent name shadowing #2144
    • Refactored the internals of alpaka::{mapIdx,mapIdxPitchBytes} #2136
    • Replaced Codeplay's STLTuple implementation with std::tuple #2106
    • Replaced ALPAKA_DECAY_T macro with std::decay_t #2104
    • Refactored alpaka::internal::ViewAccessOps #2094
    • Breaking change: Replaced alpaka::createVecFromIndexedFn family of functions with alpaka::Vec's new generator constructor #2085
    • Refactored alpaka::QueueCpuOmp2Collective #2013
    • Refactored alpaka::meta::ndLoop #1999
    • Refactored alpaka::TaskKernelCpuThreads #1998
    • Refactored alpaka::core::ConcurrentExecPool and related classes #1852 #2000
    • Refactored alpaka::subDivideGridElements #1830
    • Refactored includes inside alpaka/dev/cpu/SysInfo.hpp #1776
  • Test changes:
    • Catch2 is no longer built with fast math enabled when using icpx as compiler #2128
    • -pedantic is no longer added when compiling CUDA code #2096
    • Reduced noise from helloWorld, helloWorldLambda and TestTemplate #2076
    • Renamed fenceTest to FenceTest #2037
    • The Any intrinsic unit test now assumes a sub-group size of 4 #2017
    • The NativeHandleTest no longer assumes that a native handle is an int #2008
    • Test cases are now compiled with MSVC's two phase lookup enabled #1986
    • Kernel names in the test cases are now demangled #1983
    • CUDA/HIP/SYCL atomic tests are now restricted to explicitly supported types #1980
    • Test cases are no longer executed for zero-dimensional SYCL accelerators #1979
    • Tests are disabled by default when using alpaka via CMake's add_subdirectory #1912
  • CI changes:
    • Removed unused sanitizer blacklists #2154
    • Simplified CI oneTBB installation #2145
    • The GitLab CI now features runtime tests built with g++ and clang++ #2131 #2141
    • Upgraded ASan CI job to clang-16 #2057
    • Upgraded special CUDA jobs to newer versions #2055
    • Re-enabled g++-9 + CUDA jobs #2040
    • Updated Read the Docs configuration to v2 #2010
    • For ROCm versions <= 5.3 certain warnings are ignored #1932
    • Split compile and runtime CI runners into separate GitLab pipelines #1908
    • Switched more CI runners to C++20 mode #1902
    • LLVM sanitizer libraries are explicitly installed #1900
    • Re-enabled CUDA + gcc-10 jobs #1890
    • Moved all GitHub jobs from ubuntu-latest to ubuntu-20.04 #1872
    • More jobs are only compiling the test cases but no longer execute them #1869
    • CUDA CI runners no longer manually install the GPU driver #1853
    • Change ROCm CI node #1844
    • Reworked Xcode OpenMP installation #1840 #1922
    • Upgraded to GitHub checkout action v3 #1832
    • Upgraded test infrastructure to Catch2 v3 #1749 #1815 #1861 #1911
    • Upgraded headercheck CI run to clang-13 and CUDA 11.2 #1803
    • Simplified CI clang installation #1763
    • Running CI workflows are now automatically cancelled when their corresponding PRs are updated #1717

Deprecated

  • Breaking change: deprecated alpaka::getPitchBytes[Vec] functions in favour of new alpaka::getPitchesInBytes function #2092 #2116
  • Breaking change: deprecated alpaka::get{Extent,Offset}[Vec] functions in favour of new alpaka::get{Extents,Offsets} functions #2080 #2139

Removed

  • g++:
    • Dropped support for g++-{7,8} #1872
  • clang++:
    • Removed work-around for very old clang versions #1916
    • Dropped support for clang as CUDA compiler for all versions before clang-14 #1890
    • Dropped support for clang-{6,7,8,9} #1872
    • Dropped support for clang-5 #1750
  • icpc:
    • Dropped support for the Intel® C++ Compiler Classic (icpc) #1702
  • MSVC:
    • Temporarily dropped support for MSVC + CUDA due to a nvcc bug #1958
    • Dropped support for MSVC 2019 #1887
  • Xcode:
    • Dropped support for Xcode 12.4.0 #1759
  • CUDA:
    • Dropped support for CUDA 10 #1872
    • Dropped support for CUDA 9.2 #1855
  • ROCm:
    • Dropped support for ROCm 4 #1886
  • SYCL:
    • Removed Xilinx platform support #1970
  • Removed floating point contractions for math test cases #2155
  • Removed alpaka::set{Extent,Offset} functions #2087
  • Removed alpaka's experimental accessors #2054 #2062
  • Catch2 is no longer compiled with CATCH_CONFIG_FAST_COMPILE set to ON #1978
  • Removed OpenMP 5 back-end #1947
  • Removed OpenACC back-end #1941
  • Removed warning for Boost 1.73 since alpaka requires Boost >= 1.74 #1849
  • Removed previously deprecated alpaka::time functionality #1841
  • Removed alpaka::{map,unmap,pin,unpin,isPinned,prepareForAsyncCopy}() free functions #1790
  • Removed unused alpaka::ConceptUniformCudaHip #1736
  • Removed Boost.fiber back-end #1718

Fixed

  • Fixed warnings uncovered by nvcc + clang++ -Werror #2157 #2159 #2164 #2167
  • Removed useless semicolon #2129
  • Fixed debug information for SYCL zero-dimensional buffer allocations #2127
  • Fixed missing [[maybe_unused]] inside extent/Traits.hpp #2122
  • Fixed several minor issues with the documentation #2121 #2176
  • Fixed unsigned integer conversion inside ViewAccessOps.hpp #2119
  • Fixed several warnings issued by nvcc #2118
  • Fixed compiler explorer link #2117
  • alpaka::core::detail::ThreadPool now handles a task's noexcept specifier correctly #2115
  • Fixed missing <cstdint> include in BlockSyncBarrierOmp.hpp #2114
  • Fixed integer conversions inside memViewTest #2113
  • Fixed alpaka::BufUniformCudaHipRt declarations sometimes being a struct and sometimes a class #2109
  • Fixed alpaka::wait() behaviour for events and devices #2108
  • Fixed alpaka::ViewPlainPtr not being copyable and moveable #2105
  • Potentially breaking change: Fixed alpaka::core::{CallbackThread,ThreadPool} not propagatinc exceptions #2067
  • Fixed missing ALPAKA_UNIFORM_CUDA_HIP_RT_CHECK calls in debug mode #2034
  • Worked around Catch2 macros not being thread-safe #2022
  • Fixed alpaka::test::KernelExecutionFixture's delegating constructor #2021
  • Fixed missing <cstdint> include in alpaka/rand/Traits.hpp #1977
  • Fixed ill-formed spelling of alpaka::EventUniformCudaHipRt's constructor in C++20 mode #1968
  • Fixed typo in memory fence documentation #1944
  • Fixed compilation issues for CPU-only jobs running on GPU CI runners #1939
  • Fixed clang-specific warning suppression occurring for other compilers in HIP back-end #1914
  • Fixed CI clang installation #1907
  • Fixed CUDA async / mapped memory allocation bug #1868
  • Fixed several bugs related to thread safety #1850 #1975 #1987 #1989 #2026 #2057
  • Fixed alpaka::createView for containers without a size argument #1847
  • Fixed behaviour of alpaka::detail::nextDivisorLowerOrEqual #1829
  • Fixed missing final keyword for accelerator inheritance #1816
  • Fixed missing template parameters in alpaka::allocBuf(host, extent) #1777
  • Fixed look-up of atomic*_block() functions for the CUDA back-end when clang is the device compiler #1773
  • Fixed mixed-type and mixed-precision alpaka::math::pow implementation #1733
  • Fixed alpaka::QueueGenericThreadsNonBlocking not completing running tasks upon its destruction #1728
  • Fixed host memory allocation / pinning on OpenPOWER platforms #1725
  • Fixed alpaka::ffs CPU intrinsic in C++20 mode #1716
  • Fixed typo in cheatsheet example for alpaka::getWorkDiv #1711
  • Fixed missing braces around aggregate initializers #1704
  • Fixed CI installation of CUDA apt repository keys #1703

[0.9.0] - 2022-04-21

Compatibility Changes:

  • Platform support added:
    • oneTBB #1456
    • clang 13 #1476
    • CUDA 11.5 #1486
    • Visual Studio 2022 #1583
    • CUDA 11.6 #1616
    • ROCm 5.0 #1631
    • Xcode 12.4 / 13.2.1 #1638
  • Platform support removed:
    • CUDA 11.0 / 11.1 + MSVC #1331
    • clang 5 + CUDA #1466
    • Ubuntu 18.04 #1471
    • TBB versions before oneTBB #1456
    • clang 6 / 7 + CUDA #1506
    • Boost < 1.74 #1521
    • CUDA 11.3 - 11.5 + clang #1627
    • Xcode 11.3.1 #1638

Bug Fixes:

  • alpaka TBB kernels are now protected when called from within existing parallel TBB code #1450
  • The cheat sheet now reflects the 0.8 changes to alpaka's RNG features #1469
  • alpaka Queues will now wait for active asynchronous operations before destructing #1514
  • The test cases no longer fail on non-x86 hardware because of -Werror #1516
  • Several small fixes for the OpenACC and OpenMP 5 back-ends #1564
  • Avoid locking in CPU atomic operations #1566
  • alpaka's NormReal RNG distribution is now copyable (like the other distributions) #1591
  • The class layout of BufCpu no longer depends on whether the CUDA and HIP back-ends are enabled. #1612
  • Fixed several smaller bugs in alpaka::Vec #1620
  • Destructors no longer throw an exception #1632
  • Implemented work-around for Intel compiler bug with OpenMP back-ends #1677

New Features:

  • alpaka now has native complex number support #1336
  • alpaka now requires C++17 (or newer). This release therefore includes many refactoring PRs that migrate the code base to C++17:
    • Set CMake requirements, remove versions checks, fix warnings, etc. #1466
    • Removed pre-C++17 workarounds #1483
    • Replaced alpaka::meta::apply with std::apply #1493
    • Replaced a lot of macros and template metaprogramming sections with if constexpr blocks #1495
    • Replaced alpaka::meta::Void with std::void_t #1499
    • Replaced some of alpaka's metafunctions with their standard counterparts #1501
    • Make use of C++17 mandatory copy elision #1502
    • Simplified CPU kernel launches #1511
    • Make use of generic std container interfaces #1554
    • Replaced std::enable_if with if constexpr where possible #1556
    • Replaced alpaka::ignore_unused with C++17 [[maybe_unused]] and std::ignore #1563
    • Make use of nested namespaces #1587 #1592
    • Make use of variable template versions of std traits #1594
  • alpaka Events can now be queried for their device type #1479
  • Some alpaka buffers can now be allocated asynchronously within a device queue (queue-ordered memory buffers) #1481
    • This capability can be queried with the hasAsyncBufSupport trait #1578
  • alpaka buffers can now be zero-dimensional (scalar) #1536
  • Apply alpaka::memset and alpaka::memcpy to the whole buffer if no extent is supplied by the user #1547
  • Added an accessor-like interface to buffers and views #1570
  • Host code utilizing the CUDA and HIP back-ends can now be compiled with a non-CUDA/HIP compiler if there is no device code in the translation unit #1567
  • Added alpaka::getNativeHandle() to obtain the back-end specific handles from alpaka Devices #1579
    • alpaka::getNativeHandle() can also be called on Queues and Events #1623
  • Added an experimental SYCL back-end. All SYCL back-end functionality currently lives in the alpaka::experimental namespace. See the README_SYCL.md for more information about the usage and the restrictions of this back-end. #1598
  • alpaka's memory fences can now also be applied to the grid level #1641
  • alpaka::getWarpSize() was renamed to alpaka::getWarpSizes() and will now return a std::vector of supported warp sizes #1644
  • Added previously missing atomic functions for some datatypes #1658
  • ALPAKA_ASSERT is now variadic #1661
  • Documentation updates:
    • Improved installation and usage documentation #1571
    • Added documentation on how to write unit tests #1609
    • The HIP portion of the compiler support matrix has been simplified #1637
    • The OpenMP 5 documentation has been extended #1672

Misc:

  • The CUDA and HIP back-ends no longer explicitly set the device where this is unnecessary #1515
  • clang-tidy's modernization suggestions have been applied to the code base #1584
  • alpaka's math headers have been squashed. For each back-end there is now only one header instead of one for each math function. #1585
  • Updated the Boost predefinition header to reflect the upgrade to Boost 1.74 #1586
  • Removed the alpaka::extent namespace (the contents now live in the main alpaka namespace) #1593
  • Refactored implementations of BufCpu and BufUniformCudaHipRt #1608
  • Removed unnecessary specializations of GetPitchBytes trait #1614
  • All alpaka-specific CMake variables follow the ${PROJECT_NAME}_VARIABLE_FOO_BAR pattern. This means that all alpaka-specific CMake variables look like this: alpaka_VARIABLE_FOO_BAR. #1653
  • alpaka now enforces that kernel arguments are trivially copyable #1635
  • Renamed namespace traits to trait #1651
  • alpaka now enforces that kernel functions are trivially copyable #1654
  • Replaced the internal hipLaunchKernelGGL() call with a kernel<<<...>>>() call #1663
  • BOOST_LANG_HIP will now report a (somewhat) correct version number (for internal consumption) #1664
  • Refactored Queue implementation for CUDA and HIP to reduce code duplication #1667
  • core/CudaHipMath.hpp was merged back into math/MathUniformCudaHipBuiltIn.hpp #1668
  • The OpenMP 5 memory fence no longer explicitly sets the acq_rel memory order clause since it is the default #1673
  • Improved handling of std::shared_ptr inside the CUDA/HIP queues #1674
  • Internally replaced the deprecated cudaStreamAddCallback with cudaLaunchHostFunc #1675
  • Added CUDA- and HIP-specific aliases for Events, Platforms and Buffers #1678

Breaking Changes

  • C++14 is no longer supported (see above)
  • alpaka now uses Boost.Atomic by default if the latter can be found by CMake. This can be turned off by passing -DALPAKA_ACC_CPU_DISABLE_ATOMIC_REF=ON during the CMake configuration phase. #1566
    • When compiling in C++20 mode, alpaka will use std::atomic_ref<T> instead #1671
  • Removed the alpaka::extent namespace (the contents now live in the main alpaka namespace) #1593
  • Kernel arguments are required to be trivially copyable. This was always a requirement but is now enforced by alpaka #1635
  • All alpaka-specific CMake variables follow the ${PROJECT_NAME}_VARIABLE_FOO_BAR pattern. This means that all alpaka-specific CMake variables look like this: alpaka_VARIABLE_FOO_BAR. #1653
  • alpaka::getWarpSize() was renamed to alpaka::getWarpSizes() and now returns a vector of supported warp sizes #1644
  • alpaka::clock() is now deprecated and will be removed in the next release. The compiler will warn about its usage. #1645
  • Renamed namespace traits to trait #1651
  • Removed support for std::function kernel functions #1654
  • Kernel functions are required to be trivially copyable. With the exception of std::function (see bullet point directly above) this was always a requirement but is now enforced by alpaka. #1654

Test Cases / CI:

  • Added clang 11, 12 + CUDA 11 tests to CI #1466
  • Removed Ubuntu 18.04 from CI #1471
  • Upgraded all Linux CI runners to Ubuntu 20.04 #1484
  • Test whether alpaka can be installed through cmake --install #1488
  • GitHub CI runners now use all available cores #1508
  • Migrated all CUDA runners from GitHub to GitLab #1520
  • Refactored matMul test #1526
  • macOS CI runners now install and test OpenMP 2.x #1533
  • Always enable the serial back-end when building the test cases and/or examples #1534
  • GitLab CI runners are executed in stages to reduce CI pressure #1537
  • Fixed the pitch calculation in randomCells2D example #1549
  • Updated test infrastructure to Catch2 v2.13.8 #1557
  • GitLab CI runners will display all required information to locally reproduce the test environment #1589
  • Refactored alpaka::test #1596
  • matMul test will now measure the performance of alpaka::memcpy #1599
  • The move constructors and assignment operators of buffers are now unit-tested #1611
  • Unit tests will now be run with zero dimensionality, too #1619
  • Added more tests for alpaka::Vec #1633
  • Added test for alpaka::Vec being trivially copyable #1639
  • CI runners will retry to download Boost when necessary #1640
  • Updated used CMake versions to their latest point releases #1638 #1649

[0.8.0] - 2021-12-20

Compatibility Changes:

  • Platform support added:
    • clang 12 #1385
    • CUDA 11.4 #1380
    • GCC 11 #1383
    • HIP-clang #1338
    • Xcode 12.5.1 #1385
    • Xcode 13 #1421
  • Platform support removed:
    • clang < 5.0 #1385
    • CUDA < 9.2 #1385
    • GCC < 7.0 #1385
    • HIP-nvcc #1337
    • Visual Studio < 2019 #1385
    • Ubuntu 16.04 #1352
    • Xcode 11.x < 11.3.1 #1385
    • Xcode 12.x < 12.4 #1385

Bug Fixes:

  • Added missing #include <limits> in a few places which would lead to compilation errors for CPU back-ends #1327
  • Added missing std:: to fixed-width integers where necessary #1327
  • Fixed behavior of assert and printf for OpenMP offloading targets #1351
  • ALPAKA_STATIC_ACC_MEM_CONSTANT now works correctly for clang-CUDA as well as HIP #1386
  • The OpenMP 5 and OpenACC back-ends now correctly pass the parameters as an is_trivially_copyable type #1387
  • Fixed alpaka_compiler_option checking for the wrong variable name #1392
  • Fixed a bug in the HIP back-end's peer-to-peer memcpy implementation #1400
  • The CMake function alpaka_compiler_option has been turned into a macro which solves parameter scope issues #1401
  • Fixed compilation error occuring with CUDA >= 11.3 #1404
  • The -pthread flag is now correctly passed to the (host) compiler and linker #1420
  • alpaka now correctly sets the CUDA host compiler #1423
  • alpaka's headers are now treated as CMake SYSTEM headers so that internal warnings no longer annoy users #1451
  • Projects using alpaka can now set ALPAKA_CXX_STANDARD as variable in their CMakeLists.txt without alpaka ignoring this #1463

New Features:

  • alpaka now supports the Philox random number generator #1319
  • alpaka's kernel language now supports memory fences (a.k.a thread fences) #1379
  • alpaka::Vec now supports structured bindings #1393
  • The OpenMP 5 and OpenACC back-ends now support statically mapped memory #1394
  • alpaka now has factory methods for creating memory views #1398
  • If OpenMP >= 5.1 is supported the back-end makes use of atomic capture compare #1411
  • alpaka now experimentally supports accessors instead of pointers to access memory #1433
  • Added wrapper for CUDA's native vector types so that they may be handled like arrays #1435
  • Added function for exact floating point comparisons #1440
  • Added portable implementations of random number distributions (default: TinyMersenneTwister) #1444
  • Added new math functions: isnan, isinf, isfinite #1446
  • New type trait that removes __restrict__ from pointers #1474

Misc:

  • If TBB is enabled CMake is now able to pick up both oneAPI TBB and legacy TBB #1329
  • Eclipse project files are no longer tracked by git #1347
  • OpenACC atomics are now well-defined #1358
  • Headers in alpaka/test are now installable #1360
  • HIP no longer receives special treatment inside alpaka_add_library #1410
  • Removed unnecessary annotations on default constructors #1416
  • Removed unnecessary defaulted and deleted special member functions #1418
  • Removed unnecessary explicit specifiers #1419
  • Simplified implementation of ALPAKA_UNROLL #1437
  • Math traits have been (internally) simplified #1457
  • Visual Studio project files are no longer tracked by git #1464
  • The alpaka CMake project now enables the CXX language by default everywhere (previously some test cases would enable C) #1470

Breaking Changes:

  • Legacy TBB support is deprecated. alpaka will move to oneTBB in 0.9 #1329
  • HIP + nvcc is no longer supported #1337
  • The behavior of ALPAKA_STATIC_ACC_MEM_CONSTANT and ALPAKA_STATIC_ACC_MEM_GLOBAL was changed #1386
  • alpaka::rand::engine::createDefault() now features an additional offset parameter #1434

Test cases / CI:

  • NVHPC is now tested #1308
  • Disabled MSVC + CUDA 11.{0,1} runners due to a CUDA bug #1332
  • Some runtime tests are offloaded to HZDR's own CI #1375
  • install_clang.sh now installs the correct versions of libc++ and libc++abi #1385
  • Fixed overflow in AccDevPropsTest #1395
  • Fixed a bug in the HIP peer-to-peer memcpy test #1399
  • Math tests no longer fail for clang-cuda #1406
  • CUDA/HIP test cases are in part tested by HZDR's own CI #1407, #1409
  • CI now uses clang-format 12.0.1 #1417, #1430
  • atomic tests now also test float and double #1431
  • all test executables have been renamed: executable is now called executableTest #1432
  • test infrastructure is now based on Catch2 v2.13.7 #1461
  • alpaka and all subprojects now only enable CXX by default #1473
  • Removed unnecessary disabling of MySQL #1524
  • Reflect HZDR GitLab CI node changes for HIP #1530

[0.7.0] - 2021-08-03

Compatibility Changes:

  • Visual Studio 2017 is no longer supported #1251
  • 32bit Windows is no longer supported #1251
  • CUDA 11.3 is now supported #1295
  • clang < 9 is no longer supported as CUDA compiler #1300
  • clang 11 is now supported #1310

Bug Fixes:

  • fixed ALPAKA_ACC_CPU_B_OMP2_T_SEQ_ENABLED being checked without being defined #1259

New Features:

  • when no specialization is provided by the user alpaka's math functions will now fall back to ADL to find a candidate #1248
  • the HIP back-end now supports callbacks #1269
  • added warp::shfl functionality #1273
  • added Front and Contains type list meta functions #1306

Misc:

  • alpaka's CMake build system now uses CMake's first-class CUDA support #1146
  • updated documentation for clang-format usage #1222
  • increased the static shared memory size to 47 KiB #1247
  • fixed table markup in README.md #1256
  • added example showcasing how to specialize kernels for particular back-ends #1271
  • removed section comments #1275
  • updated cheatsheet (added warp info, fixed names) #1281

Breaking Changes:

  • alpaka now requires CMake 3.18 or newer #1146
  • the CUDA and HIP back-ends no longer enable fast-math by default #1285
  • the CMake options ALPAKA_CUDA_FAST_MATH and ALPAKA_HIP_FAST_MATH have been replaced by ALPAKA_FAST_MATH #1289
  • the CMake options ALPAKA_CUDA_FTZ and ALPAKA_HIP_FTZ have been replaced by ALPAKA_FTZ #1289
  • the CMake option ALPAKA_CUDA_NVCC_SEPARABLE_COMPILATION has been replaced by the native CMake property CUDA_SEPARABLE_COMPILATION #1289
  • the CMake option ALPAKA_CUDA_NVCC_EXPT_EXTENDED_LAMBDA has been replaced by ALPAKA_CUDA_EXPT_EXTENDED_LAMBDA #1289

Test cases / CI:

  • enabled OpenMP back-ends for more Visual Studio builds #1219
  • fixed gh-pages #1230
  • added ICPC / ICC 2021.x to CI #1235
  • fixed deadlock in Ubuntu 20.04 container #1270
  • now CI-testing CMake 3.20 #1283

[0.6.1] - 2021-06-29

Compatibility Changes:

  • rework implementation of OpenMP schedule support #1279 #1309 #1313 #1341
    • alpaka::omp::Schedule is replaced by ompScheduleKind and ompScheduleChunkSize

Bug Fixes:

  • fix OpenMP 5 shared memory allocation #1254
  • fix static shared memory alignment #1282
  • fix BlockSharedMemStMemberImpl::getVarPtr for last var #1280
  • fix CPU static shared memory implementation #1258
  • unit tests: fix queue test #1266
  • fix CtxBlockOacc: SyncBlockThreads #1291
  • fix assert in DeclareSharedVar (OpenAcc) #1303
  • CMake CUDA: dev compile options not propagated #1294
  • example: fix warning (NVCC+OpenMP) #1307
  • TBB: Add missing header and fix integer namespace #1327
  • OpenAcc: TaskKernelOacc: copyin(all used local vars) #1342
  • port macOSX CI fix from #1283
  • CI: use ubuntu-18.04 for gcc-5 and gcc-6 builds #1252
  • CI: disable GCC 10.3 + NVCC tests #1302
  • CI: MSVC + nvcc workarounds and fixes #1332
  • CI: fix warp test #1339

Misc

  • add ALPAKA_ASSERT_OFFLOAD Macro #1260
  • document return value of empty() and isComplete() #1265
  • Prefer TBBConfig.cmake over FindTBB.cmake #1329

[0.6.0] - 2021-01-20

Compatibility Changes:

  • support for CUDA 11, 11.1, and 11.2 #1076 #1086 #1147 #1231
  • remove support for CUDA 11.0 with MSVC 2019 #1227
  • support for CMake 3.18.0 and 3.19.0 #1087 #1217
  • set minimal HIP version to 3.5 #1110
  • remove CMake HIP module shipped with alpaka #1189
  • set HIP-clang as default compiler for HIP #1113
  • support for NVCC + VS 2019 #1121
  • support for boost-1.74.0 #1142
  • explicitly require backends and do not enable them by default #1111
  • remove support for Xcode 11.1 #1206
  • support Xcode 11.21 - 12.2.0 #1206
  • update to Catch 2.13.3 #1215

Bug Fixes:

  • apply some clang-tidy fixes #1044
  • fix CUDA/HIP accelerator concept usage #1064
  • fix Intel compiler detection #1070
  • CMake: build type CXX flag not passed to nvcc #1073
  • work around Intel ICE (Internal Compiler Error) when using std::decay on empty template parameter packs #1074
  • BoostPredef.hpp: Add redefinition of BOOST_COMP_PGI #1082
  • fix min/max return type deduction #1085
  • CMake: fix boost fiber linking #1088
  • fix HIP-clang compile #1107
  • fix CUDA/HIP cmake flags #1152
  • fix error handling CUDA/HIP #1108
  • ALPAKA_DECAY_T: Fix Intel detection, Add PGI #1116
  • fix how to set HIP target architecture #1112
  • fix and improve block shared mem st member sanity checks #1128
  • HIP: remove copy device2device workaround #1188
  • pass native pointers to kernel instead of buffer objects #1193
  • fix bug in isPinned() and pin() #1196
  • fix marking of unit tests for concepts #1226

New Features:

  • add functions alpaka::atomicAnd et. al. as shortcuts to alpaka::atomicOp<alpaka::AtomicAnd> et. al. #1005
  • warp voting functions #1003 #1049 #1090 #1092
  • Sphinx Doc: Fix Doxygen integration on readthedocs #1042 #1093 #1151
  • add cheat sheet to the docs #1057 #1177
  • extend AccDevProps with shared memory size per block #1084
  • OpenMP 5 target offload backend #1126
  • OpenACC backend #1127
  • option to set OpenMP schedule for the Omp2Blocks backend #1223

Misc

  • tests for BufferSlicing #1024
  • use std::invoke_result_t instead of std::result_of_t when available #1047
  • simplify shared memory usage in tests #1075
  • remove boost::aligned_alloc #1094
  • add unit tests for work div #1095
  • change examples (except reduce) to use getValidWorkDiv #1104
  • example monte-carlo-integration #1106
  • invoke docker run only once instead of twice #1109
  • cpu/SysInfo.hpp: Add #else for cpuid; Add PGI #1119
  • Pgi std atomic workaround #1120
  • make BlockSharedMemDynMember::staticAllocBytes a function #1118
  • add IntrinsicFallback: basic fallback implementations #1122
  • allow ALPAKA_CXX_STANDARD to propagate to nvcc with MSVC 1920 and above #1130
  • add set kernel #1132
  • make Queue test generic to handle QueueGenericThreads* with different devices #1133
  • IdxBtOmp: Add GetIdx specialization for 1d #1140
  • test CMAKE_CXX_EXTENSIONS=OFF #1153
  • change block memory size back to be stored as 32 bit #1187
  • add comments to math function traits that explain valid argument range #1190
  • provide docker_retry #1191
  • add .clang-format file #1204
  • add CI check whether code is correctly formatted #1213
  • make test/common a CMake INTERFACE library #1228

Breaking changes:

The namespace structure of alpaka is now flattened. The script can help you to apply the changes to your code. The script only works if you used the full namespace alpaka::* for alpaka functions.

  • removed namespace alpaka::dev
  • removed namespace alpaka::pltf
  • renamed function alpaka::vec::cast to alpaka::castVec
  • renamed function alpaka::vec::reverse to alpaka::reverseVec
  • renamed function alpaka::vec::concat to alpaka::concatVec
  • removed namespace alpaka::vec
  • removed namespace alpaka::workdiv
  • removed namespace alpaka::acc
  • renamed functors alpaka::atomic::op::And et. al. to alpaka::AtomicAnd et. al. #1185
  • removed namespace alpaka::atomic::op
  • removed namespace alpaka::atomic
  • removed namespace alpaka::queue
  • removed namespace alpaka::idx
  • removed namespace alpaka::dim
  • removed namespace alpaka::kernel
  • removed namespace alpaka::wait
  • removed namespace alpaka::mem
  • removed namespace alpaka::offset
  • removed namespace alpaka::elem
  • removed namespace alpaka::intrinsic
  • renamed function alpaka::event::test to alpaka::isComplete
  • removed namespace alpaka::event
  • removed namespace alpaka::time
  • removed namespace alpaka::example
  • renamed function alpaka::alloc::alloc to alpaka::malloc
  • renamed function alpaka::buf::alloc to alpaka::allocBuf
  • removed namespace alpaka::alloc
  • removed namespace alpaka::buf
  • renamed function alpaka::view::set to alpaka::memset
  • renamed function alpaka::view::copy to alpaka::memcpy
  • removed namespace alpaka::view
  • removed namespace alpaka::block::shared::st
  • removed namespace alpaka::block::shared::dyn
  • removed namespace alpaka::block::sync
  • renamed function getMem to getDynSharedMem #1197
  • renamed function getVar to declareSharedVar #1197
  • renamed function freeMem to freeSharedVars #1197
  • renamed functors alpaka::block::op::LogicalAnd et. al. to alpaka::BlockAnd et. al.
  • removed namespace alpaka::block::op
  • removed namespace alpaka::block

[0.5.0] - 2020-06-26

Compatibility Changes:

  • the minimum required C++ version has been raised from C++11 to C++14 #900
  • drop support for CUDA 8.0 (does not support c++14)
  • drop support for gcc 4.9 (does not support c++14)
  • drop support for CMake versions lower than 3.15 (3.11, 3.12, 3.13 and 3.14)
  • raise minimum supported boost version from 1.62.0 to 1.65.1 #906
  • require HIP version to 3.3.0 #1006
  • drop HIP-hcc support #945

Bug Fixes:

  • fix CMake error #941
  • fix HIP math includes #947
  • fix: missing hipRand and rocRand library #948
  • fix VS 2017 CUDA builds #953
  • fix uninitialized pitch #963
  • fix windows CI builds #965
  • fix conversion warning in TinyMT #997

New Features:

  • add automated gh-pages deployment for branch develop #916
  • unify CUDA/HIP backend #928 #904 #950 #980 #981
  • add support for Visual Studio 2019 #949
  • simplify vector operator construction #977
  • example heat-equation #978
  • extend supported compiler combinations gcc-8+nvcc 10.1-10.2 #985
  • add support for CMake 3.17 #988
  • adds initial files for sphinx/rst and readthedocs. #990 #1017 #1048
  • add support for clang 10 #998
  • add popcount intrinsic #1004
  • emulate hip/cuda-Memcpy3D with a kernel #1014
  • simplify alpaka usage #1017

[0.4.0] - 2020-01-14

Compatibility Changes:

  • added support for CUDA 10.0, 10.1 and 10.2
  • dropped support for CUDA 7.0 and 7.5
  • added official support for Visual Studio 2017 on Windows with CUDA 10 (built on Travis CI instead of appveyor now)
  • added support for xcode10.2-11.3 (no official CUDA support yet)
  • added support for Ubuntu 18.04
  • added support for gcc 9
  • added support for clang 7.0, 8.0 and 9.0
  • dropped support for clang 3.5, 3.6, 3.7, 3.8 and 3.9
  • added support for CMake 3.13, 3.14, 3.15 and 3.16
  • dropped support for CMake 3.11.3 and lower, 3.11.4 is the lowest supported version
  • added support for Boost 1.69, 1.70 and 1.71
  • added support for usage of libc++ instead of libstdc++ for clang builds
  • removed dependency to Boost.MPL and BOOST_CURRENT_FUNCTION
  • replaced Boost.Test with Catch2 using an internal version of Catch2 by default but allowing to use an external one

Bug Fixes:

  • fixed some incorrect host/device function attributes
  • fixed warning about comparison unsigned < 0
  • There is no need to disable all other backends manually when using ALPAKA_ACC_GPU_CUDA_ONLY_MODE anymore
  • fixed static block shared memory of types with alignemnt higher than defaultAlignment
  • fixed race-condition in HIP/NVCC queue
  • fixed data races when a GPU updates host memory by aligning host memory buffers always to 4kib

New Features:

  • Added a new alpaka Logo!
  • the whole alpaka code has been relicensed to MPL2 and the examples to ISC
  • added ALPAKA_CXX_STANDARD CMake option which allows to select the C++ standard to be used
  • added ALPAKA_CUDA_NVCC_SEPARABLE_COMPILATION option to enable separable compilation for nvcc
  • added ALPAKA_CUDA_NVCC_EXPT_EXTENDED_LAMBDA and ALPAKA_CUDA_NVCC_EXPT_RELAXED_CONSTEXPR CMake options to enable/disable those nvcc options (they were always ON before)
  • added headers for standalone usage without CMake (alpaka/standalone/GpuCudaRt.h, ...) which set the backend defines
  • added experimental HIP back-end with using nvcc (HIP >= 1.5.1 required, latest rocRand). More on HIP setup: doc/markdown/user/implementation/mapping/HIP.md
  • added sincos math function implementations
  • allowed to copy and move construct ViewPlainPtr
  • added support for CUDA atomics using "unsigned long int"
  • added compile-time error for atomic CUDA ops which are not available due to sm restrictions
  • added explicit errors for unsupported types/operations for CUDA atomics
  • replaced usages of assert with ALPAKA_ASSERT
  • replaced BOOST_VERIFY by ALPAKA_CHECK and returned success from all test kernels
  • added alpaka::ignore_unused as replacement for boost::ignore_unused

Breaking changes:

  • renamed QueueAsync to QueueNonBlocking and QueueSync to QueueBlocking
  • renamed alpaka::size::Size to alpaka::idx::Idx, alpaka::size::SizeType to alpaka::idx::IdxType (and TSize to TIdx internally)
  • replaced ALPAKA_FN_ACC_NO_CUDA by ALPAKA_FN_HOST
  • replaced ALPAKA_FN_ACC_CUDA_ONLY by direct usage of device
  • renamed ALPAKA_STATIC_DEV_MEM_CONSTANT to ALPAKA_STATIC_ACC_MEM_CONSTANT and ALPAKA_STATIC_DEV_MEM_GLOBAL to ALPAKA_STATIC_ACC_MEM_GLOBAL
  • renamed alpaka::kernel::createTaskExec to alpaka::kernel::createTaskKernel
  • QueueCpuSync now correctly blocks when called from multiple threads
    • This broke some previous use-cases (e.g. usage within existing OpenMP parallel regions)
    • This use case can now be handled with the support for external CPU queues as can bee seen in the example QueueCpuOmp2CollectiveImpl
  • previously it was possible to have kernels return values even though they were always ignored. Now kernels are checked to always return void
  • renamed all files with *Stl suffix to *StdLib
  • renamed BOOST_ARCH_CUDA_DEVICE to BOOST_ARCH_PTX
  • executors have been renamed due to the upcoming standard C++ feature with a different meaning. All files within alpaka/exec/ have been moved to alpaka/kernel/ and the files and classes have been renamed from Exec* to TaskKernel*. This should not affect users of alpaka but will affect extensions.

[0.3.6] - 2020-01-06

Bug Fixes:

  • fix cuda stream race condition #850
  • fix: cuda exceptions #844
  • math/abs: Added trait specialisation for double. #862
  • alpaka/math Overloaded float specialization #837
  • Fixes name conflicts in alpaka math functions. #784

[0.3.5] - 2018-11-18

New Features:

  • used OpenMP atomics instead of critical sections

[0.3.4] - 2018-10-17

Compatibility Changes:

  • added support for boost-1.68.0
  • added support for CUDA 10
  • support for glibc < 2.18 (fix missing macros)
  • added checks for available OpenMP versions

Bug Fixes:

  • fixed empty(StreamCpuAsync) returning true even though the last task is still in progress
  • fixed integer overflows in case of int16_t being used as accelerator index type
  • made some throwing destructors not throwing to support clang 7
  • fixed broken alpaka::math::min for non-integral types

New Features:

  • added prepareForAsyncCopy which can be called to enable async copies for a specific buffer (if it is supported)
  • allowed to run alpaka OpenMP 2 block accelerated kernels within existing parallel region
  • added alpaka::ignore_unused which can be used in kernels

[0.3.3] - 2018-08-10

New Features:

  • added CPU random number generators based on std::random_device and TinyMT32
  • made TinyMT32 the default random number generator
  • added alpaka::ignore_unused

[0.3.2] - 2018-10-17

New Features:

  • Enhanced the compiler compatibility checks within the CMake scripts

Bugs Fixed:

  • fixed missing error in case of wrong OpenMP thread count being used by the runtime that was not triggered when not in debug mode
  • fixed CUDA driver API error handling
  • fixed CUDA memcpy and memset for zero sized buffers (division by zero)
  • fixed OpenMP 4 execution
  • fixed the VS2017 CUDA build (not officially supported)
  • fixed CUDA callback execution not waiting for the task to finish executing
  • fixed cudaOnly test being part of make test when cuda only mode is not enabled

Compatibility Changes:

  • added support for CUDA 9.2

[0.3.1] - 2018-06-11

New Features:

  • CMake: added option to control tests BUILD_TESTING
  • CMake: unified requirement of CMake 3.7.0+
  • CMake: used targets for Boost dependencies
  • CMake: made alpaka a pure interface library

Bugs Fixed:

  • fixed getDevCount documentation
  • fixed undefined define warnings
  • fixed self containing header check for CUDA

[0.3.0] - 2018-03-15

Bugs Fixed:

  • fixed multiple bugs where CPU streams/events could deadlock or behaved different than the native CUDA events
  • fixed a bug where the block synchronization of the Boost.Fiber backend crashed due to uninitialized variables

New Features / Enhancements:

  • added support for stream callbacks allowing to enqueue arbitrary host code using alpaka::stream::enqueue(stream, &{...});
  • added support for compiling for multiple architectures using e.g. ALPAKA_CUDA_ARCH="20;35"
  • added support for using host constexpr code within device code
  • enhanced the CUDA error handling
  • enhanced the documentation for mapping CUDA to alpaka

Compatibility Changes:

  • added support for CUDA 9.0 and 9.1
  • added support for CMake 3.9 and 3.10
  • removed support for CMake 3.6 and older
  • added support for boost-1.65.0
  • removed support for boost-1.61.0 and older
  • added support for gcc 7
  • added support for clang 4 and 5
  • removed support for VS2015

[0.2.0] - 2017-06-19

Compatibility fixes and small enhancements:

  • the documentation has been greatly enhanced
  • adds support for CUDA 8.0
  • adds support for CMake versions 3.6, 3.7 and 3.8
  • adds support for Boost 1.62, 1.63 and 1.64
  • adds support for clang-3.9
  • adds support for Visual Studio 2017
  • alpaka now compiles clean even with clang -Weverything
  • re-enabled the boost::fiber accelerator backend which was disabled in the last release

API changes:

  • mapIdx is moved from namespace alpaka::core to alpaka::idx
  • Vec is moved from namespace alpaka to alpaka::vec
  • vec::Vec is now allowed to be zero-dimensional (was previously forbidden)
  • added vec::concat
  • added element-wise operator< for vec::Vec which returns a vector of bool
  • CPU accelerators now support arbitrary dimensionality (both kernel execution as well as memory operations)
  • added support for syncBlockThreadsPredicate with block::sync::op::LogicalOr, block::sync::op::LogicalAnd and block::sync::op::Count
  • memory allocations are now aligned optimally for the underlying architecture (16 bit for SSE, 32 bit for AVX, 64 bit for AVX512) instead of 16 bit for all architectures in the previous release