Changelog

This file may not always be up to date in particular for the unreleased commits. For a comprehensive list, use the following command:

git log --first-parent

Unreleased

Please visit our wiki Changelog for unreleased changes.

Version 1.8.0

The Ginkgo team is proud to announce the new Ginkgo minor release 1.8.0. This release brings new features such as:

A brand new file-based configuration for Ginkgo objects: you can now construct Ginkgo objects (solvers, preconditioners, ...) from a JSON configuration file. This simplifies interfacing to Ginkgo as well as exploring different settings to solve a problem.
Expand the batched feature set with: the Batched CSR Matrix format, batched CG solver, batched (Block-)Jacobi preconditioner, usage example and other features such as scaling,
New Distributed Multigrid and the PGM coarsening method,
New CUDA and HIP kernels for Reverse Cuthill McKee (RCM) reordering
Better Ginkgo and Kokkos interaction thanks to a mapping from simple Ginkgo types to native Kokkos types

and more!

If you face an issue, please first check our known issues page and the open issues list and if you do not find a solution, feel free to open a new issue or ask a question using the github discussions.

Supported systems and requirements:

For all platforms, CMake 3.16+
C++14 compliant compiler
Linux and macOS
- GCC: 5.5+
- clang: 3.9+
- Intel compiler: 2019+
- Apple Clang: 14.0 is tested. Earlier versions might also work.
- NVHPC: 22.7+
- Cray Compiler: 14.0.1+
- CUDA module: CMake 3.18+, and CUDA 10.1+ or NVHPC 22.7+
- HIP module: CMake 3.21+, and ROCm 4.5+
- DPC++ module: Intel oneAPI 2023.1+ with oneMKL and oneDPL. Set the CXX compiler to dpcpp or icpx.
- MPI: standard version 3.1+, ideally GPU Aware, for best performance
Windows
- MinGW: GCC 5.5+
- Microsoft Visual Studio: VS 2019+
- CUDA module: CUDA 10.1+, Microsoft Visual Studio
- OpenMP module: MinGW.

Version support changes

The Ginkgo license header now uses the SPDX format. #1404
Ginkgo changes the oneapi support to 2023.1+ #1396
Ginkgo's HIP backend now requires CMake 3.21 #1334

Interface changes

The gko::dim single-parameter constructor is now explicit to avoid accidental conversion from integers #1474
The CMake option GINKGO_BUILD_HWLOC is now set to OFF by default, and if it is set to ON, then HWLOC is required to be available #1513.

Behavior changes

gko::write_raw now defaults to writing sparse output unless otherwise specified #1533
Ginkgo now adheres to the --prefix option for cmake --install, instead of overwriting it #1534

Deprecations

array::get_num_elems() has been renamed to get_size() #1400
matrix_data::ensure_row_major_order() has been renamed to sort_row_major() #1400
device_matrix_data::get_num_elems() has been renamed to get_num_stored_elements() #1400
The CMake parameter GINKGO_COMPILER_FLAGS has been superseded by CMAKE_CXX_FLAGS, and GINKGO_CUDA_COMPILER_FLAGS has been superseded by CMAKE_CUDA_FLAGS #1535
The std::initializer_list overloads of matrix create methods and constructors are deprecated in favor of explicit array parameters #1433

Summary of previous deprecations

The device_reset parameter of CUDA and HIP executors no longer has an effect, and its allocation_mode parameters have been deprecated in favor of the Allocator interface.
The CMake parameter GINKGO_BUILD_DPCPP has been deprecated in favor of GINKGO_BUILD_SYCL.
The gko::reorder::Rcm interface has been deprecated in favor of gko::experimental::reorder::Rcm based on Permutation.
The Permutation class' permute_mask functionality.
Multiple functions with typos (set_complex_subpsace(), range functions such as conj_operaton etc).
gko::lend() is not necessary anymore.
The classes RelativeResidualNorm and AbsoluteResidualNorm are deprecated in favor of ResidualNorm.
The class AmgxPgm is deprecated in favor of Pgm.
Default constructors for the CSR load_balance and automatical strategies
The PolymorphicObject's move-semantic copy_from variant
The templated SolverBase class.
The class MachineTopology is deprecated in favor of machine_topology.
Logger constructors and create functions with the executor parameter.
The virtual, protected, Dense functions compute_norm1_impl, add_scaled_impl, etc.
Logger events for solvers and criterion without the additional implicit_tau_sq parameter.
The global gko::solver::default_krylov_dim, use instead gko::solver::gmres_default_krylov_dim.

Added features

Add a batched CG solver #1598, #1609
Add a batched Jacobi (scalar/block) preconditioner, #1542, #1600
Add an example for batched iterative solver #1553
Add add_scaled_identity and scale_add for batch matrix formats. #1528
Add scaling for batch objects (matrix formats and multi-vectors). #1527
Add a batch::Csr matrix format class and core and support for batched spmv kernels on CUDA, HIP and SYCL. #1450
Add a script for comparing benchmark JSON outputs #1467
Add an example for reordered preconditioned linear solver #1465
Add single-value access functions load_value and store_value to array #1485
Add the BlockOperator format to represent block-matrices #1435
Add CUDA and HIP kernels for Reverse Cuthill McKee (RCM) reordering #1503
Add FileConfig #1389, #1392, #1395, #1479, #1480, #1607
Add Distributed Multigrid #1269 and coarsening method PGM #1403
Add a mapping from simple Ginkgo types to native Kokkos types #1358
Add a segmented array class #1545
Add a class for mapping between global and local indexing #1543

Improvements

Ginkgo installation now has separate Ginkgo_Runtime and Ginkgo_Development components for easier packaging #1502
The HIP backend now supports complex number operations for sparse matrices based on hipSPARSE #1538
The create functions are now documented explicitly instead of using the EnableCreateMethod mixin #1433
The solver benchmark now supports Ginkgo's binary format for right-hand side vector inputs #1584
The build system now uses native HIP support for CMake, which also provides support for ROCm 6.0 #1334
The Multigrid solver generated from distributed::Matrix will use a global scalar Jacobi smoother and a GMRES solver as coarse grid solver #1612

Fixes

Compilation with libc++ was fixed #1463
Fix the __cplusplus by _MSVC_LANG in MSVC #1496
Coo::read(const T&) and Csr::read(const T&) will no longer overwrite the locally stored arrays and instead copy directly into them #1476
Fix the interaction of ProfilerHook::create(_nested)_summary, executors and GPU timers, which lead to the summary not being printed #1509
Fix compilation in environments where CPATH contains the current working directory #1531
Fix read from matrix-market files with CR line endings #1557
Fix undefined behavior that shows up with libstdc++ debug builds #1176
Fix for CUDA 12.4 bug and METIS detection #1569
Fix the pkgconfig installation with DESTDIR #1597
Fix various issues causing build or test failures #1619

Version 1.7.0

The Ginkgo team is proud to announce the new Ginkgo minor release 1.7.0. This release brings new features such as:

Complete GPU-resident sparse direct solvers feature set and interfaces,
Improved Cholesky factorization performance,
A new MC64 reordering,
Batched iterative solver support with the BiCGSTAB solver with batched Dense and ELL matrix types,
MPI support for the SYCL backend,
Improved ParILU(T)/ParIC(T) preconditioner convergence, and more!

If you face an issue, please first check our known issues page and the open issues list and if you do not find a solution, feel free to open a new issue or ask a question using the github discussions.

Supported systems and requirements:

For all platforms, CMake 3.16+
C++14 compliant compiler
Linux and macOS
- GCC: 5.5+
- clang: 3.9+
- Intel compiler: 2019+
- Apple Clang: 14.0 is tested. Earlier versions might also work.
- NVHPC: 22.7+
- Cray Compiler: 14.0.1+
- CUDA module: CMake 3.18+, and CUDA 10.1+ or NVHPC 22.7+
- HIP module: ROCm 4.5+
- DPC++ module: Intel oneAPI 2022.1+ with oneMKL and oneDPL. Set the CXX compiler to dpcpp or icpx.
- MPI: standard version 3.1+, ideally GPU Aware, for best performance
Windows
- MinGW: GCC 5.5+
- Microsoft Visual Studio: VS 2019+
- CUDA module: CUDA 10.1+, Microsoft Visual Studio
- OpenMP module: MinGW.

Version support changes

CUDA 9.2 is no longer supported and 10.0 is untested #1382
Ginkgo now requires CMake version 3.16 (and 3.18 for CUDA) #1368

Interface changes

const Factory parameters can no longer be modified through with_* functions, as this breaks const-correctness #1336 #1439

New Deprecations

The device_reset parameter of CUDA and HIP executors no longer has an effect, and its allocation_mode parameters have been deprecated in favor of the Allocator interface. #1315
The CMake parameter GINKGO_BUILD_DPCPP has been deprecated in favor of GINKGO_BUILD_SYCL. #1350
The gko::reorder::Rcm interface has been deprecated in favor of gko::experimental::reorder::Rcm based on Permutation. #1418
The Permutation class' permute_mask functionality. #1415
Multiple functions with typos (set_complex_subpsace(), range functions such as conj_operaton etc). #1348

Summary of previous deprecations

gko::lend() is not necessary anymore.
The classes RelativeResidualNorm and AbsoluteResidualNorm are deprecated in favor of ResidualNorm.
The class AmgxPgm is deprecated in favor of Pgm.
Default constructors for the CSR load_balance and automatical strategies
The PolymorphicObject's move-semantic copy_from variant
The templated SolverBase class.
The class MachineTopology is deprecated in favor of machine_topology.
Logger constructors and create functions with the executor parameter.
The virtual, protected, Dense functions compute_norm1_impl, add_scaled_impl, etc.
Logger events for solvers and criterion without the additional implicit_tau_sq parameter.
The global gko::solver::default_krylov_dim, use instead gko::solver::gmres_default_krylov_dim.

Added features

Adds a batch::BatchLinOp class that forms a base class for batched linear operators such as batched matrix formats, solver and preconditioners #1379
Adds a batch::MultiVector class that enables operations such as dot, norm, scale on batched vectors #1371
Adds a batch::Dense matrix format that stores batched dense matrices and provides gemv operations for these dense matrices. #1413
Adds a batch::Ell matrix format that stores batched Ell matrices and provides spmv operations for these batched Ell matrices. #1416 #1437
Add a batch::Bicgstab solver (class, core, and reference kernels) that enables iterative solution of batched linear systems #1438.
Add device kernels (CUDA, HIP, and DPCPP) for batch::Bicgstab solver. #1443.
New MC64 reordering algorithm which optimizes the diagonal product or sum of a matrix by permuting the rows, and computes additional scaling factors for equilibriation #1120
New interface for (non-symmetric) permutation and scaled permutation of Dense and Csr matrices #1415
LU and Cholesky Factorizations can now be separated into their factors #1432
New symbolic LU factorization algorithm that is optimized for matrices with an almost-symmetric sparsity pattern #1445
Sorting kernels for SparsityCsr on all backends #1343
Allow passing pre-generated local solver as factory parameter for the distributed Schwarz preconditioner #1426
Add DPCPP kernels for Partition #1034, and CSR's check_diagonal_entries and add_scaled_identity functionality #1436
Adds a helper function to create a partition based on either local sizes, or local ranges #1227
Add function to compute arithmetic mean of dense and distributed vectors #1275
Adds icpx compiler supports #1350
All backends can be built simultaneously #1333
Emits a CMake warning in downstream projects that use different compilers than the installed Ginkgo #1372
Reordering algorithms in sparse_blas benchmark #1354
Benchmarks gained an -allocator parameter to specify device allocators #1385
Benchmarks gained an -input_matrix parameter that initializes the input JSON based on the filename #1387
Benchmark inputs can now be reordered as a preprocessing step #1408

Improvements

Significantly improve Cholesky factorization performance #1366
Improve parallel build performance #1378
Allow constrained parallel test execution using CTest resources #1373
Use arithmetic type more inside mixed precision ELL #1414
Most factory parameters of factory type no longer need to be constructed explicitly via .on(exec) #1336 #1439
Improve ParILU(T)/ParIC(T) convergence by using more appropriate atomic operations #1434

Fixes

Fix an over-allocation for OpenMP reductions #1369
Fix DPCPP's common-kernel reduction for empty input sizes #1362
Fix several typos in the API and documentation #1348
Fix inconsistent Threads between generations #1388
Fix benchmark median condition #1398
Fix HIP 5.6.0 compilation #1411
Fix missing destruction of rand_generator from cuda/hip #1417
Fix PAPI logger destruction order #1419
Fix TAU logger compilation #1422
Fix relative criterion to not iterate if the residual is already zero #1079
Fix memory_order invocations with C++20 changes #1402
Fix check_diagonal_entries_exist report correctly when only missing diagonal value in the last rows. #1440
Fix checking OpenMPI version in cross-compilation settings #1446
Fix false-positive deprecation warnings in Ginkgo, especially for the old Rcm (it doesn't emit deprecation warnings anymore as a result but is still considered deprecated) #1444

Version 1.6.0

The Ginkgo team is proud to announce the new Ginkgo minor release 1.6.0. This release brings new features such as:

Several building blocks for GPU-resident sparse direct solvers like symbolic and numerical LU and Cholesky factorization, ...,
A distributed Schwarz preconditioner,
New FGMRES and GCR solvers,
Distributed benchmarks for the SpMV operation, solvers, ...
Support for non-default streams in the CUDA and HIP backends,
Mixed precision support for the CSR SpMV,
A new profiling logger which integrates with NVTX, ROCTX, TAU and VTune to provide internal Ginkgo knowledge to most HPC profilers!

and much more.

If you face an issue, please first check our known issues page and the open issues list and if you do not find a solution, feel free to open a new issue or ask a question using the github discussions.

Supported systems and requirements:

For all platforms, CMake 3.13+
C++14 compliant compiler
Linux and macOS
- GCC: 5.5+
- clang: 3.9+
- Intel compiler: 2018+
- Apple Clang: 14.0 is tested. Earlier versions might also work.
- NVHPC: 22.7+
- Cray Compiler: 14.0.1+
- CUDA module: CUDA 9.2+ or NVHPC 22.7+
- HIP module: ROCm 4.5+
- DPC++ module: Intel OneAPI 2021.3+ with oneMKL and oneDPL. Set the CXX compiler to dpcpp.
Windows
- MinGW: GCC 5.5+
- Microsoft Visual Studio: VS 2019+
- CUDA module: CUDA 9.2+, Microsoft Visual Studio
- OpenMP module: MinGW.

Version Support Changes

ROCm 4.0+ -> 4.5+ after #1303
Removed Cygwin pipeline and support #1283

Interface Changes

Due to internal changes, ConcreteExecutor::run will now always throw if the corresponding module for the ConcreteExecutor is not build #1234
The constructor of experimental::distributed::Vector was changed to only accept local vectors as std::unique_ptr #1284
The default parameters for the solver::MultiGrid were improved. In particular, the smoother defaults to one iteration of Ir with Jacobi preconditioner, and the coarse grid solver uses the new direct solver with LU factorization. #1291 #1327
The iteration_complete event gained a more expressive overload with additional parameters, the old overloads were deprecated. #1288 #1327

Deprecations

Deprecated less expressive iteration_complete event. Users are advised to now implement the function void iteration_complete(const LinOp* solver, const LinOp* b, const LinOp* x, const size_type& it, const LinOp* r, const LinOp* tau, const LinOp* implicit_tau_sq, const array<stopping_status>* status, bool stopped) #1288

Added Features

A distributed Schwarz preconditioner. #1248
A GCR solver #1239
Flexible Gmres solver #1244
Enable Gmres solver for distributed matrices and vectors #1201
An example that uses Kokkos to assemble the system matrix #1216
A symbolic LU factorization allowing the gko::experimental::factorization::Lu and gko::experimental::solver::Direct classes to be used for matrices with non-symmetric sparsity pattern #1210
A numerical Cholesky factorization #1215
Symbolic factorizations in host-side operations are now wrapped in a host-side Operation to make their execution visible to loggers. This means that profiling loggers and benchmarks are no longer missing a separate entry for their runtime #1232
Symbolic factorization benchmark #1302
The ProfilerHook logger allows annotating the Ginkgo execution (apply, operations, ...) for profiling frameworks like NVTX, ROCTX and TAU. #1055
ProfilerHook::created_(nested_)summary allows the generation of a lightweight runtime profile over all Ginkgo functions written to a user-defined stream #1270 for both host and device timing functionality #1313
It is now possible to enable host buffers for MPI communications at runtime even if the compile option GINKGO_FORCE_GPU_AWARE_MPI is set. #1228
A stencil matrices generator (5-pt, 7-pt, 9-pt, and 27-pt) for benchmarks #1204
Distributed benchmarks (multi-vector blas, SpMV, solver) #1204
Benchmarks for CSR sorting and lookup #1219
A timer for MPI benchmarks that reports the longest time #1217
A timer_method=min|max|average|median flag for benchmark timing summary #1294
Support for non-default streams in CUDA and HIP executors #1236
METIS integration for nested dissection reordering #1296
SuiteSparse AMD integration for fillin-reducing reordering #1328
Csr mixed-precision SpMV support #1319
A with_loggers function for all Factory parameters #1337

Improvements

Improve naming of kernel operations for loggers #1277
Annotate solver iterations in ProfilerHook #1290
Allow using the profiler hooks and inline input strings in benchmarks #1342
Allow passing smart pointers in place of raw pointers to most matrix functions. This means that things like vec->compute_norm2(x.get()) or vec->compute_norm2(lend(x)) can be simplified to vec->compute_norm2(x) #1279 #1261
Catch overflows in prefix sum operations, which makes Ginkgo's operations much less likely to crash. This also improves the performance of the prefix sum kernel #1303
Make the installed GinkgoConfig.cmake file relocatable and follow more best practices #1325

Fixes

Fix OpenMPI version check #1200
Fix the mpi cxx type binding by c binding #1306
Fix runtime failures for one-sided MPI wrapper functions observed on some OpenMPI versions #1249
Disable thread pinning with GPU executors due to poor performance #1230
Fix hwloc version detection #1266
Fix PAPI detection in non-implicit include directories #1268
Fix PAPI support for newer PAPI versions: #1321
Fix pkg-config file generation for library paths outside prefix #1271
Fix various build failures with ROCm 5.4, CUDA 12 and OneAPI 6 #1214, #1235, #1251
Fix incorrect read for skew-symmetric MatrixMarket files with explicit diagonal entries #1272
Fix handling of missing diagonal entries in symbolic factorizations #1263
Fix segmentation fault in benchmark matrix construction #1299
Fix the stencil matrix creation for benchmarking #1305
Fix the additional residual check in IR #1307
Fix the cuSPARSE CSR SpMM issue on single strided vector when cuda >= 11.6 #1322 #1331
Fix Isai generation for large sparsity powers #1327
Fix Ginkgo compilation and test with NVHPC >= 22.7 #1331
Fix Ginkgo compilation of 32 bit binaries with MSVC #1349

Version 1.5.0

The Ginkgo team is proud to announce the new Ginkgo minor release 1.5.0. This release brings many important new features such as:

MPI-based multi-node support for all matrix formats and most solvers;
full DPC++/SYCL support,
functionality and interface for GPU-resident sparse direct solvers,
an interface for wrapping solvers with scaling and reordering applied,
a new algebraic Multigrid solver/preconditioner,
improved mixed-precision support,
support for device matrix assembly,

and much more.

If you face an issue, please first check our known issues page and the open issues list and if you do not find a solution, feel free to open a new issue or ask a question using the github discussions.

Supported systems and requirements:

For all platforms, CMake 3.13+
C++14 compliant compiler
Linux and macOS
- GCC: 5.5+
- clang: 3.9+
- Intel compiler: 2018+
- Apple LLVM: 8.0+
- NVHPC: 22.7+
- Cray Compiler: 14.0.1+
- CUDA module: CUDA 9.2+ or NVHPC 22.7+
- HIP module: ROCm 4.0+
- DPC++ module: Intel OneAPI 2021.3 with oneMKL and oneDPL. Set the CXX compiler to dpcpp.
Windows
- MinGW and Cygwin: GCC 5.5+
- Microsoft Visual Studio: VS 2019
- CUDA module: CUDA 9.2+, Microsoft Visual Studio
- OpenMP module: MinGW or Cygwin.

Algorithm and important feature additions

Add MPI-based multi-node for all matrix formats and solvers (except GMRES and IDR). (#676, #908, #909, #932, #951, #961, #971, #976, #985, #1007, #1030, #1054, #1100, #1148)
Porting the remaining algorithms (preconditioners like ISAI, Jacobi, Multigrid, ParILU(T) and ParIC(T)) to DPC++/SYCL, update to SYCL 2020, and improve support and performance (#896, #924, #928, #929, #933, #943, #960, #1057, #1110, #1142)
Add a Sparse Direct interface supporting GPU-resident numerical LU factorization, symbolic Cholesky factorization, improved triangular solvers, and more (#957, #1058, #1072, #1082)
Add a ScaleReordered interface that can wrap solvers and automatically apply reorderings and scalings (#1059)
Add a Multigrid solver and improve the aggregation based PGM coarsening scheme (#542, #913, #980, #982, #986)
Add infrastructure for unified, lambda-based, backend agnostic, kernels and utilize it for some simple kernels (#833, #910, #926)
Merge different CUDA, HIP, DPC++ and OpenMP tests under a common interface (#904, #973, #1044, #1117)
Add a device_matrix_data type for device-side matrix assembly (#886, #963, #965)
Add support for mixed real/complex BLAS operations (#864)
Add a FFT LinOp for all but DPC++/SYCL (#701)
Add FBCSR support for NVIDIA and AMD GPUs and CPUs with OpenMP (#775)
Add CSR scaling (#848)
Add array::const_view and equivalent to create constant matrices from non-const data (#890)
Add a RowGatherer LinOp supporting mixed precision to gather dense matrix rows (#901)
Add mixed precision SparsityCsr SpMV support (#970)
Allow creating CSR submatrix including from (possibly discontinuous) index sets (#885, #964)
Add a scaled identity addition (M <- aI + bM) feature interface and impls for Csr and Dense (#942)

Deprecations and important changes

Deprecate AmgxPgm in favor of the new Pgm name. (#1149).
Deprecate specialized residual norm classes in favor of a common ResidualNorm class (#1101)
Deprecate CamelCase non-polymorphic types in favor of snake_case versions (like array, machine_topology, uninitialized_array, index_set) (#1031, #1052)
Bug fix: restrict gko::share to rvalue references (possible interface break) (#1020)
Bug fix: when using cuSPARSE's triangular solvers, specifying the factory parameter num_rhs is now required when solving for more than one right-hand side, otherwise an exception is thrown (#1184).
Drop official support for old CUDA < 9.2 (#887)

Improved performance additions

Reuse tmp storage in reductions in solvers and add a mutable workspace to all solvers (#1013, #1028)
Add HIP unsafe atomic option for AMD (#1091)
Prefer vendor implementations for Dense dot, conj_dot and norm2 when available (#967).
Tuned OpenMP SellP, COO, and ELL SpMV kernels for a small number of RHS (#809)

Fixes

Fix various compilation warnings (#1076, #1183, #1189)
Fix issues with hwloc-related tests (#1074)
Fix include headers for GCC 12 (#1071)
Fix for simple-solver-logging example (#1066)
Fix for potential memory leak in Logger (#1056)
Fix logging of mixin classes (#1037)
Improve value semantics for LinOp types, like moved-from state in cross-executor copy/clones (#753)
Fix some matrix SpMV and conversion corner cases (#905, #978)
Fix uninitialized data (#958)
Fix CUDA version requirement for cusparseSpSM (#953)
Fix several issues within bash-script (#1016)
Fixes for NVHPC compiler support (#1194)

Other additions

Simplify and properly name GMRES kernels (#861)
Improve pkg-config support for non-CMake libraries (#923, #1109)
Improve gdb pretty printer (#987, #1114)
Add a logger highlighting inefficient allocation and copy patterns (#1035)
Improved and optimized test random matrix generation (#954, #1032)
Better CSR strategy defaults (#969)
Add move_from to PolymorphicObject (#997)
Remove unnecessary device_guard usage (#956)
Improvements to the generic accessor for mixed-precision (#727)
Add a naive lower triangular solver implementation for CUDA (#764)
Add support for int64 indices from CUDA 11 onward with SpMV and SpGEMM (#897)
Add a L1 norm implementation (#900)
Add reduce_add for arrays (#831)
Add utility to simplify Dense View creation from an existing Dense vector (#1136).
Add a custom transpose implementation for Fbcsr and Csr transpose for unsupported vendor types (#1123)
Make IDR random initialization deterministic (#1116)
Move the algorithm choice for triangular solvers from Csr::strategy_type to a factory parameter (#1088)
Update CUDA archCoresPerSM (#1175)
Add kernels for Csr sparsity pattern lookup (#994)
Differentiate between structural and numerical zeros in Ell/Sellp (#1027)
Add a binary IO format for matrix data (#984)
Add a tuple zip_iterator implementation (#966)
Simplify kernel stubs and declarations (#888)
Simplify GKO_REGISTER_OPERATION with lambdas (#859)
Simplify copy to device in tests and examples (#863)
More verbose output to array assertions (#858)
Allow parallel compilation for Jacobi kernels (#871)
Change clang-format pointer alignment to left (#872)
Various improvements and fixes to the benchmarking framework (#750, #759, #870, #911, #1033, #1137)
Various documentation improvements (#892, #921, #950, #977, #1021, #1068, #1069, #1080, #1081, #1108, #1153, #1154)
Various CI improvements (#868, #874, #884, #889, #899, #903, #922, #925, #930, #936, #937, #958, #882, #1011, #1015, #989, #1039, #1042, #1067, #1073, #1075, #1083, #1084, #1085, #1139, #1178, #1187)

Version 1.4.0

The Ginkgo team is proud to announce the new Ginkgo minor release 1.4.0. This release brings most of the Ginkgo functionality to the Intel DPC++ ecosystem which enables Intel-GPU and CPU execution. The only Ginkgo features which have not been ported yet are some preconditioners.

Ginkgo's mixed-precision support is greatly enhanced thanks to:

The new Accessor concept, which allows writing kernels featuring on-the-fly memory compression, among other features. The accessor can be used as header-only, see the accessor BLAS benchmarks repository as a usage example.
All LinOps now transparently support mixed-precision execution. By default, this is done through a temporary copy which may have a performance impact but already allows mixed-precision research.

Native mixed-precision ELL kernels are implemented which do not see this cost. The accessor is also leveraged in a new CB-GMRES solver which allows for performance improvements by compressing the Krylov basis vectors. Many other features have been added to Ginkgo, such as reordering support, a new IDR solver, Incomplete Cholesky preconditioner, matrix assembly support (only CPU for now), machine topology information, and more!

Supported systems and requirements:

For all platforms, cmake 3.13+
C++14 compliant compiler
Linux and MacOS
- gcc: 5.3+, 6.3+, 7.3+, all versions after 8.1+
- clang: 3.9+
- Intel compiler: 2018+
- Apple LLVM: 8.0+
- CUDA module: CUDA 9.0+
- HIP module: ROCm 4.0+
- DPC++ module: Intel OneAPI 2021.3. Set the CXX compiler to dpcpp.
Windows
- MinGW and Cygwin: gcc 5.3+, 6.3+, 7.3+, all versions after 8.1+
- Microsoft Visual Studio: VS 2019
- CUDA module: CUDA 9.0+, Microsoft Visual Studio
- OpenMP module: MinGW or Cygwin.

Algorithm and important feature additions

Add a new DPC++ Executor for SYCL execution and other base utilities #648, #661, #757, #832
Port matrix formats, solvers and related kernels to DPC++. For some kernels, also make use of a shared kernel implementation for all executors (except Reference). #710, #799, #779, #733, #844, #843, #789, #845, #849, #855, #856
Add accessors which allow multi-precision kernels, among other things. #643, #708
Add support for mixed precision operations through apply in all LinOps. #677
Add incomplete Cholesky factorizations and preconditioners as well as some improvements to ILU. #672, #837, #846
Add an AMGX implementation and kernels on all devices but DPC++. #528, #695, #860
Add a new mixed-precision capability solver, Compressed Basis GMRES (CB-GMRES). #693, #763
Add the IDR(s) solver. #620
Add a new fixed-size block CSR matrix format (for the Reference executor). #671, #730
Add native mixed-precision support to the ELL format. #717, #780
Add Reverse Cuthill-McKee reordering #500, #649
Add matrix assembly support on CPUs. #644
Extends ISAI from triangular to general and spd matrices. #690

Other additions

Add possibility to apply real matrices to complex vectors. #655, #658
Add functions to compute the absolute of a matrix format. #636
Add symmetric permutation and improve existing permutations. #684, #657, #663
Add a MachineTopology class with HWLOC support #554, #697
Add an implicit residual norm criterion. #702, #818, #850
Row-major accessor is generalized to more than 2 dimensions and a new "block column-major" accessor has been added. #707
Add an heat equation example. #698, #706
Add ccache support in CMake and CI. #725, #739
Allow tuning and benchmarking variables non intrusively. #692
Add triangular solver benchmark #664
Add benchmarks for BLAS operations #772, #829
Add support for different precisions and consistent index types in benchmarks. #675, #828
Add a Github bot system to facilitate development and PR management. #667, #674, #689, #853
Add Intel (DPC++) CI support and enable CI on HPC systems. #736, #751, #781
Add ssh debugging for Github Actions CI. #749
Add pipeline segmentation for better CI speed. #737

Changes

Add a Scalar Jacobi specialization and kernels. #808, #834, #854
Add implicit residual log for solvers and benchmarks. #714
Change handling of the conjugate in the dense dot product. #755
Improved Dense stride handling. #774
Multiple improvements to the OpenMP kernels performance, including COO, an exclusive prefix sum, and more. #703, #765, #740
Allow specialization of submatrix and other dense creation functions in solvers. #718
Improved Identity constructor and treatment of rectangular matrices. #646
Allow CUDA/HIP executors to select allocation mode. #758
Check if executors share the same memory. #670
Improve test install and smoke testing support. #721
Update the JOSS paper citation and add publications in the documentation. #629, #724
Improve the version output. #806
Add some utilities for dim and span. #821
Improved solver and preconditioner benchmarks. #660
Improve benchmark timing and output. #669, #791, #801, #812

Fixes

Sorting fix for the Jacobi preconditioner. #659
Also log the first residual norm in CGS #735
Fix BiCG and HIP CSR to work with complex matrices. #651
Fix Coo SpMV on strided vectors. #807
Fix segfault of extract_diagonal, add short-and-fat test. #769
Fix device_reset issue by moving counter/mutex to device. #810
Fix EnableLogging superclass. #841
Support ROCm 4.1.x and breaking HIP_PLATFORM changes. #726
Decreased test size for a few device tests. #742
Fix multiple issues with our CMake HIP and RPATH setup. #712, #745, #709
Cleanup our CMake installation step. #713
Various simplification and fixes to the Windows CMake setup. #720, #785
Simplify third-party integration. #786
Improve Ginkgo device arch flags management. #696
Other fixes and improvements to the CMake setup. #685, #792, #705, #836
Clarification of dense norm documentation #784
Various development tools fixes and improvements #738, #830, #840
Make multiple operators/constructors explicit. #650, #761
Fix some issues, memory leaks and warnings found by MSVC. #666, #731
Improved solver memory estimates and consistent iteration counts #691
Various logger improvements and fixes #728, #743, #754
Fix for ForwardIterator requirements in iterator_factory. #665
Various benchmark fixes. #647, #673, #722
Various CI fixes and improvements. #642, #641, #795, #783, #793, #852

Version 1.3.0

The Ginkgo team is proud to announce the new minor release of Ginkgo version 1.3.0. This release brings CUDA 11 support, changes the default C++ standard to be C++14 instead of C++11, adds a new Diagonal matrix format and capacity for diagonal extraction, significantly improves the CMake configuration output format, adds the Ginkgo paper which got accepted into the Journal of Open Source Software (JOSS), and fixes multiple issues.

Supported systems and requirements:

For all platforms, cmake 3.9+
Linux and MacOS
- gcc: 5.3+, 6.3+, 7.3+, all versions after 8.1+
- clang: 3.9+
- Intel compiler: 2017+
- Apple LLVM: 8.0+
- CUDA module: CUDA 9.0+
- HIP module: ROCm 2.8+
Windows
- MinGW and Cygwin: gcc 5.3+, 6.3+, 7.3+, all versions after 8.1+
- Microsoft Visual Studio: VS 2017 15.7+
- CUDA module: CUDA 9.0+, Microsoft Visual Studio
- OpenMP module: MinGW or Cygwin.

The current known issues can be found in the known issues page.

Additions

Add paper for Journal of Open Source Software (JOSS). #479
Add a DiagonalExtractable interface. #563
Add a new diagonal Matrix Format. #580
Add Cuda11 support. #603
Add information output after CMake configuration. #610
Add a new preconditioner export example. #595
Add a new cuda-memcheck CI job. #592

Changes

Use unified memory in CUDA debug builds. #621
Improve BENCHMARKING.md with more detailed info. #619
Use C++14 standard instead of C++11. #611
Update the Ampere sm information and CudaArchitectureSelector. #588

Fixes

Fix documentation warnings and errors. #624
Fix warnings for diagonal matrix format. #622
Fix criterion factory parameters in CUDA. #586
Fix the norm-type in the examples. #612
Fix the WAW race in OpenMP is_sorted_by_column_index. #617
Fix the example's exec_map by creating the executor only if requested. #602
Fix some CMake warnings. #614
Fix Windows building documentation. #601
Warn when CXX and CUDA host compiler do not match. #607
Fix reduce_add, prefix_sum, and doc-build. #593
Fix find_library(cublas) issue on machines installing multiple cuda. #591
Fix allocator in sellp read. #589
Fix the CAS with HIP and NVIDIA backends. #585

Deletions

Remove unused preconditioner parameter in LowerTrs. #587

Version 1.2.0

The Ginkgo team is proud to announce the new minor release of Ginkgo version 1.2.0. This release brings full HIP support to Ginkgo, new preconditioners (ParILUT, ISAI), conversion between double and float for all LinOps, and many more features and fixes.

Supported systems and requirements:

For all platforms, cmake 3.9+
Linux and MacOS
- gcc: 5.3+, 6.3+, 7.3+, all versions after 8.1+
- clang: 3.9+
- Intel compiler: 2017+
- Apple LLVM: 8.0+
- CUDA module: CUDA 9.0+
- HIP module: ROCm 2.8+
Windows
- MinGW and Cygwin: gcc 5.3+, 6.3+, 7.3+, all versions after 8.1+
- Microsoft Visual Studio: VS 2017 15.7+
- CUDA module: CUDA 9.0+, Microsoft Visual Studio
- OpenMP module: MinGW or Cygwin.

The current known issues can be found in the known issues page.

Additions

Here are the main additions to the Ginkgo library. Other thematic additions are listed below.

Add full HIP support to Ginkgo #344, #357, #384, #373, #391, #396, #395, #393, #404, #439, #443, #567
Add a new ISAI preconditioner #489, #502, #512, #508, #520
Add support for ParILUT and ParICT factorization with ILU preconditioners #400
Add a new BiCG solver #438
Add a new permutation matrix format #352, #469
Add CSR SpGEMM support #386, #398, #418, #457
Add CSR SpGEAM support #556
Make all solvers and preconditioners transposable #535
Add CsrBuilder and CooBuilder for intrusive access to matrix arrays #437
Add a standard-compliant allocator based on the Executors #504
Support conversions for all LinOp between double and float #521
Add a new boolean to the CUDA and HIP executors to control DeviceReset (default off) #557
Add a relaxation factor to IR to represent Richardson Relaxation #574
Add two new stopping criteria, for relative (to norm(b)) and absolute residual norm #577

Example additions

Templatize all examples to simplify changing the precision #513
Add a new adaptive precision block-Jacobi example #507
Add a new IR example #522
Add a new Mixed Precision Iterative Refinement example #525
Add a new example on iterative trisolves in ILU preconditioning #526, #536, #550

Compilation and library changes

Auto-detect compilation settings based on environment #435, #537
Add SONAME to shared libraries #524
Add clang-cuda support #543

Other additions

Add sorting, searching and merging kernels for GPUs #403, #428, #417, #455
Add gko::as support for smart pointers #493
Add setters and getters for criterion factories #527
Add a new method to check whether a solver uses x as an initial guess #531
Add contribution guidelines #549

Fixes

Algorithms

Improve the classical CSR strategy's performance #401
Improve the CSR automatical strategy #407, #559
Memory, speed improvements to the ELL kernel #411
Multiple improvements and fixes to ParILU #419, #427, #429, #456, #544
Fix multiple issues with GMRES #481, #523, #575
Optimize OpenMP matrix conversions #505
Ensure the linearity of the ILU preconditioner #506
Fix IR's use of the advanced apply #522
Fix empty matrices conversions and add tests #560

Other core functionalities

Fix complex number support in our math header #410
Fix CUDA compatibility of the main ginkgo header #450
Fix isfinite issues #465
Fix the Array::view memory leak and the array/view copy/move #485
Fix typos preventing use of some interface functions #496
Fix the gko::dim to abide to the C++ standard #498
Simplify the executor copy interface #516
Optimize intermediate storage for Composition #540
Provide an initial guess for relevant Compositions #561
Better management of nullptr as criterion #562
Fix the norm calculations for complex support #564

CUDA and HIP specific

Use the return value of the atomic operations in our wrappers #405
Improve the portability of warp lane masks #422
Extract thread ID computation into a separate function #464
Reorder kernel parameters for consistency #474
Fix the use of pragma unroll in HIP #492

Other

Fix the Ginkgo CMake installation files #414, #553
Fix the Windows compilation #415
Always use demangled types in error messages #434, #486
Add CUDA header dependency to appropriate tests #452
Fix several sonarqube or compilation warnings #453, #463, #532, #569
Add shuffle tests #460
Fix MSVC C2398 error #490
Fix missing interface tests in test install #558

Tools and ecosystem

Benchmarks

Add better norm support in the benchmarks #377
Add CUDA 10.1 generic SpMV support in benchmarks #468, #473
Add sparse library ILU in benchmarks #487
Add overhead benchmarking capacities #501
Allow benchmarking from a matrix list file #503
Fix benchmarking issue with JSON and non-finite numbers #514
Fix benchmark logger crashers with OpenMP #565

CI related

Improvements to the CI setup with HIP compilation #421, #466
Add MacOSX CI support #470, #488
Add Windows CI support #471, #488, #510, #566
Use sanitizers instead of valgrind #476
Add automatic container generation and update facilities #499
Fix the CI parallelism settings #517, #538, #539
Make the codecov patch check informational #519
Add support for LLVM sanitizers with improved thread sanitizer support #578

Test suite

Add an assertion for sparsity pattern equality #416
Add core and reference multiprecision tests support #448
Speed up GPU tests by avoiding device reset #467
Change test matrix location string #494

Other

Add Ginkgo badges from our tools #413
Update the create_new_algorithm.sh script #420
Bump copyright and improve license management #436, #433
Set clang-format minimum requirement #441, #484
Update git-cmake-format #446, #484
Disable the development tools by default #442
Add a script for automatic header formatting #447
Add GDB pretty printer for gko::Array #509
Improve compilation speed #533
Add editorconfig support #546
Add a compile-time check for header self-sufficiency #552

Version 1.1.1

This version of Ginkgo provides a few fixes in Ginkgo's core routines. The supported systems and requirements are unchanged from version 1.1.0.

Fixes

Improve Ginkgo's installation and fix the test_install step (#406),
Fix some documentation issues (#406),
Fix multiple code issues reported by sonarqube (#406),
Update the git-cmake-format repository (#399),
Improve the global update header script (#390),
Fix broken bounds checks (#388),
Fix CSR strategies and improve performance (#379),
Fix a small typo in the stencil examples (#381),
Fix ELL error on small matrices (#375),
Fix SellP read function (#374),
Add factorization support in create_new_algorithm.sh (#371)

Version 1.1.0

The Ginkgo team is proud to announce the new minor release of Ginkgo version 1.1.0. This release brings several performance improvements, adds Windows support, adds support for factorizations inside Ginkgo and a new ILU preconditioner based on ParILU algorithm, among other things. For detailed information, check the respective issue.

Supported systems and requirements:

For all platforms, cmake 3.9+
Linux and MacOS
- gcc: 5.3+, 6.3+, 7.3+, 8.1+
- clang: 3.9+
- Intel compiler: 2017+
- Apple LLVM: 8.0+
- CUDA module: CUDA 9.0+
Windows
- MinGW and Cygwin: gcc 5.3+, 6.3+, 7.3+, 8.1+
- Microsoft Visual Studio: VS 2017 15.7+
- CUDA module: CUDA 9.0+, Microsoft Visual Studio
- OpenMP module: MinGW or Cygwin.

The current known issues can be found in the known issues page.

Additions

Upper and lower triangular solvers (#327, #336, #341, #342)
New factorization support in Ginkgo, and addition of the ParILU algorithm (#305, #315, #319, #324)
New ILU preconditioner (#348, #353)
Windows MinGW and Cygwin support (#347)
Windows Visual Studio support (#351)
New example showing how to use ParILU as a preconditioner (#358)
New example on using loggers for debugging (#360)
Add two new 9pt and 27pt stencil examples (#300, #306)
Allow benchmarking CuSPARSE spmv formats through Ginkgo's benchmarks (#303)
New benchmark for sparse matrix format conversions (#312 #317)
Add conversions between CSR and Hybrid formats (#302, #310)
Support for sorting rows in the CSR format by column indices (#322)
Addition of a CUDA COO SpMM kernel for improved performance (#345)
Addition of a LinOp to handle perturbations of the form (identity + scalar * basis * projector) (#334)
New sparsity matrix representation format with Reference and OpenMP kernels (#349, #350)

Fixes

Accelerate GMRES solver for CUDA executor (#363)
Fix BiCGSTAB solver convergence (#359)
Fix CGS logging by reporting the residual for every sub iteration (#328)
Fix CSR,Dense->Sellp conversion's memory access violation (#295)
Accelerate CSR->Ell,Hybrid conversions on CUDA (#313, #318)
Fixed slowdown of COO SpMV on OpenMP (#340)
Fix gcc 6.4.0 internal compiler error (#316)
Fix compilation issue on Apple clang++ 10 (#322)
Make Ginkgo able to compile on Intel 2017 and above (#337)
Make the benchmarks spmv/solver use the same matrix formats (#366)
Fix self-written isfinite function (#348)
Fix Jacobi issues shown by cuda-memcheck

Tools and ecosystem improvements

Multiple improvements to the CI system and tools (#296, #311, #365)
Multiple improvements to the Ginkgo containers (#328, #361)
Add sonarqube analysis to Ginkgo (#304, #308, #309)
Add clang-tidy and iwyu support to Ginkgo (#298)
Improve Ginkgo's support of xSDK M12 policy by adding the TPL_ arguments to CMake (#300)
Add support for the xSDK R7 policy (#325)
Fix examples in html documentation (#367)

Version 1.0.0

The Ginkgo team is proud to announce the first release of Ginkgo, the next-generation high-performance on-node sparse linear algebra library. Ginkgo leverages the features of modern C++ to give you a tool for the iterative solution of linear systems that is:

Easy to use. Interfaces with cryptic naming schemes and dozens of parameters are a thing of the past. Ginkgo was built with good software design in mind, making simple things simple to express.
High performance. Our optimized CUDA kernels ensure you are reaching the potential of today's GPU-accelerated high-end systems, while Ginkgo's open design allows extension to future hardware architectures.
Controllable. While Ginkgo can automatically move your data when needed, you remain in control by optionally specifying when the data is moved and what is its ownership scheme.
Composable. Iterative solution of linear systems is an extremely versatile field, where effective methods are built by mixing and matching various components. Need a GMRES solver preconditioned with a block-Jacobi enhanced BiCGSTAB? Thanks to its novel linear operator abstraction, Ginkgo can do it!
Extensible. Did not find a component you were looking for? Ginkgo is designed to be easily extended in various ways. You can provide your own loggers, stopping criteria, matrix formats, preconditioners and solvers to Ginkgo and have them integrate as well as the natively supported ones, without the need to modify or recompile the library.

Ease of Use

Ginkgo uses high level abstractions to develop an efficient and understandable vocabulary for high-performance iterative solution of linear systems. As a result, the solution of a system stored in matrix market format via a preconditioned Krylov solver on an accelerator is only 20 lines of code away:

#include <ginkgo/ginkgo.hpp>
#include <iostream>

int main()
{
    // Instantiate a CUDA executor
    auto gpu = gko::CudaExecutor::create(0, gko::OmpExecutor::create());
    // Read data
    auto A = gko::read<gko::matrix::Csr<>>(std::cin, gpu);
    auto b = gko::read<gko::matrix::Dense<>>(std::cin, gpu);
    auto x = gko::read<gko::matrix::Dense<>>(std::cin, gpu);
    // Create the solver
    auto solver =
        gko::solver::Cg<>::build()
            .with_preconditioner(gko::preconditioner::Jacobi<>::build().on(gpu))
            .with_criteria(
                gko::stop::Iteration::build().with_max_iters(1000u).on(gpu),
                gko::stop::ResidualNormReduction<>::build()
                    .with_reduction_factor(1e-15)
                    .on(gpu))
            .on(gpu);
    // Solve system
    solver->generate(give(A))->apply(lend(b), lend(x));
    // Write result
    write(std::cout, lend(x));
}

Notice that Ginkgo is not a tool that generates C++. It is C++. So just install the library (which is extremely simple due to its CMake-based build system), include the header and start using Ginkgo in your projects.

Already have an existing application and want to use Ginkgo to implement some part of it? Check out our integration example for a demonstration on how Ginkgo can be used with raw data already available in the application. If your data is in one of the formats supported by Ginkgo, it may be possible to use it directly, without creating a Ginkgo-dedicated copy of it.

Designed for HPC

Ginkgo is designed to quickly adapt to rapid changes in the HPC architecture. Every component in Ginkgo is built around the executor abstraction which is used to describe the execution and memory spaces where the operations are run, and the programming model used to realize the operations. The low-level performance critical kernels are implemented directly using each executor's programming model, while the high-level operations use a unified implementation that calls the low-level kernels. Consequently, the cost of developing new algorithms and extending existing ones to new architectures is kept relatively low, without compromising performance. Currently, Ginkgo supports CUDA, reference and OpenMP executors.

The CUDA executor features highly-optimized kernels able to efficiently utilize NVIDIA's latest hardware. Several of these kernels appeared in recent scientific publications, including the optimized COO and CSR SpMV, and the block-Jacobi preconditioner with its adaptive precision version.

The reference executor can be used to verify the correctness of the code. It features a straightforward single threaded C++ implementation of the kernels which is easy to understand. As such, it can be used as a baseline for implementing other executors, verifying their correctness, or figuring out if unexpected behavior is the result of a faulty kernel or an error in the user's code.

Ginkgo 1.0.0 also offers initial support for the OpenMP executor. OpenMP kernels are currently implemented as minor modifications of the reference kernels with OpenMP pragmas and are considered experimental. Full OpenMP support with highly-optimized kernels is reserved for a future release.

Memory Management

As a result of its executor-based design and high level abstractions, Ginkgo has explicit information about the location of every piece of data it needs and can automatically allocate, free and move the data where it is needed. However, lazily moving data around is often not optimal, and determining when a piece of data should be copied or shared in general cannot be done automatically. For this reason, Ginkgo also gives explicit control of sharing and moving its objects to the user via the dedicated ownership commands: gko::clone, gko::share, gko::give and gko::lend. If you are interested in a detailed description of the problems the C++ standard has with these concepts check out this Ginkgo Wiki page, and for more details about Ginkgo's solution to the problem and the description of ownership commands take a look at this issue.

Components

Instead of providing a single method to solve a linear system, Ginkgo provides a selection of components that can be used to tailor the solver to your specific problem. It is also possible to use each component separately, as part of larger software. The provided components include matrix formats, solvers and preconditioners (commonly referred to as "linear operators" in Ginkgo), as well as executors, stopping criteria and loggers.

Matrix formats are used to represent the system matrix and the vectors of the system. The following are the supported matrix formats (see this Matrix Format wiki page for more details):

gko::matrix::Dense - the row-major storage dense matrix format;
gko::matrix::Csr - the Compressed Sparse Row (CSR) sparse matrix format;
gko::matrix::Coo - the Coordinate (COO) sparse matrix format;
gko::matrix::Ell - the ELLPACK (ELL) sparse matrix format;
gko::matrix::Sellp - the SELL-P sparse matrix format based on the sliced ELLPACK representation;
gko::matrix::Hybrid - the hybrid matrix format that represents a matrix as a sum of an ELL and COO matrix.

All formats offer support for the apply operation that performs a (sparse) matrix-vector product between the matrix and one or multiple vectors. Conversion routines between the formats are also provided. gko::matrix::Dense offers an extended interface that includes simple vector operations such as addition, scaling, dot product and norm, which are applied on each column of the matrix separately. The interface for all operations is designed to allow any type of matrix format as a parameter. However, version 1.0.0 of this library supports only instances of gko::matrix::Dense as vector arguments (the matrix arguments do not have any limitations).

Solvers are utilized to solve the system with a given system matrix and right hand side. Currently, you can choose from several high-performance Krylov methods implemented in Ginkgo:

gko::solver::Cg - the Conjugate Gradient method (CG) suitable for symmetric positive definite problems;
gko::solver::Fcg - the flexible variant of Conjugate Gradient (FCG) that supports non-constant preconditioners;
gko::solver::Cgs - the Conjuage Gradient Squared method (CGS) for general problems;
gko::solver::Bicgstab - the BiConjugate Gradient Stabilized method (BiCGSTAB) for general problems;
gko::solver::Gmres - the restarted Generalized Minimal Residual method (GMRES) for general problems.

All solvers work with system matrices stored in any of the matrix formats described above, and any other general linear operator, such as combinations and compositions of other operators, or any matrix format you defined specifically for your application.

Preconditioners can be effective at improving the convergence rate of Krylov methods. All solvers listed above are implemented with preconditioning support. This version of Ginkgo has support for one preconditioner type, but stay tuned, as more preconditioners are coming in future releases:

gko::preconditioner::Jacobi - a highly optimized version of the block-Jacobi preconditioner (block-diagonal scaling), optionally enhanced with adaptive precision storage scheme for additional performance gains.

You can use the block-Jacobi preconditioner with system matrices stored in any of the built-in matrix formats and any custom format that has a defined conversion into a CSR matrix.

Any linear operator (matrix, solver, preconditioner) can be combined into complex operators by using the following utilities:

gko::Combination - creates a linear combination α₁ A₁ + ... + α_n A_n of linear operators;
gko::Composition - creates a composition A₁ ... A_n of linear operators.

You can utilize these utilities (together with a solver which represents the inversion operation) to compute complex expressions, such as x = (3A - B^-1C)^-1b.

As described in the "Designed for HPC" section, you have a choice between 3 different executors:

gko::CudaExecutor - offers a highly optimized GPU implementation tailored for recent HPC systems;
gko::ReferenceExecutor - single-threaded reference implementation for easy development and testing on systems without a GPU;
gko::OmpExecutor - preliminary OpenMP-based implementation for CPUs.

With Ginkgo, you have fine control over the solver iteration process to ensure that you obtain your solution under the time and accuracy constraints of your application. Ginkgo supports the following stopping criteria out of the box:

gko::stop::Iteration - the iteration process is stopped once the specified iteration count is reached;
gko::stop::ResidualNormReduction - the iteration process is stopped once the initial residual norm is reduced by the specified factor;
gko::stop::Time - the iteration process is stopped if the specified time limit is reached.

You can combine multiple criteria to achieve the desired result, and even add your own criteria to the mix.

Ginkgo also allows you to keep track of the events that happen while using the library, by providing hooks to those events via the gko::log::Logger abstraction. These hooks include everything from low-level events, such as memory allocations, deallocations, copies and kernel launches, up to high-level events, such as linear operator applications and completions of solver iterations. While the true power of logging is enabled by writing application-specific loggers, Ginkgo does provide several built-in solutions that can be useful for debugging and profiling:

gko::log::Convergence - allows access to the final iteration count and residual of a Krylov solver;
gko::log::Stream - prints events in human-readable format to the given output stream as they are emitted;
gko::log::Record - saves all emitted events in a data structure for subsequent processing;
gko::log::Papi - converts between Ginkgo's logging hooks and the standard PAPI Software Defined Events (SDE) interface (note that some details are lost, as PAPI can represent only a subset of data Ginkgo's logging can provide).

Extensibility

If you did not find what you need among the built-in components, you can try adding your own implementation of a component. New matrices, solvers and preconditioners can be implemented by inheriting from the gko::LinOp abstract class, while new stopping criteria and loggers by inheriting from the gko::stop::Criterion and gko::log::Logger abstract classes, respectively. Ginkgo aims at being developer-friendly and provides features that simplify the development of new components. To help handling various memory spaces, there is the gko::Array type template that encapsulates memory allocations, deallocations and copies between them. Macros and mixins (realized via the C++ CRTP idiom) that implement common utilities on Ginkgo's object are also provided, allowing you to focus on the implementation of your algorithm, instead of implementing various utilities required by the interface.

License

Ginkgo is available under the BSD 3-clause license. Optional third-party tools and libraries needed to run the unit tests, benchmarks, and developer tools are available under their own open-source licenses, but a fully functional installation of Ginkgo can be obtained without any of them. Check ABOUT-LICENSING.md for details.

Getting Started

To learn how to use Ginkgo, and get ideas for your own projects, take a look at the following examples:

minimal-solver-cuda is probably one of the smallest complete programs you can write in Ginkgo, and can be used as a quick reference for assembling Ginkgo's components.
simple-solver is a slightly more complex example that reads the matrices from files, computes the final residual, and selects a different executor based on the command-line parameter.
preconditioned-solver is a slightly modified simple-solver example that adds that demonstrates how a solver can be enhanced with a preconditioner.
simple-solver-logging is yet another modification of the simple-solver example that prints information about the solution process to the screen by using built-in loggers.
poisson-solver is a more elaborate example that builds a small application for the solution of the 1D Poisson equation using Ginkgo.
three-pt-stencil-solver is a variation of the poisson_solver that demonstrates how one could use Ginkgo with software that was not originally designed with Ginkgo support. It encapsulates everything related to Ginkgo in a single function that accepts raw data of the problem and demonstrates how such data can be directly used with Ginkgo's components.
inverse-iteration is another full application that uses Ginkgo's solver as a component for implementing the inverse iteration eigensolver.

You can also check out Ginkgo's core and reference unit tests and benchmarks for more detailed examples of using each of the components. A complete Doxygen-generated reference is available online, or you can find the same information by directly browsing Ginkgo's headers. We are investing significant efforts in maintaining good code quality, so you should not find them difficult to read and understand.

If you want to use your own functionality with Ginkgo, these examples are the best way to start:

custom-logger demonstrates how Ginkgo's logging API can be leveraged to implement application-specific callbacks for Ginkgo's events.
custom-stopping-criterion creates a custom stopping criterion that controls when the solver is stopped from another execution thread.
custom-matrix-format demonstrates how new linear operators can be created, by modifying the poisson-solver example to use a more efficient matrix format designed specifically for this application.

Ginkgo's sources can also serve as a good example, since built-in components are mostly implemented using publicly available utilities.

Contributing

Our principal goal for the development of Ginkgo is to provide high quality software to researchers in HPC, and to application scientists that are interested in using this software. We believe that by investing more effort in the initial development of production-ready method, the entire scientific community benefits in the long run. HPC researchers can save time by using Ginkgo's components as a starting point for their algorithms, or to compare Ginkgo's implementations with their own methods. Since Ginkgo is used for bleeding-edge research, application scientists immediately get access to production-ready new methods that help solve their problems more efficiently.

Thus, if you are interested in making this project even better, we would love to hear from you:

If you have any questions, comments, suggestions, problems, or think you have found a bug, do not hesitate to post an issue (you will have to register on GitHub first to be able to do it). In case you really do not want your comment to be publicly available, you can send us an e-mail to ginkgo.library@gmail.com.
If you developed, or would like to develop your own component that you think could be useful to others, we would be glad to accept a pull request and distribute your component as part of Ginkgo. The community will benefit by having the new method easily available, and you would get the chance to improve your code further as part of the review process with our development team. You may also want to consider creating writing an issue or sending an e-mail about the feature you are trying to implement before you get started for tips on how to best realize it in Ginkgo, and avoid going the wrong way.
If you just like Ginkgo and want to help, but do not have a specific project in mind, fell free to take on one of the open issues, or send us an issue or an e-mail describing your interests and background and we will find a project you could work on.

Backward Compatibility Guarantee and Future Support

This is a major 1.0.0 release of Ginkgo. All future patch releases of the form 1.0.x are guaranteed to keep exactly the same interface as the major release. All minor releases of the form 1.x.y are guaranteed not to change existing interfaces, but only add new capabilities.

Thus, all code conforming to the 1.0.0 release will continue to compile and run on all future Ginkgo versions up to (but not including) version 2.0.0.

About

Ginkgo 1.0.0 is brought to you by:

Karlsruhe Institute of Technology, Germany
Universitat Jaume I, Spain
University of Tennessee, Knoxville, US

These universities, along with various project grants, supported the development team and provided resources needed for the development of Ginkgo.

Ginkgo 1.0.0 contains contributions from:

Hartwig Anzt, Karlsruhe Institute of Technology
Yenchen Chen, National Taiwan University
Terry Cojean, Karlsruhe Institute of Technology
Goran Flegar, Universitat Jaume I
Fritz Göbel, Karlsruhe Institute of Technology
Thomas Grützmacher, Karlsruhe Institute of Technology
Pratik Nayak, Karlsruhe Institute of Technology
Tobias Ribizel, Karlsruhe Institute of Technology
Yuhsiang Tsai, National Taiwan University

Supporting materials are provided by the following individuals:

David Rogers - the Ginkgo logo
Frithjof Fleischhammer - the Ginkgo website

The development team is grateful to the following individuals for discussions and comments:

Erik Boman
Jelena Držaić
Mike Heroux
Mark Hoemmen
Timo Heister
Jens Saak

Files

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

Changelog

Unreleased

Version 1.8.0

Version support changes

Interface changes

Behavior changes

Deprecations

Summary of previous deprecations

Added features

Improvements

Fixes

Version 1.7.0

Version support changes

Interface changes

New Deprecations

Summary of previous deprecations

Added features

Improvements

Fixes

Version 1.6.0

Version Support Changes

Interface Changes

Deprecations

Added Features

Improvements

Fixes

Version 1.5.0

Algorithm and important feature additions

Deprecations and important changes

Improved performance additions

Fixes

Other additions

Version 1.4.0

Algorithm and important feature additions

Other additions

Changes

Fixes

Version 1.3.0

Additions

Changes

Fixes

Deletions

Version 1.2.0

Additions

Example additions

Compilation and library changes

Other additions

Fixes

Algorithms

Other core functionalities

CUDA and HIP specific

Other

Tools and ecosystem

Benchmarks

CI related

Test suite

Other

Version 1.1.1

Fixes

Version 1.1.0

Additions

Fixes

Tools and ecosystem improvements

Version 1.0.0

Ease of Use

Designed for HPC

Memory Management

Components

Extensibility

License

Getting Started

Contributing

Backward Compatibility Guarantee and Future Support

About