Modern Fortran implementation of a template method pattern with two hardware backend specialisations (pure CPU and CPU/GPU backends).
Given an array of numbers
with f(a) = sum(a + 1)
and g(a) = max(a * 2)
.
Given an input array a
, the algorithm is
- Compute
f(a)
- Compute
g(a)
- Compute
f(a) + g(a)
This algorithm only depends on the interface of f
and g
: their
argument and what they return. Conversely, it does not depend on the
actual implementation of f
and g
.
cd backends_example
FC=nvfortran cmake -S . -B build
The above builds three executables:
main_cpu
: CPU-only version.main_gpu
: Version withf
andg
implemented as accelerated GPU kernels.main_hybrid
: Execution of GPU kernels is enabled/disabled at runtime.
From the build directory:
$ make main_hybrid # Build once, run everywhere.
./main_hybrid
Executing on CPU only
184.000
$ ./main_hybrid --gpu
Executing CUDA kernels
184.00
The algorithm itself is defined once as a bound procedure doit
to the
abstract type basetype
(base.f90
). This type is abstract because,
although doit
is defined, f
and g
are not. This makes
instanciating a object of type basetype impossible. The point is that
we can now extend basetype
with concrete types providing a
definition for both functions.
The basetype
abstract type is extended by cputype
(cpu/cpu.f90
)
and gputype
(gpu/gpu.f90
). The former implements f
and g
using standard Fortran to be executed on a CPU. The latter, gputype
,
provides an implementation of f
and g
based on CUDA Fortran, using
kernel procedures to be executed on NVIDIA GPUs.
The input array memblock
(memblock.f90
) type (mem
ory block
). The cpublock
(cpu/cpublock.f90
) type holds an allocatable real array, whilst the
gpublock
(gpu/gpublock
) holds an allocatable real device array.
The current (main_hybrid.f90) implementation uses a pointer to the right type depending on the execution target:
case('gpu')
gpublk = gpublock(16); blk => gpublk
An preferable approach would be to rely on automatic (re)allocation of polymorphic entities
class(memblock), allocatable :: blk
case('gpu')
blk = gpublock(16) ! Automatic allocation of the dynamic type
Unfortunately this is not supported by the NVIDIA Fortran compiler
(nvfortran 22.5
).