-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathgpu-vendor-model-matrix.html
327 lines (326 loc) · 102 KB
/
gpu-vendor-model-matrix.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<title>GPU Vendor / Programming Model Compatibility Matrix</title>
<link rel="stylesheet" type="text/css" href="style.css">
</head>
<body>
<div class="main">
<section id="compat-matrix">
<dl>
<dt><svg height="11.92" overflow="visible" version="1.1" width="11.92"><g transform="translate(0,11.92) matrix(1 0 0 -1 0 0) translate(5.96,0) translate(0,5.96)" fill="#85924E" stroke="#000000" stroke-width="0.4pt" color="#000000"><path d="M 0 0 M 5.96 0 C 5.96 3.29 3.29 5.96 0 5.96 C -3.29 5.96 -5.96 3.29 -5.96 0 C -5.96 -3.29 -3.29 -5.96 0 -5.96 C 3.29 -5.96 5.96 -3.29 5.96 0 Z M 0 0" style="stroke:none"></path></g></svg></dt>
<dd>Full vendor support</dd>
<dt><svg height="7.92" overflow="visible" version="1.1" width="15.85"><g transform="translate(0,7.92) matrix(1 0 0 -1 0 0) translate(7.92,0) translate(0,4.75)" fill="#D3C65D" stroke="#000000" stroke-width="0.4pt" color="#000000"><path d="M -7.92 3.17 C -7.92 -1.21 -4.38 -4.75 0 -4.75 C 4.38 -4.75 7.92 -1.21 7.92 3.17 Z" style="stroke:none"></path></g></svg></dt>
<dd>Indirect, but comprehensive support, by vendor</dd>
<dt><svg height="12.64" overflow="visible" version="1.1" width="12.64"><g transform="translate(0,12.64) matrix(1 0 0 -1 0 0) translate(6.32,0) translate(0,6.32)" fill="#FBBC6A" stroke="#000000" stroke-width="0.4pt" color="#000000"><path d="M 6.32 6.32 L -6.32 6.32 L -6.32 -6.32 L 6.32 -6.32 Z" style="stroke:none"></path></g></svg></dt>
<dd>Vendor support, but not (yet) entirely comprehensive</dd>
<dt><svg height="13.4" overflow="visible" version="1.1" width="15.48"><g transform="translate(0,13.4) matrix(1 0 0 -1 0 0) translate(7.74,0) translate(0,4.47)" fill="#C7DB7F" stroke="#000000" stroke-width="0.4pt" color="#000000"><path d="M 0 8.94 L -7.74 -4.47 L 7.74 -4.47 Z" style="stroke:none"></path></g></svg></dt>
<dd>Comprehensive support, but not by vendor</dd>
<dt><svg height="16.17" overflow="visible" version="1.1" width="17"><g transform="translate(0,16.17) matrix(1 0 0 -1 0 0) translate(8.5,0) translate(0,7.23)" fill="#F38966" stroke="#000000" stroke-width="0.4pt" color="#000000"><path d="M 0 8.94 L -2.45 3.37 L -8.5 2.76 L -3.97 -1.29 L -5.25 -7.23 L 0 -4.17 L 5.25 -7.23 L 3.97 -1.29 L 8.5 2.76 L 2.45 3.37 Z" style="stroke:none"></path></g></svg></dt>
<dd>Limited, probably indirect support -- but at least some</dd>
<dt><svg height="9.45" overflow="visible" version="1.1" width="9.45"><g transform="translate(0,9.45) matrix(1 0 0 -1 0 0) translate(0.55,0) translate(0,0.55)" fill="#000000" stroke="#EB5F73" stroke-width="0.8pt" color="#000000"><path d="M 0 0 L 8.34 8.34" style="fill:none"></path></g></svg></dt>
<dd>No direct support available, but of course one could ISO-C-bind your way through it or directly link the libraries</dd>
<dt>C</dt>
<dd>C++ (sometimes also C)</dd>
<dt>F</dt>
<dd>Fortran</dd>
</dl>
<table id="compat-table">
<thead>
<tr>
<td class="empty"></td>
<td colspan="2" class="level-1">CUDA</td>
<td colspan="2" class="level-1">HIP</td>
<td colspan="2" class="level-1">SYCL</td>
<td colspan="2" class="level-1">OpenACC</td>
<td colspan="2" class="level-1">OpenMP</td>
<td colspan="2" class="level-1">Standard</td>
<td colspan="2" class="level-1">Kokkos</td>
<td colspan="2" class="level-1">ALPAKA</td>
<td class="level-1 etc">etc</td>
</tr>
<tr>
<td class="empty"></td>
<td class="level-2">C</td>
<td class="level-2">F</td>
<td class="level-2">C</td>
<td class="level-2">F</td>
<td class="level-2">C</td>
<td class="level-2">F</td>
<td class="level-2">C</td>
<td class="level-2">F</td>
<td class="level-2">C</td>
<td class="level-2">F</td>
<td class="level-2">C</td>
<td class="level-2">F</td>
<td class="level-2">C</td>
<td class="level-2">F</td>
<td class="level-2">C</td>
<td class="level-2">F</td>
<td class="level-2">Python</td>
</tr>
</thead>
<tbody>
<tr>
<td class="vendor">NVIDIA</td>
<td class="status">
<svg height="11.92" overflow="visible" version="1.1" width="11.92"><g transform="translate(0,11.92) matrix(1 0 0 -1 0 0) translate(5.96,0) translate(0,5.96)" fill="#85924E" stroke="#000000" stroke-width="0.4pt" color="#000000"><path d="M 0 0 M 5.96 0 C 5.96 3.29 3.29 5.96 0 5.96 C -3.29 5.96 -5.96 3.29 -5.96 0 C -5.96 -3.29 -3.29 -5.96 0 -5.96 C 3.29 -5.96 5.96 -3.29 5.96 0 Z M 0 0" style="stroke:none"></path></g></svg><sup class="footnote" title="CUDA C/C++ is supported on NVIDIA GPUs through the CUDA Toolkit. First released in 2007, the toolkit covers nearly all aspects of the NVIDIA platform: an API for programming (incl. language extensions), libraries, tools for profiling and debugging, compiler, management tools, and more. The current version is CUDA 12.2. Usually, when referring to CUDA without any additional context, the CUDA API is meant. While incorporating some Open Source components, the CUDA platform in its entirety is proprietary and closed sourced. The low-level CUDA instruction set architecture is PTX, to which higher languages like the CUDA C/C++ are translated to. PTX is compiled to SASS, the binary code executed on the device. As it is the reference for platform, the support for NVIDIA GPUs through CUDA C/C++ is very comprehensive. In addition to support through the CUDA toolkit, NVIDIA GPUs can also be used by Clang, utilizing the LLVM toolchain to emit PTX code and compile it subsequently."><a href="#desc-cudac">1</a></sup></td>
<td class="status">
<svg height="11.92" overflow="visible" version="1.1" width="11.92"><g transform="translate(0,11.92) matrix(1 0 0 -1 0 0) translate(5.96,0) translate(0,5.96)" fill="#85924E" stroke="#000000" stroke-width="0.4pt" color="#000000"><path d="M 0 0 M 5.96 0 C 5.96 3.29 3.29 5.96 0 5.96 C -3.29 5.96 -5.96 3.29 -5.96 0 C -5.96 -3.29 -3.29 -5.96 0 -5.96 C 3.29 -5.96 5.96 -3.29 5.96 0 Z M 0 0" style="stroke:none"></path></g></svg><sup class="footnote" title="CUDA Fortran, a proprietary Fortran extension by NVIDIA, is supported on NVIDIA GPUs via the NVIDIA HPC SDK (NVHPC). NVHPC implements most features of the CUDA API in Fortran and is activated through the -cuda switch in the nvfortran compiler. The CUDA extensions for Fortran are modeled closely after the CUDA C/C++ definitions. In addition to creating explicit kernels in Fortran, CUDA Fortran also supports cuf kernels, a way to let the compiler generate GPU parallel code automatically. Very recently, CUDA Fortran support was also merged into Flang, the LLVM-based Fortran compiler."><a href="#desc-cudafortran">2</a></sup></td>
<td class="status">
<svg height="7.92" overflow="visible" version="1.1" width="15.85"><g transform="translate(0,7.92) matrix(1 0 0 -1 0 0) translate(7.92,0) translate(0,4.75)" fill="#D3C65D" stroke="#000000" stroke-width="0.4pt" color="#000000"><path d="M -7.92 3.17 C -7.92 -1.21 -4.38 -4.75 0 -4.75 C 4.38 -4.75 7.92 -1.21 7.92 3.17 Z" style="stroke:none"></path></g></svg><sup class="footnote" title="HIP programs can directly use NVIDIA GPUs via a CUDA backend. As HIP is strongly inspired by CUDA, the mapping is relatively straight-forward; API calls are named similarly (for example: hipMalloc() instead of cudaMalloc()) and keywords of the kernel syntax are identical. HIP also supports some CUDA libraries and creates interfaces to them (like hipblasSaxpy() instead of cublasSaxpy()). To target NVIDIA GPUs through the HIP compiler (hipcc), HIP_PLATFORM=nvidia needs to be set in the environment. In order to initially create a HIP code from CUDA, AMD offers the HIPIFY conversion tool."><a href="#desc-nvidiahip">3</a></sup></td>
<td class="status"><svg height="16.17" overflow="visible" version="1.1" width="17"><g transform="translate(0,16.17) matrix(1 0 0 -1 0 0) translate(8.5,0) translate(0,7.23)" fill="#F38966" stroke="#000000" stroke-width="0.4pt" color="#000000"><path d="M 0 8.94 L -2.45 3.37 L -8.5 2.76 L -3.97 -1.29 L -5.25 -7.23 L 0 -4.17 L 5.25 -7.23 L 3.97 -1.29 L 8.5 2.76 L 2.45 3.37 Z" style="stroke:none"></path></g></svg><svg height="9.45" overflow="visible" version="1.1" width="9.45"><g transform="translate(0,9.45) matrix(1 0 0 -1 0 0) translate(0.55,0) translate(0,0.55)" fill="#000000" stroke="#EB5F73" stroke-width="0.8pt" color="#000000"><path d="M 0 0 L 8.34 8.34" style="fill:none"></path></g></svg><sup class="footnote" title="No Fortran version of HIP exists; HIP is solely a C/C++ model. But AMD offers an extensive set of ready-made interfaces to the HIP API and HIP and ROCm libraries with hipfort (MIT-licensed). All interfaces implement C functionality and CUDA-like Fortran extensions, for example to write kernels, are available."><a href="#desc-nvidiahipfortran">4</a></sup></td>
<td class="status">
<svg height="13.4" overflow="visible" version="1.1" width="15.48"><g transform="translate(0,13.4) matrix(1 0 0 -1 0 0) translate(7.74,0) translate(0,4.47)" fill="#C7DB7F" stroke="#000000" stroke-width="0.4pt" color="#000000"><path d="M 0 8.94 L -7.74 -4.47 L 7.74 -4.47 Z" style="stroke:none"></path></g></svg><sup class="footnote" title="No direct support for SYCL is available by NVIDIA, but SYCL can be used on NVIDIA GPUs through multiple venues. First, SYCL can be used through DPC++, an Open-Source LLVM-based compiler project led by Intel. The DPC++ infrastructure is also available through Intel's commercial oneAPI toolkit (Intel oneAPI DPC++/C++) as a dedicated plugin. Upstreaming SYCL support directly into LLVM is an ongoing effort, which started in 2019. Further, SYCL can be used via Open SYCL (previously called hipSYCL), an independently developed SYCL implementation, using NVIDIA GPUs either through the CUDA support of LLVM or the nvc++ compiler of NVHPC. A third popular possibility was the NVIDIA GPU support in ComputeCpp of CodePlay; though the product became unsupported in September 2023. In case LLVM is involved, SYCL implementations can rely on CUDA support in LLVM, which needs the CUDA toolkit available for the final compilations parts beyond PTX. In order to translate a CUDA code to SYCL, Intel offers the SYCLomatic conversion tool."><a href="#desc-nvidiasycl">5</a></sup></td>
<td class="status">
<svg height="9.45" overflow="visible" version="1.1" width="9.45"><g transform="translate(0,9.45) matrix(1 0 0 -1 0 0) translate(0.55,0) translate(0,0.55)" fill="#000000" stroke="#EB5F73" stroke-width="0.8pt" color="#000000"><path d="M 0 0 L 8.34 8.34" style="fill:none"></path></g></svg><sup class="footnote" title="SYCL is a C++-based programming model (C++17) and by its nature does not support Fortran. Also, no pre-made bindings are available."><a href="#desc-syclfortran">6</a></sup></td>
<td class="status">
<svg height="11.92" overflow="visible" version="1.1" width="11.92"><g transform="translate(0,11.92) matrix(1 0 0 -1 0 0) translate(5.96,0) translate(0,5.96)" fill="#85924E" stroke="#000000" stroke-width="0.4pt" color="#000000"><path d="M 0 0 M 5.96 0 C 5.96 3.29 3.29 5.96 0 5.96 C -3.29 5.96 -5.96 3.29 -5.96 0 C -5.96 -3.29 -3.29 -5.96 0 -5.96 C 3.29 -5.96 5.96 -3.29 5.96 0 Z M 0 0" style="stroke:none"></path></g></svg><sup class="footnote" title="OpenACC C/C++ on NVIDIA GPUs is supported most extensively through the NVIDIA HPC SDK. Beyond the bundled libraries, frameworks, and other models, the NVIDIA HPC SDK also features the nvc/nvc++ compilers, in which OpenACC support can be enabled with the -acc -gpu. The support of OpenACC in this vendor-delivered compiler is very comprehensive, it conforms to version 2.7 of the specification. A variety of compile options are available to modify the compilation process. In addition to NVIDIA HPC SDK, good support is also available in GCC since GCC 5.0, supporting OpenACC 2.6 through the nvptx architecture. The compiler switch to enable OpenACC in gcc/g++ is -fopenacc, further options are available. Further, the Clacc compiler implements OpenACC support into the LLVM toolchain, adapting the Clang frontend. As a central design aspect, it translates OpenACC to OpenMP as part of the compilation process. OpenACC can be activated in a Clacc-clang via -fopenacc, and further compiler options exist, mostly leveraging OpenMP options. A recent study by Jarmusch et al. compared these compilers for coverage of the OpenACC 3.0 specification."><a href="#desc-openaccc">7</a></sup></td>
<td class="status">
<svg height="11.92" overflow="visible" version="1.1" width="11.92"><g transform="translate(0,11.92) matrix(1 0 0 -1 0 0) translate(5.96,0) translate(0,5.96)" fill="#85924E" stroke="#000000" stroke-width="0.4pt" color="#000000"><path d="M 0 0 M 5.96 0 C 5.96 3.29 3.29 5.96 0 5.96 C -3.29 5.96 -5.96 3.29 -5.96 0 C -5.96 -3.29 -3.29 -5.96 0 -5.96 C 3.29 -5.96 5.96 -3.29 5.96 0 Z M 0 0" style="stroke:none"></path></g></svg><sup class="footnote" title="Support of OpenACC Fortran on NVIDIA GPUs is similar to OpenACC C/C++, albeit not identical. First, NVIDIA HPC SDK supports OpenACC in Fortran through the included nvfortran compiler, with options like for the C/C++ compilers. In addition, also GCC supports OpenACC through the gfortran compiler with identical compiler options to the C/C++ compilers. Further, similar to OpenACC support in LLVM for C/C++ through Clacc contributions, the LLVM frontend for Fortran, Flang (the successor of F18, not classic Flang), supports OpenACC as well. Support was initially contributed through the Flacc project and now resides in the main LLVM project. Finally, the HPE Cray Programming Environment supports OpenACC Fortran; in ftn, OpenACC can be enabled through -hacc."><a href="#desc-openaccfortran">8</a></sup></td>
<td class="status"><svg height="12.64" overflow="visible" version="1.1" width="12.64"><g transform="translate(0,12.64) matrix(1 0 0 -1 0 0) translate(6.32,0) translate(0,6.32)" fill="#FBBC6A" stroke="#000000" stroke-width="0.4pt" color="#000000"><path d="M 6.32 6.32 L -6.32 6.32 L -6.32 -6.32 L 6.32 -6.32 Z" style="stroke:none"></path></g></svg><svg height="13.4" overflow="visible" version="1.1" width="15.48"><g transform="translate(0,13.4) matrix(1 0 0 -1 0 0) translate(7.74,0) translate(0,4.47)" fill="#C7DB7F" stroke="#000000" stroke-width="0.4pt" color="#000000"><path d="M 0 8.94 L -7.74 -4.47 L 7.74 -4.47 Z" style="stroke:none"></path></g></svg><sup class="footnote" title="OpenMP in C/C++ is supported on NVIDIA GPUs (Offloading) through multiple venues, similarly to OpenACC. First, the NVIDIA HPC SDK supports OpenMP GPU offloading in both nvc and nvc++, albeit only a subset of the entire OpenMP 5.0 standard (see the documentation for supported/unsupported features). The key compiler option is -mp. Also in GCC, OpenMP offloading can be used to NVIDIA GPUs; the compiler switch is -fopenmp, with options delivered through -foffload and -foffload-options. GCC currently supports OpenMP 4.5 entirely, while OpenMP features of 5.0, 5.1, and, 5.2 are currently being implemented. Similarly in Clang, where OpenMP offloading to NVIDIA GPUs is supported and enabled through -fopenmp -fopenmp-targets=nvptx64, with offload architectures selected via --offload-arch=native (or similar). Clang implements nearly all OpenMP 5.0 features and most of OpenMP 5.1/5.2. In the HPE Cray Programming Environment, a subset of OpenMP 5.0/5.1 is supported for NVIDIA GPUs. It can be activated through -fopenmp. Also AOMP, AMD's Clang/LLVM-based compiler, supports NVIDIA GPUs. Support of OpenMP features in the compilers was recently discussed in the OpenMP ECP BoF 2022."><a href="#desc-nvidiaopenmpc">9</a></sup></td>
<td class="status"><svg height="12.64" overflow="visible" version="1.1" width="12.64"><g transform="translate(0,12.64) matrix(1 0 0 -1 0 0) translate(6.32,0) translate(0,6.32)" fill="#FBBC6A" stroke="#000000" stroke-width="0.4pt" color="#000000"><path d="M 6.32 6.32 L -6.32 6.32 L -6.32 -6.32 L 6.32 -6.32 Z" style="stroke:none"></path></g></svg><svg height="11.92" overflow="visible" version="1.1" width="11.92"><g transform="translate(0,11.92) matrix(1 0 0 -1 0 0) translate(5.96,0) translate(0,5.96)" fill="#85924E" stroke="#000000" stroke-width="0.4pt" color="#000000"><path d="M 0 0 M 5.96 0 C 5.96 3.29 3.29 5.96 0 5.96 C -3.29 5.96 -5.96 3.29 -5.96 0 C -5.96 -3.29 -3.29 -5.96 0 -5.96 C 3.29 -5.96 5.96 -3.29 5.96 0 Z M 0 0" style="stroke:none"></path></g></svg><sup class="footnote" title="OpenMP in Fortran is supported on NVIDIA GPUs nearly identical to C/C++. NVIDIA HPC SDK's nvfortran implements support, GCC's gfortran, LLVM's Flang (through -mp, and only when Flang is compiled via Clang), and also the HPE Cray Programming Environment."><a href="#desc-nvidiaopenmpfortran">10</a></sup></td>
<td class="status">
<svg height="11.92" overflow="visible" version="1.1" width="11.92"><g transform="translate(0,11.92) matrix(1 0 0 -1 0 0) translate(5.96,0) translate(0,5.96)" fill="#85924E" stroke="#000000" stroke-width="0.4pt" color="#000000"><path d="M 0 0 M 5.96 0 C 5.96 3.29 3.29 5.96 0 5.96 C -3.29 5.96 -5.96 3.29 -5.96 0 C -5.96 -3.29 -3.29 -5.96 0 -5.96 C 3.29 -5.96 5.96 -3.29 5.96 0 Z M 0 0" style="stroke:none"></path></g></svg><sup class="footnote" title="Standard language parallelism of C++, namely algorithms and data structures of the parallel STL, is supported on NVIDIA GPUs through the nvc++ compiler of the NVIDIA HPC SDK. The key compiler option is -stdpar=gpu, which enables offloading of parallel algorithms to the GPU. Also, currently Open SYCL is in the process of implementing support for pSTL algorithms, enabled via --hipsycl-stdpar. Further, NVIDIA GPUs can be targeted from Intel's DPC++ compiler, enabling usage of pSTL algorithms implemented in Intel's Open Source oneDPL (oneAPI DPC++ Library) on NVIDIA GPUs. Finally, a current proposal in the LLVM community aims at implementing pSTL support through an OpenMP backend."><a href="#desc-nvidiastandardc">11</a></sup></td>
<td class="status">
<svg height="11.92" overflow="visible" version="1.1" width="11.92"><g transform="translate(0,11.92) matrix(1 0 0 -1 0 0) translate(5.96,0) translate(0,5.96)" fill="#85924E" stroke="#000000" stroke-width="0.4pt" color="#000000"><path d="M 0 0 M 5.96 0 C 5.96 3.29 3.29 5.96 0 5.96 C -3.29 5.96 -5.96 3.29 -5.96 0 C -5.96 -3.29 -3.29 -5.96 0 -5.96 C 3.29 -5.96 5.96 -3.29 5.96 0 Z M 0 0" style="stroke:none"></path></g></svg><sup class="footnote" title="Standard language parallelism of Fortran, mainly do concurrent, is supported on NVIDIA GPUs through the nvfortran compiler of the NVIDIA HPC SDK. As for the C++ case, it is enabled through the -stdpar=gpu compiler option."><a href="#desc-nvidiastandardfortran">12</a></sup></td>
<td class="status">
<svg height="13.4" overflow="visible" version="1.1" width="15.48"><g transform="translate(0,13.4) matrix(1 0 0 -1 0 0) translate(7.74,0) translate(0,4.47)" fill="#C7DB7F" stroke="#000000" stroke-width="0.4pt" color="#000000"><path d="M 0 8.94 L -7.74 -4.47 L 7.74 -4.47 Z" style="stroke:none"></path></g></svg><sup class="footnote" title="Kokkos supports NVIDIA GPUs in C++. Kokkos has multiple backends available with NVIDIA GPU support: a native CUDA C/C++ backend (using nvcc), an NVIDIA HPC SDK backend (using CUDA support in nvc++), and a Clang backend, using either Clang's CUDA support directly or via the OpenMP offloading facilities (via clang++)."><a href="#desc-nvidiakokkosc">13</a></sup></td>
<td class="status">
<svg height="16.17" overflow="visible" version="1.1" width="17"><g transform="translate(0,16.17) matrix(1 0 0 -1 0 0) translate(8.5,0) translate(0,7.23)" fill="#F38966" stroke="#000000" stroke-width="0.4pt" color="#000000"><path d="M 0 8.94 L -2.45 3.37 L -8.5 2.76 L -3.97 -1.29 L -5.25 -7.23 L 0 -4.17 L 5.25 -7.23 L 3.97 -1.29 L 8.5 2.76 L 2.45 3.37 Z" style="stroke:none"></path></g></svg><sup class="footnote" title="Kokkos is a C++ programming model, but an official compatibility layer for Fortran (Fortran Language Compatibility Layer, FLCL) is available. Through this layer, GPUs can be used as supported by Kokkos C++."><a href="#desc-nvidiakokkosfortran">14</a></sup></td>
<td class="status">
<svg height="13.4" overflow="visible" version="1.1" width="15.48"><g transform="translate(0,13.4) matrix(1 0 0 -1 0 0) translate(7.74,0) translate(0,4.47)" fill="#C7DB7F" stroke="#000000" stroke-width="0.4pt" color="#000000"><path d="M 0 8.94 L -7.74 -4.47 L 7.74 -4.47 Z" style="stroke:none"></path></g></svg><sup class="footnote" title="Alpaka supports NVIDIA GPUs in C++ (C++17), either through the NVIDIA CUDA C/C++ compiler nvcc or LLVM/Clang's support of CUDA in clang++."><a href="#desc-nvidiaalpakac">15</a></sup></td>
<td class="status">
<svg height="9.45" overflow="visible" version="1.1" width="9.45"><g transform="translate(0,9.45) matrix(1 0 0 -1 0 0) translate(0.55,0) translate(0,0.55)" fill="#000000" stroke="#EB5F73" stroke-width="0.8pt" color="#000000"><path d="M 0 0 L 8.34 8.34" style="fill:none"></path></g></svg><sup class="footnote" title="Alpaka is a C++ programming model and no ready-made Fortran support exists."><a href="#desc-nvidiaalpakafortran">16</a></sup></td>
<td class="status"><svg height="11.92" overflow="visible" version="1.1" width="11.92"><g transform="translate(0,11.92) matrix(1 0 0 -1 0 0) translate(5.96,0) translate(0,5.96)" fill="#85924E" stroke="#000000" stroke-width="0.4pt" color="#000000"><path d="M 0 0 M 5.96 0 C 5.96 3.29 3.29 5.96 0 5.96 C -3.29 5.96 -5.96 3.29 -5.96 0 C -5.96 -3.29 -3.29 -5.96 0 -5.96 C 3.29 -5.96 5.96 -3.29 5.96 0 Z M 0 0" style="stroke:none"></path></g></svg><svg height="13.4" overflow="visible" version="1.1" width="15.48"><g transform="translate(0,13.4) matrix(1 0 0 -1 0 0) translate(7.74,0) translate(0,4.47)" fill="#C7DB7F" stroke="#000000" stroke-width="0.4pt" color="#000000"><path d="M 0 8.94 L -7.74 -4.47 L 7.74 -4.47 Z" style="stroke:none"></path></g></svg><sup class="footnote" title="Using NVIDIA GPUs from Python code can be achieved through multiple venues. NVIDIA itself offers CUDA Python, a package delivering low-level interfaces to CUDA C/C++. Typically, code is not directly written using CUDA Python, but rather CUDA Python functions as a backend for higher level models. CUDA Python is available on PyPI as cuda-python. An alternative to CUDA Python from the community is PyCUDA, which adds some higher-level features and functionality and comes with its own C++ base layer. PyCUDA is available on PyPI as pycuda. The most well-known, higher-level abstraction is CuPy, which implements primitives known from Numpy with GPU support, offers functionality for defining custom kernels, and bindings to libraries. CuPy is available on PyPI as cupy-cuda12x (for CUDA 12.x). Two packages arguably providing even higher abstractions are Numba and CuNumeric. Numba offers access to NVIDIA GPUs and features acceleration of functions through Python decorators (functions wrapping functions); it is available as numba on PyPI. cuNumeric, a project by NVIDIA, allows to access the GPU via Numpy-inspired functions (like CuPy), but utilizes the Legate library to transparently scale to multiple GPUs."><a href="#desc-nvidiapython">17</a></sup></td>
</tr>
<tr>
<td class="vendor">AMD</td>
<td class="status">
<svg height="7.92" overflow="visible" version="1.1" width="15.85"><g transform="translate(0,7.92) matrix(1 0 0 -1 0 0) translate(7.92,0) translate(0,4.75)" fill="#D3C65D" stroke="#000000" stroke-width="0.4pt" color="#000000"><path d="M -7.92 3.17 C -7.92 -1.21 -4.38 -4.75 0 -4.75 C 4.38 -4.75 7.92 -1.21 7.92 3.17 Z" style="stroke:none"></path></g></svg><sup class="footnote" title="While CUDA is not directly supported on AMD GPUs, it can be translated to HIP through AMD's HIPIFY. Using hipcc and HIP_PLATFORM=amd in the environment, CUDA-to-HIP-translated code can be executed."><a href="#desc-amdcudac">18</a></sup></td>
<td class="status">
<svg height="16.17" overflow="visible" version="1.1" width="17"><g transform="translate(0,16.17) matrix(1 0 0 -1 0 0) translate(8.5,0) translate(0,7.23)" fill="#F38966" stroke="#000000" stroke-width="0.4pt" color="#000000"><path d="M 0 8.94 L -2.45 3.37 L -8.5 2.76 L -3.97 -1.29 L -5.25 -7.23 L 0 -4.17 L 5.25 -7.23 L 3.97 -1.29 L 8.5 2.76 L 2.45 3.37 Z" style="stroke:none"></path></g></svg><sup class="footnote" title="No direct support for CUDA Fortran on AMD GPUs is available, but AMD offers a source-to-source translator, GPUFORT, to convert some CUDA Fortran to either Fortran with OpenMP (via AOMP) or Fortran with HIP bindings and extracted C kernels (via hipfort). As stated in the project repository, the covered functionality is driven by use-case requirements; the last commit is two years old."><a href="#desc-amdcudafortran">19</a></sup></td>
<td class="status">
<svg height="11.92" overflow="visible" version="1.1" width="11.92"><g transform="translate(0,11.92) matrix(1 0 0 -1 0 0) translate(5.96,0) translate(0,5.96)" fill="#85924E" stroke="#000000" stroke-width="0.4pt" color="#000000"><path d="M 0 0 M 5.96 0 C 5.96 3.29 3.29 5.96 0 5.96 C -3.29 5.96 -5.96 3.29 -5.96 0 C -5.96 -3.29 -3.29 -5.96 0 -5.96 C 3.29 -5.96 5.96 -3.29 5.96 0 Z M 0 0" style="stroke:none"></path></g></svg><sup class="footnote" title="HIP C++ is the native programming model for AMD GPUs and, as such, fully supports the devices. It is part of AMD's GPU-targeted ROCm platform, which includes compilers, libraries, tool, and drivers and mostly consists of Open Source Software. HIP code can be compiled with hipcc, utilizing the correct environment variables (like HIP_PLATFORM=amd) and compiler options (like --offload-arch=gfx90a). hipcc is a compiler driver (wrapper script) which assembles the correct compilation string, finally calling AMD's Clang compiler to generate host/device code (using the AMDGPU backend)."><a href="#desc-amdhipc">20</a></sup></td>
<td class="status"><svg height="16.17" overflow="visible" version="1.1" width="17"><g transform="translate(0,16.17) matrix(1 0 0 -1 0 0) translate(8.5,0) translate(0,7.23)" fill="#F38966" stroke="#000000" stroke-width="0.4pt" color="#000000"><path d="M 0 8.94 L -2.45 3.37 L -8.5 2.76 L -3.97 -1.29 L -5.25 -7.23 L 0 -4.17 L 5.25 -7.23 L 3.97 -1.29 L 8.5 2.76 L 2.45 3.37 Z" style="stroke:none"></path></g></svg><svg height="9.45" overflow="visible" version="1.1" width="9.45"><g transform="translate(0,9.45) matrix(1 0 0 -1 0 0) translate(0.55,0) translate(0,0.55)" fill="#000000" stroke="#EB5F73" stroke-width="0.8pt" color="#000000"><path d="M 0 0 L 8.34 8.34" style="fill:none"></path></g></svg><sup class="footnote" title="No Fortran version of HIP exists; HIP is solely a C/C++ model. But AMD offers an extensive set of ready-made interfaces to the HIP API and HIP and ROCm libraries with hipfort (MIT-licensed). All interfaces implement C functionality and CUDA-like Fortran extensions, for example to write kernels, are available."><a href="#desc-nvidiahipfortran">4</a></sup></td>
<td class="status">
<svg height="13.4" overflow="visible" version="1.1" width="15.48"><g transform="translate(0,13.4) matrix(1 0 0 -1 0 0) translate(7.74,0) translate(0,4.47)" fill="#C7DB7F" stroke="#000000" stroke-width="0.4pt" color="#000000"><path d="M 0 8.94 L -7.74 -4.47 L 7.74 -4.47 Z" style="stroke:none"></path></g></svg><sup class="footnote" title="No direct support for SYCL is available by AMD for their GPU devices. But like for the NVIDIA ecosystem, SYCL C++ can be used on AMD GPUs through third-party software. First, Open SYCL (previously hipSYCL) supports AMD GPUs, relying on HIP/ROCm support in Clang. All available internal compilation models can target AMD GPUs. Second, also AMD GPUs can be targeted through both DPC++, Intel's LLVM-based Open Source compiler, and the commercial version included in the oneAPI toolkit (via an AMD ROCm plugin). In comparison to SYCL support for CUDA, no conversion tool like SYCLomatic exists."><a href="#desc-amdsyclc">21</a></sup></td>
<td class="status">
<svg height="9.45" overflow="visible" version="1.1" width="9.45"><g transform="translate(0,9.45) matrix(1 0 0 -1 0 0) translate(0.55,0) translate(0,0.55)" fill="#000000" stroke="#EB5F73" stroke-width="0.8pt" color="#000000"><path d="M 0 0 L 8.34 8.34" style="fill:none"></path></g></svg><sup class="footnote" title="SYCL is a C++-based programming model (C++17) and by its nature does not support Fortran. Also, no pre-made bindings are available."><a href="#desc-syclfortran">6</a></sup></td>
<td class="status">
<svg height="13.4" overflow="visible" version="1.1" width="15.48"><g transform="translate(0,13.4) matrix(1 0 0 -1 0 0) translate(7.74,0) translate(0,4.47)" fill="#C7DB7F" stroke="#000000" stroke-width="0.4pt" color="#000000"><path d="M 0 8.94 L -7.74 -4.47 L 7.74 -4.47 Z" style="stroke:none"></path></g></svg><sup class="footnote" title="OpenACC C/C++ is not supported by AMD itself, but third-party support is available for AMD GPUs through GCC or Clacc (similarly to their support of OpenACC C/C++ for NVIDI GPUS). In GCC, OpenACC support can be activated through -fopenacc, and further specified for AMD GPUs with, for example, -foffload=amdgcn-amdhsa="-march=gfx906". Clacc also supports OpenACC C/C++ on AMD GPUs by translating OpenACC to OpenMP and using LLVM's AMD support. The enabling compiler switch is -fopenacc, and AMD GPU targets can be further specified by, for example, -fopenmp-targets=amdgcn-amd-amdhsa. Intel's OpenACC to OpenMP source-to-source translator can also be used for AMD's platform."><a href="#desc-amdopenaccc">22</a></sup></td>
<td class="status"><svg height="13.4" overflow="visible" version="1.1" width="15.48"><g transform="translate(0,13.4) matrix(1 0 0 -1 0 0) translate(7.74,0) translate(0,4.47)" fill="#C7DB7F" stroke="#000000" stroke-width="0.4pt" color="#000000"><path d="M 0 8.94 L -7.74 -4.47 L 7.74 -4.47 Z" style="stroke:none"></path></g></svg><svg height="16.17" overflow="visible" version="1.1" width="17"><g transform="translate(0,16.17) matrix(1 0 0 -1 0 0) translate(8.5,0) translate(0,7.23)" fill="#F38966" stroke="#000000" stroke-width="0.4pt" color="#000000"><path d="M 0 8.94 L -2.45 3.37 L -8.5 2.76 L -3.97 -1.29 L -5.25 -7.23 L 0 -4.17 L 5.25 -7.23 L 3.97 -1.29 L 8.5 2.76 L 2.45 3.37 Z" style="stroke:none"></path></g></svg><sup class="footnote" title="No native support for OpenACC on AMD GPUs for Fortran is available, but AMD supplies GPUFORT, a research project to source-to-source translate OpenACC Fortran to either Fortran with added OpenMP or Fortran with HIP bindings and extracted C kernels (using hipfort). The covered functionality of GPUFORT is driven by use-case requirements, the last commit is two years old. Support for OpenACC Fortran is also available by the community through GCC (gfortran) and upcoming in LLVM (Flacc). Also the HPE Cray Programming Environment supports OpenACC Fortran on AMD GPUs. In addition, the translator tool to convert OpenACC source to OpenMP source by Intel can be used."><a href="#desc-amdopenaccfortran">23</a></sup></td>
<td class="status"><svg height="11.92" overflow="visible" version="1.1" width="11.92"><g transform="translate(0,11.92) matrix(1 0 0 -1 0 0) translate(5.96,0) translate(0,5.96)" fill="#85924E" stroke="#000000" stroke-width="0.4pt" color="#000000"><path d="M 0 0 M 5.96 0 C 5.96 3.29 3.29 5.96 0 5.96 C -3.29 5.96 -5.96 3.29 -5.96 0 C -5.96 -3.29 -3.29 -5.96 0 -5.96 C 3.29 -5.96 5.96 -3.29 5.96 0 Z M 0 0" style="stroke:none"></path></g></svg><svg height="13.4" overflow="visible" version="1.1" width="15.48"><g transform="translate(0,13.4) matrix(1 0 0 -1 0 0) translate(7.74,0) translate(0,4.47)" fill="#C7DB7F" stroke="#000000" stroke-width="0.4pt" color="#000000"><path d="M 0 8.94 L -7.74 -4.47 L 7.74 -4.47 Z" style="stroke:none"></path></g></svg><sup class="footnote" title="AMD offers AOMP, a dedicated, Clang-based compiler for using OpenMP C/C++ on AMD GPUs (offloading). AOMP is usually shipped with ROCm. The compiler supports most OpenMP 4.5 and some OpenMP 5.0 features. Since the compiler is Clang-based, the usual Clang compiler options apply (-fopenmp to enable OpenMP parsing, and others). Also in the upstream Clang compiler, AMD GPUs can be targeted through OpenMP; as outlined for NVIDIA GPUs, the support for OpenMP 5.0 is nearly complete, and support for OpenMP 5.1/5.2 is comprehensive. In addition, the HPE Cray Programming Environment supports OpenMP on AMD GPUs."><a href="#desc-amdopenmpc">24</a></sup></td>
<td class="status">
<svg height="11.92" overflow="visible" version="1.1" width="11.92"><g transform="translate(0,11.92) matrix(1 0 0 -1 0 0) translate(5.96,0) translate(0,5.96)" fill="#85924E" stroke="#000000" stroke-width="0.4pt" color="#000000"><path d="M 0 0 M 5.96 0 C 5.96 3.29 3.29 5.96 0 5.96 C -3.29 5.96 -5.96 3.29 -5.96 0 C -5.96 -3.29 -3.29 -5.96 0 -5.96 C 3.29 -5.96 5.96 -3.29 5.96 0 Z M 0 0" style="stroke:none"></path></g></svg><sup class="footnote" title="Through AOMP, AMD supports OpenMP offloading to AMD GPUs in Fortran, using the flang executable and Clang-typical compiler options (foremost -fopenmp). Support for AMD GPUs is also available through the HPE Cray Programming Environment."><a href="#desc-amdopenmpfortran">25</a></sup></td>
<td class="status"><svg height="13.4" overflow="visible" version="1.1" width="15.48"><g transform="translate(0,13.4) matrix(1 0 0 -1 0 0) translate(7.74,0) translate(0,4.47)" fill="#C7DB7F" stroke="#000000" stroke-width="0.4pt" color="#000000"><path d="M 0 8.94 L -7.74 -4.47 L 7.74 -4.47 Z" style="stroke:none"></path></g></svg><svg height="12.64" overflow="visible" version="1.1" width="12.64"><g transform="translate(0,12.64) matrix(1 0 0 -1 0 0) translate(6.32,0) translate(0,6.32)" fill="#FBBC6A" stroke="#000000" stroke-width="0.4pt" color="#000000"><path d="M 6.32 6.32 L -6.32 6.32 L -6.32 -6.32 L 6.32 -6.32 Z" style="stroke:none"></path></g></svg><svg height="16.17" overflow="visible" version="1.1" width="17"><g transform="translate(0,16.17) matrix(1 0 0 -1 0 0) translate(8.5,0) translate(0,7.23)" fill="#F38966" stroke="#000000" stroke-width="0.4pt" color="#000000"><path d="M 0 8.94 L -2.45 3.37 L -8.5 2.76 L -3.97 -1.29 L -5.25 -7.23 L 0 -4.17 L 5.25 -7.23 L 3.97 -1.29 L 8.5 2.76 L 2.45 3.37 Z" style="stroke:none"></path></g></svg><sup class="footnote" title="AMD does not yet provide production-grade support for Standard-language parallelism in C++ for their GPUs. Currently under development is roc-stdpar (ROCm Standard Parallelism Runtime Implementation), which aims to supply pSTL algorithms on the GPU and merge the implementation with upstream LLVM. Support for GPU-parallel algorithms is enabled with -stdpar. An alternative proposal in the LLVM community aims to support the pSTL via an OpenMP backend. Also Open SYCL is in the process of creating support for C++ parallel algorithms via a --hipsycl-stdpar switch. By using Open SYCL's backends, also AMD GPUs are supported. Intel provides the Open Source oneDPL (oneAPI DPC++ Library) which implements pSTL algorithms through the DPC++ compiler (see also C++ Standard Parallelism for Intel GPUs). DPC++ has experimental support for AMD GPUs."><a href="#desc-amdstandardc">26</a></sup></td>
<td class="status">
<svg height="9.45" overflow="visible" version="1.1" width="9.45"><g transform="translate(0,9.45) matrix(1 0 0 -1 0 0) translate(0.55,0) translate(0,0.55)" fill="#000000" stroke="#EB5F73" stroke-width="0.8pt" color="#000000"><path d="M 0 0 L 8.34 8.34" style="fill:none"></path></g></svg><sup class="footnote" title="There is no (known) way to launch Standard-based parallel algorithms in Fortran on AMD GPUs."><a href="#desc-amdstandardfortran">27</a></sup></td>
<td class="status">
<svg height="13.4" overflow="visible" version="1.1" width="15.48"><g transform="translate(0,13.4) matrix(1 0 0 -1 0 0) translate(7.74,0) translate(0,4.47)" fill="#C7DB7F" stroke="#000000" stroke-width="0.4pt" color="#000000"><path d="M 0 8.94 L -7.74 -4.47 L 7.74 -4.47 Z" style="stroke:none"></path></g></svg><sup class="footnote" title="Kokkos supports AMD GPUs in C++ mainly through the HIP/ROCm backend. Also, an OpenMP offloading backend is available."><a href="#desc-amdkokkosc">28</a></sup></td>
<td class="status">
<svg height="16.17" overflow="visible" version="1.1" width="17"><g transform="translate(0,16.17) matrix(1 0 0 -1 0 0) translate(8.5,0) translate(0,7.23)" fill="#F38966" stroke="#000000" stroke-width="0.4pt" color="#000000"><path d="M 0 8.94 L -2.45 3.37 L -8.5 2.76 L -3.97 -1.29 L -5.25 -7.23 L 0 -4.17 L 5.25 -7.23 L 3.97 -1.29 L 8.5 2.76 L 2.45 3.37 Z" style="stroke:none"></path></g></svg><sup class="footnote" title="Kokkos is a C++ programming model, but an official compatibility layer for Fortran (Fortran Language Compatibility Layer, FLCL) is available. Through this layer, GPUs can be used as supported by Kokkos C++."><a href="#desc-nvidiakokkosfortran">14</a></sup></td>
<td class="status">
<svg height="13.4" overflow="visible" version="1.1" width="15.48"><g transform="translate(0,13.4) matrix(1 0 0 -1 0 0) translate(7.74,0) translate(0,4.47)" fill="#C7DB7F" stroke="#000000" stroke-width="0.4pt" color="#000000"><path d="M 0 8.94 L -7.74 -4.47 L 7.74 -4.47 Z" style="stroke:none"></path></g></svg><sup class="footnote" title="Alpaka supports AMD GPUs in C++ through HIP or through an OpenMP backend."><a href="#desc-amdalpakac">29</a></sup></td>
<td class="status">
<svg height="9.45" overflow="visible" version="1.1" width="9.45"><g transform="translate(0,9.45) matrix(1 0 0 -1 0 0) translate(0.55,0) translate(0,0.55)" fill="#000000" stroke="#EB5F73" stroke-width="0.8pt" color="#000000"><path d="M 0 0 L 8.34 8.34" style="fill:none"></path></g></svg><sup class="footnote" title="Alpaka is a C++ programming model and no ready-made Fortran support exists."><a href="#desc-nvidiaalpakafortran">16</a></sup></td>
<td class="status">
<svg height="16.17" overflow="visible" version="1.1" width="17"><g transform="translate(0,16.17) matrix(1 0 0 -1 0 0) translate(8.5,0) translate(0,7.23)" fill="#F38966" stroke="#000000" stroke-width="0.4pt" color="#000000"><path d="M 0 8.94 L -2.45 3.37 L -8.5 2.76 L -3.97 -1.29 L -5.25 -7.23 L 0 -4.17 L 5.25 -7.23 L 3.97 -1.29 L 8.5 2.76 L 2.45 3.37 Z" style="stroke:none"></path></g></svg><sup class="footnote" title="AMD does not officially support GPU programming with Python, but third-party solutions are available. CuPy experimentally supports AMD GPUs/ROCm. The package can be found on PyPI as cupy-rocm-5-0. Numba once had support for AMD GPUs, but it is not maintained anymore. Low-level bindings from Python to HIP exist, for example PyHIP (available as pyhip-interface on PyPI). Bindings to OpenCL also exist (PyOpenCL)."><a href="#desc-amdpython">30</a></sup></td>
</tr>
<tr>
<td class="vendor">Intel</td>
<td class="status"><svg height="7.92" overflow="visible" version="1.1" width="15.85"><g transform="translate(0,7.92) matrix(1 0 0 -1 0 0) translate(7.92,0) translate(0,4.75)" fill="#D3C65D" stroke="#000000" stroke-width="0.4pt" color="#000000"><path d="M -7.92 3.17 C -7.92 -1.21 -4.38 -4.75 0 -4.75 C 4.38 -4.75 7.92 -1.21 7.92 3.17 Z" style="stroke:none"></path></g></svg><svg height="13.4" overflow="visible" version="1.1" width="15.48"><g transform="translate(0,13.4) matrix(1 0 0 -1 0 0) translate(7.74,0) translate(0,4.47)" fill="#C7DB7F" stroke="#000000" stroke-width="0.4pt" color="#000000"><path d="M 0 8.94 L -7.74 -4.47 L 7.74 -4.47 Z" style="stroke:none"></path></g></svg><sup class="footnote" title="Intel itself does not support CUDA C/C++ on their GPUs. They offer SYCLomatic, though, an Open Source tool to translate CUDA code to SYCL code, allowing it to run on Intel GPUs. The commercial variant of SYCLomatic is called the DPC++ Compatibility Tool and bundled with oneAPI toolkit. The community project chipStar (previously called CHIP-SPV, recently released a 1.0 version) allows to target Intel GPUs from CUDA C/C++ code by using the CUDA support in Clang. chipStar delivers a Clang-wrapper, cuspv, which replaces calls to nvcc. Also ZLUDA exists, which implements CUDA support for Intel GPUs; it is not maintained anymore, though."><a href="#desc-intelcudac">31</a></sup></td>
<td class="status">
<svg height="9.45" overflow="visible" version="1.1" width="9.45"><g transform="translate(0,9.45) matrix(1 0 0 -1 0 0) translate(0.55,0) translate(0,0.55)" fill="#000000" stroke="#EB5F73" stroke-width="0.8pt" color="#000000"><path d="M 0 0 L 8.34 8.34" style="fill:none"></path></g></svg><sup class="footnote" title="No direct support exists for CUDA Fortran on Intel GPUs. A simple example to bind SYCL to a (CUDA) Fortran program (via ISO C BINDING) can be found on GitHub."><a href="#desc-intelcudafortran">32</a></sup></td>
<td class="status">
<svg height="13.4" overflow="visible" version="1.1" width="15.48"><g transform="translate(0,13.4) matrix(1 0 0 -1 0 0) translate(7.74,0) translate(0,4.47)" fill="#C7DB7F" stroke="#000000" stroke-width="0.4pt" color="#000000"><path d="M 0 8.94 L -7.74 -4.47 L 7.74 -4.47 Z" style="stroke:none"></path></g></svg><sup class="footnote" title="No native support for HIP C++ on Intel GPUs exists. The Open Source third-party project chipStar (previously called CHIP-SPV), though, supports HIP on Intel GPUs by mapping it to OpenCL or Intel's Level Zero runtime. The compiler uses an LLVM-based toolchain and relies on its HIP and SPIR-V functionality."><a href="#desc-intelhipc">33</a></sup></td>
<td class="status">
<svg height="9.45" overflow="visible" version="1.1" width="9.45"><g transform="translate(0,9.45) matrix(1 0 0 -1 0 0) translate(0.55,0) translate(0,0.55)" fill="#000000" stroke="#EB5F73" stroke-width="0.8pt" color="#000000"><path d="M 0 0 L 8.34 8.34" style="fill:none"></path></g></svg><sup class="footnote" title="HIP for Fortran does not exist, and also no translation efforts for Intel GPUs."><a href="#desc-intelhipfortran">34</a></sup></td>
<td class="status">
<svg height="11.92" overflow="visible" version="1.1" width="11.92"><g transform="translate(0,11.92) matrix(1 0 0 -1 0 0) translate(5.96,0) translate(0,5.96)" fill="#85924E" stroke="#000000" stroke-width="0.4pt" color="#000000"><path d="M 0 0 M 5.96 0 C 5.96 3.29 3.29 5.96 0 5.96 C -3.29 5.96 -5.96 3.29 -5.96 0 C -5.96 -3.29 -3.29 -5.96 0 -5.96 C 3.29 -5.96 5.96 -3.29 5.96 0 Z M 0 0" style="stroke:none"></path></g></svg><sup class="footnote" title="SYCL is a C++17-based standard and selected by Intel as the prime programming model for Intel GPUs. Intel implements SYCL support for their GPUs via DPC++, an LLVM-based compiler toolchain. Currently, Intel maintains an own fork of LLVM, but plans to upstream the changes to the main LLVM repository. Based on DPC++, Intel releases a commercial Intel oneAPI DPC++ compiler as part of the oneAPI toolkit. The third-party project Open SYCL also supports Intel GPUs, by leveraging/creating LLVM support (either SPIR-V or Level Zero). A previous solution for targeting Intel GPUs from SYCL was ComputeCpp of CodePlay. The project became unsupported in September 2023 (in favor of implementations to the DPC++ project)."><a href="#desc-intelsyclc">35</a></sup></td>
<td class="status">
<svg height="9.45" overflow="visible" version="1.1" width="9.45"><g transform="translate(0,9.45) matrix(1 0 0 -1 0 0) translate(0.55,0) translate(0,0.55)" fill="#000000" stroke="#EB5F73" stroke-width="0.8pt" color="#000000"><path d="M 0 0 L 8.34 8.34" style="fill:none"></path></g></svg><sup class="footnote" title="SYCL is a C++-based programming model (C++17) and by its nature does not support Fortran. Also, no pre-made bindings are available."><a href="#desc-syclfortran">6</a></sup></td>
<td class="status">
<svg height="16.17" overflow="visible" version="1.1" width="17"><g transform="translate(0,16.17) matrix(1 0 0 -1 0 0) translate(8.5,0) translate(0,7.23)" fill="#F38966" stroke="#000000" stroke-width="0.4pt" color="#000000"><path d="M 0 8.94 L -2.45 3.37 L -8.5 2.76 L -3.97 -1.29 L -5.25 -7.23 L 0 -4.17 L 5.25 -7.23 L 3.97 -1.29 L 8.5 2.76 L 2.45 3.37 Z" style="stroke:none"></path></g></svg><sup class="footnote" title="No direct support for OpenACC C/C++ is available for Intel GPUs. Intel offers a Python-based tool to translate source files with OpenACC C/C++ to OpenMP C/C++, the Application Migration Tool for OpenACC to OpenMP API."><a href="#desc-intelopenaccc">36</a></sup></td>
<td class="status">
<svg height="16.17" overflow="visible" version="1.1" width="17"><g transform="translate(0,16.17) matrix(1 0 0 -1 0 0) translate(8.5,0) translate(0,7.23)" fill="#F38966" stroke="#000000" stroke-width="0.4pt" color="#000000"><path d="M 0 8.94 L -2.45 3.37 L -8.5 2.76 L -3.97 -1.29 L -5.25 -7.23 L 0 -4.17 L 5.25 -7.23 L 3.97 -1.29 L 8.5 2.76 L 2.45 3.37 Z" style="stroke:none"></path></g></svg><sup class="footnote" title="Also for OpenACC Fortran, no direct support is available for Intel GPUs. Intel's source-to-source translation tool from OpenACC to OpenMP also supports Fortran, though."><a href="#desc-intelopenaccfortran">37</a></sup></td>
<td class="status">
<svg height="11.92" overflow="visible" version="1.1" width="11.92"><g transform="translate(0,11.92) matrix(1 0 0 -1 0 0) translate(5.96,0) translate(0,5.96)" fill="#85924E" stroke="#000000" stroke-width="0.4pt" color="#000000"><path d="M 0 0 M 5.96 0 C 5.96 3.29 3.29 5.96 0 5.96 C -3.29 5.96 -5.96 3.29 -5.96 0 C -5.96 -3.29 -3.29 -5.96 0 -5.96 C 3.29 -5.96 5.96 -3.29 5.96 0 Z M 0 0" style="stroke:none"></path></g></svg><sup class="footnote" title="OpenMP is a second key programming model for Intel GPUs and well-supported by Intel. For C++, the support is built into the commercial version of DPC++/C++, Intel oneAPI DPC++/C++. All OpenMP 4.5 and most OpenMP 5.0 and 5.1 features are supported. OpenMP can be enabled through the -qopenmp compiler option of icpx; a suitable offloading target can be given via -fopenmp-targets=spir64."><a href="#desc-intelopenmpc">38</a></sup></td>
<td class="status">
<svg height="11.92" overflow="visible" version="1.1" width="11.92"><g transform="translate(0,11.92) matrix(1 0 0 -1 0 0) translate(5.96,0) translate(0,5.96)" fill="#85924E" stroke="#000000" stroke-width="0.4pt" color="#000000"><path d="M 0 0 M 5.96 0 C 5.96 3.29 3.29 5.96 0 5.96 C -3.29 5.96 -5.96 3.29 -5.96 0 C -5.96 -3.29 -3.29 -5.96 0 -5.96 C 3.29 -5.96 5.96 -3.29 5.96 0 Z M 0 0" style="stroke:none"></path></g></svg><sup class="footnote" title="OpenMP in Fortran is Intel's main selected route to bring Fortran applications to their GPUs. OpenMP offloading in Fortran is supported through Intel's Fortran Compiler ifx (the new LLVM-based version, not the Fortran Compiler Classic), part of the oneAPI HPC Toolkit. Similarly to C++, OpenMP offloading can be enabled through a combination of -qopenmp and -fopenmp-targets=spir64."><a href="#desc-intelopenmpfortran">39</a></sup></td>
<td class="status"><svg height="11.92" overflow="visible" version="1.1" width="11.92"><g transform="translate(0,11.92) matrix(1 0 0 -1 0 0) translate(5.96,0) translate(0,5.96)" fill="#85924E" stroke="#000000" stroke-width="0.4pt" color="#000000"><path d="M 0 0 M 5.96 0 C 5.96 3.29 3.29 5.96 0 5.96 C -3.29 5.96 -5.96 3.29 -5.96 0 C -5.96 -3.29 -3.29 -5.96 0 -5.96 C 3.29 -5.96 5.96 -3.29 5.96 0 Z M 0 0" style="stroke:none"></path></g></svg><svg height="12.64" overflow="visible" version="1.1" width="12.64"><g transform="translate(0,12.64) matrix(1 0 0 -1 0 0) translate(6.32,0) translate(0,6.32)" fill="#FBBC6A" stroke="#000000" stroke-width="0.4pt" color="#000000"><path d="M 6.32 6.32 L -6.32 6.32 L -6.32 -6.32 L 6.32 -6.32 Z" style="stroke:none"></path></g></svg><sup class="footnote" title="Intel supports C++ standard parallelism (pSTL) through the Open Source oneDPL (oneAPI DPC++ Library), also available as part of the oneAPI toolkit. It implements the pSTL on top of the DPC++ compiler, algorithms, data structures, and policies live in the oneapi::dpl:: namespace. In addition, Open SYCL is current adding support for C++ parallel algorithms, to be enabled via the --hipsycl-stdpar compiler option."><a href="#desc-intelstandardc">40</a></sup></td>
<td class="status">
<svg height="11.92" overflow="visible" version="1.1" width="11.92"><g transform="translate(0,11.92) matrix(1 0 0 -1 0 0) translate(5.96,0) translate(0,5.96)" fill="#85924E" stroke="#000000" stroke-width="0.4pt" color="#000000"><path d="M 0 0 M 5.96 0 C 5.96 3.29 3.29 5.96 0 5.96 C -3.29 5.96 -5.96 3.29 -5.96 0 C -5.96 -3.29 -3.29 -5.96 0 -5.96 C 3.29 -5.96 5.96 -3.29 5.96 0 Z M 0 0" style="stroke:none"></path></g></svg><sup class="footnote" title="Standard language parallelism of Fortran is supported by Intel on their GPUs through the Intel Fortran Compiler ifx (the new, LLVM-based compiler, not the Classic version), part of the oneAPI HPC toolkit. In the oneAPI update 2022.1, the do concurrent support was added and extended in further releases. It can be used via the -qopenmp compiler option together with -fopenmp-target-do-concurrent and -fopenmp-targets=spir64."><a href="#desc-intelstandardfortran">41</a></sup></td>
<td class="status">
<svg height="13.4" overflow="visible" version="1.1" width="15.48"><g transform="translate(0,13.4) matrix(1 0 0 -1 0 0) translate(7.74,0) translate(0,4.47)" fill="#C7DB7F" stroke="#000000" stroke-width="0.4pt" color="#000000"><path d="M 0 8.94 L -7.74 -4.47 L 7.74 -4.47 Z" style="stroke:none"></path></g></svg><sup class="footnote" title="No direct support by Intel for Kokkos is available, but Kokkos supports Intel GPUs through an experimental SYCL backend."><a href="#desc-intelkokkosc">42</a></sup></td>
<td class="status">
<svg height="16.17" overflow="visible" version="1.1" width="17"><g transform="translate(0,16.17) matrix(1 0 0 -1 0 0) translate(8.5,0) translate(0,7.23)" fill="#F38966" stroke="#000000" stroke-width="0.4pt" color="#000000"><path d="M 0 8.94 L -2.45 3.37 L -8.5 2.76 L -3.97 -1.29 L -5.25 -7.23 L 0 -4.17 L 5.25 -7.23 L 3.97 -1.29 L 8.5 2.76 L 2.45 3.37 Z" style="stroke:none"></path></g></svg><sup class="footnote" title="Kokkos is a C++ programming model, but an official compatibility layer for Fortran (Fortran Language Compatibility Layer, FLCL) is available. Through this layer, GPUs can be used as supported by Kokkos C++."><a href="#desc-nvidiakokkosfortran">14</a></sup></td>
<td class="status">
<svg height="13.4" overflow="visible" version="1.1" width="15.48"><g transform="translate(0,13.4) matrix(1 0 0 -1 0 0) translate(7.74,0) translate(0,4.47)" fill="#C7DB7F" stroke="#000000" stroke-width="0.4pt" color="#000000"><path d="M 0 8.94 L -7.74 -4.47 L 7.74 -4.47 Z" style="stroke:none"></path></g></svg><sup class="footnote" title="Since v.0.9.0, Alpaka contains experimental SYCL support with which Intel GPUs can be targeted. Also, Alpaka can fall back to an OpenMP backend."><a href="#desc-intelalpakac">43</a></sup></td>
<td class="status">
<svg height="9.45" overflow="visible" version="1.1" width="9.45"><g transform="translate(0,9.45) matrix(1 0 0 -1 0 0) translate(0.55,0) translate(0,0.55)" fill="#000000" stroke="#EB5F73" stroke-width="0.8pt" color="#000000"><path d="M 0 0 L 8.34 8.34" style="fill:none"></path></g></svg><sup class="footnote" title="Alpaka is a C++ programming model and no ready-made Fortran support exists."><a href="#desc-nvidiaalpakafortran">16</a></sup></td>
<td class="status">
<svg height="12.64" overflow="visible" version="1.1" width="12.64"><g transform="translate(0,12.64) matrix(1 0 0 -1 0 0) translate(6.32,0) translate(0,6.32)" fill="#FBBC6A" stroke="#000000" stroke-width="0.4pt" color="#000000"><path d="M 6.32 6.32 L -6.32 6.32 L -6.32 -6.32 L 6.32 -6.32 Z" style="stroke:none"></path></g></svg><sup class="footnote" title="Intel GPUs can be used from Python through three notable packages. First, Intel's Data Parallel Control (dpctl) implements low-level Python bindings to SYCL functionality. It is available on PyPI as dpctl. Second, a higher level, Intel's Data-parallel Extension to Numba (numba-dpex) supplies an extension to the JIT functionality of Numba to support Intel GPUs. It is available from Anaconda as numba-dpex. Finally, and arguably highest level, Intel's Data Parallel Extension for Numpy (dpnp) builds up on the Numpy API and extends some functions with Intel GPU support. It is available on PyPI as dpnp, although latest versions appear to be available only on GitHub."><a href="#desc-intelpython">44</a></sup></td>
</tr>
</tbody>
</table>
<ul>
<li id="desc-cudac"><span class="number">1:</span> <span class="description">CUDA C/C++ is supported on NVIDIA GPUs through the <a href='https://developer.nvidia.com/cuda-toolkit'>CUDA Toolkit</a>. First released in 2007, the toolkit covers nearly all aspects of the NVIDIA platform: an API for programming (incl. language extensions), libraries, tools for profiling and debugging, compiler, management tools, and more. The current version is CUDA 12.2. Usually, when referring to <em>CUDA</em> without any additional context, the CUDA API is meant. While incorporating some Open Source components, the CUDA platform in its entirety is proprietary and closed sourced. The low-level CUDA instruction set architecture is PTX, to which higher languages like the CUDA C/C++ are translated to. PTX is compiled to SASS, the binary code executed on the device. As it is the reference for platform, the support for NVIDIA GPUs through CUDA C/C++ is very comprehensive. In addition to support through the CUDA toolkit, NVIDIA GPUs can also be <a href='https://llvm.org/docs/CompileCudaWithLLVM.html'>used by Clang</a>, utilizing the LLVM toolchain to emit PTX code and compile it subsequently.</span><span class="citation" data-cites="CUDA">[1]</span><a href="#compat-table" class="back" title="Back to table">↺</a></li>
<li id="desc-cudafortran"><span class="number">2:</span> <span class="description">CUDA Fortran, a proprietary Fortran extension by NVIDIA, is supported on NVIDIA GPUs via the <a href='https://developer.nvidia.com/hpc-sdk'>NVIDIA HPC SDK</a> (<em>NVHPC</em>). NVHPC implements most features of the CUDA API in Fortran and is activated through the <code>-cuda</code> switch in the <code>nvfortran</code> compiler. The CUDA extensions for Fortran are modeled closely after the CUDA C/C++ definitions. In addition to creating explicit kernels in Fortran, CUDA Fortran also supports <em>cuf kernels</em>, a way to let the compiler generate GPU parallel code automatically. Very recently, <a href='https://reviews.llvm.org/D150159'>CUDA Fortran support was also merged into Flang</a>, the LLVM-based Fortran compiler.</span><span class="citation" data-cites="CUDAFortran">[2]</span><a href="#compat-table" class="back" title="Back to table">↺</a></li>
<li id="desc-nvidiahip"><span class="number">3:</span> <span class="description"><a href='https://github.com/ROCm-Developer-Tools/HIP'>HIP</a> programs can directly use NVIDIA GPUs via a CUDA backend. As HIP is strongly inspired by CUDA, the mapping is relatively straight-forward; API calls are named similarly (for example: <code>hipMalloc()</code> instead of <code>cudaMalloc()</code>) and keywords of the kernel syntax are identical. HIP also supports some CUDA libraries and creates interfaces to them (like <code>hipblasSaxpy()</code> instead of <code>cublasSaxpy()</code>). To target NVIDIA GPUs through the HIP compiler (<code>hipcc</code>), <code>HIP_PLATFORM=nvidia</code> needs to be set in the environment. In order to initially create a HIP code from CUDA, AMD offers the <a href='https://github.com/ROCm-Developer-Tools/HIPIFY'>HIPIFY</a> conversion tool.</span><span class="citation" data-cites="HIP">[3]</span><a href="#compat-table" class="back" title="Back to table">↺</a></li>
<li id="desc-nvidiahipfortran"><span class="number">4:</span> <span class="description">No Fortran version of HIP exists; HIP is solely a C/C++ model. But AMD offers an extensive set of ready-made interfaces to the HIP API and HIP and ROCm libraries with <a href='https://github.com/ROCmSoftwarePlatform/hipfort'>hipfort</a> (MIT-licensed). All interfaces implement C functionality and CUDA-like Fortran extensions, for example to write kernels, are available.</span><span class="citation" data-cites="hipfort">[4]</span><a href="#compat-table" class="back" title="Back to table">↺</a></li>
<li id="desc-nvidiasycl"><span class="number">5:</span> <span class="description">No direct support for <a href="https://www.khronos.org/sycl/">SYCL</a> is available by NVIDIA, but SYCL can be used on NVIDIA GPUs through multiple venues. First, SYCL can be <a href="https://github.com/intel/llvm/blob/sycl/sycl/doc/GetStartedGuide.md#build-dpc-toolchain-with-support-for-nvidia-cuda">used through DPC++</a>, an Open-Source LLVM-based compiler project <a href="https://github.com/intel/llvm">led by Intel</a>. The DPC++ infrastructure is also available through Intel's commercial <a href="https://www.intel.com/content/www/us/en/developer/tools/oneapi/dpc-compiler.html">oneAPI toolkit</a> (<em>Intel oneAPI DPC++/C++</em>) as <a href="https://developer.codeplay.com/products/oneapi/nvidia/2023.2.1/guides/get-started-guide-nvidia">a dedicated plugin</a>. Upstreaming SYCL support directly into LLVM is an <a href="https://github.com/intel/llvm/issues/49">ongoing effort</a>, which started <a href="https://lists.llvm.org/pipermail/cfe-dev/2019-January/060811.html">in 2019</a>. Further, SYCL can be used via <a href="https://github.com/OpenSYCL/OpenSYCL/">Open SYCL</a> (previously called hipSYCL), an independently developed SYCL implementation, using NVIDIA GPUs either through the CUDA support of LLVM or the <code>nvc++</code> compiler of NVHPC. A third popular possibility was the NVIDIA GPU support in <a href="https://github.com/codeplaysoftware/sycl-for-cuda/tree/cuda">ComputeCpp of CodePlay</a>; though <a href="https://developer.codeplay.com/products/computecpp/ce/home/">the product became unsupported in September 2023</a>. In case LLVM is involved, SYCL implementations can rely on CUDA support in LLVM, which needs the CUDA toolkit available for the final compilations parts beyond PTX. In order to translate a CUDA code to SYCL, Intel offers the <a href="https://github.com/oneapi-src/SYCLomatic">SYCLomatic</a> conversion tool.</span><span class="citation" data-cites="intelllvm opensyclproceedings">[5,6]</span><a href="#compat-table" class="back" title="Back to table">↺</a></li>
<li id="desc-syclfortran"><span class="number">6:</span> <span class="description">SYCL is a C++-based programming model (C++17) and by its nature does not support Fortran. Also, no pre-made bindings are available.</span><span class="citation" data-cites="khronossycl">[7]</span><a href="#compat-table" class="back" title="Back to table">↺</a></li>
<li id="desc-openaccc"><span class="number">7:</span> <span class="description">OpenACC C/C++ on NVIDIA GPUs is supported most extensively through the <a href="https://developer.nvidia.com/hpc-sdk">NVIDIA HPC SDK</a>. Beyond the bundled libraries, frameworks, and other models, the NVIDIA HPC SDK also features the <code>nvc</code>/<code>nvc++</code> compilers, in which <a href="https://docs.nvidia.com/hpc-sdk/compilers/hpc-compilers-user-guide/index.html#acc-use">OpenACC support</a> can be enabled with the <code>-acc -gpu</code>. The support of OpenACC in this vendor-delivered compiler is very comprehensive, it conforms to version 2.7 of the specification. A variety of compile options are available to modify the compilation process. In addition to NVIDIA HPC SDK, good support is also available in GCC since GCC 5.0, <a href="https://gcc.gnu.org/wiki/OpenACC">supporting OpenACC 2.6</a> through the <code>nvptx</code> architecture. The compiler switch to enable OpenACC in <code>gcc</code>/<code>g++</code> is <code>-fopenacc</code>, further options are available. Further, the <a href="https://csmd.ornl.gov/project/clacc">Clacc compiler</a> implements OpenACC support into the LLVM toolchain, adapting the Clang frontend. As a central design aspect, it translates OpenACC to OpenMP as part of the compilation process. OpenACC can be activated in a Clacc-<code>clang</code> via <code>-fopenacc</code>, and further compiler options exist, mostly leveraging OpenMP options. A recent study by <a href="https://ieeexplore.ieee.org/document/10029456">Jarmusch et al.</a> compared these compilers for coverage of the OpenACC 3.0 specification.</span><span class="citation" data-cites="nvhpc gccopenacc claccieee jarmusch22">[8–11]</span><a href="#compat-table" class="back" title="Back to table">↺</a></li>
<li id="desc-openaccfortran"><span class="number">8:</span> <span class="description">Support of OpenACC Fortran on NVIDIA GPUs is similar to OpenACC C/C++, albeit not identical. First, <a href="https://developer.nvidia.com/hpc-sdk">NVIDIA HPC SDK</a> supports OpenACC in Fortran through the included <code>nvfortran</code> compiler, with options like for the C/C++ compilers. In addition, also <a href="https://gcc.gnu.org/wiki/OpenACC">GCC supports OpenACC</a> through the <code>gfortran</code> compiler with identical compiler options to the C/C++ compilers. Further, similar to OpenACC support in LLVM for C/C++ through <em>Clacc</em> contributions, the LLVM frontend for Fortran, <a href="https://flang.llvm.org/docs/">Flang</a> (the successor of <em>F18</em>, not <em>classic Flang</em>), <a href="https://flang.llvm.org/docs/OpenACC.html">supports OpenACC</a> as well. Support was initially contributed through the <a href="https://ieeexplore.ieee.org/document/9651310">Flacc project</a> and now resides in the main LLVM project. Finally, the <a href="https://www.hpe.com/psnow/doc/a50002303enw">HPE Cray Programming Environment</a> supports <a href="https://cpe.ext.hpe.com/docs/cce/man7/intro_openacc.7.html">OpenACC Fortran</a>; in <code>ftn</code>, OpenACC can be enabled through <code>-hacc</code>.</span><span class="citation" data-cites="nvhpc gccopenacc flaccieee">[8,9,12]</span><a href="#compat-table" class="back" title="Back to table">↺</a></li>
<li id="desc-nvidiaopenmpc"><span class="number">9:</span> <span class="description">OpenMP in C/C++ is supported on NVIDIA GPUs (<em>Offloading</em>) through multiple venues, similarly to OpenACC. First, the NVIDIA HPC SDK supports <a href="https://docs.nvidia.com/hpc-sdk/compilers/hpc-compilers-user-guide/index.html#openmp-use">OpenMP GPU offloading</a> in both <code>nvc</code> and <code>nvc++</code>, albeit only a subset of the entire OpenMP 5.0 standard (see <a href="https://docs.nvidia.com/hpc-sdk/compilers/hpc-compilers-user-guide/index.html#openmp-subset">the documentation for supported/unsupported features</a>). The key compiler option is <code>-mp</code>. Also in GCC, <a href="https://gcc.gnu.org/wiki/Offloading">OpenMP offloading</a> can be used to NVIDIA GPUs; the compiler switch is <code>-fopenmp</code>, with options delivered through <a href="https://gcc.gnu.org/onlinedocs/gcc/C-Dialect-Options.html#index-foffload"><code>-foffload</code> and <code>-foffload-options</code></a>. GCC <a href="https://gcc.gnu.org/onlinedocs/gcc-13.1.0/libgomp/OpenMP-Implementation-Status.html">currently supports OpenMP 4.5 entirely</a>, while OpenMP features of 5.0, 5.1, and, 5.2 are currently being implemented. Similarly in Clang, where <a href="https://clang.llvm.org/docs/OffloadingDesign.html">OpenMP offloading to NVIDIA GPUs</a> is supported and enabled through <code>-fopenmp -fopenmp-targets=nvptx64</code>, with offload architectures selected via <code>--offload-arch=native</code> (or similar). Clang implements <a href="https://clang.llvm.org/docs/OpenMPSupport.html#openmp-implementation-details">nearly all OpenMP 5.0 features and most of OpenMP 5.1/5.2</a>. In the HPE Cray Programming Environment, a <a href="https://cpe.ext.hpe.com/docs/cce/man7/intro_openmp.7.html">subset of OpenMP 5.0/5.1 is supported</a> for NVIDIA GPUs. It can be activated through <code>-fopenmp</code>. Also <a href="https://github.com/ROCm-Developer-Tools/aomp/">AOMP</a>, AMD's Clang/LLVM-based compiler, supports NVIDIA GPUs. Support of OpenMP features in the compilers was recently discussed in the <a href="https://www.openmp.org/wp-content/uploads/2022_ECP_Community_BoF_Days-OpenMP_RoadMap_BoF.pdf">OpenMP ECP BoF 2022</a>.</span><span class="citation" data-cites="nvhpc gccopenmp clangopenmp hpepe">[8,13–15]</span><a href="#compat-table" class="back" title="Back to table">↺</a></li>
<li id="desc-nvidiaopenmpfortran"><span class="number">10:</span> <span class="description">OpenMP in Fortran is supported on NVIDIA GPUs nearly identical to C/C++. <a href="https://docs.nvidia.com/hpc-sdk/compilers/hpc-compilers-user-guide/index.html#openmp-use">NVIDIA HPC SDK's <code>nvfortran</code></a> implements support, <a href="https://gcc.gnu.org/wiki/openmp">GCC's <code>gfortran</code></a>, <a href="https://flang.llvm.org/docs/">LLVM's Flang</a> (through <code>-mp</code>, and <a href="https://flang.llvm.org/docs/GettingStarted.html#openmp-target-offload-build">only when Flang is compiled via Clang</a>), and also the <a href="https://cpe.ext.hpe.com/docs/cce/man7/intro_openmp.7.html">HPE Cray Programming Environment</a>.</span><span class="citation" data-cites="nvhpc gccopenmp hpepe flang">[8,13,15,16]</span><a href="#compat-table" class="back" title="Back to table">↺</a></li>
<li id="desc-nvidiastandardc"><span class="number">11:</span> <span class="description">Standard language parallelism of C++, namely algorithms and data structures of the <em>parallel STL</em>, is supported on NVIDIA GPUs <a href="https://docs.nvidia.com/hpc-sdk/compilers/c++-parallel-algorithms/index.html">through the <code>nvc++</code> compiler of the NVIDIA HPC SDK</a>. The key compiler option is <code>-stdpar=gpu</code>, which enables offloading of parallel algorithms to the GPU. Also, currently Open SYCL <a href="https://github.com/OpenSYCL/OpenSYCL/pull/1088">is in the process of implementing support for pSTL algorithms</a>, enabled via <code>--hipsycl-stdpar</code>. Further, <a href="https://intel.github.io/llvm-docs/GetStartedGuide.html#build-dpc-toolchain-with-support-for-nvidia-cuda">NVIDIA GPUs can be targeted from Intel's DPC++ compiler</a>, enabling usage of pSTL algorithms implemented in Intel's Open Source <a href="https://github.com/oneapi-src/oneDPL">oneDPL</a> (<em>oneAPI DPC++ Library</em>) on NVIDIA GPUs. Finally, a <a href="https://discourse.llvm.org/t/rfc-openmp-offloading-backend-for-c-parallel-algorithms/73468">current proposal in the LLVM community</a> aims at implementing pSTL support through an OpenMP backend.</span><span class="citation" data-cites="nvhpc opensyclproceedings onedpl">[6,8,17]</span><a href="#compat-table" class="back" title="Back to table">↺</a></li>
<li id="desc-nvidiastandardfortran"><span class="number">12:</span> <span class="description">Standard language parallelism of Fortran, mainly <code>do concurrent</code>, is supported on NVIDIA GPUs <a href="https://developer.nvidia.com/blog/accelerating-fortran-do-concurrent-with-gpus-and-the-nvidia-hpc-sdk/">through the <code>nvfortran</code> compiler of the NVIDIA HPC SDK</a>. As for the C++ case, it is enabled through the <code>-stdpar=gpu</code> compiler option.</span><span class="citation" data-cites="nvhpc">[8]</span><a href="#compat-table" class="back" title="Back to table">↺</a></li>
<li id="desc-nvidiakokkosc"><span class="number">13:</span> <span class="description"><a href="https://github.com/kokkos/kokkos">Kokkos</a> supports NVIDIA GPUs in C++. Kokkos has <a href="https://kokkos.github.io/kokkos-core-wiki/requirements.html">multiple backends</a> available with NVIDIA GPU support: a native CUDA C/C++ backend (using <code>nvcc</code>), an NVIDIA HPC SDK backend (using CUDA support in <code>nvc++</code>), and a Clang backend, using either Clang's CUDA support directly or <a href="https://docs.nersc.gov/development/programming-models/kokkos/">via the OpenMP offloading facilities</a> (via <code>clang++</code>).</span><span class="citation" data-cites="kokkos">[18]</span><a href="#compat-table" class="back" title="Back to table">↺</a></li>
<li id="desc-nvidiakokkosfortran"><span class="number">14:</span> <span class="description">Kokkos is a C++ programming model, but an official compatibility layer for Fortran (<a href="https://github.com/kokkos/kokkos-fortran-interop"><em>Fortran Language Compatibility Layer</em>, FLCL</a>) is available. Through this layer, GPUs can be used as supported by Kokkos C++.</span><span class="citation" data-cites="kokkos">[18]</span><a href="#compat-table" class="back" title="Back to table">↺</a></li>
<li id="desc-nvidiaalpakac"><span class="number">15:</span> <span class="description"><a href="https://github.com/alpaka-group/alpaka">Alpaka</a> supports NVIDIA GPUs in C++ (C++17), either through the NVIDIA CUDA C/C++ compiler <code>nvcc</code> or LLVM/Clang's support of CUDA in <code>clang++</code>.</span><span class="citation" data-cites="alpaka">[19]</span><a href="#compat-table" class="back" title="Back to table">↺</a></li>
<li id="desc-nvidiaalpakafortran"><span class="number">16:</span> <span class="description">Alpaka is a C++ programming model and no ready-made Fortran support exists.</span><span class="citation" data-cites="alpaka">[19]</span><a href="#compat-table" class="back" title="Back to table">↺</a></li>
<li id="desc-nvidiapython"><span class="number">17:</span> <span class="description">Using NVIDIA GPUs from Python code can be achieved through multiple venues. NVIDIA itself offers <a href="https://github.com/NVIDIA/cuda-python">CUDA Python</a>, a package delivering low-level interfaces to CUDA C/C++. Typically, code is not directly written using CUDA Python, but rather CUDA Python functions as a backend for higher level models. CUDA Python is available on PyPI as <a href="https://pypi.org/project/cuda-python/"><code>cuda-python</code></a>. An alternative to CUDA Python from the community is <a href="https://github.com/inducer/pycuda">PyCUDA</a>, which adds some higher-level features and functionality and comes with its own C++ base layer. PyCUDA is available on PyPI as <a href="https://pypi.org/project/pycuda/"><code>pycuda</code></a>. The most well-known, higher-level abstraction is <a href="https://cupy.dev/">CuPy</a>, which implements primitives known from Numpy with GPU support, offers functionality for defining custom kernels, and bindings to libraries. CuPy is available on PyPI as <a href="https://pypi.org/project/cupy-cuda12x/"><code>cupy-cuda12x</code></a> (for CUDA 12.x). Two packages arguably providing even higher abstractions are Numba and CuNumeric. <a href="http://numba.pydata.org/">Numba</a> offers access to NVIDIA GPUs and features acceleration of functions through Python decorators (<em>functions wrapping functions</em>); it is available as <a href="https://pypi.org/project/numba/"><code>numba</code></a> on PyPI. <a href="https://github.com/nv-legate/cunumeric">cuNumeric</a>, a project by NVIDIA, allows to access the GPU via Numpy-inspired functions (like CuPy), but utilizes the <a href="https://github.com/nv-legate/legate.core">Legate library</a> to transparently scale to multiple GPUs.</span><span class="citation" data-cites="cudapython pycuda cupy numba cunumeric">[20–24]</span><a href="#compat-table" class="back" title="Back to table">↺</a></li>
<li id="desc-amdcudac"><span class="number">18:</span> <span class="description">While CUDA is not directly supported on AMD GPUs, it can be translated to HIP through AMD's <a href="https://github.com/ROCm-Developer-Tools/HIPIFY">HIPIFY</a>. Using <code>hipcc</code> and <code>HIP_PLATFORM=amd</code> in the environment, CUDA-to-HIP-translated code can be executed.</span><span class="citation" data-cites="HIP">[3]</span><a href="#compat-table" class="back" title="Back to table">↺</a></li>
<li id="desc-amdcudafortran"><span class="number">19:</span> <span class="description">No direct support for CUDA Fortran on AMD GPUs is available, but AMD offers a source-to-source translator, <a href="https://github.com/ROCmSoftwarePlatform/gpufort">GPUFORT</a>, to convert some CUDA Fortran to either Fortran with OpenMP (via <a href="https://github.com/ROCm-Developer-Tools/aomp">AOMP</a>) or Fortran with HIP bindings and extracted C kernels (via <a href="https://github.com/ROCmSoftwarePlatform/hipfort">hipfort</a>). As stated in the project repository, the covered functionality is <a href="https://github.com/ROCmSoftwarePlatform/gpufort#limitations">driven by use-case requirements</a>; the last commit is two years old.</span><span class="citation" data-cites="gpufort">[25]</span><a href="#compat-table" class="back" title="Back to table">↺</a></li>
<li id="desc-amdhipc"><span class="number">20:</span> <span class="description"><a href="https://github.com/ROCm-Developer-Tools/HIP">HIP</a> C++ is the <em>native</em> programming model for AMD GPUs and, as such, fully supports the devices. It is part of AMD's GPU-targeted <a href="https://rocm.docs.amd.com/en/latest/">ROCm platform</a>, which includes compilers, libraries, tool, and drivers and mostly consists of Open Source Software. HIP code can be compiled with <a href="https://github.com/ROCm-Developer-Tools/HIPCC"><code>hipcc</code></a>, utilizing the correct environment variables (like <code>HIP_PLATFORM=amd</code>) and compiler options (like <code>--offload-arch=gfx90a</code>). <code>hipcc</code> is a <em>compiler driver</em> (wrapper script) which assembles the correct compilation string, finally calling <a href="https://github.com/RadeonOpenCompute/llvm-project">AMD's Clang compiler</a> to generate host/device code (using the <a href="https://llvm.org/docs/AMDGPUUsage.html">AMDGPU backend</a>).</span><span class="citation" data-cites="HIP">[3]</span><a href="#compat-table" class="back" title="Back to table">↺</a></li>
<li id="desc-amdsyclc"><span class="number">21:</span> <span class="description">No direct support for SYCL is available by AMD for their GPU devices. But like for the NVIDIA ecosystem, SYCL C++ can be used on AMD GPUs through third-party software. First, <a href="https://github.com/OpenSYCL/OpenSYCL">Open SYCL</a> (previously <em>hipSYCL</em>) supports AMD GPUs, relying on HIP/ROCm support in Clang. All available <a href="https://github.com/OpenSYCL/OpenSYCL/blob/develop/doc/compilation.md">internal compilation models</a> can target AMD GPUs. Second, also AMD GPUs can be targeted through both <a href="https://github.com/intel/llvm/blob/sycl/sycl/doc/GetStartedGuide.md#build-dpc-toolchain-with-support-for-hip-amd">DPC++</a>, Intel's LLVM-based Open Source compiler, and the commercial version included in the <a href="https://www.intel.com/content/www/us/en/developer/tools/oneapi/dpc-compiler.html">oneAPI toolkit</a> (via an <a href="https://developer.codeplay.com/products/oneapi/amd/2023.2.1/guides/get-started-guide-amd">AMD ROCm plugin</a>). In comparison to SYCL support for CUDA, no conversion tool like SYCLomatic exists.</span><span class="citation" data-cites="opensyclproceedings intelllvm">[5,6]</span><a href="#compat-table" class="back" title="Back to table">↺</a></li>
<li id="desc-amdopenaccc"><span class="number">22:</span> <span class="description">OpenACC C/C++ is not supported by AMD itself, but third-party support is available for AMD GPUs through GCC or Clacc (similarly to their support of OpenACC C/C++ for NVIDI GPUS). In <a href="https://gcc.gnu.org/wiki/Offloading">GCC, OpenACC support</a> can be activated through <code>-fopenacc</code>, and further specified for AMD GPUs with, for example, <code>-foffload=amdgcn-amdhsa="-march=gfx906"</code>. <a href="https://csmd.ornl.gov/project/clacc">Clacc also supports OpenACC C/C++ on AMD GPUs</a> by translating OpenACC to OpenMP and using LLVM's AMD support. The enabling compiler switch is <code>-fopenacc</code>, and AMD GPU targets can be further specified by, for example, <code>-fopenmp-targets=amdgcn-amd-amdhsa</code>. <a href=\"https://github.com/intel/intel-application-migration-tool-for-openacc-to-openmp\">Intel's OpenACC to OpenMP source-to-source translator</a> can also be used for AMD's platform.</span><span class="citation" data-cites="gccopenacc claccieee">[9,10]</span><a href="#compat-table" class="back" title="Back to table">↺</a></li>
<li id="desc-amdopenaccfortran"><span class="number">23:</span> <span class="description">No native support for OpenACC on AMD GPUs for Fortran is available, but AMD supplies <a href="https://github.com/ROCmSoftwarePlatform/gpufort">GPUFORT</a>, a research project to source-to-source translate OpenACC Fortran to either Fortran with added OpenMP or Fortran with HIP bindings and extracted C kernels (using <a href="https://github.com/ROCmSoftwarePlatform/hipfort">hipfort</a>). The covered functionality of GPUFORT is driven by use-case requirements, the last commit is two years old. Support for OpenACC Fortran is also available by the community through <a href="https://gcc.gnu.org/onlinedocs/gfortran/OpenACC.html">GCC (<code>gfortran</code>)</a> and upcoming in <a href="https://ieeexplore.ieee.org/document/9651310">LLVM (Flacc)</a>. Also the <a href="https://cpe.ext.hpe.com/docs/cce/man7/intro_openacc.7.html">HPE Cray Programming Environment supports OpenACC Fortran</a> on AMD GPUs. In addition, the <a href="https://github.com/intel/intel-application-migration-tool-for-openacc-to-openmp">translator tool to convert OpenACC source to OpenMP source by Intel</a> can be used.</span><span class="citation" data-cites="gpufort gccopenacc flaccieee">[9,12,25]</span><a href="#compat-table" class="back" title="Back to table">↺</a></li>
<li id="desc-amdopenmpc"><span class="number">24:</span> <span class="description">AMD offers <a href="https://github.com/ROCm-Developer-Tools/aomp">AOMP</a>, a dedicated, Clang-based compiler for using OpenMP C/C++ on AMD GPUs (<em>offloading</em>). AOMP is usually shipped with ROCm. The compiler <a href="https://www.exascaleproject.org/wp-content/uploads/2022/02/Elwasif-ECP-sollve_vv_final.pdf">supports most OpenMP 4.5 and some OpenMP 5.0 features</a>. Since the compiler is Clang-based, the usual Clang compiler options apply (<code>-fopenmp</code> to enable OpenMP parsing, and others). Also in the upstream Clang compiler, <a href="https://clang.llvm.org/docs/OffloadingDesign.html">AMD GPUs can be targeted through OpenMP</a>; as outlined for NVIDIA GPUs, the support for OpenMP 5.0 is nearly complete, and support for OpenMP 5.1/5.2 is comprehensive. In addition, the <a href="https://cpe.ext.hpe.com/docs/cce/man7/intro_openmp.7.html">HPE Cray Programming Environment</a> supports OpenMP on AMD GPUs.</span><span class="citation" data-cites="aomp ecpopenmpbof hpepe">[15,26,27]</span><a href="#compat-table" class="back" title="Back to table">↺</a></li>
<li id="desc-amdopenmpfortran"><span class="number">25:</span> <span class="description">Through <a href="https://github.com/ROCm-Developer-Tools/aomp">AOMP</a>, AMD supports OpenMP offloading to AMD GPUs in Fortran, using the <code>flang</code> executable and Clang-typical compiler options (foremost <code>-fopenmp</code>). Support for AMD GPUs is also available through the <a href="https://cpe.ext.hpe.com/docs/cce/man7/intro_openmp.7.html">HPE Cray Programming Environment</a>.</span><span class="citation" data-cites="aomp hpepe">[15,26]</span><a href="#compat-table" class="back" title="Back to table">↺</a></li>
<li id="desc-amdstandardc"><span class="number">26:</span> <span class="description">AMD does not yet provide production-grade support for Standard-language parallelism in C++ for their GPUs. Currently under development is <a href="https://github.com/ROCmSoftwarePlatform/roc-stdpar"><em>roc-stdpar</em></a> (ROCm Standard Parallelism Runtime Implementation), which aims to supply pSTL algorithms on the GPU and <a href="https://discourse.llvm.org/t/rfc-adding-c-parallel-algorithm-offload-support-to-clang-llvm/72159">merge the implementation with upstream LLVM</a>. Support for GPU-parallel algorithms is enabled with <code>-stdpar</code>. An <a href="https://discourse.llvm.org/t/rfc-openmp-offloading-backend-for-c-parallel-algorithms/73468">alternative proposal in the LLVM</a> community aims to support the pSTL via an OpenMP backend. Also Open SYCL <a href="https://github.com/OpenSYCL/OpenSYCL/pull/1088">is in the process of creating support for C++ parallel algorithms</a> via a <code>--hipsycl-stdpar</code> switch. By using Open SYCL's backends, also AMD GPUs are supported. Intel provides the Open Source <a href="https://github.com/oneapi-src/oneDPL">oneDPL</a> (<em>oneAPI DPC++ Library</em>) which <a href="https://oneapi-src.github.io/oneDPL/parallel_api_main.html">implements pSTL algorithms</a> through the DPC++ compiler (see also <em>C++ Standard Parallelism for Intel GPUs</em>). DPC++ has <a href="https://intel.github.io/llvm-docs/GetStartedGuide.html#build-dpc-toolchain-with-support-for-hip-amd">experimental support for AMD GPUs</a>.</span><span class="citation" data-cites="rocstdpar opensyclproceedings onedpl">[6,17,28]</span><a href="#compat-table" class="back" title="Back to table">↺</a></li>
<li id="desc-amdstandardfortran"><span class="number">27:</span> <span class="description">There is no (known) way to launch Standard-based parallel algorithms in Fortran on AMD GPUs.</span><a href="#compat-table" class="back" title="Back to table">↺</a></li>
<li id="desc-amdkokkosc"><span class="number">28:</span> <span class="description"><a href="https://github.com/kokkos/kokkos">Kokkos</a> supports AMD GPUs in C++ mainly through the HIP/ROCm backend. Also, an OpenMP offloading backend is available.</span><span class="citation" data-cites="kokkos">[18]</span><a href="#compat-table" class="back" title="Back to table">↺</a></li>
<li id="desc-amdalpakac"><span class="number">29:</span> <span class="description"><a href="https://github.com/alpaka-group/alpaka">Alpaka</a> supports AMD GPUs in C++ through HIP or through an OpenMP backend.</span><span class="citation" data-cites="alpaka">[19]</span><a href="#compat-table" class="back" title="Back to table">↺</a></li>
<li id="desc-amdpython"><span class="number">30:</span> <span class="description">AMD does not officially support GPU programming with Python, but third-party solutions are available. <a href="https://docs.cupy.dev/en/latest/install.html#using-cupy-on-amd-gpu-experimental">CuPy</a> experimentally supports AMD GPUs/ROCm. The package can be found on PyPI as <code>cupy-rocm-5-0</code>. Numba once had <a href="https://numba.pydata.org/numba-doc/latest/roc/index.html">support for AMD GPUs</a>, but it is <a href="https://numba.readthedocs.io/en/stable/release-notes.html#version-0-54-0-19-august-2021">not maintained anymore</a>. Low-level bindings from Python to HIP exist, for example <a href="https://github.com/jatinx/PyHIP">PyHIP</a> (available as <code>pyhip-interface</code> on PyPI). Bindings to OpenCL also exist (<a href="https://documen.tician.de/pyopencl/">PyOpenCL</a>).</span><span class="citation" data-cites="cudapython">[20]</span><a href="#compat-table" class="back" title="Back to table">↺</a></li>
<li id="desc-intelcudac"><span class="number">31:</span> <span class="description">Intel itself does not support CUDA C/C++ on their GPUs. They offer <a href='https://github.com/oneapi-src/SYCLomatic'>SYCLomatic</a>, though, an Open Source tool to translate CUDA code to SYCL code, allowing it to run on Intel GPUs. The commercial variant of SYCLomatic is called the <a href='https://www.intel.com/content/www/us/en/developer/tools/oneapi/dpc-compatibility-tool.html'>DPC++ Compatibility Tool</a> and bundled with oneAPI toolkit. The community project <a href='https://github.com/CHIP-SPV/chipStar'>chipStar</a> (previously called CHIP-SPV, recently released a 1.0 version) allows to target Intel GPUs from CUDA C/C++ code by using the CUDA support in Clang. chipStar delivers a <a href='https://github.com/CHIP-SPV/chipStar/blob/main/docs/Using.md#compiling-cuda-application-directly-with-chipstar'>Clang-wrapper, <code>cuspv</code></a>, which replaces calls to <code>nvcc</code>. Also <a href='https://github.com/vosen/ZLUDA'>ZLUDA</a> exists, which implements CUDA support for Intel GPUs; it is not maintained anymore, though.</span><span class="citation" data-cites="syclomatic chipstar oneapi">[29–31]</span><a href="#compat-table" class="back" title="Back to table">↺</a></li>
<li id="desc-intelcudafortran"><span class="number">32:</span> <span class="description">No direct support exists for CUDA Fortran on Intel GPUs. A simple example to bind SYCL to a (CUDA) Fortran program (via ISO C BINDING) can be <a href='https://github.com/codeplaysoftware/SYCL-For-CUDA-Examples/tree/master/examples/fortran_interface'>found on GitHub</a>.</span><a href="#compat-table" class="back" title="Back to table">↺</a></li>
<li id="desc-intelhipc"><span class="number">33:</span> <span class="description">No native support for HIP C++ on Intel GPUs exists. The Open Source third-party project <a href="https://github.com/CHIP-SPV/chipStar">chipStar</a> (previously called CHIP-SPV), though, supports <a href="https://github.com/CHIP-SPV/chipStar/blob/main/docs/Using.md#compiling-a-hip-application-using-chipstar">HIP on Intel GPUs</a> by mapping it to OpenCL or Intel's Level Zero runtime. The compiler uses an LLVM-based toolchain and relies on its HIP and SPIR-V functionality.</span><span class="citation" data-cites="chipstar">[30]</span><a href="#compat-table" class="back" title="Back to table">↺</a></li>
<li id="desc-intelhipfortran"><span class="number">34:</span> <span class="description">HIP for Fortran does not exist, and also no translation efforts for Intel GPUs.</span><a href="#compat-table" class="back" title="Back to table">↺</a></li>
<li id="desc-intelsyclc"><span class="number">35:</span> <span class="description"><a href="https://www.khronos.org/sycl/">SYCL</a> is a C++17-based standard and selected by Intel as the prime programming model for Intel GPUs. Intel implements SYCL support for their GPUs <a href="https://github.com/intel/llvm">via DPC++</a>, an LLVM-based compiler toolchain. Currently, Intel maintains an own fork of LLVM, but <a href="https://lists.llvm.org/pipermail/cfe-dev/2019-January/060811.html">plans to upstream the changes</a> to the main LLVM repository. Based on DPC++, Intel releases a <a href="https://www.intel.com/content/www/us/en/developer/tools/oneapi/dpc-compiler.html">commercial <em>Intel oneAPI DPC++</em> compiler</a> as part of the <a href="https://www.intel.com/content/www/us/en/developer/tools/oneapi/toolkits.html">oneAPI toolkit</a>. The third-party project Open SYCL also supports Intel GPUs, by leveraging/creating LLVM support (either SPIR-V or Level Zero). A previous solution for targeting Intel GPUs from SYCL was <a href="https://developer.codeplay.com/products/computecpp/ce/home/">ComputeCpp of CodePlay</a>. The project became unsupported in September 2023 (in favor of implementations to the DPC++ project).</span><span class="citation" data-cites="intelllvm oneapi opensyclproceedings">[5,6,31]</span><a href="#compat-table" class="back" title="Back to table">↺</a></li>
<li id="desc-intelopenaccc"><span class="number">36:</span> <span class="description">No direct support for OpenACC C/C++ is available for Intel GPUs. Intel offers a Python-based tool to translate source files with OpenACC C/C++ to OpenMP C/C++, the <a href="https://github.com/intel/intel-application-migration-tool-for-openacc-to-openmp"><em>Application Migration Tool for OpenACC to OpenMP API</em></a>.</span><span class="citation" data-cites="acc2mp">[32]</span><a href="#compat-table" class="back" title="Back to table">↺</a></li>
<li id="desc-intelopenaccfortran"><span class="number">37:</span> <span class="description">Also for OpenACC Fortran, no direct support is available for Intel GPUs. Intel's <a href="https://github.com/intel/intel-application-migration-tool-for-openacc-to-openmp">source-to-source translation tool from OpenACC to OpenMP</a> also supports Fortran, though.</span><span class="citation" data-cites="acc2mp">[32]</span><a href="#compat-table" class="back" title="Back to table">↺</a></li>
<li id="desc-intelopenmpc"><span class="number">38:</span> <span class="description">OpenMP is a second key programming model for Intel GPUs and <a href="https://www.intel.com/content/www/us/en/develop/documentation/get-started-with-cpp-fortran-compiler-openmp/top.html">well-supported by Intel</a>. For C++, the support is built into the commercial version of DPC++/C++, <em>Intel oneAPI DPC++/C++</em>. All <a href="https://www.intel.com/content/www/us/en/developer/articles/technical/openmp-features-and-extensions-supported-in-icx.html">OpenMP 4.5 and most OpenMP 5.0 and 5.1 features are supported</a>. OpenMP can be enabled through the <code>-qopenmp</code> compiler option of <code>icpx</code>; a suitable offloading target can be given via <code>-fopenmp-targets=spir64</code>.</span><span class="citation" data-cites="oneapi">[31]</span><a href="#compat-table" class="back" title="Back to table">↺</a></li>
<li id="desc-intelopenmpfortran"><span class="number">39:</span> <span class="description">OpenMP in Fortran is Intel's main selected route to bring Fortran applications to their GPUs. OpenMP offloading in Fortran is supported through <a href="https://www.intel.com/content/www/us/en/docs/fortran-compiler/developer-guide-reference/2023-2/overview.html">Intel's Fortran Compiler <code>ifx</code></a> (the new LLVM-based version, not the <em>Fortran Compiler Classic</em>), part of the oneAPI HPC Toolkit. Similarly to C++, OpenMP offloading can be enabled through a combination of <code>-qopenmp</code> and <code>-fopenmp-targets=spir64</code>.</span><span class="citation" data-cites="oneapi">[31]</span><a href="#compat-table" class="back" title="Back to table">↺</a></li>
<li id="desc-intelstandardc"><span class="number">40:</span> <span class="description">Intel supports C++ standard parallelism (<em>pSTL</em>) through the Open Source <a href="https://oneapi-src.github.io/oneDPL/index.html">oneDPL</a> (oneAPI DPC++ Library), also available as part of the oneAPI toolkit. It <a href="https://oneapi-src.github.io/oneDPL/parallel_api_main.html">implements the pSTL</a> on top of the DPC++ compiler, algorithms, data structures, and policies live in the <code>oneapi::dpl::</code> namespace. In addition, <a href="https://github.com/OpenSYCL/OpenSYCL/pull/1088">Open SYCL is current adding support for C++ parallel algorithms</a>, to be enabled via the <code>--hipsycl-stdpar</code> compiler option.</span><span class="citation" data-cites="onedpl">[17]</span><a href="#compat-table" class="back" title="Back to table">↺</a></li>
<li id="desc-intelstandardfortran"><span class="number">41:</span> <span class="description">Standard language parallelism of Fortran is supported by Intel on their GPUs through the Intel Fortran Compiler <code>ifx</code> (the new, LLVM-based compiler, not the <em>Classic</em> version), part of the oneAPI HPC toolkit. In the <a href="https://www.intel.com/content/www/us/en/developer/articles/release-notes/fortran-compiler-release-notes.html">oneAPI update 2022.1</a>, the <a href="https://www.intel.com/content/www/us/en/docs/fortran-compiler/developer-guide-reference/2023-2/do-concurrent.html"><code>do concurrent</code> support</a> was added and extended in further releases. It can be used via the <code>-qopenmp</code> compiler option together with <code>-fopenmp-target-do-concurrent</code> and <code>-fopenmp-targets=spir64</code>.</span><span class="citation" data-cites="oneapi">[31]</span><a href="#compat-table" class="back" title="Back to table">↺</a></li>
<li id="desc-intelkokkosc"><span class="number">42:</span> <span class="description">No direct support by Intel for Kokkos is available, but <a href="https://kokkos.github.io/kokkos-core-wiki/">Kokkos</a> supports Intel GPUs through an experimental SYCL backend.</span><span class="citation" data-cites="kokkos">[18]</span><a href="#compat-table" class="back" title="Back to table">↺</a></li>
<li id="desc-intelalpakac"><span class="number">43:</span> <span class="description">Since <a href="https://github.com/alpaka-group/alpaka/releases/tag/0.9.0">v.0.9.0</a>, <a href="https://github.com/alpaka-group/alpaka">Alpaka</a> contains experimental SYCL support with which Intel GPUs can be targeted. Also, Alpaka can fall back to an OpenMP backend.</span><a href="#compat-table" class="back" title="Back to table">↺</a></li>
<li id="desc-intelpython"><span class="number">44:</span> <span class="description">Intel GPUs can be used from Python through three notable packages. First, Intel's <a href="https://github.com/IntelPython/dpctl"><em>Data Parallel Control</em> (dpctl)</a> implements low-level Python bindings to SYCL functionality. It is available on PyPI as <a href="https://pypi.org/project/dpctl/"><code>dpctl</code></a>. Second, a higher level, Intel's <a href="https://github.com/IntelPython/numba-dpex"><em>Data-parallel Extension to Numba</em> (numba-dpex)</a> supplies an extension to the JIT functionality of Numba to support Intel GPUs. It is available from Anaconda as <a href="https://anaconda.org/intel/numba-dpex"><code>numba-dpex</code></a>. Finally, and arguably highest level, Intel's <a href="https://github.com/IntelPython/dpnp"><em>Data Parallel Extension for Numpy</em> (dpnp)</a> builds up on the Numpy API and extends some functions with Intel GPU support. It is available on PyPI as <a href="https://pypi.org/project/dpnp/"><code>dpnp</code></a>, although latest versions appear to be available <a href="https://github.com/IntelPython/dpnp/releases">only on GitHub</a>.</span><span class="citation" data-cites="dpctl numba-dpex dpnp">[33–35]</span><a href="#compat-table" class="back" title="Back to table">↺</a></li>
</ul>
</section>
<section id="references">
<div id="refs" class="references csl-bib-body hanging-indent" data-entry-spacing="0" data-line-spacing="2" role="list">
<div id="ref-CUDA" class="csl-entry" role="listitem">
1. NVIDIA. (2023). <em>CUDA toolkit</em>. <a href="https://developer.nvidia.com/cuda-toolkit">https://developer.nvidia.com/cuda-toolkit</a>
</div>
<div id="ref-CUDAFortran" class="csl-entry" role="listitem">
2. NVIDIA. (2023). <em>CUDA fortran</em>. <a href="https://developer.nvidia.com/cuda-fortran">https://developer.nvidia.com/cuda-fortran</a>
</div>
<div id="ref-HIP" class="csl-entry" role="listitem">
3. AMD. (2023). <em>HIP</em>. <a href="https://rocm.docs.amd.com/projects/HIP/en/latest/">https://rocm.docs.amd.com/projects/HIP/en/latest/</a>
</div>
<div id="ref-hipfort" class="csl-entry" role="listitem">
4. AMD. (2023). <em>Hipfort</em>. <a href="https://rocm.docs.amd.com/projects/hipfort/en/latest/">https://rocm.docs.amd.com/projects/hipfort/en/latest/</a>
</div>
<div id="ref-intelllvm" class="csl-entry" role="listitem">
5. Intel, & Contributors. (2023). <em>oneAPI DPC++ compiler</em>. <a href="https://github.com/intel/llvm">https://github.com/intel/llvm</a>
</div>
<div id="ref-opensyclproceedings" class="csl-entry" role="listitem">
6. Alpay, A., Soproni, B., Wünsche, H., & Heuveline, V. (2022, May). Exploring the possibility of a <span class="nocase">hipSYCL</span>-based implementation of <span class="nocase">oneAPI</span>. <em>International Workshop on <span>OpenCL</span></em>. <a href="https://doi.org/10.1145/3529538.3530005">https://doi.org/10.1145/3529538.3530005</a>
</div>
<div id="ref-khronossycl" class="csl-entry" role="listitem">
7. Group, K. (2023). <em>SYCL</em>. <a href="https://www.khronos.org/sycl/">https://www.khronos.org/sycl/</a>
</div>
<div id="ref-nvhpc" class="csl-entry" role="listitem">
8. NVIDIA. (2023). <em>NVIDIA HPC SDK</em>. <a href="https://developer.nvidia.com/hpc-sdk">https://developer.nvidia.com/hpc-sdk</a>
</div>
<div id="ref-gccopenacc" class="csl-entry" role="listitem">
9. GCC. (2023). <em>GCC OpenACC</em>. <a href="https://gcc.gnu.org/wiki/OpenACC">https://gcc.gnu.org/wiki/OpenACC</a>
</div>
<div id="ref-claccieee" class="csl-entry" role="listitem">
10. Denny, J. E., Lee, S., & Vetter, J. S. (2018). CLACC: Translating OpenACC to OpenMP in clang. <em>2018 IEEE/ACM 5th Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC)</em>, 18–29. <a href="https://doi.org/10.1109/LLVM-HPC.2018.8639349">https://doi.org/10.1109/LLVM-HPC.2018.8639349</a>
</div>
<div id="ref-jarmusch22" class="csl-entry" role="listitem">
11. Jarmusch, A., Liu, A., Munley, C., Horta, D., Ravichandran, V., Denny, J., Friedline, K., & Chandrasekaran, S. (2022). Analysis of validating and verifying OpenACC compilers 3.0 and above. <em>2022 Workshop on Accelerator Programming Using Directives (WACCPD)</em>, 1–10. <a href="https://doi.org/10.1109/WACCPD56842.2022.00006">https://doi.org/10.1109/WACCPD56842.2022.00006</a>
</div>
<div id="ref-flaccieee" class="csl-entry" role="listitem">
12. Clement, V., & Vetter, J. S. (2021). Flacc: Towards OpenACC support for fortran in the LLVM ecosystem. <em>2021 IEEE/ACM 7th Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC)</em>, 12–19. <a href="https://doi.org/10.1109/LLVMHPC54804.2021.00007">https://doi.org/10.1109/LLVMHPC54804.2021.00007</a>
</div>
<div id="ref-gccopenmp" class="csl-entry" role="listitem">
13. Developers, G. (2023). <em>GCC OpenMP</em>. <a href="https://gcc.gnu.org/wiki/openmp">https://gcc.gnu.org/wiki/openmp</a>
</div>
<div id="ref-clangopenmp" class="csl-entry" role="listitem">
14. Developers, L. (2023). <em>Clang OpenMP</em>. <a href="https://clang.llvm.org/docs/OpenMPSupport.html">https://clang.llvm.org/docs/OpenMPSupport.html</a>
</div>
<div id="ref-hpepe" class="csl-entry" role="listitem">
15. HPE. (2023). <em>HPE cray programming environment</em>. <a href="https://www.hpe.com/psnow/doc/a50002303enw">https://www.hpe.com/psnow/doc/a50002303enw</a>
</div>
<div id="ref-flang" class="csl-entry" role="listitem">
16. LLVM/Flang. (2023). <em>Flang</em>. <a href="https://flang.llvm.org/">https://flang.llvm.org/</a>
</div>
<div id="ref-onedpl" class="csl-entry" role="listitem">
17. Intel. (2023). <em>oneDPL</em>. <a href="https://oneapi-src.github.io/oneDPL/index.html">https://oneapi-src.github.io/oneDPL/index.html</a>
</div>
<div id="ref-kokkos" class="csl-entry" role="listitem">
18. Trott, C. R., Lebrun-Grandié, D., Arndt, D., Ciesko, J., Dang, V., Ellingwood, N., Gayatri, R., Harvey, E., Hollman, D. S., Ibanez, D., Liber, N., Madsen, J., Miles, J., Poliakoff, D., Powell, A., Rajamanickam, S., Simberg, M., Sunderland, D., Turcksin, B., & Wilke, J. (2022). Kokkos 3: Programming model extensions for the exascale era. <em>IEEE Transactions on Parallel and Distributed Systems</em>, <em>33</em>(4), 805–817. <a href="https://doi.org/10.1109/TPDS.2021.3097283">https://doi.org/10.1109/TPDS.2021.3097283</a>
</div>
<div id="ref-alpaka" class="csl-entry" role="listitem">
19. Matthes, A., Widera, R., Zenker, E., Worpitz, B., Huebl, A., & Bussmann, M. (2017, June 30). <em>Tuning and optimization for a variety of many-core architectures without changing a single line of implementation code using the alpaka library</em>. <a href="https://arxiv.org/abs/1706.10086">https://arxiv.org/abs/1706.10086</a>
</div>
<div id="ref-cudapython" class="csl-entry" role="listitem">
20. NVIDIA. (2023). <em>CUDA python</em>. <a href="https://nvidia.github.io/cuda-python/index.html">https://nvidia.github.io/cuda-python/index.html</a>
</div>
<div id="ref-pycuda" class="csl-entry" role="listitem">
21. Kloeckner, A., Wohlgemuth, G., Lee, G., Rybak, T., Nitz, A., Chiang, D., Seibert, S., Bergtholdt, M., Unterthiner, T., Markall, G., Kotak, M., Favre-Nicolin, V., Opanchuk, B., Merry, B., Pinto, N., Milo, F., Collignon, T., Rathgeber, F., Perkins, S., … Gohlke, C. (2023). <em>PyCUDA</em> (Version v2022.2.2) [Computer software]. Zenodo. <a href="https://doi.org/10.5281/zenodo.8121901">https://doi.org/10.5281/zenodo.8121901</a>
</div>
<div id="ref-cupy" class="csl-entry" role="listitem">
22. Okuta, R., Unno, Y., Nishino, D., Hido, S., & Loomis, C. (2017). CuPy: A NumPy-compatible library for NVIDIA GPU calculations. <em>Proceedings of Workshop on Machine Learning Systems (LearningSys) in the Thirty-First Annual Conference on Neural Information Processing Systems (NIPS)</em>. <a href="http://learningsys.org/nips17/assets/papers/paper_16.pdf">http://learningsys.org/nips17/assets/papers/paper_16.pdf</a>
</div>
<div id="ref-numba" class="csl-entry" role="listitem">
23. Lam, S. K., stuartarchibald, Pitrou, A., Florisson, M., Seibert, S., Markall, G., esc, Anderson, T. A., Leobas, G., rjenc29, Collison, M., luk-f-a, Bourque, J., Meurer, A., Kaustubh, Oliphant, T. E., Riasanovsky, N., Wang, M., densmirn, … Turner-Trauring, I. (2023). <em>Numba/numba: Version 0.57.1</em> (Version 0.57.1) [Computer software]. Zenodo. <a href="https://doi.org/10.5281/zenodo.8087361">https://doi.org/10.5281/zenodo.8087361</a>
</div>
<div id="ref-cunumeric" class="csl-entry" role="listitem">
24. NVIDIA. (2023). <em>cuNumeric</em>. <a href="https://developer.nvidia.com/cunumeric">https://developer.nvidia.com/cunumeric</a>
</div>
<div id="ref-gpufort" class="csl-entry" role="listitem">
25. AMD. (2023). <em>GPUFORT</em>. <a href="https://github.com/ROCmSoftwarePlatform/gpufort">https://github.com/ROCmSoftwarePlatform/gpufort</a>
</div>
<div id="ref-aomp" class="csl-entry" role="listitem">
26. AMD. (2023). <em>AOMP</em>. <a href="https://github.com/ROCm-Developer-Tools/aomp">https://github.com/ROCm-Developer-Tools/aomp</a>
</div>
<div id="ref-ecpopenmpbof" class="csl-entry" role="listitem">
27. Project, E. E. C. (2022). <em>OpenMP roadmap for accelerators across DOE pre-exascale/exascale machines</em>. <a href="https://www.openmp.org/wp-content/uploads/2022_ECP_Community_BoF_Days-OpenMP_RoadMap_BoF.pdf">https://www.openmp.org/wp-content/uploads/2022_ECP_Community_BoF_Days-OpenMP_RoadMap_BoF.pdf</a>
</div>
<div id="ref-rocstdpar" class="csl-entry" role="listitem">
28. AMD. (2023). <em>Roc-stdpar</em>. <a href="https://github.com/ROCmSoftwarePlatform/roc-stdpar">https://github.com/ROCmSoftwarePlatform/roc-stdpar</a>
</div>
<div id="ref-syclomatic" class="csl-entry" role="listitem">
29. Intel. (2023). <em>SYCLomatic</em>. <a href="https://github.com/oneapi-src/SYCLomatic">https://github.com/oneapi-src/SYCLomatic</a>
</div>
<div id="ref-chipstar" class="csl-entry" role="listitem">
30. Zhao, J., Bertoni, C., Young, J., Harms, K., Sarkar, V., & Videau, B. (2023). HIPLZ: Enabling performance portability for exascale systems. In J. Singer, Y. Elkhatib, D. Blanco Heras, P. Diehl, N. Brown, & A. Ilic (Eds.), <em>Euro-par 2022: Parallel processing workshops</em> (pp. 197–210). Springer Nature Switzerland.
</div>
<div id="ref-oneapi" class="csl-entry" role="listitem">
31. Intel. (2023). <em>oneAPI</em>. <a href="https://www.intel.com/content/www/us/en/developer/tools/oneapi/toolkits.html">https://www.intel.com/content/www/us/en/developer/tools/oneapi/toolkits.html</a>
</div>
<div id="ref-acc2mp" class="csl-entry" role="listitem">
32. Intel. (2023). <em>Intel application migration tool for OpenACC to OpenMP API</em>. <a href="https://github.com/intel/intel-application-migration-tool-for-openacc-to-openmp">https://github.com/intel/intel-application-migration-tool-for-openacc-to-openmp</a>
</div>
<div id="ref-dpctl" class="csl-entry" role="listitem">
33. Intel. (2023). <em>Data parallel control</em>. <a href="https://github.com/IntelPython/dpctl">https://github.com/IntelPython/dpctl</a>
</div>
<div id="ref-numba-dpex" class="csl-entry" role="listitem">
34. Intel. (2023). <em>Data-parallel extension to numba</em>. <a href="https://github.com/IntelPython/numba-dpex">https://github.com/IntelPython/numba-dpex</a>
</div>
<div id="ref-dpnp" class="csl-entry" role="listitem">
35. Intel. (2023). <em>Data parallel extension for numpy</em>. <a href="https://github.com/IntelPython/dpnp">https://github.com/IntelPython/dpnp</a>
</div>
</div>
</section> <!-- insert_here -->
</div>
</body>
</html>