Support for AMD backend is implemented via SYCL HIP backend. The feature is
disabled by default. Users must enable it at build time with a CMake option
DNNL_GPU_VENDOR=AMD
. The AMD GPUs can be used via oneDNN engine abstraction.
The engine should be created using dnnl::engine::kind::gpu
engine kind or the
user can provide a sycl::device
objects that corresponds to AMD GPUs.
- oneAPI DPC++ Compiler with support for HIP AMD, version 2022-12
- AMD ROCm, version 5.3 or newer
- MIOpen, version 2.18 or newer (optional if AMD ROCm includes the required version of MIOpen)
- rocBLAS, version 2.45.0 or newer (optional if AMD ROCm includes the required version of rocBLAS)
export CC=/path/to/dpcpp/install/bin/clang
export CXX=/path/to/dpcpp/install/bin/clang++
mkdir build
cd build
cmake -DDNNL_CPU_RUNTIME=SYCL -DDNNL_GPU_RUNTIME=SYCL \
-DDNNL_GPU_VENDOR=AMD -G Ninja ..
If you have AMD ROCm, MIOpen or rocBLAS installed in non-standard locations or you want to use MIOpen or rocBLAS that is not part of the AMD ROCm package then the following CMake and environment variables can be used to specify their location:
MIOPENROOT
HIPROOT
ROCBLASROOT
Both buffer-based and USM-based oneDNN APIs are supported for AMD backend.
The following table documents the supported data types. In generic this is for all primitives, but primitive wise which datatypes are supported are mentioned under each primitive.
Data Type | Computation Mode |
---|---|
f32 | Training, Inference |
f16 | Inference |
s8 | Inference (when applicable) |
bf16 | Training, Inference (when applicable) |
The AMD backend cannot provide all functionalities supported by oneDNN primitives. because MIOpen and rocBLAS lack some features. The detailed limitations of each MIOpen and rocBLAS based primitive are explained below.
The miopenOpTensor
is the equivalent of oneDNN binary primitive.
- Supported data types are
f32
,f16
,s32
. - Datatypes of
SRC0
,SRC1
andDST
should be the same. - Supported formats are
NCDHW
,NCHW
,NCW
,NC
,N
. - Blocked formats are not supported.
- Only
scales
attribute is supported. - Post-ops are not supported.
- Supported algorithms are
binary_add
,binary_mul
,binary_min
,binary_max
.
The miopenConvolutionForwardImmediate
is used to compute forward.
The miopenConvolutionBackwardDataImmediate
and miopenConvolutionBackwardWeightsImmediate
are used to compute backward by data and backward by weights respectively.
The implementation supports both Forward and Backward directions:
- Supported data types combinations:
Source Weights Destination Bias f32 f32 f32 f32 f16 f16 f16 f16 s8 s8 s8 Not supported bf16 bf16 bf16 Not supported - Supported formats:
NCDHW
,NCHW
,NCW
(with bias) andNDHWC
,NHWC
,NWC
(without bias) - Supported post-ops:
eltwise
(eltwise_relu
,eltwise_tanh
,eltwise_elu
,eltwise_logistic
) andsum
- Supported attributes:
scales
- Supported algorithms :
winograd
,direct
- Supported data types combinations:
Source Weights Destination Bias f32 f32 f32 f32 bf16 bf16 bf16 bf16 - Supported formats:
NCDHW
,NCHW
,NCW
(with bias) andNDHWC
,NHWC
,NWC
(without bias) - Supported algorithms :
winograd
,direct
- Source, weights and destination tensors must have the same format
- Post-op sum scale with non-zero fractional part can lead to incorrect results
- Zero points are not supported
- Post-ops are implementated via separate operations
- Bias addition is implemented with
miopenOpTensor
- Deconvolution primitive is implemented through the convolution with swapped input and output channels.
- Post-ops are not supported.
The implementation supports both forward and backward directions. The
miopenCreateActivationDescriptor
and miopenSetActivationDescriptor
are used
to create the activation descriptor. And the miopenActivationForward
and
miopenActivationBackward
are used for the execution.
- Supported formats:
NCDHW
,NDHWC
,NCHW
,NHWC
,NCW
,NWC
,NC
,N
- Supported algorithms:
relu
,tanh
,elu
,soft_relu
,abs
,logistic
. - soft_relu is only supported with
alpha = 1
. - Supported data types are
f32
andf16
. - Post-ops are not supported.
- Supported algorithms:
relu
andsoft_relu
. - soft_relu is only supported with
alpha = 1
. - Supported data types are
f32
.
The implementation supports both forward and backward directions. The primitive
was implemented using miopenSoftmaxForward_V2
and miopenSoftmaxBackward_V2
.
- Supported formats:
NCDHW
,NDHWC
,NCHW
,NHWC
,NCW
,NWC
,NC
. - Only
axis = 1
is supported.
- Supported data types:
f32
andf16
. - Post-ops are not supported.
- Supported data types:
f32
.
The implementation supports both forward and backward directions.
The miopenCreateLRNDescriptor
and miopenSetLRNDescriptor
are used to set the LRN desriptor.
The miopenLRNForward
and miopenLRNBackward
are used for the execution.
- Supported formats:
NCHW
,NHWC
,NCW
,NWC
,NC
.
- Supported data types:
f32
,f16
. - Supported algorithms:
lrn_across_channels
,lrn_within_channel
. lrn_within_channel
supports only 2D spatial cases.- Post-ops are not supported.
- Supported data types:
f32
. - Supported algorithms:
lrn_across_channels
,lrn_within_channel
. lrn_within_channels
supports only 2D spatial cases.
The Pooling primitive in the AMD backend is implemented with the following API's:
miopenCreatePoolingDescriptor
,miopenSetNdPoolingDescriptor
,miopenSetPoolingIndexType
, andmiopenSetPoolingWorkSpaceIndexMode
are used to set the pooling descriptormiopenPoolingGetWorkSpaceSizeV2
is used for getting a work space size.miopenPoolingForward
andmiopenPoolingBackward
are used for the execution.
- Supported datatype for forward Training
f32
. - Supported datatypes for forward Inference
f32
,f16
. - Only 1D, 2D and 3D pooling is supported.
- Only
NCDHW
,NCHW
,NCW
formats are supported. - Supported algorithms are
pooling_max
,pooling_avg_include_padding
,pooling_avg_exclude_padding
. - Post-ops are not supported.
- Supported datatypes are
f32
. - Only 1D, 2D and 3D pooling is supported.
- Only
NCDHW
,NCHW
,NCW
formats are supported. - Supported algorithms are
pooling_max
,pooling_avg_include_padding
,pooling_avg_exclude_padding
.
The Reduction primitive is implemented with the following API's:
-
miopenCreateReduceTensorDescriptor
andmiopenSetReduceTensorDescriptor
are used to set the reduction tensor descriptor -
miopenGetReductionWorkspaceSize
is used for getting a workspace size. -
miopenReduceTensor
is used for execution -
Supported datatypes are
f32
,f16
. -
Only
NCDHW
,NCHW
,NCW
,NC
,N
formats are supported. -
Supported algorithms are
reduction_max
,reduction_min
,reduction_sum
,reduction_mul
,reduction_mean
,reduction_norm_lp_sum
,reduction_norm_lp_power_p_sum
-
reduction_norm_lp_sum
algorithm supported only for thep
value 2 -
reduction_norm_lp_power_p_sum
supported only for thep
value 1 -
Only
eps = 0
is supported. -
Post-opst are not supported.
The matrix multiplication primitive is implemented with rocblas_gemm_ex
and rocblas_gemm_strided_batched_ex
functions.
-
Supported data types are
f32
,f16
,bf16
ands8/s32
. -
Currently only below 5 combinations are supported:
Source Weights Destination Bias f32 f32 f32 f32 f16 f16 f16 f16 s8 s8 s32 s32 bf16 bf16 bf16 Not supported bf16 bf16 f32 f32 -
Blocked formats are not supported.
-
Zero points are not supported.
-
Scales are not supported.
-
Post-op
eltwise
witheltwise_relu
,eltwise_tanh
,eltwise_elu
,eltwise_logistic
is supported -
Post-op
sum
is supported. For s8 case (for the 3rd combination in above table), only scales without fractional part are supported. -
Source and weights broadcasting is supported in the batched case.
-
Only 1D, 2D, 3D supported.
-
Supported formats are
NCW
,NWC
,NC
,CN
,N
The inner product primitive is implemented with rocblas_gemm_ex
and
rocblas_gemm_strided_batched_ex
functions for forward, backward data and backward weight
and miopenReduceTensor
for backward bias. A function called gemm_consitency_check()
,
dense_check()
is used to see if the backend can be used for inner product.
reorder_check()
is used when reorder is required. miopenActivationForward
operation is
used for eltwise operation and miopenOpTensor
is used for bias operation. The
beta
parameter in gemm is used for the sum scale and alpha
parameter is used
for the output scale.
- Supported formats :
NCW
,NC
,CN
,N
- Supported data types are
f32
,f16
,bf16
ands8/s32
. - Currently only below combinations are supported:
Source Weights Destination Bias f32 f32 f32 f32 f16 f16 f16 f16 s8 s8 s32 s32 bf16 bf16 bf16 Not supported bf16 bf16 f32 f32 - Zero points support is not provided.
- Post-op eltwise with
eltwise_relu
,eltwise_tanh
,eltwise_elu
,eltwise_logistic
is supported - Post-op sum is supported. For s8 case(for third combination in above table), only integer sum scale values are supported
- Blocked formats are not supported.
-
Supported data types are
f32
,bf16
. -
Currently only below combinations are supported:
Propagation Source Weights Destination Bias Backward Data f32 f32 f32 Not supported bf16 bf16 bf16 Not supported Backward Weights f32 f32 f32 f32 bf16 bf16 bf16 Not supported -
Zero points are not supported.
-
Blocked formats are not supported.
The closest equivalent to oneDNN batch normalization can be
miopenBatchNormalizationForwardTraining
, miopenBatchNormalizationForwardInference
and miopenBatchNormalizationBackward
operations.
- When
global_stats
flag is set for batch normalization, the mean and variance are input only parameters. However, MIOpen does not have the option to accept the mean and variance as inputs in the forward training operation. Therefore,miopenBatchNormalizationForwardInference
is used to match the oneDNN feature. Although inference is not supported withoutglobal_stats
flags set. - The MIOpen precision is different from that of oneDNN for Batch Normalization.
(e.g
exp from oneDNN: 0.000629427 got from miopen: 0.000629831 diff:4.04136e-07 rdiff:0.000642069
) - The forward training with no flags accepts mean and variance as an output.
However, in MIOpen the mean and variance are running mean and variance
respectably so they are both input and output variable. Therefore, they are
required to have a sensible value (cannot be NaN). Since oneDNN will not set
value for the mean and variance when no flag is passed, the NaN can be
propagated as a result. To avoid NaN propagation,
hipMemsetD32Async
function is used to initialize the mean and variance with zero. - MIOpen requires the values for scale and shift. When shift and scale are
not defined in oneDNN,
hipMemsetD32Async
is used to initialize scale to 1 and shift to 0. - For performance reason in the backward pass, MIOpen requires the mean and inverse variance to be saved in the forward pass. Therefore, when AMD backend is used for batch normalization, the workspace must be provided to save the mean and inverse variance.
- When
dnnl_fuse_norm_relu
flag is set for batch normalization, themiopenActivationForward
operation is called immediately after the batch normalization, since MIOpen does not have a fused batch normalization withRELU
. The implementation of the elementwise post operations is the same. - When
dnnl_fuse_norm_relu
is used, the intermediate output of batch normalization, which is used as an input to the activation function, is saved in the workspace as well. This is required to compute the backward pass fordnnl_fuse_norm_relu
flag. - Forward pass supports
f32
,f16
. - Blocked Formats are not supported.
- Only
NCDHW
,NCHW
,NCW
,NC
formats are supported. - Elementwise post-op is supported only for eltwise_relu.
- MIOpen uses
alpha
andbeta
parameters to blend thedy
,shift
andscale
. Since oneDNN does not have this feature, thealpha
andbeta
values in the backward direction are set to 1 and 0 respectively to avoid blending. - AMD backend for backward direction requires the workspace as an input containing the mean and inverse variance computed in the forward pass.
- The AMD backend for oneDNN does not support the backward direction for
batch normalization when the flag is set to
global_stats
. - When
dnnl_fuse_norm_relu
flag is set, AMD backend requires the intermediate result of the batch normalization saved in the forward pass. This is used to compute the backward direction of the activation function used forRELU
. - Backward pass supports only
f32
data types. - Blocked formats are not supported.
- Only
NCDHW
,NCHW
,NCW
,NC
formats are supported.
The miopenTransform
function is the equivalent of oneDNN reorder function.
- Per dimension scaling is not supported (a single alpha and beta value is accepted by the transform tensor function).
- Supported data types:
f32
Some missing primitives/features are supported through generic SYCL kernels.