This example illustrates the use of the hipBLAS Level 3 Strided Batched General Matrix Multiplication. The hipBLAS GEMM STRIDED BATCHED performs a matrix--matrix operation for a batch of matrices as:
for each
-
$f(X) = X$ or -
$f(X) = X^T$ (transpose$X$ :$X_{ij}^T = X_{ji}$ ) or -
$f(X) = X^H$ (Hermitian$X$ :$X_{ij}^H = \bar X_{ji} $ ).
- Read in command-line parameters.
- Set
$f$ operation, set sizes of matrices and get batch count. - Allocate and initialize the host matrices. Set up
$B$ matrix as an identity matrix. - Initialize gold standard matrix.
- Compute CPU reference result with strided batched subvectors.
- Allocate device memory.
- Copy data from host to device.
- Create a hipBLAS handle.
- Invoke the hipBLAS GEMM STRIDED BATCHED function.
- Copy the result from device to host.
- Destroy the hipBLAS handle, release device memory.
- Validate the output by comparing it to the CPU reference result.
The application provides the following optional command line arguments:
-
-a
or--alpha
. The scalar value$\alpha$ used in the GEMM operation. Its default value is 1. -
-b
or--beta
. The scalar value$\beta$ used in the GEMM operation. Its default value is 1. -
-c
or--count
. Batch count. Its default value is 3. -
-m
or--m
. The number of rows of matrices$f(A)$ and$C$ , which must be greater than 0. Its default value is 5. -
-n
or--n
. The number of columns of matrices$f(B)$ and$C$ , which must be greater than 0. Its default value is 5. -
-k
or--k
. The number of columns of matrix$f(A)$ and rows of matrix$f(B)$ , which must be greater than 0. Its default value is 5.
-
The performance of a numerical multi-linear algebra code can be heavily increased by using tensor contractions [ Y. Shi et al., HiPC, pp 193, 2016. ], thereby most of the hipBLAS functions have a
_batched
and a_strided_batched
[ C. Jhurani and P. Mullowney, JPDP Vol 75, pp 133, 2015. ] extensions.
We can apply the same multiplication operator for several matrices if we combine them into batched matrices. Batched matrix multiplication has a performance improvement for a large number of small matrices. For a constant stride between matrices, further acceleration is available by strided batched GEMM. -
hipBLAS is initialized by calling
hipblasCreate(hipblasHandle*)
and it is terminated by callinghipblasDestroy(hipblasHandle)
. -
The pointer mode controls whether scalar parameters must be allocated on the host (
HIPBLAS_POINTER_MODE_HOST
) or on the device (HIPBLAS_POINTER_MODE_DEVICE
). It is controlled byhipblasSetPointerMode
. -
The
$f$ operator -- defined in Description section -- can be-
HIPBLAS_OP_N
: identity operator ($f(X) = X$ ), -
HIPBLAS_OP_T
: transpose operator ($f(X) = X^T$ ) or -
HIPBLAS_OP_C
: Hermitian (conjugate transpose) operator ($f(X) = X^H$ ).
-
-
hipblasStride
strides between matrices or vectors in strided_batched functions. -
hipblas[HSDCZ]gemmStridedBatched
Depending on the character matched in
[HSDCZ]
, the norm can be obtained with different precisions:-
H
(half-precision:hipblasHalf
) -
S
(single-precision:float
) -
D
(double-precision:double
) -
C
(single-precision complex:hipblasComplex
) -
Z
(double-precision complex:hipblasDoubleComplex
).
Input parameters for
hipblasSgemmStridedBatched
:hipblasHandle_t handle
-
hipblasOperation_t trans_a
: transformation operator on each$A_i$ matrix -
hipblasOperation_t trans_b
: transformation operator on each$B_i$ matrix -
int m
: number of rows in each$f(A_i)$ and$C$ matrices -
int n
: number of columns in each$f(B_i)$ and$C$ matrices -
int k
: number of columns in each$f(A_i)$ matrix and number of rows in each$f(B_i)$ matrix -
const float *alpha
: scalar multiplier of each$C_i$ matrix addition -
const float *A
: pointer to the each$A_i$ matrix -
int lda
: leading dimension of each$A_i$ matrix -
long long stride_a
: stride size for each$A_i$ matrix -
const float *B
: pointer to each$B_i$ matrix -
int ldb
: leading dimension of each$B_i$ matrix -
const float *beta
: scalar multiplier of the$B \cdot C$ matrix product -
long long stride_b
: stride size for each$B_i$ matrix -
float *C
: pointer to each$C_i$ matrix -
int ldc
: leading dimension of each$C_i$ matrix -
long long stride_c
: stride size for each$C_i$ matrix -
int batch_count
: number of matrices
Return value:
hipblasStatus_t
-
hipblasCreate
hipblasDestroy
hipblasHandle_t
hipblasSgemmStridedBatched
hipblasOperation_t
hipblasStride
hipblasSetPointerMode
HIPBLAS_OP_N
HIPBLAS_POINTER_MODE_HOST
hipFree
hipMalloc
hipMemcpy
hipMemcpyDeviceToHost
hipMemcpyHostToDevice