This example showcases the usage of the rocBLAS Level2 Hermitian rank-1 update functionality. Additionally, this example demonstrates the compatible memory layout of three different complex float types (hipFloatComplex
, std::complex<float>
, and rocblas_float_complex
). Vectors of complex numbers can be passed to rocBLAS simply by performing a call to hipMemcpy
and reinterpreting the respective pointers.
- Read in command-line parameters.
- Allocate and initialize the host vector and matrix.
- Compute CPU reference result.
- Create a rocBLAS handle.
- Allocate and initialize the device vector and matrix.
- Invoke the rocBLAS HER function.
- Copy the result from device to host.
- Destroy the rocBLAS handle, release device memory.
- Validate the output by comparing it to the CPU reference result.
The application provides the following optional command line arguments:
-
-a
or--alpha
. The scalar value$\alpha$ used in the HER operation. Its default value is 1. -
-x
or--incx
. The stride between consecutive values in the data array that makes up vector$x$ , which must be greater than 0. Its default value is 1. -
-n
or--n
. The number of elements in vectors$x$ and$y$ , which must be greater than 0. Its default value is 5.
-
rocBLAS is initialized by calling
rocblas_create_handle(rocblas_handle*)
and it is terminated by callingrocblas_destroy_handle(rocblas_handle)
. -
The pointer mode controls whether scalar parameters must be allocated on the host (
rocblas_pointer_mode_host
) or on the device (rocblas_pointer_mode_device
). It is controlled byrocblas_set_pointer_mode
. -
rocblas_[cz]her(handle, uplo, n, *alpha, *x, incx, *A, lda)
computes a Hermitian rank-1 update, defined as$A = A + \alpha \cdot x \cdot x ^ H$ , where$A$ is an$n \times n$ Hermitian matrix, and$x$ is a complex vector of$n$ elements. The character matched in[cz]
denotes the data type of the operation, and can either bec
(complex float:rocblas_complex_float
), orz
(complex double:rocblas_complex_double
). Because a Hermitian matrix is symmetric over the diagonal, except that the values in the upper triangle are the complex conjugate of the values in the lower triangle, the required work is reduced by only updating a single half of the matrix. The part of the matrix to update is given byuplo
:rocblas_fill_upper
indicates that the upper triangle of$A$ should be updated, androcblas_fill_lower
indicates that the lower triangle should be updated. Values in the other triangle are not altered.n
gives the dimensions of$x$ and$A$ , andincx
the increment in elements between items of$x$ .lda
is the leading dimension of$A$ : the number of elements between the starts of columns of$A$ . The elements of each column of$A$ are packed in memory. Note that rocBLAS matrices are laid out in column major ordering. See the following figure, which illustrates the memory layout of a matrix with 3 rows and 2 columns:
-
hipFloatComplex
,std::complex<float>
, androcblas_float_complex
have compatible memory layout, and performing a memory copy between values of these types will correctly perform the expected copy. -
hipCaddf(a, b)
addshipFloatComplex
valuesa
andb
element-wise together. This function is from a family of host/device HIP functions which operate on complex values.
rocblas_cher
rocblas_create_handle
rocblas_destroy_handle
rocblas_fill
rocblas_fill_lower
rocblas_fill_upper
rocblas_float
rocblas_float_complex
rocblas_handle
rocblas_int
rocblas_pointer_mode_host
rocblas_set_pointer_mode
rocblas_status
rocblas_status_success
rocblas_status_to_string
hipCaddf
hipFloatComplex
hipFree
hipMalloc
hipMemcpy
hipMemcpyDeviceToHost
hipMemcpyHostToDevice