AMD Optimized BLIS Version 2.2
AMD Optimized BLIS Version 2.2
Highlights of improvements on AMD EPYCTM processor family CPUs
- Improved performance for Level-1 BLAS routines for single and double precision.
- Improved performance of SGEMV and DGEMV for large sizes.
- Enabled small unpacked(SUP) GEMM kernels for single precision and double precision complex (C,Z) GEMM
- Multi-threaded small unpacked(SUP) GEMM kernels enabled for (S,D,C,Z) GEMM providing improved performance for small/skinny matrices.
- GEMM Selective packing feature is now multithread enabled. Selective packing feature packs either A or B or both the matrices and can be enabled by setting environment variable. Refer AOCL User Guide at https://developer.amd.com/amd-aocl/ for details
- Improved TRSM single-thread and multi-thread performance for large and skinny matrices
- Debug trace and log feature enabled for debug purposes.