Skip to content

Releases: ROCm/Tensile

Tensile 4.32.0 for ROCm 5.1.1

08 Apr 20:52
Compare
Choose a tag to compare

Tensile code for ROCm 5.1.1 did not change. The library was rebuilt for the updated ROCm 5.1.1 stack.

Tensile 4.32.0 for ROCm 5.1.0

30 Mar 17:26
Compare
Choose a tag to compare

Added

  • Better control of parallelism to control memory usage
  • Support for multiprocessing on Windows for TensileCreateLibrary
  • New JSD metric and metric selection functionality
  • Initial changes to support two-tier solution selection

Optimized

  • Optimized runtime of TensileCreateLibraries by reducing max RAM usage
  • StoreCInUnroll additional optimizations plus adaptive K support
  • DGEMM NN optimizations with PrefetchGlobalRead(PGR)=2 support

Changed

  • Update Googletest to 1.11.0

Removed

  • Remove no longer supported benchmarking steps

Tensile 4.31.0 for ROCm 5.0.2

04 Mar 17:54
Compare
Choose a tag to compare

Tensile code for ROCm 5.0.2 is unchanged from Tensile for ROCm 5.0.1. The library was rebuilt for the updated ROCm 5.0.2 stack.

Tensile 4.31.0 for ROCm 5.0.1

16 Feb 22:17
Compare
Choose a tag to compare

Tensile code for ROCm 5.0.1 is unchanged from Tensile for ROCm 5.0.0. The library was rebuilt for the updated ROCm 5.0.1 stack.

Tensile 4.31.0 for ROCm 5.0.0

09 Feb 20:34
Compare
Choose a tag to compare

Added

  • DirectToLds support (x2/x4)
  • DirectToVgpr support for DGEMM
  • Parameter to control number of files kernels are merged into to better parallelize kernel compilation
  • FP16 alternate implementation for HPA HGEMM on aldebaran

Optimized

  • Add DGEMM NN custom kernel for HPL on aldebaran

Changed

  • Update tensile_client executable to std=c++14

Removed

  • Remove unused old Tensile client code

Fixed

  • Fix hipErrorInvalidHandle during benchmarks
  • Fix addrVgpr for atomic GSU
  • Fix for Python 3.8: add case for Constant nodeType
  • Fix architecture mapping for gfx1011 and gfx1012
  • Fix PrintSolutionRejectionReason verbiage in KernelWriter.py
  • Fix vgpr alignment problem when enabling flat buffer load

Tensile 4.30.0 for ROCm 4.5.2

10 Dec 19:20
bb19eec
Compare
Choose a tag to compare

Tensile code for ROCm 4.5.2 is unchanged from Tensile for ROCm 4.5.0. The library was rebuilt for the updated ROCm 4.5.2 stack.

Tensile 4.30.0 for ROCm 4.5.0

27 Oct 21:30
bb19eec
Compare
Choose a tag to compare

Added

  • Custom Kernel mechanism for adding custom assembly kernels to Tensile
  • New assertions for problems sizes, alpha/beta values, and C equals D
  • Support setting VectorWidth in M dimension in MFMA SourceSwap configuration

Fixed

  • Fix merge.py keeping duplicate solutions
  • Fix ScheduleIterAlg 2,3 cases for aldebaran

Tensile 4.28.0 for ROCm 4.3.1

27 Aug 17:41
9cbabb0
Compare
Choose a tag to compare

No changes made for ROCm 4.3.1.

Tensile 4.28.0 for ROCm 4.3.0

30 Jul 22:53
9cbabb0
Compare
Choose a tag to compare

Added

  • TensileRetuneLibrary for updating existing library logic files
  • Support GFX1030
  • Support NHWC

Fixed

  • TensileCreateLibrary crash with relative output and --merge-files

Changed

  • Change cmake_minimum_required to VERSION 3.13

Tensile-4.27.0 for ROCm 4.2.0

10 May 23:17
3438af2
Compare
Choose a tag to compare

Added

  • Benchmarking and library support for CU efficiency vs. overall speed
  • support general batch GEMM
  • Support offset for each input/output buffer in Tensile
  • support support ldc != ldd for all GEMM kernel

Optimizations

  • Refactor ConvolutionVsContraction

Fixed

  • Fixed MasterSolutionLibrary having duplicated hardware rows
  • channel stride is incorrect when converting conv problem into tensor contraction problem]