Skip to content

Releases: ROCm/Tensile

Tensile 4.35.0 for ROCm 5.4.3

07 Feb 17:32
5aec089
Compare
Choose a tag to compare

Tensile code for ROCm 5.4.3 did not change. The library was rebuilt for the updated ROCm 5.4.3 stack.

Tensile 4.35.0 for ROCm 5.4.2

13 Jan 16:40
5aec089
Compare
Choose a tag to compare

Tensile code for ROCm 5.4.2 did not change. The library was rebuilt for the updated ROCm 5.4.2 stack.

Tensile 4.35.0 for ROCm 5.4.1

15 Dec 18:38
5aec089
Compare
Choose a tag to compare

Tensile code for ROCm 5.4.1 did not change. The library was rebuilt for the updated ROCm 5.4.1 stack.

Tensile 4.35.0 for ROCm 5.4.0

30 Nov 17:32
5aec089
Compare
Choose a tag to compare

Added

  • Async DMA support for Transpose Data Layout (ThreadSeparateGlobalReadA/B)
  • Option to output library logic in dictionary format
  • No solution found error message for benchmarking client
  • Exact K check for StoreCInUnrollExact
  • Support for CGEMM + MIArchVgpr
  • client-path parameter for using prebuilt client
  • CleanUpBuildFiles global parameter
  • Debug flag for printing library logic index of winning solution
  • NumWarmups global parameter for benchmarking
  • Windows support for benchmarking client
  • DirectToVgpr support for CGEMM
  • TensileLibLogicToYaml for creating tuning configs from library logic solutions

Optimizations

  • Put beta code and store separately if StoreCInUnroll = x4 store
  • Improved performance for StoreCInUnroll + b128 store

Changed

  • Re-enable HardwareMonitor for gfx90a
  • Decision trees use MLFeatures instead of Properties

Fixed

  • Reject DirectToVgpr + MatrixInstBM/BN > 1
  • Fix benchmark timings when using warmups and/or validation
  • Fix mismatch issue with DirectToVgprB + VectorWidth > 1
  • Fix mismatch issue with DirectToLds + NumLoadsCoalesced > 1 + TailLoop
  • Fix incorrect reject condition for DirectToVgpr
  • Fix reject condition for DirectToVgpr + MIWaveTile < VectorWidth
  • Fix incorrect instruction generation with StoreCInUnroll

Tensile 4.34.0 for ROCm 5.3.1

28 Oct 16:57
b33ca97
Compare
Choose a tag to compare

Tensile code for ROCm 5.3.1 did not change. The library was rebuilt for the updated ROCm 5.3.1 stack.

Tensile 4.34.0 for ROCm 5.3.0

30 Sep 19:24
b33ca97
Compare
Choose a tag to compare

Added

  • Lazy loading of solution libraries and code object files
  • Support for dictionary style logic files
  • Support for decision tree based logic files using dictionary format
  • DecisionTreeLibrary for solution selection
  • DirectToLDS support for HGEMM
  • DirectToVgpr support for SGEMM
  • Grid based distance metric for solution selection
  • Support for gfx11xx
  • Support for DirectToVgprA/B + TLU=False
  • ForkParameters Groups as a way of specifying solution parameters
  • Support for a new Tensile yaml config format
  • TensileClientConfig for generating Tensile client config files
  • Options for TensileCreateLibrary to build client and create client config file

Optimizations

  • Solution generation is now cached and is not repeated if solution parameters are unchanged

Changed

  • Default MACInstruction to FMA

Fixed

  • Accept StaggerUStride=0 as valid
  • Reject invalid data types for UnrollLoopEfficiencyEnable
  • Fix invalid code generation issues related to DirectToVgpr
  • Return hipErrorNotFound if no modules are loaded
  • Fix performance drop for NN ZGEMM with 96x64 macro tile
  • Fix memory violation for general batched kernels when alpha/beta/K = 0

Tensile 4.33.0 for ROCm 5.2.3

18 Aug 16:59
da90ed3
Compare
Choose a tag to compare

Tensile code for ROCm 5.2.3 did not change. The library was rebuilt for the updated ROCm 5.2.3 stack.

Tensile 4.33.0 for ROCm 5.2.1

21 Jul 20:23
da90ed3
Compare
Choose a tag to compare

Tensile code for ROCm 5.2.1 did not change. The library was rebuilt for the updated ROCm 5.2.1 stack.

Tensile 4.33.0 for ROCm 5.2.0

28 Jun 18:42
da90ed3
Compare
Choose a tag to compare

Added

  • TensileUpdateLibrary for updating old library logic files
  • Support for TensileRetuneLibrary to use sizes from separate file
  • ZGEMM DirectToVgpr/DirectToLds/StoreCInUnroll/MIArchVgpr support
  • Tests for denorm correctness
  • Option to write different architectures to different TensileLibrary files

Optimizations

  • Optimize MessagePackLoadLibraryFile by switching to fread
  • DGEMM tail loop optimization for PrefetchAcrossPersistentMode=1/DirectToVgpr

Changed

  • Alpha/beta datatype remains as F32 for HPA HGEMM
  • Force assembly kernels to not flush denorms
  • Use hipDeviceAttributePhysicalMultiProcessorCount as multiProcessorCount

Fixed

  • Fix segmentation fault when run i8 datatype with TENSILE_DB=0x80

Tensile 4.32.0 for ROCm 5.1.3

20 May 17:05
Compare
Choose a tag to compare

Tensile code for ROCm 5.1.3 did not change. The library was rebuilt for the updated ROCm 5.1.3 stack.