Releases: ROCm/Tensile
Releases · ROCm/Tensile
Tensile 4.35.0 for ROCm 5.4.3
Tensile code for ROCm 5.4.3 did not change. The library was rebuilt for the updated ROCm 5.4.3 stack.
Tensile 4.35.0 for ROCm 5.4.2
Tensile code for ROCm 5.4.2 did not change. The library was rebuilt for the updated ROCm 5.4.2 stack.
Tensile 4.35.0 for ROCm 5.4.1
Tensile code for ROCm 5.4.1 did not change. The library was rebuilt for the updated ROCm 5.4.1 stack.
Tensile 4.35.0 for ROCm 5.4.0
Added
- Async DMA support for Transpose Data Layout (ThreadSeparateGlobalReadA/B)
- Option to output library logic in dictionary format
- No solution found error message for benchmarking client
- Exact K check for StoreCInUnrollExact
- Support for CGEMM + MIArchVgpr
- client-path parameter for using prebuilt client
- CleanUpBuildFiles global parameter
- Debug flag for printing library logic index of winning solution
- NumWarmups global parameter for benchmarking
- Windows support for benchmarking client
- DirectToVgpr support for CGEMM
- TensileLibLogicToYaml for creating tuning configs from library logic solutions
Optimizations
- Put beta code and store separately if StoreCInUnroll = x4 store
- Improved performance for StoreCInUnroll + b128 store
Changed
- Re-enable HardwareMonitor for gfx90a
- Decision trees use MLFeatures instead of Properties
Fixed
- Reject DirectToVgpr + MatrixInstBM/BN > 1
- Fix benchmark timings when using warmups and/or validation
- Fix mismatch issue with DirectToVgprB + VectorWidth > 1
- Fix mismatch issue with DirectToLds + NumLoadsCoalesced > 1 + TailLoop
- Fix incorrect reject condition for DirectToVgpr
- Fix reject condition for DirectToVgpr + MIWaveTile < VectorWidth
- Fix incorrect instruction generation with StoreCInUnroll
Tensile 4.34.0 for ROCm 5.3.1
Tensile code for ROCm 5.3.1 did not change. The library was rebuilt for the updated ROCm 5.3.1 stack.
Tensile 4.34.0 for ROCm 5.3.0
Added
- Lazy loading of solution libraries and code object files
- Support for dictionary style logic files
- Support for decision tree based logic files using dictionary format
- DecisionTreeLibrary for solution selection
- DirectToLDS support for HGEMM
- DirectToVgpr support for SGEMM
- Grid based distance metric for solution selection
- Support for gfx11xx
- Support for DirectToVgprA/B + TLU=False
- ForkParameters Groups as a way of specifying solution parameters
- Support for a new Tensile yaml config format
- TensileClientConfig for generating Tensile client config files
- Options for TensileCreateLibrary to build client and create client config file
Optimizations
- Solution generation is now cached and is not repeated if solution parameters are unchanged
Changed
- Default MACInstruction to FMA
Fixed
- Accept StaggerUStride=0 as valid
- Reject invalid data types for UnrollLoopEfficiencyEnable
- Fix invalid code generation issues related to DirectToVgpr
- Return hipErrorNotFound if no modules are loaded
- Fix performance drop for NN ZGEMM with 96x64 macro tile
- Fix memory violation for general batched kernels when alpha/beta/K = 0
Tensile 4.33.0 for ROCm 5.2.3
Tensile code for ROCm 5.2.3 did not change. The library was rebuilt for the updated ROCm 5.2.3 stack.
Tensile 4.33.0 for ROCm 5.2.1
Tensile code for ROCm 5.2.1 did not change. The library was rebuilt for the updated ROCm 5.2.1 stack.
Tensile 4.33.0 for ROCm 5.2.0
Added
- TensileUpdateLibrary for updating old library logic files
- Support for TensileRetuneLibrary to use sizes from separate file
- ZGEMM DirectToVgpr/DirectToLds/StoreCInUnroll/MIArchVgpr support
- Tests for denorm correctness
- Option to write different architectures to different TensileLibrary files
Optimizations
- Optimize MessagePackLoadLibraryFile by switching to fread
- DGEMM tail loop optimization for PrefetchAcrossPersistentMode=1/DirectToVgpr
Changed
- Alpha/beta datatype remains as F32 for HPA HGEMM
- Force assembly kernels to not flush denorms
- Use hipDeviceAttributePhysicalMultiProcessorCount as multiProcessorCount
Fixed
- Fix segmentation fault when run i8 datatype with TENSILE_DB=0x80
Tensile 4.32.0 for ROCm 5.1.3
Tensile code for ROCm 5.1.3 did not change. The library was rebuilt for the updated ROCm 5.1.3 stack.