Skip to content

Navigation Menu

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

ROCm / Tensile Public

Notifications You must be signed in to change notification settings
Fork 152
Star 227

Code
Issues 7
Pull requests 8
Actions
Projects
Wiki
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Wiki
Security
Insights

Releases: ROCm/Tensile

Releases · ROCm/Tensile

Tensile 4.32.0 for ROCm 5.1.1

08 Apr 20:52

lawruble13

Compare

Choose a tag to compare

Loading

Tensile 4.32.0 for ROCm 5.1.1

Tensile code for ROCm 5.1.1 did not change. The library was rebuilt for the updated ROCm 5.1.1 stack.

Assets 2

Loading

All reactions

Tensile 4.32.0 for ROCm 5.1.0

30 Mar 17:26

lawruble13

Compare

Choose a tag to compare

Loading

Tensile 4.32.0 for ROCm 5.1.0

Added

Better control of parallelism to control memory usage
Support for multiprocessing on Windows for TensileCreateLibrary
New JSD metric and metric selection functionality
Initial changes to support two-tier solution selection

Optimized

Optimized runtime of TensileCreateLibraries by reducing max RAM usage
StoreCInUnroll additional optimizations plus adaptive K support
DGEMM NN optimizations with PrefetchGlobalRead(PGR)=2 support

Changed

Update Googletest to 1.11.0

Removed

Remove no longer supported benchmarking steps

Assets 2

Loading

All reactions

Tensile 4.31.0 for ROCm 5.0.2

04 Mar 17:54

lawruble13

Compare

Choose a tag to compare

Loading

Tensile 4.31.0 for ROCm 5.0.2

Tensile code for ROCm 5.0.2 is unchanged from Tensile for ROCm 5.0.1. The library was rebuilt for the updated ROCm 5.0.2 stack.

Assets 2

Loading

All reactions

Tensile 4.31.0 for ROCm 5.0.1

16 Feb 22:17

lawruble13

Compare

Choose a tag to compare

Loading

Tensile 4.31.0 for ROCm 5.0.1

Tensile code for ROCm 5.0.1 is unchanged from Tensile for ROCm 5.0.0. The library was rebuilt for the updated ROCm 5.0.1 stack.

Assets 2

Loading

All reactions

Tensile 4.31.0 for ROCm 5.0.0

09 Feb 20:34

lawruble13

Compare

Choose a tag to compare

Loading

Tensile 4.31.0 for ROCm 5.0.0

Added

DirectToLds support (x2/x4)
DirectToVgpr support for DGEMM
Parameter to control number of files kernels are merged into to better parallelize kernel compilation
FP16 alternate implementation for HPA HGEMM on aldebaran

Optimized

Add DGEMM NN custom kernel for HPL on aldebaran

Changed

Update tensile_client executable to std=c++14

Removed

Remove unused old Tensile client code

Fixed

Fix hipErrorInvalidHandle during benchmarks
Fix addrVgpr for atomic GSU
Fix for Python 3.8: add case for Constant nodeType
Fix architecture mapping for gfx1011 and gfx1012
Fix PrintSolutionRejectionReason verbiage in KernelWriter.py
Fix vgpr alignment problem when enabling flat buffer load

Assets 2

Loading

All reactions

Tensile 4.30.0 for ROCm 4.5.2

10 Dec 19:20

lawruble13

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

Tensile 4.30.0 for ROCm 4.5.2

Tensile code for ROCm 4.5.2 is unchanged from Tensile for ROCm 4.5.0. The library was rebuilt for the updated ROCm 4.5.2 stack.

Assets 2

Loading

All reactions

Tensile 4.30.0 for ROCm 4.5.0

27 Oct 21:30

lawruble13

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

Tensile 4.30.0 for ROCm 4.5.0

Added

Custom Kernel mechanism for adding custom assembly kernels to Tensile
New assertions for problems sizes, alpha/beta values, and C equals D
Support setting VectorWidth in M dimension in MFMA SourceSwap configuration

Fixed

Fix merge.py keeping duplicate solutions
Fix ScheduleIterAlg 2,3 cases for aldebaran

Assets 2

Loading

All reactions

Tensile 4.28.0 for ROCm 4.3.1

27 Aug 17:41

lawruble13

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

Tensile 4.28.0 for ROCm 4.3.1

No changes made for ROCm 4.3.1.

Assets 2

Loading

All reactions

Tensile 4.28.0 for ROCm 4.3.0

30 Jul 22:53

saadrahim

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

Tensile 4.28.0 for ROCm 4.3.0

Added

TensileRetuneLibrary for updating existing library logic files
Support GFX1030
Support NHWC

Fixed

TensileCreateLibrary crash with relative output and --merge-files

Changed

Change cmake_minimum_required to VERSION 3.13

Assets 2

Loading

All reactions

Tensile-4.27.0 for ROCm 4.2.0

10 May 23:17

saadrahim

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

Tensile-4.27.0 for ROCm 4.2.0

Added

Benchmarking and library support for CU efficiency vs. overall speed
support general batch GEMM
Support offset for each input/output buffer in Tensile
support support ldc != ldd for all GEMM kernel

Optimizations

Refactor ConvolutionVsContraction

Fixed

Fixed MasterSolutionLibrary having duplicated hardware rows
channel stride is incorrect when converting conv problem into tensor contraction problem]

Assets 2

Loading

All reactions

Previous 1 2 3 4 5 6 7 8 Next

Footer

© 2024 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.