Default to Shared Object by @jithunnair-amd in #33
Add varlen support to AOTriton's Flash Attention by @xinyazhang in #31
Switch to upstream Triton compiler, and related changes by @xinyazhang in #36
Improve Backward Performance and Experimental Navi31 Support by @xinyazhang in #39
- Introduce new tuning system based on pre-compiled GPU kernels
- Navi 31's support is still experimental
Support hipGraph usage in PyTorch by @xinyazhang in #40
- This changes the RNG API used by FA kernels.
- Switch to new testing scheme to match PyTorch 2.5's changes

New Contributors

@jithunnair-amd made their first contribution in #33

Full Changelog: 0.6b...0.7b

Contributors

xinyazhang and jithunnair-amd

Assets 5

04 Aug 18:04

xinyazhang

0.7preview2

7aa3e14

Preview 2 of 0.7b

The tuning database for Preview 1 was generated with newer triton kernel which does not use block pointer anymore. However Preview 1 does not include those changes.

Preview 2 was created to fix this.

Assets 3

04 Aug 08:42

xinyazhang

0.7preview1

98395ad

Preview 1 of 0.7b

Preview 1 of 0.7b.

What's Changed

Switch to Triton upstream compiler
~~Improved backward kernel performance with better tuning database~~ Didn't fully accomplish this, check Preview 2 for this feature
Add Navi31 support
Default to AOTRITON_COMPRESS_KERNEL=ON
Requires zstd as runtime dependency

Known problems

No Navi32 support
Lack of changes, especially ABI breaks to the library, that enable the generation the tuning_database.sqlite3 shipped in the preview version.

Assets 3

02 Aug 22:08

jithunnair-amd

0.4.2b

99f540a

AOTriton 0.4.2 Beta

Manylinux2_28 updates to 0.4.1b

Assets 3

06 Jun 02:52

xinyazhang

0.6b

04b5df8

AOTriton 0.6 Beta

What's Changed

Resolve cmake conflicts when adding aotriton into TE via add_subdirectory by @wangye805 in #23
[mGPU] Run hipModuleLoadDataEx for each GPU device. by @xinyazhang in #24
Adding mutex.h for TE pytorch extension compilation by @wangye805 in #26
Refactor the build system by @xinyazhang in #29

New Contributors

@wangye805 made their first contribution in #23

Full Changelog: 0.5b...0.6b

Contributors

xinyazhang and wangye805

Assets 7

08 May 00:47

xinyazhang

0.5b

00ccbf3

AOTriton 0.5 Beta

What's Changed

Switch Tuning database to SQLite3 for Incremental Tuning
Add matrix bias to forward/backward kernel
Fix build failures due to missing
Add new triton kernel debug_fill_dropout_rng to for debugging dropout
Add FP32 support to fulfill the functionalities required by torch.nn.attention.SDPBackend.EFFICIENT_ATTENTION

Notes about binary delivery

Starting from 0.5 Beta, we are not delivering binary form of AOTriton along with software releases due to software supply chain security considerations, for now.

Full Changelog: 0.4.1b...0.5b

Assets 2

29 Mar 20:14

xinyazhang

0.4.1b

24a3fe9

AOTriton 0.4.1 Beta

Summary

This is an emergency fix for the build process. It delivers the same (in terms of functional and performance) as of 0.4b version. AOTriton users who are not seeking for building the library from source can keep using 0.4b binary release packages.

Changes

Triton's setup.py downloads CUDA Packages during the build, but it does not always success. AOTriton does not need them for now, and hence they were commented out.

Assets 3

20 Mar 06:15

xinyazhang

0.4b

e537881

AOTriton 0.4 Beta

Summary

This is the first release which is considered sufficiently stable for production.

Features

Implement Flash Attention v2 Algorithm on MI200/MI300
- Implemented most features required by PyTorch's mha_fwd and mha_bwd
- Missing feature: window_size_left and window_size_right
- API can be found at include/aotriton/flash.h

Assets 4

19 Feb 20:21

xinyazhang

legal-scan

2569660

AOTriton GA Preview for Legal Scan Pre-release

Pre-release

This release is created for legal review before releasing to the public.

Compiled on ROCM 6.0, Ubuntu 20.04, Python 3.9

Assets 3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This is a point release

What's Changed

Contributors

What's Changed

New Contributors

Contributors

What's Changed

Known problems

What's Changed

New Contributors

Contributors

What's Changed

Notes about binary delivery

Summary

Changes

Summary

Features

Releases: ROCm/aotriton

AOTriton 0.7.1 Beta

This is a point release

What's Changed

Contributors

AOTriton 0.7 Beta

What's Changed

New Contributors

Contributors

Preview 2 of 0.7b

Preview 1 of 0.7b

What's Changed

Known problems

AOTriton 0.4.2 Beta

AOTriton 0.6 Beta

What's Changed

New Contributors

Contributors

AOTriton 0.5 Beta

What's Changed

Notes about binary delivery

AOTriton 0.4.1 Beta

Summary

Changes

AOTriton 0.4 Beta

Summary

Features

AOTriton GA Preview for Legal Scan