mkdir build
cd build
export PKG_CONFIG_PATH="${PKG_CONFIG_PATH}:${CONDA_PREFIX}/lib/pkgconfig"
cmake .. -DCMAKE_INSTALL_PREFIX=./install_dir -DCMAKE_BUILD_TYPE=Release -DAOTRITON_GPU_BUILD_TIMEOUT=0 -G Ninja
# Use ccmake to tweak options
ninja install
The library and the header file can be found under build/install_dir
afterwards.
You may ignore the export PKG_CONFIG_PATH
part if you're not building with conda
Note: do not run ninja
separately, due to the limit of the current build
system, ninja install
will run the whole build process unconditionally.
hipcc
in/opt/rocm/bin
, as a part of ROCmcmake
ninja
libzstd
- Common names are
libzstd-dev
orlibzstd-devel
.
- Common names are
The kernel definition for generation is done in rules.py. Edits to this file are needed for each new kernel, but it is extensible and generic.
Include files can be added in this directory.
The final build output is an archive object file any new project may link against.
The archive file and header files are installed in the path specified by CMAKE_INSTALL_PREFIX.
Currently the first kernel supported is FlashAttention as based on the algorithm from Tri Dao.
PyTorch recently expanded AOTriton support for FlashAttention. AOTriton is consumed in PyTorch through the SDPA kernels. The Triton kernels and bundled archive are built at PyTorch build time.
CAVEAT: As a fast moving target, AOTriton's FlashAttention API changes over time. Hence, a specific PyTorch release is only compatible with a few versions of AOTriton. The computability matrix is shown below
PyTorch | AOTriton Release |
---|---|
2.2 and earlier | N/A, no support |
2.3 | 0.4b, 0.4.1b |
2.4 | 0.6b |
For PyTorch main branch, check aotriton_version.txt. The first line is the tag name, and the 4th line is the SHA-1 commit of AOTriton.