Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge improvements of 0.7.1b release into main #46

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

xinyazhang
Copy link
Collaborator

No description provided.

There are multiple compatible variants of gfx942, e.g., gfx942:sramecc-:xnack- and gfx942:sramecc+:xnack-
Major changes:

1. Fix numerical errors from scaling input tensors with log_2(e) as
preprocessing. Fudge factors are adjusted accordingly
2. Adopt techniques from forward kernel to specialize inner loops of the
bwd kernel as well.
3. Update the tuning database for MI200/300 accordingly

Minor changes:

1. `pyaotriton` now includes `$ORIGIN` in its `DT_RUNPATH`
2. `install` target now installs `pyaotriton` to
`$CMAKE_INSTALL_PREFIX/lib`
3. `mptune` now stores testing results' batch size, making the timing
results more informative
4. `performance_*.py` scripts now read `USE_TFLOPS`, `D_HEADS`, and
`N_CTX` env vars, allowing changing the testing size without editing the
code
5. `test/test_backward.py` now displays target fudge factors for fudge
factor adjustment
6. `tune_flash.py` now shrinks batch size to 2 when both sequence
lengths > 4096, to not exceed the VRAM limit.
7. Fix a problem of `sancheck_lut_tensor` in `class
FlashKernel(KernelDescription)`, which did not handle single element LUT
tensor correctly.
8. `v2python/table_tool.py` now ignores `inputs$BATCH` column

Notes:

1. The fudge factors in use assume PyTorch <= 2.4. See
pytorch/pytorch#135590 for detailed discussion
why PyTorch 2.5 cannot be used for testing. PyTorch 2.6 will include a
new interface to fix the problem.
@xinyazhang xinyazhang changed the title Merge improvements from 0.7.1b release Merge improvements of 0.7.1b release into main Oct 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant