Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(WIP) Multi backend refactor -> main (full diff of all already merged PRs) #1220

Open
wants to merge 248 commits into
base: main
Choose a base branch
from

Conversation

Titus-von-Koeller
Copy link
Collaborator

@Titus-von-Koeller Titus-von-Koeller commented May 25, 2024

This PR to main serves the purpose to keep an overview of all the extensive changes that have been introduced to multi-backend-refactor to the iterative PRs around this topic.

We will eventually merge this into master and before that do a thorough final review and, as well, get Tim's final sign-off on this extensive refactor.

For now, it mainly serves the purpose of providing a public diff of the entirety of the changes. However, already feel free to leave constructive feedback and review comments.

jiqing-feng and others added 21 commits August 2, 2024 11:21
* Add build job for rocm

* Add rocm build script

* Copy shared obj file into output_dir

* upload build artifacts and enable wheels build

* Remove cuda build temporarily

* Add ROCm version to .so filename

* Add rocm_version to whls build

* Revert "Remove cuda build temporarily"

This reverts commit 1413c5f.

* Add rocm_version env var

* Remove thrush header files

* Print node info

* print cuda node info

* Revert "print cuda node info"

This reverts commit cdb209a.

* Revert "Print node info"

This reverts commit 7e9a65c.

* Add rocm arch to compile command

* Rename .so files to rocm

* Update default gpu arch

* Skip cpu based igemmlt int tests on ROCm

* Update Documentation

* Update upstream repo name

* Update docs

* Update string format

Co-authored-by: Aarni Koskela <[email protected]>

* Remove pre-release option for torch install

* Update pytorch install path

Co-authored-by: Titus <[email protected]>

---------

Co-authored-by: Aarni Koskela <[email protected]>
Co-authored-by: Titus <[email protected]>
* fix 4bit dtype

* fix nf4 save
* fix nf4 memory issue by init op_context in forward

* disable repack in init

* fix code style
* Add build job for rocm

* Add rocm build script

* Copy shared obj file into output_dir

* upload build artifacts and enable wheels build

* Remove cuda build temporarily

* Add ROCm version to .so filename

* Add rocm_version to whls build

* Revert "Remove cuda build temporarily"

This reverts commit 1413c5f.

* Add rocm_version env var

* Remove thrush header files

* Print node info

* print cuda node info

* Revert "print cuda node info"

This reverts commit cdb209a.

* Revert "Print node info"

This reverts commit 7e9a65c.

* Add rocm arch to compile command

* Rename .so files to rocm

* Update default gpu arch

* Skip cpu based igemmlt int tests on ROCm

* Update Documentation

* Update upstream repo name

* Update docs

* Update string format

Co-authored-by: Aarni Koskela <[email protected]>

* Remove pre-release option for torch install

* Update pytorch install path

Co-authored-by: Titus <[email protected]>

* Add messages for Heuristics error

* Remove toolcache for disk space

* print disk usage

* Clean disk space for linux

* Fix for ubuntu

* Add sudo for apt clean

* Update clean up disk list

* remove disk usage print

* Add BNB_BACKEND variable

* Update diagnostic functions for ROCm

* Fix tuple error

* Fix library detection bug for recursive and symlink cases

* fix pre-commit errors

* Remove recursive path lib search

* Create function for runtime lib patterns

* Update logger format

Co-authored-by: Aarni Koskela <[email protected]>

* Update error reporting

Co-authored-by: Aarni Koskela <[email protected]>

* Remove commented code

Co-authored-by: Aarni Koskela <[email protected]>

* Update error reporting

Co-authored-by: Aarni Koskela <[email protected]>

* Update error reporting

* Create hip diagnostics functions

* Fix Typo

* Fix pre-commit checks

---------

Co-authored-by: Aarni Koskela <[email protected]>
Co-authored-by: Titus <[email protected]>
* Enable 6.2 build

* Update documentation for 6.2.0 pip install
* Update for VS2022 17.11 compatibility with CUDA < 12.4

* Try again
@Titus-von-Koeller Titus-von-Koeller force-pushed the multi-backend-refactor branch 2 times, most recently from 0585a6a to fedd94e Compare September 27, 2024 23:54
* refine docs for multi-backend alpha release

* docs: further tweaks to multi-backend alpha docs

* docs: further tweaks to multi-backend alpha docs

* docs: further tweaks to multi-backend alpha docs

* docs: add multi-backend feedback links

* docs: add request for contributions

* docs: small fixes

* docs: small fixes

* docs: add info about `main` continuous build

* docs: further tweaks to multi-backend alpha docs

* docs: further tweaks to multi-backend alpha docs
* Add build job for rocm

* Add rocm build script

* Copy shared obj file into output_dir

* upload build artifacts and enable wheels build

* Remove cuda build temporarily

* Add ROCm version to .so filename

* Add rocm_version to whls build

* Revert "Remove cuda build temporarily"

This reverts commit 1413c5f.

* Add rocm_version env var

* Remove thrush header files

* Print node info

* print cuda node info

* Revert "print cuda node info"

This reverts commit cdb209a.

* Revert "Print node info"

This reverts commit 7e9a65c.

* Add rocm arch to compile command

* Rename .so files to rocm

* Update default gpu arch

* Skip cpu based igemmlt int tests on ROCm

* Update Documentation

* Update upstream repo name

* Update docs

* Update string format

Co-authored-by: Aarni Koskela <[email protected]>

* Remove pre-release option for torch install

* Update pytorch install path

Co-authored-by: Titus <[email protected]>

* Add messages for Heuristics error

* Remove toolcache for disk space

* print disk usage

* Clean disk space for linux

* Fix for ubuntu

* Add sudo for apt clean

* Update clean up disk list

* remove disk usage print

* Add BNB_BACKEND variable

* Update diagnostic functions for ROCm

* Fix tuple error

* Fix library detection bug for recursive and symlink cases

* fix pre-commit errors

* Remove recursive path lib search

* Create function for runtime lib patterns

* Update logger format

Co-authored-by: Aarni Koskela <[email protected]>

* Update error reporting

Co-authored-by: Aarni Koskela <[email protected]>

* Remove commented code

Co-authored-by: Aarni Koskela <[email protected]>

* Update error reporting

Co-authored-by: Aarni Koskela <[email protected]>

* Update error reporting

* Create hip diagnostics functions

* Fix Typo

* Fix pre-commit checks

* Enable 6.2 build

* Skip gemv 4 bit cpu test

* Update documentation for 6.2.0 pip install

* Update README for default branch change

* Fix typo

* Sync README with upstream

* Remove depth

---------

Co-authored-by: Aarni Koskela <[email protected]>
Co-authored-by: Titus <[email protected]>
Co-authored-by: Aswin John Mathews <[email protected]>
Co-authored-by: root <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.