Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pull] main from NVIDIA:main #32

Merged
merged 6 commits into from
Dec 4, 2024
Merged

[pull] main from NVIDIA:main #32

merged 6 commits into from
Dec 4, 2024

Conversation

pull[bot]
Copy link

@pull pull bot commented Nov 25, 2024

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.1)

Can you help keep this open source service alive? 💖 Please sponsor : )

* Align RNG tracker with megatron

Signed-off-by: Robin Zhang <[email protected]>
Co-authored-by: Yifei Song <[email protected]>

* Fix module_params order and warmup bug in cudagraph

Signed-off-by: Robin Zhang <[email protected]>
Co-authored-by: Yifei Song <[email protected]>

* Add fp8_group argument and fix fp8 accuracy issue for cudagraph

Signed-off-by: Robin Zhang <[email protected]>
Co-authored-by: Yifei Song <[email protected]>

* Add TE modules and weights filters to support MoE models

Signed-off-by: Robin Zhang <[email protected]>
Co-authored-by: Yifei Song <[email protected]>

* Revert self.fp8

Signed-off-by: Robin Zhang <[email protected]>

* Use hooks to filter module params

Signed-off-by: Robin Zhang <[email protected]>

* Filter all TE modules in hooks

Signed-off-by: Robin Zhang <[email protected]>
Co-authored-by: Yifei Song <[email protected]>

* Format code

Signed-off-by: Robin Zhang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update graph.py

Signed-off-by: Xin Yao <[email protected]>

* Revert CudaRNGStatesTracker

Signed-off-by: Robin Zhang <[email protected]>

* Format Update

Signed-off-by: Yifei Song <[email protected]>

* Revert "Use hooks to filter module params"

This reverts commit 73a22e2.

Signed-off-by: Yifei Song <[email protected]>

* Remove filtering module params

Signed-off-by: Robin Zhang <[email protected]>

---------

Signed-off-by: Robin Zhang <[email protected]>
Signed-off-by: Xin Yao <[email protected]>
Signed-off-by: Yifei Song <[email protected]>
Co-authored-by: Yifei Song <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Xin Yao <[email protected]>
Co-authored-by: Xin Yao <[email protected]>
Co-authored-by: Tim Moon <[email protected]>
@pull pull bot added the ⤵️ pull label Nov 25, 2024
mgoldfarb-nvidia and others added 5 commits November 25, 2024 08:43
Moved framework agnostic THD kernels to common.

---------

Signed-off-by: Michael Goldfarb <[email protected]>
* retain_graph=True for grouped gemm

Signed-off-by: Xiaowei Ren <[email protected]>

* remove an unnecessary retain_graph=True

Signed-off-by: Xiaowei Ren <[email protected]>

* make retain_graph in graph capture configurable

Signed-off-by: Xiaowei Ren <[email protected]>

* typo fix

Signed-off-by: Xiaowei Ren <[email protected]>

---------

Signed-off-by: Xiaowei Ren <[email protected]>
* Update list of CI users

Signed-off-by: Tim Moon <[email protected]>

* Update list of CI users

Signed-off-by: Tim Moon <[email protected]>

---------

Signed-off-by: Tim Moon <[email protected]>
…age (#1308)

* draft implementation

Signed-off-by: Youngeun Kwon <[email protected]>

* compile error fix

Signed-off-by: Youngeun Kwon <[email protected]>

* fix compile error

Signed-off-by: Youngeun Kwon <[email protected]>

* remove print

Signed-off-by: Youngeun Kwon <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Edit comments

Signed-off-by: Youngeun Kwon <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* edit the bulk-overlap test case

Signed-off-by: Youngeun Kwon <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add version guard

Signed-off-by: Youngeun Kwon <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add runtime version guard

Signed-off-by: Youngeun Kwon <[email protected]>

* fix the version guard

Signed-off-by: Youngeun Kwon <[email protected]>

---------

Signed-off-by: Youngeun Kwon <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Scale sequence length in CP tests to avoid tiny sizes.

Signed-off-by: Michael Goldfarb <[email protected]>
@phu0ngng phu0ngng merged commit d3cbccd into phu0ngng:main Dec 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants