support gpu send/recv thunk #1

wbmc · 2023-12-04T14:40:01Z

No description provided.

Imported from GitHub PR openxla#6599 FP8 cublasLt matmul uses fast accumulation when both operands' precision are DEFAULT. Otherwise fall back to high precision acuumulation. Issue#openxla#6168 This PR is closely related to Flax PR-![3416](google/flax#3416). Copybara import of the project: -- a4140da by shuw <[email protected]>: Add FP8 fast accumulation support for cublasLt. -- 9684568 by shuw <[email protected]>: Improve based on review #1 -- e906d76 by shuw <[email protected]>: Improve based on review #2 Merging this change closes openxla#6599 COPYBARA_INTEGRATE_REVIEW=openxla#6599 from wenscarl:fp8_fast_accumulation e906d76 PiperOrigin-RevId: 578948593

Imported from GitHub PR openxla#7751 Due to fast accumulation being turned on in the forward mode, the cublasLt fp8 gemm with gelu epilogue can efficiently operate with a fused kernel. Compared against the XLA-generated gelu kernel on H100, the performance demonstrates some improvement for size of [8192, 4096] x [4096, 16384] + gelu: Execution time for matmul using cublasLt and gelu (XLA): 1.28ms Execution time for matmul_gelu using cublasLt: 1.25ms Copybara import of the project: -- e8abce3 by Shu Wang <[email protected]>: Support cublasLt Fp8 Approx Gelu epilogue fusion. -- 818127c by shuw <[email protected]>: Remove F32 check -- 5ce3108 by shuw <[email protected]>: Improve based on review intelligent-machine-learning#1 Merging this change closes openxla#7751 COPYBARA_INTEGRATE_REVIEW=openxla#7751 from wenscarl:cublaslt_fp8_gelu 5ce3108 PiperOrigin-RevId: 591236441

…execution scope Instead of always constructing while operation conditional in the default scope use the scope of a while operation itself. This generates correct CUDA graph: https://gist.github.com/ezhulenev/a84192fe8b46a4bf1a934a8baa08ea60 Memeset operation launched in a scope #1 is not synchronized with initial condition handle update PiperOrigin-RevId: 609742672

update

645c507

github-actions bot added the kokoro:force-run label Dec 4, 2023

wbmc merged commit 5b044b1 into main Dec 4, 2023
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support gpu send/recv thunk #1

support gpu send/recv thunk #1

wbmc commented Dec 4, 2023

support gpu send/recv thunk #1

support gpu send/recv thunk #1

Conversation

wbmc commented Dec 4, 2023