Remove AMD workarounds as they do more harm than good on recent ROCm releases. #8289

kasper93 · 2025-05-26T12:22:19Z

Tested on gfx1201, ROCm 6.4.1. It fixes VAE Decode issues and generally performance is better with pytorch flash attention. bfloats16 are working fine.

comfyanonymous · 2025-05-26T18:16:55Z

pytorch rocm 6.4 is still nightly so I can't merge some of these changes as is or it's going to break people using the stable pytorch which is on 6.3

kasper93 · 2025-05-26T18:30:00Z

Ok, fair, it can wait. I was using pytorch wheels from AMD https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.1/ to validate those changes.

It causes crashes even without pytorch attention for big sizes, and for resonable sizes it is significantly faster. This reverts commit 1cd6cd6.

It's is significantly faster, 8 it/s vs 12 it/s on 9070 xt, ROCm 6.4.1.

It works fine on ROCm 6.4.1. Also it is faster and avoid OOMs in VAE Decode. Fixes: comfyanonymous#7400 Fixes: ROCm/ROCm#4742

kasper93 · 2025-06-11T14:40:35Z

pytorch rocm 6.4 is still nightly so I can't merge some of these changes as is or it's going to break people using the stable pytorch which is on 6.3

I've rebased. I think we should merge this. We could add some version checks, but it's not clear to me on what versions it is broken. (also upstream ROCm 6.4 will probably be on next pytorch release, which can take months) The main issue currently is that those is_amd checks are unconditional and cannot be controlled by user. While after those changes, users still can force fp32 VAE if they want.

chutchatut · 2025-06-11T15:47:29Z

pytorch rocm 6.4 is still nightly so I can't merge some of these changes as is or it's going to break people using the stable pytorch which is on 6.3

I've rebased. I think we should merge this. We could add some version checks, but it's not clear to me on what versions it is broken. (also upstream ROCm 6.4 will probably be on next pytorch release, which can take months) The main issue currently is that those is_amd checks are unconditional and cannot be controlled by user. While after those changes, users still can force fp32 VAE if they want.

IMO, it is much safer if we have flag to enable the new is_amd checking logic until we actually benchmark and see that there are no performance regression on older amd cards.

kasper93 · 2025-06-11T19:36:03Z

pytorch rocm 6.4 is still nightly so I can't merge some of these changes as is or it's going to break people using the stable pytorch which is on 6.3

I've rebased. I think we should merge this. We could add some version checks, but it's not clear to me on what versions it is broken. (also upstream ROCm 6.4 will probably be on next pytorch release, which can take months) The main issue currently is that those is_amd checks are unconditional and cannot be controlled by user. While after those changes, users still can force fp32 VAE if they want.

IMO, it is much safer if we have flag to enable the new is_amd checking logic until we actually benchmark and see that there are no performance regression on older amd cards.

It's already possible to force VAE type. There is no need to have hardcoded is_amd cases. Also it's was reported to cause OOMs on both new and old gpus.

wasd-tech · 2025-06-20T12:04:24Z

@kasper93 sorry to bother you. I tried your fork with this commit and the massive memory usage during VAE Decode persists. Also, adding the following flag: --use-pytorch-cross-attention increase the memory usage during the ksample a lot. I use pytorch nightly, ROCm 6.4.1, Rx 9070 xt in the official almalinux container.

kasper93 · 2025-06-20T12:13:48Z

Well, yes. It still uses lots of memory, but it's less than before and previously fallback to tailed vae would take ages because of conversions, this seems to work more snappy too.

--use-pytorch-cross-attention increase the memory usage during the ksample a lot.

Maybe, but it's a lot faster.

The main goal of this PR is to remove hardcoded conditions and allow to evaluate other components in the pipeline.

wasd-tech · 2025-06-20T12:44:50Z

Thanks for the fast reply and the explanations. I forgot to mention that I haven't noticed any problems with using BF16 so far, if it can helps.

kasper93 requested a review from comfyanonymous as a code owner May 26, 2025 12:22

This was referenced May 26, 2025

[Windows]: HIPRTC_ERROR_COMPILATION on gfx1201 ROCm/TheRock#710

Open

Enable pytorch attention by default on AMD gfx1151 #8282

Merged

kasper93 force-pushed the fix_amd branch from abc15ac to fa1db71 Compare May 26, 2025 12:40

This was referenced May 28, 2025

[Issue]: [Windows] [gfx1151] No suitable algorithm was found to execute the required convolution ROCm/TheRock#724

Closed

Enable FP8 on rocm for gfx12 #8242

Open

kasper93 mentioned this pull request Jun 6, 2025

[Issue]: Performance of 9070xt with ComfyUI ROCm/ROCm#4846

Open

tcgu-amd mentioned this pull request Jun 9, 2025

[Issue]: VAE decode slow, OOM and system crashes on Linux with 9070 XT ROCm/ROCm#4742

Closed

kasper93 added 3 commits June 11, 2025 16:34

Revert "Disable pytorch attention in VAE for AMD."

59c8168

It causes crashes even without pytorch attention for big sizes, and for resonable sizes it is significantly faster. This reverts commit 1cd6cd6.

Enable pytorch attention by default on AMD gfx1200/gfx1201

ebe0b89

It's is significantly faster, 8 it/s vs 12 it/s on 9070 xt, ROCm 6.4.1.

Enable bfloat16 on AMD

8d3e5ac

It works fine on ROCm 6.4.1. Also it is faster and avoid OOMs in VAE Decode. Fixes: comfyanonymous#7400 Fixes: ROCm/ROCm#4742

kasper93 force-pushed the fix_amd branch from fa1db71 to 8d3e5ac Compare June 11, 2025 14:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remove AMD workarounds as they do more harm than good on recent ROCm releases. #8289

Remove AMD workarounds as they do more harm than good on recent ROCm releases. #8289

kasper93 commented May 26, 2025 •

edited

Loading

Uh oh!

comfyanonymous commented May 26, 2025

Uh oh!

kasper93 commented May 26, 2025

Uh oh!

kasper93 commented Jun 11, 2025 •

edited

Loading

Uh oh!

chutchatut commented Jun 11, 2025

Uh oh!

kasper93 commented Jun 11, 2025

Uh oh!

wasd-tech commented Jun 20, 2025

Uh oh!

kasper93 commented Jun 20, 2025 •

edited

Loading

Uh oh!

wasd-tech commented Jun 20, 2025

Uh oh!

Uh oh!

Remove AMD workarounds as they do more harm than good on recent ROCm releases. #8289

Are you sure you want to change the base?

Remove AMD workarounds as they do more harm than good on recent ROCm releases. #8289

Conversation

kasper93 commented May 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

comfyanonymous commented May 26, 2025

Uh oh!

kasper93 commented May 26, 2025

Uh oh!

kasper93 commented Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chutchatut commented Jun 11, 2025

Uh oh!

kasper93 commented Jun 11, 2025

Uh oh!

wasd-tech commented Jun 20, 2025

Uh oh!

kasper93 commented Jun 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wasd-tech commented Jun 20, 2025

Uh oh!

Uh oh!

kasper93 commented May 26, 2025 •

edited

Loading

kasper93 commented Jun 11, 2025 •

edited

Loading

kasper93 commented Jun 20, 2025 •

edited

Loading