[regression][GPU]: 'func.func' op uses 81920 bytes of shared memory; exceeded the limit of 65536 bytes post 6ff00a8a008d06b604d4ca4e0ae6e601ae810b4f #19511

pdhirajkumarprasad · 2024-12-18T05:23:52Z

What happened?

We have 100+ models failing in GPU which were earlier passing numeric and started post 6ff00a8

module {
  func.func @torch_jit(%arg1:!torch.vtensor<[1,128,4,256],f32>) -> !torch.vtensor<[1,257,4,256],f32    > attributes {torch.onnx_meta.ir_version = 7 : si64, torch.onnx_meta.opset_version = 21 : si64, torch.onnx_meta.producer_name = "pytorch", torch.onnx_meta.producer_version = "1.12.1"} {
    %1 = torch.operator "onnx.Constant"() {torch.onnx.value = dense_resource<_stages.2.1.transformer.0.attn.qkv_proj.weight> : tensor<257x128x1x1xf32>} : () -> !torch.vtensor<[257,128,1,1],f32> 
    %2 = torch.operator "onnx.Constant"() {torch.onnx.value = dense_resource<_stages.2.1.transformer.0.attn.qkv_proj.bias> : tensor<257xf32>} : () -> !torch.vtensor<[257],f32> 
    %3 = torch.operator "onnx.Conv"(%arg1, %1, %2) {torch.onnx.dilations = [1 : si64, 1 : si64], torch.onnx.group = 1 : si64, torch.onnx.kernel_shape = [1 : si64, 1 : si64], torch.onnx.pads = [0 : si64, 0 : si64, 0 : si64, 0 : si64], torch.onnx.strides = [1 : si64, 1 : si64]} : (!torch.vtensor<[1,128,4,256],f32>, !torch.vtensor<[257,128,1,1],f32>, !torch.vtensor<[257],f32>) -> !torch.vtensor<[1,257,4,256],f32> 
    return %3 : !torch.vtensor<[1,257,4,256],f32    >
  }
}

command:

iree-compile model.torch_onnx.mlir --iree-hal-target-backends=rocm --iree-hip-target=gfx942 -o abc.vmfb

model.torch_onnx.mlir.txt

Steps to reproduce your issue

Go to '...'
Click on '....'
Scroll down to '....'
See error

What component(s) does this issue relate to?

Compiler

Version information

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

ScottTodd · 2024-12-18T05:46:18Z

Which 100 models? Please be specific in bug reports.

ScottTodd · 2024-12-18T16:31:44Z

Here's the list of test failures: https://github.com/nod-ai/e2eshark-reports/blob/main/2024-12-17/ci_reports_onnx/rocm/combined-reports/yesterday_comparison.md#317-regressions-found. Lots in the "vit" family (vision transformers)

ScottTodd · 2024-12-18T21:58:14Z

Good news: with the existing tests in https://github.com/iree-org/iree-test-suites/tree/main/onnx_models, I can reproduce a regression in a few models like mobilenet and resnet50 using --iree-hal-target-backends=rocm --iree-hip-target=gfx1100. So once I land PRs to get those tests running on multiple backends, we should have earlier signal for these sorts of regressions.

ScottTodd · 2024-12-19T17:47:02Z

Repro using resnet from iree-org/iree-test-suites#65:

# Setup
cd onnx_models
.\.venv\Scripts\activate.bat
pip install --upgrade -r requirements-iree.txt

# Run test that passes
pytest --log-cli-level=info -rA --durations=0 -k resnet --test-config-file=./configs/onnx_models_gpu_rocm_rdna3.json

# Switch to broken release, run again to see failure
pip install --find-links https://iree.dev/pip-release-links.html iree-base-compiler==3.1.0rc20241217
pytest --log-cli-level=info -rA --durations=0 -k resnet --test-config-file=./configs/onnx_models_gpu_rocm_rdna3.json

Once that PR and #19524 land we'll have that coverage on all IREE PRs.

ScottTodd · 2024-12-19T17:53:48Z

I also noticed that this error spews a looooot of logs (multiple thousands of lines):

iree/compiler/src/iree/compiler/Codegen/Common/GPU/GPUCheckResourceUsage.cpp

Lines 85 to 87 in fb4d094

    
           return funcOp.emitOpError("uses ") 
        
                  << cumSize << " bytes of shared memory; exceeded the limit of " 
        
                  << limit << " bytes";

Maybe we don't need the entire context?

ScottTodd · 2025-01-10T23:41:22Z

This issue has been fixed since iree-3.1.0rc20241219, at least as I can reproduce it in iree-test-suites with resnet50-v2-7.onnx tests.

Commit range: iree-3.1.0rc20241218...iree-3.1.0rc20241219 . Possibly fixed by #19508.

Still working on getting those tests landed and enabled on presubmits.

pdhirajkumarprasad added the bug 🐞 Something isn't working label Dec 18, 2024

pdhirajkumarprasad assigned pashu123 Dec 18, 2024

MaheshRavishankar mentioned this issue Dec 18, 2024

Llama 3.1 8B fp16 prefill/decode generates nan logits in sharktank #19506

Closed

ScottTodd closed this as completed Jan 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[regression][GPU]: 'func.func' op uses 81920 bytes of shared memory; exceeded the limit of 65536 bytes post 6ff00a8a008d06b604d4ca4e0ae6e601ae810b4f #19511

[regression][GPU]: 'func.func' op uses 81920 bytes of shared memory; exceeded the limit of 65536 bytes post 6ff00a8a008d06b604d4ca4e0ae6e601ae810b4f #19511

pdhirajkumarprasad commented Dec 18, 2024

ScottTodd commented Dec 18, 2024

ScottTodd commented Dec 18, 2024

ScottTodd commented Dec 18, 2024

ScottTodd commented Dec 19, 2024 •

edited

Loading

ScottTodd commented Dec 19, 2024

ScottTodd commented Jan 10, 2025

[regression][GPU]: 'func.func' op uses 81920 bytes of shared memory; exceeded the limit of 65536 bytes post 6ff00a8a008d06b604d4ca4e0ae6e601ae810b4f #19511

[regression][GPU]: 'func.func' op uses 81920 bytes of shared memory; exceeded the limit of 65536 bytes post 6ff00a8a008d06b604d4ca4e0ae6e601ae810b4f #19511

Comments

pdhirajkumarprasad commented Dec 18, 2024

What happened?

Steps to reproduce your issue

What component(s) does this issue relate to?

Version information

Additional context

ScottTodd commented Dec 18, 2024

ScottTodd commented Dec 18, 2024

ScottTodd commented Dec 18, 2024

ScottTodd commented Dec 19, 2024 • edited Loading

ScottTodd commented Dec 19, 2024

ScottTodd commented Jan 10, 2025

ScottTodd commented Dec 19, 2024 •

edited

Loading