[PERF] [FLUX] Flux.1 Dev Transformer perf tracker #19751
Labels
codegen/rocm
ROCm code generation compiler backend (HIP/HSA)
codegen
Shared code generation infrastructure and dialects
performance ⚡
Performance/optimization related work across the compiler and runtime
What happened?
Flux1 Dev Transformer MLIR and Wts (Real Model)
Artefacts - (tracy, *dispatch.mlir, *benchmark.mlir)
TOP 6 Dispatches (01/21):
Attention: dispatch_535, dispatch_37
MatVec_like: dispatch_19, dispatch_526
matmul_transpose_b: dispatch_528, dispatch_538
Compile:
../iree-build-trace/tools/iree-compile \ black-forest-labs--FLUX.1-dev--black-forest-labs-transformer-bf16.mlir \ -o black-forest-labs--FLUX.1-dev--black-forest-labs-transformer-bf16-trace.vmfb \ --iree-hal-executable-debug-level=3 \ --iree-hal-dump-executable-files-to=dump_real \ --iree-hal-target-device=hip \ --iree-hip-target=gfx942 \ --iree-opt-const-eval=false \ --iree-opt-strip-assertions=true \ --iree-global-opt-propagate-transposes=true \ --iree-dispatch-creation-enable-fuse-horizontal-contractions=true \ --iree-dispatch-creation-enable-aggressive-fusion=true \ --iree-opt-aggressively-propagate-transposes=true \ --iree-opt-outer-dim-concat=true \ --iree-vm-target-truncate-unsupported-floats \ --iree-llvmgpu-enable-prefetch=true \ --iree-opt-data-tiling=false \ --iree-codegen-gpu-native-math-precision=true \ --iree-codegen-llvmgpu-use-vector-distribution \ --iree-hip-waves-per-eu=2 \ --iree-execution-model=async-external \ "--iree-preprocessing-pass-pipeline=builtin.module(iree-preprocessing-transpose-convolution-pipeline,iree-preprocessing-pad-to-intrinsics)"
Run (with all 1s input):
Benchmark (01/21):
Compile:
../iree-build/tools/iree-compile \ flux1-dev-data/black-forest-labs--FLUX.1-dev--black-forest-labs-transformer-bf16.mlir \ -o black-forest-labs--FLUX.1-dev--black-forest-labs-transformer-bf16-benchmark.vmfb \ --iree-hal-target-device=hip \ --iree-hip-target=gfx942 \ --iree-opt-const-eval=false \ --iree-opt-strip-assertions=true \ --iree-global-opt-propagate-transposes=true \ --iree-dispatch-creation-enable-fuse-horizontal-contractions=true \ --iree-dispatch-creation-enable-aggressive-fusion=true \ --iree-opt-aggressively-propagate-transposes=true \ --iree-opt-outer-dim-concat=true \ --iree-vm-target-truncate-unsupported-floats \ --iree-llvmgpu-enable-prefetch=true \ --iree-opt-data-tiling=false \ --iree-codegen-gpu-native-math-precision=true \ --iree-codegen-llvmgpu-use-vector-distribution \ --iree-hip-waves-per-eu=2 \ --iree-execution-model=async-external \ "--iree-preprocessing-pass-pipeline=builtin.module(iree-preprocessing-transpose-convolution-pipeline,iree-preprocessing-pad-to-intrinsics)"
Run:
Steps to reproduce your issue
What component(s) does this issue relate to?
No response
Version information
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: