Description
Description
Running Vello on an AMD 5700 XT triggers a shader miscompilation at the driver level, which causes incorrect behavior (resulting in device lost).
Repro steps
git clone -b oh_eighteen https://github.com/DJMcNab/vello.git
cd vello
cargo run -p with_winit
Note: this is PR #398 of linebender/vello. The same thing happens on main, but this branch brings us to wgpu 0.18, and I figured it would be more helpful to work on the most recent versions.
Expected vs observed behavior
The example typically displays a couple frames, sometimes correctly and sometimes corrupted, then exits with a device lost error. Expected behavior is to display a tiger test image and performance statistics.
Extra materials
We tracked this down to a very buggy implementation of ZeroInitializeWorkgroupMemory in the AMD driver; the core problem is that it's zeroing the workgroup-shared memory and then proceeding to user code without a barrier. A secondary problem is that it's doing so extremely inefficiently; it appears all threads are zeroing the entire array.
One of the offending shaders is draw_reduce. The post-processed WGSL is attached, as is the SPIR-V output. Note that the spv does not contain any zeroing logic, as spv::ZeroInitializeWorkgroupMemoryMode::Native
was selected in adapter.rs.
I captured the ISA using Radeon Developer Panel, doing ctrl-A, ctrl-C (and choosing inputs so it would run without crashing so I could capture a trace). Maybe there's a better way to do it, if so please let me know. In any case, three things are wrong:
- There is no
s_barrier
between the zeroing logic and the user code - It appears that all invocations in the workgroup zero the entire array. If this were at the SPIR-V level, the conflicting writes would be considered a data race and thus UB, but maybe at the ISA level the behavior is defined. But this is certainly a performance problem if nothing else.
- Speaking of performance problems, almost a thousand lines of ISA to zero an array is clearly not a good idea. The code is just bad, among other things repeatedly zeroes v[4:7] using the
v_lshlrev_b64
instruction.
It makes sense to work around the broken driver by disabling ZeroInitializeWorkgroupMemoryMode::Native
and also escalate the bug to AMD.
Platform
Windows 10. AMD Radeon 5700 XT running driver 2.0.233, API version 1.3.217. This is running in Vulkan through the PRIMARY default. With DX12 selected, the example runs but with pathologically slow shader compile times.