Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WebGPURenderer Performance Significantly Lower Than WebGLRenderer #30560

Open
tenkkov opened this issue Feb 19, 2025 · 6 comments
Open

WebGPURenderer Performance Significantly Lower Than WebGLRenderer #30560

tenkkov opened this issue Feb 19, 2025 · 6 comments

Comments

@tenkkov
Copy link

tenkkov commented Feb 19, 2025

Description

Summary

When switching from WebGLRenderer to WebGPURenderer, I experience a significant drop in performance. The same scene, containing thousands of non-instanced meshes, runs smoothly at 60 FPS on WebGL but drops to 15 FPS on WebGPU, a 4x decrease in performance.

Expected Behavior

WebGPURenderer should provide comparable or better performance than WebGLRenderer, given its modern API and intended improvements over WebGL.

Current Behavior

Rendering 20,000 non-instanced basic cube meshes:

WebGLRenderer: ~60 FPS on Mac (Apple Silicon M1 Pro)

WebGPURenderer: ~15 FPS (4x slower)

No errors or warnings appear in the Chrome console.

Reproduction steps

  1. Create a Three.js scene with 20,000 Mesh instances.
  2. Use WebGLRenderer and observe smooth 60 FPS performance.
  3. Switch to WebGPURenderer by uncommenting the renderer swap.
  4. Observe FPS dropping significantly (down to 15 FPS).

Code

see live example below

Live example

https://jsfiddle.net/15zfestk/1/

Screenshots

No response

Version

r.0.173.0

Device

Desktop

Browser

Chrome

OS

MacOS

@Mugen87
Copy link
Collaborator

Mugen87 commented Feb 19, 2025

@Mugen87
Copy link
Collaborator

Mugen87 commented Feb 19, 2025

The live example uses more or less the worst case setup for the renderer. Many objects which update their transformation every frame. Since the example uses no instancing or batching, existing performance issues in WebGPURenderer are exhibited.

Because every object has its own UBO for managing their object scope uniforms, each frame all UBOs must be bound and updated which seems to cause a considerable amount of overhead. The WebGL backend spends most time for the bindBufferBase(), bindBuffer() and bufferData() calls.

Image

@Mugen87
Copy link
Collaborator

Mugen87 commented Feb 19, 2025

To further explain the major performance gap: This is how WebGLRenderer renders the scene when four cubes are configured:

Image

There are no major state changes between the draw calls (except for some single uniform updates which are not displayed in the list). Compared to that, WebGPURenderer does the following:

Image

As you can see, there is a considerable amount of state changes between each drawElements() command. Many scenes won't have an issue with this because the number of render objects is low. But the more render objects you have in a scene, the sooner WebGPURenderer gets CPU limited.

#30562 fixes the VAO related issues but they are unfortunately negligible compared to the UBO related overhead. I guess we need a different approach in the renderer to minimize these state changes.

@sunag @RenaudRohlinger @aardgoose Would be a single UBO for all object scope uniforms a potential solution?

@RenaudRohlinger
Copy link
Collaborator

Nice catch with the VAO!

Related (I like the CommonUniformBuffer interface):
#27388

Unless we implement a pool system I don't think we can use a single UBO for all object-scope uniforms as a potential solution since we'd be very limited by the number of meshes. With a typical 16KB max block size and each mat4 taking 64 bytes in std140, that limits us to about 256 meshes.

@Mugen87
Copy link
Collaborator

Mugen87 commented Feb 20, 2025

Good to know that. I hope we can revisit #27388 soon.

@CodyJasonBennett
Copy link
Contributor

With a typical 16KB max block size

This is the limitation per draw call as guaranteed by the WebGL 2 specification. You can have a larger buffer bound and adjust the offset dynamically. I've shared many words of this and scheduling in general, but I'm not sure they've been heard, reading this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants