MultithreadedExecutor bottlenecking at 1000+ Systems #11378

UsaidPro · 2024-01-17T02:45:38Z

Bevy version

0.12.1

Relevant system information

CPU: AMD Ryzen Threadripper 3970X 32-Core Processor 3.69 GHz
RAM: 32GB

What you did

Hello! I have a use-case that essentially involves separating identical groups of entities. Since Bevy's subworld support is not complete and Bevy does not have shared components (like Unity DOTS), I opted for a solution where I use Rust generics to "duplicate" my systems for every group with a SpareSet marker component. So Marker::<0>, Marker::<1>, ... components and SystemA::<0>, SystemA::<1>, ... systems. The idea was the separate systems/marker components will allow Bevy to properly parallelize logic across groups since there are no cross-group dependencies.

What went wrong

It seems Bevy is bottlenecked by the number of systems for my use-case. Attempting 6000 systems (2000 groups, 3 systems/group) results in 7% CPU utilization with 12 FPS. A Tracy capture indicates that 80+% of the CPU time is spent in the multithreaded executor before sending tasks to my thread pool.
I have created a Github with the capture and code https://github.com/UsaidPro/BevyLotsOfSystems

I was hoping Bevy would distribute the systems across the full thread pool provided by my 32-core CPU. However, instead what happens is 1 core gets consumed by the multithreaded executor which does distribute the tasks across all threads (I see 55+ thread pools in Tracy) but only after taking ~60+ms (80+% of compute time). The multithreaded executor has MTPC of 470us, but it is called 17k times compared to 129 Update calls resulting in 83% of time spent in the single thread.

Here is a table of what systems vs FPS. All these used only 7% of my CPU, same bottleneck. I have 3 systems, 1 of them only runs if run_if() returned true.

Groups	Concurrent Systems	Conditional Systems	FPS
2000	4000	2000	12
1000	2000	1000	40
500	1000	500	60

Additional information

Tracy screenshot:

Tracy capture
Github with code. Uses Bevy Rapier3D, which does not seem to be related to this issue.
- Line in Github where you can set the # of groups you want
- This code may be used as a stress-test for Bevy's scheduler handling lots of systems. Can raise a PR if it would be useful.

The text was updated successfully, but these errors were encountered:

ghost · 2024-01-17T03:01:39Z

Using a headless application is where this can become really obvious fwiw, I noticed the same issues as soon as schedule v3 was merged. see: https://discord.com/channels/691052431525675048/692572690833473578/1115422818012762274

UsaidPro · 2024-01-21T01:50:48Z

Here are FPS comparisons between Bevy versions 0.9.1 and 0.12.1. I was told in the Discord I should test with LTO enabled, but cannot test 0.12.1 with LTO enabled due to an issue.

# of groups	# of Systems	0.12.1 FPS	0.9.1 FPS	0.9.1 LTO
2000	6000	13.211003	15.001518	14.079012
1000	3000	19.089099	43.316860	30.000613

500 groups = 1500 FPS is 60+ FPS for all 3.

Interestingly, LTO enabling reduces FPS for 0.9.1. Not sure why.

james7132 · 2024-02-09T06:28:43Z

I'm very curious what use case you have where you need that many systems, but this makes plenty of sense given that the executor cannot schedule systems fast enough if they're all terminate quickly. There are options like #8304 that has been thrown around, but I'm pretty sure that the contention introduced by it will be on par if not worse than what we see here.

s-puig · 2024-03-29T15:20:48Z

He was running a reinforcement learning simulation and used const generic systems as group markers. His use case would be solved by bevyengine/rfcs#16

james7132 · 2024-04-16T06:57:03Z

#12990 should reduce the overhead by a large amount. Could you test out that PR and see if it works out for you?

james7132 · 2024-04-16T07:04:12Z

With that said, I just opened the provided trace and noticed that the bottleneck may actually be running run conditions, which are all run inline in the multithreaded executor, and the costs of making new spans for them while profiling. In this particular case where the cost of running a system and the run condition are very small, it may actually be better just to embed an early return in the system than to add a run condition.

afonsolage · 2024-04-17T12:28:34Z

I tested removing the run_if and adding an early return, but I saw no difference.

I also made a fork, upgraded to latest bevy and tried to use #12990 but bevy_rapier is incompatible with this version.

I could remove bevy_rapier and add some dumb expensive calculations, but this doesn't seems to be a good use case. I guess I'll try to switch from bevy_rapier to bevy_xpbd, since it's more likely to be compatible with latest bevy. I'll do it maybe tomorrow.

UsaidPro added C-Bug An unexpected or incorrect behavior S-Needs-Triage This issue needs to be labelled labels Jan 17, 2024

alice-i-cecile added A-ECS Entities, components, systems, and events C-Performance A change motivated by improving speed, memory usage or compile times and removed S-Needs-Triage This issue needs to be labelled labels Jan 18, 2024

UsaidPro mentioned this issue Jan 21, 2024

LTO Compilation failure with generic systems "exit code: 0xc0000005, STATUS_ACCESS_VIOLATION" #11446

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MultithreadedExecutor bottlenecking at 1000+ Systems #11378

MultithreadedExecutor bottlenecking at 1000+ Systems #11378

UsaidPro commented Jan 17, 2024

ghost commented Jan 17, 2024 •

edited by ghost

Loading

UsaidPro commented Jan 21, 2024 •

edited

Loading

james7132 commented Feb 9, 2024

s-puig commented Mar 29, 2024

james7132 commented Apr 16, 2024

james7132 commented Apr 16, 2024

afonsolage commented Apr 17, 2024

MultithreadedExecutor bottlenecking at 1000+ Systems #11378

MultithreadedExecutor bottlenecking at 1000+ Systems #11378

Comments

UsaidPro commented Jan 17, 2024

Bevy version

Relevant system information

What you did

What went wrong

Additional information

ghost commented Jan 17, 2024 • edited by ghost Loading

UsaidPro commented Jan 21, 2024 • edited Loading

james7132 commented Feb 9, 2024

s-puig commented Mar 29, 2024

james7132 commented Apr 16, 2024

james7132 commented Apr 16, 2024

afonsolage commented Apr 17, 2024

ghost commented Jan 17, 2024 •

edited by ghost

Loading

UsaidPro commented Jan 21, 2024 •

edited

Loading