Skip to content

Commit 1c96d9b

Browse files
hymmItsDoot
authored andcommitted
Pipelined Rendering (bevyengine#6503)
# Objective - Implement pipelined rendering - Fixes bevyengine#5082 - Fixes bevyengine#4718 ## User Facing Description Bevy now implements piplelined rendering! Pipelined rendering allows the app logic and rendering logic to run on different threads leading to large gains in performance. ![image](https://user-images.githubusercontent.com/2180432/202049871-3c00b801-58ab-448f-93fd-471e30aba55f.png) *tracy capture of many_foxes example* To use pipelined rendering, you just need to add the `PipelinedRenderingPlugin`. If you're using `DefaultPlugins` then it will automatically be added for you on all platforms except wasm. Bevy does not currently support multithreading on wasm which is needed for this feature to work. If you aren't using `DefaultPlugins` you can add the plugin manually. ```rust use bevy::prelude::*; use bevy::render::pipelined_rendering::PipelinedRenderingPlugin; fn main() { App::new() // whatever other plugins you need .add_plugin(RenderPlugin) // needs to be added after RenderPlugin .add_plugin(PipelinedRenderingPlugin) .run(); } ``` If for some reason pipelined rendering needs to be removed. You can also disable the plugin the normal way. ```rust use bevy::prelude::*; use bevy::render::pipelined_rendering::PipelinedRenderingPlugin; fn main() { App::new.add_plugins(DefaultPlugins.build().disable::<PipelinedRenderingPlugin>()); } ``` ### A setup function was added to plugins A optional plugin lifecycle function was added to the `Plugin trait`. This function is called after all plugins have been built, but before the app runner is called. This allows for some final setup to be done. In the case of pipelined rendering, the function removes the sub app from the main app and sends it to the render thread. ```rust struct MyPlugin; impl Plugin for MyPlugin { fn build(&self, app: &mut App) { } // optional function fn setup(&self, app: &mut App) { // do some final setup before runner is called } } ``` ### A Stage for Frame Pacing In the `RenderExtractApp` there is a stage labelled `BeforeIoAfterRenderStart` that systems can be added to. The specific use case for this stage is for a frame pacing system that can delay the start of main app processing in render bound apps to reduce input latency i.e. "frame pacing". This is not currently built into bevy, but exists as `bevy` ```text |-------------------------------------------------------------------| | | BeforeIoAfterRenderStart | winit events | main schedule | | extract |---------------------------------------------------------| | | extract commands | rendering schedule | |-------------------------------------------------------------------| ``` ### Small API additions * `Schedule::remove_stage` * `App::insert_sub_app` * `App::remove_sub_app` * `TaskPool::scope_with_executor` ## Problems and Solutions ### Moving render app to another thread Most of the hard bits for this were done with the render redo. This PR just sends the render app back and forth through channels which seems to work ok. I originally experimented with using a scope to run the render task. It was cuter, but that approach didn't allow render to start before i/o processing. So I switched to using channels. There is much complexity in the coordination that needs to be done, but it's worth it. By moving rendering during i/o processing the frame times should be much more consistent in render bound apps. See bevyengine#4691. ### Unsoundness with Sending World with NonSend resources Dropping !Send things on threads other than the thread they were spawned on is considered unsound. The render world doesn't have any nonsend resources. So if we tell the users to "pretty please don't spawn nonsend resource on the render world", we can avoid this problem. More seriously there is this bevyengine#6534 pr, which patches the unsoundness by aborting the app if a nonsend resource is dropped on the wrong thread. ~~That PR should probably be merged before this one.~~ For a longer term solution we have this discussion going bevyengine#6552. ### NonSend Systems in render world The render world doesn't have any !Send resources, but it does have a non send system. While Window is Send, winit does have some API's that can only be accessed on the main thread. `prepare_windows` in the render schedule thus needs to be scheduled on the main thread. Currently we run nonsend systems by running them on the thread the TaskPool::scope runs on. When we move render to another thread this no longer works. To fix this, a new `scope_with_executor` method was added that takes a optional `TheadExecutor` that can only be ticked on the thread it was initialized on. The render world then holds a `MainThreadExecutor` resource which can be passed to the scope in the parallel executor that it uses to spawn it's non send systems on. ### Scopes executors between render and main should not share tasks Since the render world and the app world share the `ComputeTaskPool`. Because `scope` has executors for the ComputeTaskPool a system from the main world could run on the render thread or a render system could run on the main thread. This can cause performance problems because it can delay a stage from finishing. See bevyengine#6503 (comment) for more details. To avoid this problem, `TaskPool::scope` has been changed to not tick the ComputeTaskPool when it's used by the parallel executor. In the future when we move closer to the 1 thread to 1 logical core model we may want to overprovide threads, because the render and main app threads don't do much when executing the schedule. ## Performance My machine is Windows 11, AMD Ryzen 5600x, RX 6600 ### Examples #### This PR with pipelining vs Main > Note that these were run on an older version of main and the performance profile has probably changed due to optimizations Seeing a perf gain from 29% on many lights to 7% on many sprites. <html> <body> <!--StartFragment--><google-sheets-html-origin>   | percent |   |   | Diff |   |   | Main |   |   | PR |   |   -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- tracy frame time | mean | median | sigma | mean | median | sigma | mean | median | sigma | mean | median | sigma many foxes | 27.01% | 27.34% | -47.09% | 1.58 | 1.55 | -1.78 | 5.85 | 5.67 | 3.78 | 4.27 | 4.12 | 5.56 many lights | 29.35% | 29.94% | -10.84% | 3.02 | 3.03 | -0.57 | 10.29 | 10.12 | 5.26 | 7.27 | 7.09 | 5.83 many animated sprites | 13.97% | 15.69% | 14.20% | 3.79 | 4.17 | 1.41 | 27.12 | 26.57 | 9.93 | 23.33 | 22.4 | 8.52 3d scene | 25.79% | 26.78% | 7.46% | 0.49 | 0.49 | 0.15 | 1.9 | 1.83 | 2.01 | 1.41 | 1.34 | 1.86 many cubes | 11.97% | 11.28% | 14.51% | 1.93 | 1.78 | 1.31 | 16.13 | 15.78 | 9.03 | 14.2 | 14 | 7.72 many sprites | 7.14% | 9.42% | -85.42% | 1.72 | 2.23 | -6.15 | 24.09 | 23.68 | 7.2 | 22.37 | 21.45 | 13.35 <!--EndFragment--> </body> </html> #### This PR with pipelining disabled vs Main Mostly regressions here. I don't think this should be a problem as users that are disabling pipelined rendering are probably running single threaded and not using the parallel executor. The regression is probably mostly due to the switch to use `async_executor::run` instead of `try_tick` and also having one less thread to run systems on. I'll do a writeup on why switching to `run` causes regressions, so we can try to eventually fix it. Using try_tick causes issues when pipeline rendering is enable as seen [here](bevyengine#6503 (comment)) <html> <body> <!--StartFragment--><google-sheets-html-origin>   | percent |   |   | Diff |   |   | Main |   |   | PR no pipelining |   |   -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- tracy frame time | mean | median | sigma | mean | median | sigma | mean | median | sigma | mean | median | sigma many foxes | -3.72% | -4.42% | -1.07% | -0.21 | -0.24 | -0.04 | 5.64 | 5.43 | 3.74 | 5.85 | 5.67 | 3.78 many lights | 0.29% | -0.30% | 4.75% | 0.03 | -0.03 | 0.25 | 10.29 | 10.12 | 5.26 | 10.26 | 10.15 | 5.01 many animated sprites | 0.22% | 1.81% | -2.72% | 0.06 | 0.48 | -0.27 | 27.12 | 26.57 | 9.93 | 27.06 | 26.09 | 10.2 3d scene | -15.79% | -14.75% | -31.34% | -0.3 | -0.27 | -0.63 | 1.9 | 1.83 | 2.01 | 2.2 | 2.1 | 2.64 many cubes | -2.85% | -3.30% | 0.00% | -0.46 | -0.52 | 0 | 16.13 | 15.78 | 9.03 | 16.59 | 16.3 | 9.03 many sprites | 2.49% | 2.41% | 0.69% | 0.6 | 0.57 | 0.05 | 24.09 | 23.68 | 7.2 | 23.49 | 23.11 | 7.15 <!--EndFragment--> </body> </html> ### Benchmarks Mostly the same except empty_systems has got a touch slower. The maybe_pipelining+1 column has the compute task pool with an extra thread over default added. This is because pipelining loses one thread over main to execute systems on, since the main thread no longer runs normal systems. <details> <summary>Click Me</summary> ```text group main maybe-pipelining+1 ----- ------------------------- ------------------ busy_systems/01x_entities_03_systems 1.07 30.7±1.32µs ? ?/sec 1.00 28.6±1.35µs ? ?/sec busy_systems/01x_entities_06_systems 1.10 52.1±1.10µs ? ?/sec 1.00 47.2±1.08µs ? ?/sec busy_systems/01x_entities_09_systems 1.00 74.6±1.36µs ? ?/sec 1.00 75.0±1.93µs ? ?/sec busy_systems/01x_entities_12_systems 1.03 100.6±6.68µs ? ?/sec 1.00 98.0±1.46µs ? ?/sec busy_systems/01x_entities_15_systems 1.11 128.5±3.53µs ? ?/sec 1.00 115.5±1.02µs ? ?/sec busy_systems/02x_entities_03_systems 1.16 50.4±2.56µs ? ?/sec 1.00 43.5±3.00µs ? ?/sec busy_systems/02x_entities_06_systems 1.00 87.1±1.27µs ? ?/sec 1.05 91.5±7.15µs ? ?/sec busy_systems/02x_entities_09_systems 1.04 139.9±6.37µs ? ?/sec 1.00 134.0±1.06µs ? ?/sec busy_systems/02x_entities_12_systems 1.05 179.2±3.47µs ? ?/sec 1.00 170.1±3.17µs ? ?/sec busy_systems/02x_entities_15_systems 1.01 219.6±3.75µs ? ?/sec 1.00 218.1±2.55µs ? ?/sec busy_systems/03x_entities_03_systems 1.10 70.6±2.33µs ? ?/sec 1.00 64.3±0.69µs ? ?/sec busy_systems/03x_entities_06_systems 1.02 130.2±3.11µs ? ?/sec 1.00 128.0±1.34µs ? ?/sec busy_systems/03x_entities_09_systems 1.00 195.0±10.11µs ? ?/sec 1.00 194.8±1.41µs ? ?/sec busy_systems/03x_entities_12_systems 1.01 261.7±4.05µs ? ?/sec 1.00 259.8±4.11µs ? ?/sec busy_systems/03x_entities_15_systems 1.00 318.0±3.04µs ? ?/sec 1.06 338.3±20.25µs ? ?/sec busy_systems/04x_entities_03_systems 1.00 82.9±0.63µs ? ?/sec 1.02 84.3±0.63µs ? ?/sec busy_systems/04x_entities_06_systems 1.01 181.7±3.65µs ? ?/sec 1.00 179.8±1.76µs ? ?/sec busy_systems/04x_entities_09_systems 1.04 265.0±4.68µs ? ?/sec 1.00 255.3±1.98µs ? ?/sec busy_systems/04x_entities_12_systems 1.00 335.9±3.00µs ? ?/sec 1.05 352.6±15.84µs ? ?/sec busy_systems/04x_entities_15_systems 1.00 418.6±10.26µs ? ?/sec 1.08 450.2±39.58µs ? ?/sec busy_systems/05x_entities_03_systems 1.07 114.3±0.95µs ? ?/sec 1.00 106.9±1.52µs ? ?/sec busy_systems/05x_entities_06_systems 1.08 229.8±2.90µs ? ?/sec 1.00 212.3±4.18µs ? ?/sec busy_systems/05x_entities_09_systems 1.03 329.3±1.99µs ? ?/sec 1.00 319.2±2.43µs ? ?/sec busy_systems/05x_entities_12_systems 1.06 454.7±6.77µs ? ?/sec 1.00 430.1±3.58µs ? ?/sec busy_systems/05x_entities_15_systems 1.03 554.6±6.15µs ? ?/sec 1.00 538.4±23.87µs ? ?/sec contrived/01x_entities_03_systems 1.00 14.0±0.15µs ? ?/sec 1.08 15.1±0.21µs ? ?/sec contrived/01x_entities_06_systems 1.04 28.5±0.37µs ? ?/sec 1.00 27.4±0.44µs ? ?/sec contrived/01x_entities_09_systems 1.00 41.5±4.38µs ? ?/sec 1.02 42.2±2.24µs ? ?/sec contrived/01x_entities_12_systems 1.06 55.9±1.49µs ? ?/sec 1.00 52.6±1.36µs ? ?/sec contrived/01x_entities_15_systems 1.02 68.0±2.00µs ? ?/sec 1.00 66.5±0.78µs ? ?/sec contrived/02x_entities_03_systems 1.03 25.2±0.38µs ? ?/sec 1.00 24.6±0.52µs ? ?/sec contrived/02x_entities_06_systems 1.00 46.3±0.49µs ? ?/sec 1.04 48.1±4.13µs ? ?/sec contrived/02x_entities_09_systems 1.02 70.4±0.99µs ? ?/sec 1.00 68.8±1.04µs ? ?/sec contrived/02x_entities_12_systems 1.06 96.8±1.49µs ? ?/sec 1.00 91.5±0.93µs ? ?/sec contrived/02x_entities_15_systems 1.02 116.2±0.95µs ? ?/sec 1.00 114.2±1.42µs ? ?/sec contrived/03x_entities_03_systems 1.00 33.2±0.38µs ? ?/sec 1.01 33.6±0.45µs ? ?/sec contrived/03x_entities_06_systems 1.00 62.4±0.73µs ? ?/sec 1.01 63.3±1.05µs ? ?/sec contrived/03x_entities_09_systems 1.02 96.4±0.85µs ? ?/sec 1.00 94.8±3.02µs ? ?/sec contrived/03x_entities_12_systems 1.01 126.3±4.67µs ? ?/sec 1.00 125.6±2.27µs ? ?/sec contrived/03x_entities_15_systems 1.03 160.2±9.37µs ? ?/sec 1.00 156.0±1.53µs ? ?/sec contrived/04x_entities_03_systems 1.02 41.4±3.39µs ? ?/sec 1.00 40.5±0.52µs ? ?/sec contrived/04x_entities_06_systems 1.00 78.9±1.61µs ? ?/sec 1.02 80.3±1.06µs ? ?/sec contrived/04x_entities_09_systems 1.02 121.8±3.97µs ? ?/sec 1.00 119.2±1.46µs ? ?/sec contrived/04x_entities_12_systems 1.00 157.8±1.48µs ? ?/sec 1.01 160.1±1.72µs ? ?/sec contrived/04x_entities_15_systems 1.00 197.9±1.47µs ? ?/sec 1.08 214.2±34.61µs ? ?/sec contrived/05x_entities_03_systems 1.00 49.1±0.33µs ? ?/sec 1.01 49.7±0.75µs ? ?/sec contrived/05x_entities_06_systems 1.00 95.0±0.93µs ? ?/sec 1.00 94.6±0.94µs ? ?/sec contrived/05x_entities_09_systems 1.01 143.2±1.68µs ? ?/sec 1.00 142.2±2.00µs ? ?/sec contrived/05x_entities_12_systems 1.00 191.8±2.03µs ? ?/sec 1.01 192.7±7.88µs ? ?/sec contrived/05x_entities_15_systems 1.02 239.7±3.71µs ? ?/sec 1.00 235.8±4.11µs ? ?/sec empty_systems/000_systems 1.01 47.8±0.67ns ? ?/sec 1.00 47.5±2.02ns ? ?/sec empty_systems/001_systems 1.00 1743.2±126.14ns ? ?/sec 1.01 1761.1±70.10ns ? ?/sec empty_systems/002_systems 1.01 2.2±0.04µs ? ?/sec 1.00 2.2±0.02µs ? ?/sec empty_systems/003_systems 1.02 2.7±0.09µs ? ?/sec 1.00 2.7±0.16µs ? ?/sec empty_systems/004_systems 1.00 3.1±0.11µs ? ?/sec 1.00 3.1±0.24µs ? ?/sec empty_systems/005_systems 1.00 3.5±0.05µs ? ?/sec 1.11 3.9±0.70µs ? ?/sec empty_systems/010_systems 1.00 5.5±0.12µs ? ?/sec 1.03 5.7±0.17µs ? ?/sec empty_systems/015_systems 1.00 7.9±0.19µs ? ?/sec 1.06 8.4±0.16µs ? ?/sec empty_systems/020_systems 1.00 10.4±1.25µs ? ?/sec 1.02 10.6±0.18µs ? ?/sec empty_systems/025_systems 1.00 12.4±0.39µs ? ?/sec 1.14 14.1±1.07µs ? ?/sec empty_systems/030_systems 1.00 15.1±0.39µs ? ?/sec 1.05 15.8±0.62µs ? ?/sec empty_systems/035_systems 1.00 16.9±0.47µs ? ?/sec 1.07 18.0±0.37µs ? ?/sec empty_systems/040_systems 1.00 19.3±0.41µs ? ?/sec 1.05 20.3±0.39µs ? ?/sec empty_systems/045_systems 1.00 22.4±1.67µs ? ?/sec 1.02 22.9±0.51µs ? ?/sec empty_systems/050_systems 1.00 24.4±1.67µs ? ?/sec 1.01 24.7±0.40µs ? ?/sec empty_systems/055_systems 1.05 28.6±5.27µs ? ?/sec 1.00 27.2±0.70µs ? ?/sec empty_systems/060_systems 1.02 29.9±1.64µs ? ?/sec 1.00 29.3±0.66µs ? ?/sec empty_systems/065_systems 1.02 32.7±3.15µs ? ?/sec 1.00 32.1±0.98µs ? ?/sec empty_systems/070_systems 1.00 33.0±1.42µs ? ?/sec 1.03 34.1±1.44µs ? ?/sec empty_systems/075_systems 1.00 34.8±0.89µs ? ?/sec 1.04 36.2±0.70µs ? ?/sec empty_systems/080_systems 1.00 37.0±1.82µs ? ?/sec 1.05 38.7±1.37µs ? ?/sec empty_systems/085_systems 1.00 38.7±0.76µs ? ?/sec 1.05 40.8±0.83µs ? ?/sec empty_systems/090_systems 1.00 41.5±1.09µs ? ?/sec 1.04 43.2±0.82µs ? ?/sec empty_systems/095_systems 1.00 43.6±1.10µs ? ?/sec 1.04 45.2±0.99µs ? ?/sec empty_systems/100_systems 1.00 46.7±2.27µs ? ?/sec 1.03 48.1±1.25µs ? ?/sec ``` </details> ## Migration Guide ### App `runner` and SubApp `extract` functions are now required to be Send This was changed to enable pipelined rendering. If this breaks your use case please report it as these new bounds might be able to be relaxed. ## ToDo * [x] redo benchmarking * [x] reinvestigate the perf of the try_tick -> run change for task pool scope
1 parent e030b24 commit 1c96d9b

File tree

9 files changed

+488
-99
lines changed

9 files changed

+488
-99
lines changed

crates/bevy_app/src/app.rs

Lines changed: 72 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,7 @@ pub struct App {
6767
/// the application's event loop and advancing the [`Schedule`].
6868
/// Typically, it is not configured manually, but set by one of Bevy's built-in plugins.
6969
/// See `bevy::winit::WinitPlugin` and [`ScheduleRunnerPlugin`](crate::schedule_runner::ScheduleRunnerPlugin).
70-
pub runner: Box<dyn Fn(App)>,
70+
pub runner: Box<dyn Fn(App) + Send>, // Send bound is required to make App Send
7171
/// A container of [`Stage`]s set to be run in a linear order.
7272
pub schedule: Schedule,
7373
sub_apps: HashMap<AppLabelId, SubApp>,
@@ -87,10 +87,55 @@ impl Debug for App {
8787
}
8888
}
8989

90-
/// Each `SubApp` has its own [`Schedule`] and [`World`], enabling a separation of concerns.
91-
struct SubApp {
92-
app: App,
93-
extract: Box<dyn Fn(&mut World, &mut App)>,
90+
/// A [`SubApp`] contains its own [`Schedule`] and [`World`] separate from the main [`App`].
91+
/// This is useful for situations where data and data processing should be kept completely separate
92+
/// from the main application. The primary use of this feature in bevy is to enable pipelined rendering.
93+
///
94+
/// # Example
95+
///
96+
/// ```rust
97+
/// # use bevy_app::{App, AppLabel};
98+
/// # use bevy_ecs::prelude::*;
99+
///
100+
/// #[derive(Resource, Default)]
101+
/// struct Val(pub i32);
102+
///
103+
/// #[derive(Debug, Clone, Copy, Hash, PartialEq, Eq, AppLabel)]
104+
/// struct ExampleApp;
105+
///
106+
/// #[derive(Debug, Hash, PartialEq, Eq, Clone, StageLabel)]
107+
/// struct ExampleStage;
108+
///
109+
/// let mut app = App::empty();
110+
/// // initialize the main app with a value of 0;
111+
/// app.insert_resource(Val(10));
112+
///
113+
/// // create a app with a resource and a single stage
114+
/// let mut sub_app = App::empty();
115+
/// sub_app.insert_resource(Val(100));
116+
/// let mut example_stage = SystemStage::single_threaded();
117+
/// example_stage.add_system(|counter: Res<Val>| {
118+
/// // since we assigned the value from the main world in extract
119+
/// // we see that value instead of 100
120+
/// assert_eq!(counter.0, 10);
121+
/// });
122+
/// sub_app.add_stage(ExampleStage, example_stage);
123+
///
124+
/// // add the sub_app to the app
125+
/// app.add_sub_app(ExampleApp, sub_app, |main_world, sub_app| {
126+
/// sub_app.world.resource_mut::<Val>().0 = main_world.resource::<Val>().0;
127+
/// });
128+
///
129+
/// // This will run the schedules once, since we're using the default runner
130+
/// app.run();
131+
/// ```
132+
pub struct SubApp {
133+
/// The [`SubApp`]'s instance of [`App`]
134+
pub app: App,
135+
136+
/// A function that allows access to both the [`SubApp`] [`World`] and the main [`App`]. This is
137+
/// useful for moving data between the sub app and the main app.
138+
pub extract: Box<dyn Fn(&mut World, &mut App) + Send>,
94139
}
95140

96141
impl SubApp {
@@ -161,11 +206,14 @@ impl App {
161206
///
162207
/// See [`add_sub_app`](Self::add_sub_app) and [`run_once`](Schedule::run_once) for more details.
163208
pub fn update(&mut self) {
164-
#[cfg(feature = "trace")]
165-
let _bevy_frame_update_span = info_span!("frame").entered();
166-
self.schedule.run(&mut self.world);
167-
168-
for sub_app in self.sub_apps.values_mut() {
209+
{
210+
#[cfg(feature = "trace")]
211+
let _bevy_frame_update_span = info_span!("main app").entered();
212+
self.schedule.run(&mut self.world);
213+
}
214+
for (_label, sub_app) in self.sub_apps.iter_mut() {
215+
#[cfg(feature = "trace")]
216+
let _sub_app_span = info_span!("sub app", name = ?_label).entered();
169217
sub_app.extract(&mut self.world);
170218
sub_app.run();
171219
}
@@ -850,7 +898,7 @@ impl App {
850898
/// App::new()
851899
/// .set_runner(my_runner);
852900
/// ```
853-
pub fn set_runner(&mut self, run_fn: impl Fn(App) + 'static) -> &mut Self {
901+
pub fn set_runner(&mut self, run_fn: impl Fn(App) + 'static + Send) -> &mut Self {
854902
self.runner = Box::new(run_fn);
855903
self
856904
}
@@ -1035,14 +1083,15 @@ impl App {
10351083

10361084
/// Adds an [`App`] as a child of the current one.
10371085
///
1038-
/// The provided function `sub_app_runner` is called by the [`update`](Self::update) method. The [`World`]
1086+
/// The provided function `extract` is normally called by the [`update`](Self::update) method.
1087+
/// After extract is called, the [`Schedule`] of the sub app is run. The [`World`]
10391088
/// parameter represents the main app world, while the [`App`] parameter is just a mutable
10401089
/// reference to the `SubApp` itself.
10411090
pub fn add_sub_app(
10421091
&mut self,
10431092
label: impl AppLabel,
10441093
app: App,
1045-
extract: impl Fn(&mut World, &mut App) + 'static,
1094+
extract: impl Fn(&mut World, &mut App) + 'static + Send,
10461095
) -> &mut Self {
10471096
self.sub_apps.insert(
10481097
label.as_label(),
@@ -1088,6 +1137,16 @@ impl App {
10881137
}
10891138
}
10901139

1140+
/// Inserts an existing sub app into the app
1141+
pub fn insert_sub_app(&mut self, label: impl AppLabel, sub_app: SubApp) {
1142+
self.sub_apps.insert(label.as_label(), sub_app);
1143+
}
1144+
1145+
/// Removes a sub app from the app. Returns [`None`] if the label doesn't exist.
1146+
pub fn remove_sub_app(&mut self, label: impl AppLabel) -> Option<SubApp> {
1147+
self.sub_apps.remove(&label.as_label())
1148+
}
1149+
10911150
/// Retrieves a `SubApp` inside this [`App`] with the given label, if it exists. Otherwise returns
10921151
/// an [`Err`] containing the given label.
10931152
pub fn get_sub_app(&self, label: impl AppLabel) -> Result<&App, impl AppLabel> {

crates/bevy_ecs/src/schedule/executor_parallel.rs

Lines changed: 52 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,15 @@
1+
use std::sync::Arc;
2+
3+
use crate as bevy_ecs;
14
use crate::{
25
archetype::ArchetypeComponentId,
36
query::Access,
47
schedule::{ParallelSystemExecutor, SystemContainer},
8+
system::Resource,
59
world::World,
610
};
711
use async_channel::{Receiver, Sender};
8-
use bevy_tasks::{ComputeTaskPool, Scope, TaskPool};
12+
use bevy_tasks::{ComputeTaskPool, Scope, TaskPool, ThreadExecutor};
913
#[cfg(feature = "trace")]
1014
use bevy_utils::tracing::Instrument;
1115
use event_listener::Event;
@@ -14,6 +18,16 @@ use fixedbitset::FixedBitSet;
1418
#[cfg(test)]
1519
use scheduling_event::*;
1620

21+
/// New-typed [`ThreadExecutor`] [`Resource`] that is used to run systems on the main thread
22+
#[derive(Resource, Default, Clone)]
23+
pub struct MainThreadExecutor(pub Arc<ThreadExecutor<'static>>);
24+
25+
impl MainThreadExecutor {
26+
pub fn new() -> Self {
27+
MainThreadExecutor(Arc::new(ThreadExecutor::new()))
28+
}
29+
}
30+
1731
struct SystemSchedulingMetadata {
1832
/// Used to signal the system's task to start the system.
1933
start: Event,
@@ -124,40 +138,46 @@ impl ParallelSystemExecutor for ParallelExecutor {
124138
}
125139
}
126140

127-
ComputeTaskPool::init(TaskPool::default).scope(|scope| {
128-
self.prepare_systems(scope, systems, world);
129-
if self.should_run.count_ones(..) == 0 {
130-
return;
131-
}
132-
let parallel_executor = async {
133-
// All systems have been ran if there are no queued or running systems.
134-
while 0 != self.queued.count_ones(..) + self.running.count_ones(..) {
135-
self.process_queued_systems();
136-
// Avoid deadlocking if no systems were actually started.
137-
if self.running.count_ones(..) != 0 {
138-
// Wait until at least one system has finished.
139-
let index = self
140-
.finish_receiver
141-
.recv()
142-
.await
143-
.unwrap_or_else(|error| unreachable!("{}", error));
144-
self.process_finished_system(index);
145-
// Gather other systems than may have finished.
146-
while let Ok(index) = self.finish_receiver.try_recv() {
141+
let thread_executor = world.get_resource::<MainThreadExecutor>().map(|e| &*e.0);
142+
143+
ComputeTaskPool::init(TaskPool::default).scope_with_executor(
144+
false,
145+
thread_executor,
146+
|scope| {
147+
self.prepare_systems(scope, systems, world);
148+
if self.should_run.count_ones(..) == 0 {
149+
return;
150+
}
151+
let parallel_executor = async {
152+
// All systems have been ran if there are no queued or running systems.
153+
while 0 != self.queued.count_ones(..) + self.running.count_ones(..) {
154+
self.process_queued_systems();
155+
// Avoid deadlocking if no systems were actually started.
156+
if self.running.count_ones(..) != 0 {
157+
// Wait until at least one system has finished.
158+
let index = self
159+
.finish_receiver
160+
.recv()
161+
.await
162+
.unwrap_or_else(|error| unreachable!("{}", error));
147163
self.process_finished_system(index);
164+
// Gather other systems than may have finished.
165+
while let Ok(index) = self.finish_receiver.try_recv() {
166+
self.process_finished_system(index);
167+
}
168+
// At least one system has finished, so active access is outdated.
169+
self.rebuild_active_access();
148170
}
149-
// At least one system has finished, so active access is outdated.
150-
self.rebuild_active_access();
171+
self.update_counters_and_queue_systems();
151172
}
152-
self.update_counters_and_queue_systems();
153-
}
154-
};
155-
#[cfg(feature = "trace")]
156-
let span = bevy_utils::tracing::info_span!("parallel executor");
157-
#[cfg(feature = "trace")]
158-
let parallel_executor = parallel_executor.instrument(span);
159-
scope.spawn(parallel_executor);
160-
});
173+
};
174+
#[cfg(feature = "trace")]
175+
let span = bevy_utils::tracing::info_span!("parallel executor");
176+
#[cfg(feature = "trace")]
177+
let parallel_executor = parallel_executor.instrument(span);
178+
scope.spawn(parallel_executor);
179+
},
180+
);
161181
}
162182
}
163183

crates/bevy_internal/src/default_plugins.rs

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,12 @@ impl PluginGroup for DefaultPlugins {
6868
// NOTE: Load this after renderer initialization so that it knows about the supported
6969
// compressed texture formats
7070
.add(bevy_render::texture::ImagePlugin::default());
71+
72+
#[cfg(not(target_arch = "wasm32"))]
73+
{
74+
group = group
75+
.add(bevy_render::pipelined_rendering::PipelinedRenderingPlugin::default());
76+
}
7177
}
7278

7379
#[cfg(feature = "bevy_core_pipeline")]

crates/bevy_render/Cargo.toml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,7 @@ bevy_time = { path = "../bevy_time", version = "0.9.0" }
4444
bevy_transform = { path = "../bevy_transform", version = "0.9.0" }
4545
bevy_window = { path = "../bevy_window", version = "0.9.0" }
4646
bevy_utils = { path = "../bevy_utils", version = "0.9.0" }
47+
bevy_tasks = { path = "../bevy_tasks", version = "0.9.0" }
4748

4849
# rendering
4950
image = { version = "0.24", default-features = false }
@@ -75,3 +76,4 @@ basis-universal = { version = "0.2.0", optional = true }
7576
encase = { version = "0.4", features = ["glam"] }
7677
# For wgpu profiling using tracing. Use `RUST_LOG=info` to also capture the wgpu spans.
7778
profiling = { version = "1", features = ["profile-with-tracing"], optional = true }
79+
async-channel = "1.4"

crates/bevy_render/src/lib.rs

Lines changed: 18 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ mod extract_param;
1010
pub mod extract_resource;
1111
pub mod globals;
1212
pub mod mesh;
13+
pub mod pipelined_rendering;
1314
pub mod primitives;
1415
pub mod render_asset;
1516
pub mod render_graph;
@@ -72,6 +73,9 @@ pub enum RenderStage {
7273
/// running the next frame while rendering the current frame.
7374
Extract,
7475

76+
/// A stage for applying the commands from the [`Extract`] stage
77+
ExtractCommands,
78+
7579
/// Prepare render resources from the extracted data for the GPU.
7680
Prepare,
7781

@@ -191,8 +195,14 @@ impl Plugin for RenderPlugin {
191195
// after access to the main world is removed
192196
// See also https://github.com/bevyengine/bevy/issues/5082
193197
extract_stage.set_apply_buffers(false);
198+
199+
// This stage applies the commands from the extract stage while the render schedule
200+
// is running in parallel with the main app.
201+
let mut extract_commands_stage = SystemStage::parallel();
202+
extract_commands_stage.add_system(apply_extract_commands.at_start());
194203
render_app
195204
.add_stage(RenderStage::Extract, extract_stage)
205+
.add_stage(RenderStage::ExtractCommands, extract_commands_stage)
196206
.add_stage(RenderStage::Prepare, SystemStage::parallel())
197207
.add_stage(RenderStage::Queue, SystemStage::parallel())
198208
.add_stage(RenderStage::PhaseSort, SystemStage::parallel())
@@ -223,7 +233,7 @@ impl Plugin for RenderPlugin {
223233

224234
app.add_sub_app(RenderApp, render_app, move |app_world, render_app| {
225235
#[cfg(feature = "trace")]
226-
let _render_span = bevy_utils::tracing::info_span!("renderer subapp").entered();
236+
let _render_span = bevy_utils::tracing::info_span!("extract main app to render subapp").entered();
227237
{
228238
#[cfg(feature = "trace")]
229239
let _stage_span =
@@ -309,10 +319,12 @@ fn extract(app_world: &mut World, render_app: &mut App) {
309319
let inserted_world = render_world.remove_resource::<MainWorld>().unwrap();
310320
let scratch_world = std::mem::replace(app_world, inserted_world.0);
311321
app_world.insert_resource(ScratchMainWorld(scratch_world));
312-
313-
// Note: We apply buffers (read, Commands) after the `MainWorld` has been removed from the render app's world
314-
// so that in future, pipelining will be able to do this too without any code relying on it.
315-
// see <https://github.com/bevyengine/bevy/issues/5082>
316-
extract_stage.0.apply_buffers(render_world);
317322
});
318323
}
324+
325+
// system for render app to apply the extract commands
326+
fn apply_extract_commands(world: &mut World) {
327+
world.resource_scope(|world, mut extract_stage: Mut<ExtractStage>| {
328+
extract_stage.0.apply_buffers(world);
329+
});
330+
}

0 commit comments

Comments
 (0)