diff --git a/implementation_details.md b/implementation_details.md new file mode 100644 index 00000000..7a508a61 --- /dev/null +++ b/implementation_details.md @@ -0,0 +1,474 @@ + +# Implementation Details + +## Requirements + +Type identifiers (in some form) need to match between all connected machines. Options so far: + +- `TypeId` + - [debatable stability][14] + - no additional mapping needed (just reuse the world's) +- ["`StableTypeId`"][13] + - currently unavailable + - no additional mapping needed (just reuse the world's) +- `ComponentId` + - fragile, requires networked components and resources registered first and in a fixed order (for all relevant worlds) + - no mapping needed +- `unique_type_id` + - uses ordering described in a `types.toml` file + - needs mapping between these and `ComponentId` +- `type_uuid` + - not an index + - needs mapping between these and `ComponentId` + +## Wants + +- Ability to split-off a range of entities into a [sub-world][12] with its own, separate component storages. This would remove a layer of indirection, bringing back the convenience of the `World` interface. +- Scheduler support for arbitrary cycles in the stage graph (or "stageless" equivalent). I believe this boils down to arranging stages (or labels) in a hierarchical FSM (state chart) or behavior tree. + +## Practices users should follow or they'll have UB + +- Entities must be spawned (non-)networked. They cannot become (non-)networked. +- Networked entities must be spawned with a "network ID" component at minimum. +- (Non-)networked components and resources should only hold or reference (non-)networked data. +- Networked components should only be mutated inside the fixed update. + +## `Connection` != `Player` + +I know I've been using the terms "client" and "player" somewhat interchangeably, but `Connection` and `Player` should be separate tokens. There's no reason the engine should limit things to one player per connection. Having `Player` be its own thing makes it easier to do stuff like online splitscreen, temporarily substituting vacancies with bots, etc. Likewise, a `Connection` should be a platform-agnostic handle. + +## Storage + +IMO a fast, data-agnostic networking solution is impossible without the ability to handle things on the bit-level. Memcpy and integer compression are orders of magnitude faster than deep serialization and DEFLATE. To that end, each snapshot should be a pre-allocated memory arena. The core storage resource would then basically amount to a ring buffer of these arenas. + +On the server, this would translate to a ring buffer of deltas (for the last `N` snapshots) along with a full copy of the latest networked state. On the client this would hold a bunch of snapshots. + +```plaintext + delta ringbuf copy of latest + v v +[(0^8), (1^8), (2^8), (3^8), (4^8), (5^8), (6^8), (7^8)] [8] + ^ + newest delta +``` + +This architecture has a lot of advantages. It can be pre-allocated when the resource is initialized and it's the same for all replication modes. I.e. no storage differences between input determinism, full state transfer, or interest-managed state transfer. + +These storages can be lazily updated using Bevy's built-in change detection. At the end of every tick, the server zeroes the space for the newest delta, then iterates `Changed` and `Removed`: + +- Generating the newest delta by xor'ing the changes with the stored copy. +- Updating the rest of the ring buffer by xor'ing the older deltas with the newest. +- Writing the changes to the stored copy. + +Even if change detection became optional, I don't think much speed would be lost if we had to scan two snapshots for bitwise differences. + +TODO + +- Store/serialize networked data without struct padding so we're not wasting bandwidth. +- Components and resources that allocate on the heap (backed by the arena) may have some issues with the interest management send strategy. First, finding all of an entity's heap allocations is its own problem. Then, writing partial heap information could invalidate existing data on the client. + +### Input Determinism + +In deterministic games, the server bundles received inputs and re-distributes them back to the clients. Clients generate their own snapshots locally whenever they have a full set inputs for a tick. Only one snapshot is needed. Clients also send checksum values to the server, that the server can use to detect desyncs. + +### Full State Transfer + +(aka. delta-compressed snapshots) + +For delta-compression, the server just compresses whichever deltas clients need using some variant of run-length encoding (currently looking at [Simple8b + RLE][2]). If the compressed payload is too large, the server will split it into fragments. Overall, this is a very lightweight replication method because the server only needs to compress deltas that are going to be sent and the same compressed payload can be sent to any number of clients. + +### Interest-Managed State Transfer + +(aka. eventual consistency) + +Eventual consistency isn't inherently reliant on prioritization and filtering, but they're essential for an optimal player experience. + +If we can't send everything, we should prioritize what players want to know. They want live updates on objects that are close or occupy a big chunk of their FOV. They want to know about their teammates or projectiles they've fired, even if those are far away. The server has to make the most of each packet. + +Similarly, game designers often want to hide certain information from certain players. Limiting the amount of hidden information that gets leaked and exploited by cheaters is often crucial to a game's long-term health. Battle-royale players, for example, don't need and probably shouldn't even have their opponents' inventory data. In practice, these barriers are never perfect (e.g. *Valorant's* Fog of War not preventing wallhacks), but something is better than nothing. + +Anyway, to do all this interest management, the server needs to track some extra metadata. + +```rust +struct InterestMetadata { + changed: Vec, + relevant: Vec, + lost: Vec, + priority: Vec>, +} +``` + +Essentially, the server wants to send clients all the data that: + +- belongs entities they're interested in AND +- has changed since they were last received AND +- is currently relevant to them (i.e. they're allowed to know) + +I'm gonna gloss over how to check if an entity is inside someone's area of interest (AOI). It's just an application of collision detection. You'll create some interest regions for each client and write from entities that fall within them. + +I don't see a need for ray, distance, and shape queries where BVH or spatial partitioning structures excel, so I'm looking into doing something like a [sweep-and-prune][9] (SAP) with Morton-encoding. Since SAP is essentially sorting an array, I imagine it might be faster. Alternatives like grids and potentially visible sets (PVS) can be explored and added later. + +Those entities get written in the order of their oldest relevant changes. That way that the most pertinent information gets through if not everything can fit. Entities that the server always wants clients to know about or those that clients themselves always want to know about can be written without going through these checks or bandwidth constraints. + +So how is that metadata tracked? + +The **changed** field is simply an array of change ticks. We could reuse Bevy's built-in change tracking, but ideally each *word* in the arena would be tracked separately (would enable much better compression). These change ticks serve as the basis for send priority. + +The **relevant** field tracks whether or not a component should be sent to a certain client. This is how information is filtered at the component-level. By default, new changes would mark components as relevant for everybody, and then some form of filter rules (maybe [entity relations][10]) could selectively modify those. If a component value is sent, its relevance is reset to `false`. + +The **lost** field has bit per change tick, per client. Whenever the server sends a packet, it jots down somewhere which entities were in it and their priorities. Later, if the server is notified that a packet was probably lost, it can pull this info and set the lost bits. If the delta matching the stored priority still exists, the server can use that as a reference to only set a minimal amount of lost bits for an entity. Otherwise, all its lost bits would be set. Similarly, on send the lost bit would be cleared. + +Honestly, I believe this is pretty good, but I'm still looking for something that's more accurate while using fewer bits (if possible). + +The **priority** field just stores the end result of combining the other metadata. This array gets sorted and the `Some(entity)` are written in that order, until the client's packet is full or they're all written. I think having the server only send undelivered changes and prioritizing the oldest ones is better than assigning entities arbitrary update frequencies. + +### Interest Management Edge Cases + +Unfortunately, the most generalized strategy comes with its own headaches. + +- What should a client do when it misses the first update for a entity? Is it OK to spawn a entity with incomplete information? If not, how does the client know when it's safe? + +AFAIK this is only a problem for "kinded" entities that have archetype invariants (aka spawn info). I'm thinking two potential solutions: + +1. Have the client spawn new remote entities with any missing components in their invariants set to a default value. +2. Have the server redundantly send the full invariants for a new interesting entity until that information has been delivered once. + +I think #2 is the better solution. + +TBD, I'm sure there are more of these. + +## Replicate Trait + +TBD + +```rust +pub unsafe trait Replicate { + fn quantize(&mut self); +} + +unsafe impl Replicate for T {} +``` + +## How to rollback? + +There are two loops over the same chain of logic. + +```plaintext +The first loop is for re-simulating older ticks. +The second loop is for executing the newly-accumulated ticks. +``` + +I think a nice solution would be to queue stages/labels using a hierarchical FSM (state chart) or behavior tree. Looping would stop being a special edge case and `ShouldRun` could reduce to a `bool`. The main thing is that transitions cannot be completely determined in the middle of a stage / label. The final decision has to be deferred to the end or there will be conflicts. + +## Unconditional Rollbacks + +Every article on "rollback netcode" and "client-side prediction and server reconciliation" encourages having clients compare their predicted state to the authoritative state and reconciling *if* they mispredicted, but well... How do you actually detect a mispredict? + +AFAIK, there are two ways to do it: + +- Iterate both copies and look for the first difference. +- Have both server and client compute a checksum for their copy and have the client compare them. + +The first option has an unpredictable speed. The second option requires an ordered scan of the entire networked game state. Checksums *are* worth having for deterministic desync detection, but that can be deferred. The point I'm trying to make is that detecting state differences isn't cheap (especially once DSTs are involved). + +Let's consider a simpler default: + +- Always rollback and re-simulate when you receive a new update. + +This might seem wasteful, but think about it. If-then is really an anti-pattern that just hides performance problems from you. Mispredictions will exist regardless of this choice, and they're *especially* likely during heavier computations like physics. Having clients always rollback and re-sim makes it easier to profile and optimize your worst-case. It's also more memory-efficient, since clients never need to store old predicted states. + +## Time Synchronization + +Networked applications have to deal with relativity. Clocks will drift. Some router between you and the game server will randomly go offline. Someone in your house will start streaming Netflix. Et cetera. The slightest change in latency (i.e. distance) between two clocks will cause them to shift out of phase. + +So how do two computers even agree on *when* something happened? + +It'd be really easy to answer that question if there was an *absolute* time reference. Luckily, we can make one. See, there are [two kinds of time][15]—plain ol' **wall-clock time** and **game time**—and we have complete control over the latter. The basic idea is pretty simple: Use a fixed timestep simulation and number the ticks in order. Doing that gives us a timeline of discrete moments that everyone can share (i.e. Tick 742 is the same in-game moment for everyone). + +With this shared timeline strategy, clients have two, mutually exclusive options: + +- Try to simulate ticks at the same wall-clock time. +- Try to have their inputs reach the server at same wall-clock time. + +When using state transfer, I'd recommend against having clients try to simulate ticks at the same time. To accomodate inputs arriving at different times, the server itself would have to rollback and resimulate or you'd have to change the strategy. For example, [Source engine][3] games (AFAIK) simulate the movement of each player at their individual send rates and *then* simulate the world at the regular tick rate. However, doing things their way makes having lower ping a technical advantage (search "lag compensation" in this [this article][4]), which I assume is the reason why ~~melee is bad~~ trading kills is so rare in Source engine games. + +### A Relatively Fixed Timestep + +Fixed timesteps are typically implemented as a kind of currency exchange. The time that elapsed since the previous frame is deposited in an accumulator and converted into simulation steps according to the exchange rate (tick rate). + +```rust +pub struct Accumulator { + accum: f64, + ticks: usize, +} + +impl Accumulator { + pub fn add_time(&mut self, time: f64, timestep: f64) { + self.accum += time; + while self.accum >= timestep { + self.accum -= timestep; + self.ticks += 1; + } + } + + pub fn ticks(&self) -> usize { + self.ticks + } + + pub fn overtick_percentage(&self, timestep: f64) -> f64 { + self.accum / timestep + } + + pub fn consume_tick(&mut self) -> Option { + let remaining = self.ticks.checked_sub(1); + remaining + } + + pub fn consume_ticks(&mut self) -> Option { + let ticks = if self.ticks > 0 { Some(self.ticks) } else { None }; + self.ticks = 0; + ticks + } +} +``` + +Here's how it's typically used. Notice the time dilation. It's changing the time->tick exchange rate to produce more or fewer simulation steps per unit time. Just so you know, this time dilation should only affect the tick rate. Inside the systems running in the fixed update, you should always use the normal fixed timestep for the value of dt. + +```rust +// Determine the exchange rate. +let x = (1.0 * time_dilation); + +// Accrue all the time that has elapsed since last frame. +accumulator.add_time(time.delta_seconds(), x * timestep); + +for step in 0..accumulator.consume_ticks() { + /* ... */ +} + +// Calculate the blend alpha for rendering simulated objects. +let alpha = accumulator.overtick_percentage(x * timestep); +``` + +Ideally, clients simulate any given tick ahead by just enough so their inputs reach the server right before it does. + +One way I've seen people try to do this is to have clients estimate the wall-clock time on the server (using an SNTP handshake or similar) and from that schedule their next tick. That does work, but IMO it's too inaccurate. What we really care about is how much time passes between the server receiving an input and consuming it. That's what we want to control. The server can measure these wait times exactly and include them in the corresponding snapshot headers. Then clients can use those measurements to modify their tick rate and adjust their lead. E.g. if its inputs are arriving too late (too early), a client can briefly simulate more (less) frequently to converge on the correct lead. + +```rust +if received_newer_server_update { + /* ... updates packet statistics ... */ + // measurements of the input wait time and input arrival delta come from the server + target_input_wait_time = max(timestep, avg_input_arrival_delta + safety_factor * input_arrival_dispersion) + + // I'm negating here because I'm scaling the timestep and not the tick rate. + // i.e. 110% tick rate => 90% timestep + error = -(target_input_wait_time - avg_input_wait_time); +} + +// This logic executes every tick. + +// Anything we hear back from the server is always a round-trip old. +// We want to drive this feedback loop with a more up-to-date estimate +// to avoid overshoot / oscillation. +// Obviously, it's impossible for the client to know the current wait time. +// But it can fancy a guess by assuming every adjustment it made since +// the latest received update succeeded. +predicted_error = error; +for tick in (recv_tick..curr_tick) { + predicted_error += ringbuf[tick % ringbuf.len()]; +} + +// This is basically just a proportional controller. +time_dilation = (predicted_error - min_error) * (max_dilation - min_dilation) / (max_error - min_error); +time_dilation = time_dilation.clamp(min_dilation, max_dilation); + +// Store the new adjustment in the ring buffer. +*ringbuf[curr_tick % ringbuf.len()] = time_dilation * timestep; +``` + +### Snapshot Interpolation + +Interpolating received snapshots is very similar. What we're interested in is the remaining time left in the snapshot buffer. You want to always have at least one snapshot ahead of the current "playback" time (so the client always has something to interpolate to). + +```rust +if received_newer_server_update { + /* ... updates packet statistics ... */ + target_interpolation_delay = max(server_send_interval, avg_update_arrival_delta + safety_factor * update_arrival_dispersion); + time_since_last_update = 0.0; +} + +// This logic executes every frame. + +// Calculate the current interpolation. +// Network conditions are assumed to be constant between updates. +current_interpolation_delay = (latest_snapshot_tick * timestep) + time_since_last_update - playback_time; + +// I'm negating here because I'm scaling time and not frequency. +// i.e. 110% freq => 90% time +error = -(target_interpolation_delay - current_interpolation_delay); +time_dilation = (error - min_error) * (max_dilation - min_dilation) / (max_error - min_error); +time_dilation = time_dilation.clamp(min_dilation, max_dilation); + +playback_time += time.delta_seconds() * (1.0 + time_dilation); +time_since_last_update += time.delta_seconds(); + +// Determine the two snapshots and blend alpha. +let i = buf.partition_point(|&snapshot| (snapshot.tick as f32 * timestep) < playback_time); +let (from, to, blend) = if i == 0 { + // Current playback time is behind all buffered snapshots. + (buf[0].tick, buf[0].tick, 0.0) +} else if i == buffer.len() { + // Current playback time is ahead of all buffered snapshots. + // Here, I'm just clamping to the latest, but you could extrapolate instead. + (buf[-1].tick, buf[-1].tick, 0.0) +} else { + let a = buf[i-1].tick; + let b = buf[i].tick; + let blend = (playback_time - (a as f32 * timestep)) / ((b - a) as f32 * timestep); + (a, b, blend) +} + +// Go forth and (s)lerp. +``` + +## Predict or Delay? + +The higher a client's ping, the more ticks they'll need to resim. Depending on the game, that might be too expensive to support all the clients in your target audience. In those cases, we can trade more input delay for fewer resim ticks. Essentially, there are three meaningful moments in the round-trip of an input: + +1. When the inputs are sent. +2. When the true simulation tick happens (conceptually). +3. When resulting update is received. + +```plaintext +0 <-+-+-+-+-+-+-+-+-+-+-+-> t + sim + / \ + / \ + / \ + / \ + / \ + send recv + |<-- RTT -->| +``` + +This gives us a few options for time sync: + +1. **No rollback and adaptive input delay**, preferred for games with prediction disabled +2. **Bounded rollback and adaptive input delay**, preferred for games using input determinism with prediction enabled +3. **Unbounded rollback and no/fixed input delay**, preferred for games using state transfer with prediction enabled + +"Adaptive input delay" here means "fixed input delay with more as needed." + +Method #1 basically tries to ensure packets are always received before they're needed by the simulation. The client will add as much input delay as needed to avoid stalling. + +Method #2 is best explained as a sequence of fallbacks. Clients first add a fixed amount of input delay. If that's more than their current RTT, they won't need to rollback. If that isn't enough, the client will rollback, but only up to a limit. If even the combination of fixed input delay and maximum rollback doesn't cover RTT, more input delay will be added to fill the remainder. + +Method #3 is preferred for games that use state transfer. Adding input delay would negatively impact the accuracy of server-side lag compensation, so it should almost always be set to zero in those cases. Games that use input determinism might prefer a constant input delay even if it means their game might stutter, unlike method #2. + +## Predicted <-> Interpolated + +When the server has full authority, clients cannot directly write persistent changes to the authoritative state. However, it's perfectly okay for them to do whatever they want locally. That's all client-side prediction really is—local changes. Clients can just copy the latest authoritative state as a starting point. + +We can also shift components between being predicted (extrapolated) and being interpolated. Either could be default. If interpolation is default, entities would reset to interpolated when modified by a server update. Users could then use specialized `Predicted` and `Confirmed` query filters to address the two separately. These can piggyback off of Bevy's built-in reliable change detection. + +This means systems predict by default, but users can opt-out with the `Predicted` filter to only process components that have already been mutated by an earlier system. Clients will naturally predict any entities driven by their input and any spawned by their input (until confirmed by the server). + +## Predicted FX Events + +Sounds and particles need special consideration, since they can be predicted but are also typically handled outside of the fixed update. + +We'll need events that can be confirmed or cancelled. The main requirement is tagging them with a unique identifier. Maybe hashing together the tick number, system ID, and entity ID would suffice. + +TBD + +## Predicted Spawns + +This too requires special consideration. + +The naive solution is to have clients spawn dummy entities so that when an update that confirms the result arrives, they'll simply destroy the dummy and spawn the true entity. IMO this is a poor solution because it prevents clients from smoothly blending errors in the predicted spawn's rendered transform. Snapping its visuals wouldn't look right. + +A better solution is for the server to assign each networked entity a global ID that the spawning client can predict and map to its local instance. There are 3 variants that I know of: + +1. Use an incrementing generational index (reuse `Entity`) and fix its upper bits to match the ID of the spawning player. + +2. Use PRNGs to generate shared keys (I've seen these dubbed "prediction keys") for pairing local and global IDs. Rather than predict the global ID directly, clients predict the shared keys. Server updates that confirm a predicted entity would include both its global ID and the shared key. Once acknowledged, later updates can include just the global ID. This method is more complicated but does not share the previous method's implicit entity limit. + +3. Bake it into the memory layout. If the layout and order of the snapshot storage is identical on all machines, array indexes and relative pointers can double as global IDs. They wouldn't need to be explicitly written into packets, potentially reducing packet size by 4-8 bytes per entity (before compression). However, we'd probably end up wanting generations anyway to not confuse destroyed entities with new ones. + +I recommend 1 as it's the simplest method. Bandwidth and CPU resources would run out long before the reduced entity ranges do. My current strategy is a mix of 1 and 3. + +## Smooth Rendering + +Rendering should happen later in the frame, sometime after the fixed update. + +Whenever clients receive an update with new remote entities, those entities shouldn't be rendered until that update is interpolated. We can do this through a marker component or with a field in the render transform. + +Cameras need some special treatment. Look inputs need to be accumulated at the render rate and re-applied to the predicted camera rotation just before rendering. + +We'll also need some way for developers to declare their intent that a motion should be instant instead of smoothly interpolated. Since it needs to work for remote entities as well, maybe this just has to be a bool on the networked transform. + +While most visual interpolation is linear, we'll want another blend for quickly but smoothly correcting visual misprediction errors, which can occur for entities that are or just stopped being predicted. [Projective velocity blending][11] seems like the de facto standard method for these, but I've also seen simple exponential decays used. There may be better error correction methods. + +## Lag Compensation + +Lag compensation deals with colliders and needs to run after all motion and physics systems. All positions have to be settled or you'll get unexpected results. + +Similar to inputs, I've seen people try to have the server estimate which snapshots each client was interpolating based on their ping, but we can easily do better than that. Clients can just tell the server directly by sending their interpolation parameters along with their inputs. With this information, the server knows to do with *perfect* accuracy. No guesswork necessary. + +```plaintext + +tick number (predicted) +tick number (interpolated from) +tick number (interpolated to) +interpolation blend value + +``` + +So there are two ways to do the actual compensation: + +- Compensate upfront by bringing new projectiles into the present (similar to a rollback). +- Compensate over time ("amortized"), constantly testing projectiles against a history buffer. + +There's a lot to learn from *Overwatch* here. + +*Overwatch* shows that [we can treat time as another spatial dimension][5], so we can put the entire collider history in something like a BVH and test it all at once (the amortized method). Essentially, you'd generate a bounding box for each collider that surrounds all of its historical poses and then test projectiles for hits against those first (broad-phase), then test those hits against bounding boxes blended between two snapshots (optional mid-phase), then the precise geometry blended between two snapshots (narrow-phase). + +For clients with too-high ping, their interpolation will lag far behind their prediction. If you only compensate up to a limit (e.g. 200ms), [those clients will have to extrapolate the difference][6]. Doing nothing is also valid, but lagging clients would abruptly have to start leading their targets. + +You'd constrain the playback time like below and then run some extrapolation logic pre-update. + +```rust +playback_time = playback_time.max((curr_tick * timestep) - max_lag_compensation); +``` + +*Overwatch* [allows defensive abilities to mitigate compensated projectiles][7]. AFAIK this is simple to do. If a player activates any defensive bonus, just apply it to all their buffered colliders. + +When a player is the child of another, uncontrolled entity (e.g. the player is a passenger in a vehicle), the non-predicted movement of that parent entity must be rewound during lag compensation, so that any projectiles fired by the player spawn in the correct location. [See here.][8] + +## Messages (RPCs and events you can send!) + +Sometimes raw inputs aren't expressive enough. Examples include choosing a loadout and buying items from an in-game menu. Mispredicts aren't acceptable in these cases, however servers don't typically simulate UI. + +So there's need for a dedicated type of optionally reliable message for text/UI-based and "send once" gameplay interactions. Similarly, global alerts from the server shouldn't clutter the game state. + +These messages can optionally be postmarked to be processed on a certain tick like inputs, but that can only be best effort (i.e. tick N or earliest). + +And while I gave examples of "requests" using these messages, those don't have to receive explicit replies. If the server confirms your purchased items, those would just appear in your inventory in a later snapshot. + +IDK what these should look like yet. A macro might be the most ergonomic choice, if it means a message can be defined in its relevant system. + +TBD + +[1]: https://github.com/bevyengine/rfcs/pull/16 +[2]: https://github.com/lemire/FastPFor/blob/master/headers/simple8b_rle.h +[3]: https://developer.valvesoftware.com/wiki/Source_Multiplayer_Networking +[4]: https://www.ea.com/games/apex-legends/news/servers-netcode-developer-deep-dive +[5]: https://youtu.be/W3aieHjyNvw?t=2226 "Tim Ford explains Overwatch's hit registration" +[6]: https://youtu.be/W3aieHjyNvw?t=2347 "Tim Ford explains Overwatch's lag comp. limits" +[7]: https://youtu.be/W3aieHjyNvw?t=2492 "Tim Ford explains Overwatch's lag comp. mitigation" +[8]: https://alontavor.github.io/AdvancedLatencyCompensation/ +[9]: https://github.com/mattleibow/jitterphysics/wiki/Sweep-and-Prune +[10]: https://github.com/bevyengine/rfcs/pull/18 +[11]: https://www.researchgate.net/publication/293809946_Believable_Dead_Reckoning_for_Networked_Games +[12]: https://github.com/bevyengine/rfcs/pull/16#issuecomment-849878777 +[13]: https://github.com/bevyengine/bevy/issues/32 +[14]: https://github.com/bevyengine/bevy/issues/32#issuecomment-821510244 +[15]: https://johnaustin.io/articles/2019/fix-your-unity-timestep \ No newline at end of file diff --git a/networked_replication.md b/networked_replication.md new file mode 100644 index 00000000..d51720b2 --- /dev/null +++ b/networked_replication.md @@ -0,0 +1,163 @@ +# Feature Name: `networked-replication` + +## Summary + +This RFC proposes an implementation of engine features for developing networked games. It abstracts away the (mostly irrelevant) low-level transport details to focus on high-level *replication* features, with key interest in providing them transparently (i.e. minimal, if any, networking boilerplate). + +## Motivation + +Networking is unequivocally the most lacking feature in all general-purpose game engines. + +While most engines provide low-level connectivity—virtual connections, optionally reliable UDP channels, rooms—almost none of them ([except][1] [Unreal][2]) provide high-level *replication* features like prediction, interest management, or lag compensation, which are necessary for most networked multiplayer games. + +This broad absence of first-class replication features stifles creative ambition and feeds into an idea that every multiplayer game needs its own unique implementation. Certainly, there are idiomatic "strategies" for different genres, but all of them—lockstep, rollback, client-side prediction with server reconciliation—pull from the same bag of tricks. Their differences can be captured in a short list of configuration options. Really, only *massive* multiplayer games require custom solutions. + +Bevy's ECS opens up the possibility of providing a near-seamless, generalized networking API. + +What I hope to explore in this RFC is: + +- What game design choices and constraints does networking add? +- How does ECS make networking easier to implement? +- What should developing a networked multiplayer game in Bevy look like? + +## User-facing Explanation + +[Recommended reading on replication concepts.](../main/replication_concepts.md) + +Bevy's aim here is to make writing local and networked multiplayer games indistinguishable, with minimal added boilerplate. Having an exact simulation timeline simplifies this problem, thus the core of this unified approach is a fixed timestep—`NetworkFixedUpdate`. + +As a user, you only have to annotate your gameplay-related components and systems, add those systems to `NetworkFixedUpdate`, and configure a few simulation settings to get up and running. That's it! Bevy will transparently handle separating, reconciling, serializing, and compressing the networked state for you. (Those systems can be exposed for advanced users, but non-interested users need not concern themselves.) + +> Game design should (mostly) drive networking choices. Future documentation could feature a questionnaire to guide users to the correct configuration options for their game. Genre and player count are generally enough to decide. + +The core primitive here is the `Replicate` trait. All instances of components and resources that implement this trait will be automatically registered and synchronized over the network. Simply adding a `#[derive(Replicate)]` should be enough in most cases. + +```rust +#[derive(Replicate)] +struct Transform { + #[replicate(precision=0.001)] + translation: Vec3, + #[replicate(precision=0.01)] + rotation: Quat, + #[replicate(precision=0.1)] + scale: Vec3, +} + +#[derive(Replicate)] +struct Health { + hp: u32, +} +``` + +By default, both client and server will run every system you add to `NetworkFixedUpdate`. If you want systems or code snippets to run exclusively on one or the other, you can annotate them with `#[client]` or `#[server]` for the compiler. + +```rust +#[server] +fn ball_movement_system( + mut ball_query: Query<(&Ball, &mut Transform)>) +{ + for (ball, mut transform) in ball_query.iter_mut() { + transform.translation += ball.velocity * FIXED_TIMESTEP; + } +} +``` + +For more nuanced runtime cases—say, an expensive movement system that should only process the local player entity on clients—you can use the `Predicted` query filter. If you need an explicit request or notification, you can use `Message` variants. + +```rust +fn update_player_velocity( + mut q: Query<(&Player, &mut Rigidbody)>) +{ + for (player, mut rigidbody) in q.iter_mut() { + // DerefMut flags these rigidbodies as predicted on the client. + *rigidbody.velocity = player.aim_direction * player.movement_speed * FIXED_TIMESTEP; + } +} + +fn expensive_physics_calculation( + mut q: Query<(&mut Rigidbody), Predicted>) +{ + for rigidbody in q.iter_mut() { + // Do stuff with only the predicted rigidbodies... + } +} +``` + +```plaintext +TODO: Message Example +``` + +Bevy can configure an `App` to operate in several different network modes. + +| Mode | Playable? | Authoritative? | Open to connections? | +| :--- | :---: | :---: | :---: | +| Client | ✓ | ✗ | ✗ | +| Standalone | ✓ | ✓ | ✗ | +| Listen Server | ✓ | ✓ | ✓ | +| Dedicated Server | ✗ | ✓ | ✓ | +| Relay | ✗ | ✗ | ✓ | + +We'll also need a mode similar to listen server for deterministic peers. + +```plaintext +TODO: Example App configuration. +``` + +## Implementation Strategy + +[See here for a big idea dump.](../main/implementation_details.md) (Hopefully I can clean this up later.) + +## Drawbacks + +- Serialization strategy is `unsafe` (might be possible to do it entirely with safe Rust, idk). +- Macros might be gnarly. +- At first, only POD components and resources will be supported. DST support will come later. + +## Rationale and Alternatives + +### Why *this* design? + +Networking is a widely misunderstood problem domain. The proposed implementation should suffice for most games while minimizing design friction—users need only annotate gameplay-related components and systems, put those systems in `NetworkFixedUpdate`, and configure some settings. + +Polluting the API with "networked" variants of structs and systems (aside from `Transform`, `Rigidbody`, etc.) would just make life harder for everybody, both game developers and Bevy maintainers. IMO the ease of macro annotations is worth any increase in compile times when networking features are enabled. + +### Why should Bevy provide this? + +People who want to make multiplayer games want to focus on designing their game and not worry about how to implement prediction, how to serialize their game, how to keep packets under MTU, etc. Having these come built-in would be a huge selling point. + +### Why not wait until Bevy is more mature? + +It'll only grow more difficult to add these features as time goes on. Take Unity for example. Its built-in features are too non-deterministic and its only working solutions for state transfer are paid third-party assets. Thus far, said assets cannot integrate deeply enough to be transparent (at least not without substituting parts of the engine). + +### Why does this need to involve `bevy_ecs`? + +For better encapsulation, I'd prefer if multiple world functionality and nested loops were standard ECS features. Nesting an outer fixed timestep loop and an inner rollback loop doesn't seem possible without a custom stage or scheduler right now. + +## Unresolved Questions + +- Can we provide lints for undefined behavior like mutating networked state outside of `NetworkFixedUpdate`? +- ~~Will rollbacks break change detection?~~ As long as we're careful to update the appropriate change ticks, it should be okay. +- Will rollbacks break events? +- ~~When sending interest-managed updates, how should we deal with weird stuff like there being references to entities that haven't been spawned or have been destroyed?~~ I believe this is solved by using generational indexes for the network IDs. +- How should UI widgets interact with networked state? React to events? Exclusively poll verified data? +- How should we handle correcting mispredicted events and FX? +- Can we replicate animations exactly without explicitly sending animation data? + +## Future Possibilities + +- With some tools to visualize game state diffs, these replication systems could help detect non-determinism in other parts of the engine. +- Much like how Unreal has Fortnite, Bevy could have an official (or curated) collection of multiplayer samples to dogfood these features. +- Bevy's future editor could automate most of the configuration and annotation. +- Replication addresses all the underlying ECS interop, so it should be settled first. But beyond replication, Bevy need only provide one good default for protocol and I/O for the sake of completeness. I recommend dividing crates at least to the extent shown below to make it easy for developers to swap the low-level transport with [whatever][3] [alternatives][4] [they][5] [want][7]. + +| `bevy::net::replication` | `bevy::net::protocol` | `bevy::net::io` | +| -- | -- | -- | +|
  • save and restore
  • prediction
  • serialization
  • delta compression
  • interest management
  • visual error correction
  • lag compensation
  • statistics (high-level)
|
  • (N)ACKs
  • reliability
  • virtual connections
  • channels
  • encryption
  • statistics (low-level)
|
  • send
  • recv
  • poll
| + +[1]: https://youtu.be/JOJP0CvpB8w "Unreal Networking Features" +[2]: https://www.unrealengine.com/en-US/tech-blog/replication-graph-overview-and-proper-replication-methods "Unreal Replication Graph Plugin" +[3]: https://github.com/quinn-rs/quinn +[4]: https://partner.steamgames.com/doc/features/multiplayer +[5]: https://developer.microsoft.com/en-us/games/solutions/multiplayer/ +[6]: https://dev.epicgames.com/docs/services/en-US/Overview/index.html +[7]: https://docs.aws.amazon.com/gamelift/latest/developerguide/gamelift-intro.html diff --git a/replication_concepts.md b/replication_concepts.md new file mode 100644 index 00000000..a69e9340 --- /dev/null +++ b/replication_concepts.md @@ -0,0 +1,160 @@ +# Replication + +> The goal of replication is to ensure that all of the players in the game have a consistent model of the game state. Replication is the absolute minimum problem which all networked games have to solve in order to be functional, and all other problems in networked games ultimately follow from it. - [Mikola Lysenko][1] + +## Simulation Behavior + +Abstractly, you can think of a game as a pure function that accepts an initial state and player inputs and generates a new state. + +```rust +let new_state = simulate(&state, &inputs); +``` + +If several players want to perform a synchronized simulation over a network, they have basically two options: + +- Send their inputs to each other and independently and deterministically simulate the game. + -
also known asactive replication, lockstep, state-machine synchronization, determinism
+- Send their inputs to a single machine (the server) who simulates the game and broadcasts updates back. + -
also known aspassive replication, client-server, primary-backup, state transfer
+ +In other words, players can either run the "real" game or follow it. + +For the rest of this RFC, I'll refer to them as determinism and state transfer, respectively. I just think they're the most literal terminology. + +### Why determinism? + +Deterministic multiplayer is basically local multiplayer but with *really* long controller cables. The netcode simply supplies the gameplay code with inputs. They're basically decoupled. + +Determinism has low infrastructure costs, both in terms of bandwith and server hardware. All steady-state network traffic is input, which is not only small but also compresses well. (Note that as player count increases, there *is* a crossover point where state transfer becomes more efficient). Likewise, as the game runs completely on the clients, there's no need to rent powerful servers. Relays are still handy for efficiently managing rooms and scaling to higher player counts, but those could be cheap VPS instances. + +Determinism is also tamperproof. It's impossible to do anything like speedhack or teleport as running these exploits would simply cause cheaters to desync. On the other hand, determinism inherently suffers from total information leakage. + +That every client must run the *entire* world is also determinism's biggest limit. While this works well for games with thousands of micro-managed entities like *Starcraft 2*, you won't be seeing games with expansive worlds like *Genshin Impact* networked this way anytime soon. + +### Why state transfer? + +Determinism is awesome when it fits but it's generally unavailable. Neither Godot nor Unity nor Unreal can make this guarantee for large parts of their engines, particularly physics. + +Whenever you can't have or don't want determinism, you should use state transfer. + +Its main underlying idea is **authority**, which is just like ownership in Rust. Those who own state are responsible for broadcasting up-to-date information about it. I sometimes see authority divided into *input* authority (control permission) and *state* authority (write permission), but usually authority means state authority. + +The server usually owns everything, but authority is very flexible. In games like *Destiny* and *Fall Guys*, clients own their movement state. Other games even trust clients to confirm hits. Distributing authority like this adds complexity and obviously leaves the door wide open for cheaters, but sometimes it's necessary. In VR, it makes sense to let clients claim and relinquish authority over interactable objects. + +### Why not messaging patterns? + +The only other strategy you really see used for replication is messaging. RPCs. I actually see these most often in the free asset space. (I guess it's the go-to pattern outside of games?) + +Take chess for example. Instead of sending polled player inputs or the state of the chessboard, you could just send the moves like "white, e2 to e4," etc. + +Here's the issue. Messages are tightly coupled to their game's logic. They can't be generalized. Chess is simple—one turn, one event—but what about an FPS? What messages would it need? How many? When and where would those messages need be sent and received? + +If those messages have cascading effects, they can only be sent reliable, ordered. + +```rust +let mut s = state[n]; +for message in queue.iter() { + s.apply(&message); +} + +// The key thing to note is that state[n+1] +// cannot be correct unless all messages were +// applied and applied in the right order. +*state[n+1] = s; +``` + +Messages are great for when you want explicit request-reply interactions and global alerts like players joining or leaving. They just don't cut it as a replication mechanism for real-time games. Even if you avoided send and receive calls everywhere (i.e., collect and send in batches), messages don't compress as well as inputs or state. + +## Latency + +Networking is hard because we want to let players who live in different countries play together *at the same time*, something that special relativity tells us is [strictly impossible][2]... unless we cheat. + +### Lockstep + +The simplest solution is to concede to the universe with grace and have players stall until they've received whatever data they need to execute the next simulation step. Blocking is fine for most turn-based games but simply doesn't cut it for real-time games. + +### Adding Local Input Delay + +The first trick we can pull is have each player delay their own input for a bit, trading responsiveness for more time to receive the incoming data. + +Our brains are pretty lenient about this, so we can actually *reduce* the latency between players. Two players in a 1v1 match actually could experience simultaneity if each delayed their input by half the round-trip time. + +This trick has powered the RTS genre for decades. With a large enough input delay and a stable connection, the game will run smoothly. However, there's still a problem because the game stutters whenever the window is missed. This leads to the next trick. + +> determinism + lockstep + local input delay = "delay-based netcode" + +### Predict-Rollback + +Instead of blocking, what if players just guess the missing data and keep going? Doing that would let us avoid stuttering, but then we'd have to deal with guessing incorrectly. + +Well, when the player finally has that missing remote data, what they can do is restore their simulation to the previous verified state, update it with the received data, and then re-predict the remaining steps. + +This retroactive correction is called **rollback** or **reconciliation**, and it ensures that players never desync *too much*. Honestly, it's practically invisible with a high tick rate and good visual smoothing. (Apparently it's been around since [1996][3].) + +With prediction, input delay is no longer needed, but it's still useful. Reducing latency reduces how many steps players need to re-simulate. + +> determinism + predict-rollback + local input delay (optional) = "rollback netcode" + +### Selective Prediction + +Determinism is an all or nothing deal. If you predict, you predict everything. + +State transfer has the flexibility to predict only *some* things, letting you offload expensive computations onto the server. There *are* client-server games like *Rocket League* who still predict everything (FWIW deterministic predict-rollback would have been a better fit), including other clients—the server redistributes inputs along with game state to reduce error. However, most often clients only predict what they control directly. + +## Visual Consistency + +Real quick, always hard snap the simulation state. If clients do any blending, it's entirely visual. Yes, this does mean that entities may appear in different positions from where they should be. On the other hand, we have to honor this inaccurate view to keep players happy. + +### Smooth Rendering and Lag Compensation + +Predicting only *some* things adds implementation complexity. + +When clients predict everything, they produce renderable state at a fixed pace. Now, anything that isn't predicted must be rendered using data received from the server. The problem is that server updates are sent over a lossy, unreliable internet that disrupts any consistent spacing between packets. This means clients need to buffer incoming server updates long enough to have two authoritative updates to interpolate most of the time. + +Gameplay-wise, not predicting everything also divides entities between two points in time: a predicted time and an interpolated time. Clients see themselves in the future and everything else in the past. Because players demand a WYSIWYG experience, the server must compensate for this "remote lag" by allowing certain things, mainly projectiles, to interact with the past. + +Visually, we'll often have to blend between extrapolated and authoritative data. Simply interpolating between two authoritative updates is incorrect. The visual state can and will accrue errors, but that's what we want. Those can be tracked and smoothly reduced (to some near-zero threshold, then cleared). + +## Bandwidth + +### How much can we fit into each packet? + +Not a lot. + +You can't send arbitrarily large packets over the internet. The information superhighway has load limits. The conservative, almost universally supported "maximum transmissible unit" or MTU is 1280 bytes. Accounting for IP and UDP headers and some connection metadata, you realistically can send ~1200 bytes of game data per packet. + +If you significantly exceed this, some random stop along the way will delay the packet and break it up into fragments. + +[Fragmentation](https://packetpushers.net/ip-fragmentation-in-detail/) [sucks](https://blog.cloudflare.com/ip-fragmentation-is-broken) because it multiplies the likelihood of the overall packet being lost (all fragments have to arrive to read the full packet). Getting fragmented along the way is even worse because of the added delay. It's okay if the sender manually fragments their packet (like 2 or 3) *upfront*, although the higher loss does limit simulation rate, just don't rely on the internet to do it. + +### Okay, but that doesn't seem like much? + +Well, there are two more reasons not to yeet giant 100kB packets across the network: + +- Bandwidth costs are the lion's share of hosting expenses. +- Many players still have limited bandwidth. + +So unless we limit everyone to <20Hz tick rates, our only options are: + +- Send smaller things. +- Send fewer things. + +### Snapshots + +Alright then, state transfer. The most obvious strategy is to send full **snapshots**. All we can do with these is make them smaller (i.e. quantize floats, then compress everything). + +Fortunately, snapshots are very compressible. An extremely popular idea called **delta compression** is to send each client a diff (often with further compression on top) of the current snapshot and the latest one they acknowledged receiving. Clients can then use these to patch their existing snapshots into the current one. + +The server can fragment payloads as a last resort. + +### Eventual Consistency + +When snapshots fail or hidden information is needed, the best alternative is to prioritize sending each client the state most relevant to them. This technique is commonly called **eventual consistency**. + +Determining relevance is often called **interest management** or **area of interest**. Each granular piece of state is given a "send priority" that accumulates over time and resets when sent. How quickly priority accumulates for different things is up to the developer, though physical proximity and visual salience usually have the most influence. + +Eventual consistency can be combined with delta compression, but I wouldn't recommend it. Many AAA games have done it, but IMO it's just too much bookkeeping. Unlike snapshots, the server would have to track the latest received state for each *item* on each client separately and create diffs for each client separately. + +[1]: https://0fps.net/2014/02/10/replication-in-networked-games-overview-part-1/ +[2]: https://en.wikipedia.org/wiki/Relativity_of_simultaneity +[3]: https://en.wikipedia.org/wiki/Client-side_prediction diff --git a/rfcs/DELETEME.md b/rfcs/DELETEME.md deleted file mode 100644 index 2b210d2f..00000000 --- a/rfcs/DELETEME.md +++ /dev/null @@ -1 +0,0 @@ -Dummy file for git, please delete once the first RFC is merged. diff --git a/template.md b/template.md deleted file mode 100644 index 8e7bc765..00000000 --- a/template.md +++ /dev/null @@ -1,73 +0,0 @@ -# Feature Name: (fill me in with a unique ident, `my_awesome_feature`) - -## Summary - -One paragraph explanation of the feature. - -## Motivation - -Why are we doing this? What use cases does it support? - -## Guide-level explanation - -Explain the proposal as if it was already included in the engine and you were teaching it to another Bevy user. That generally means: - -- Introducing new named concepts. -- Explaining the feature, ideally through simple examples of solutions to concrete problems. -- Explaining how Bevy users should *think* about the feature, and how it should impact the way they use Bevy. It should explain the impact as concretely as possible. -- If applicable, provide sample error messages, deprecation warnings, or migration guidance. -- If applicable, explain how this feature compares to similar existing features, and in what situations the user would use each one. - -## Reference-level explanation - -This is the technical portion of the RFC. Explain the design in sufficient detail that: - -- Its interaction with other features is clear. -- It is reasonably clear how the feature would be implemented. -- Corner cases are dissected by example. - -The section should return to the examples given in the previous section, and explain more fully how the detailed proposal makes those examples work. - -## Drawbacks - -Why should we *not* do this? - -## Rationale and alternatives - -- Why is this design the best in the space of possible designs? -- What other designs have been considered and what is the rationale for not choosing them? -- What is the impact of not doing this? -- Why is this important to implement as a feature of Bevy itself, rather than an ecosystem crate? - -## \[Optional\] Prior art - -Discuss prior art, both the good and the bad, in relation to this proposal. -This can include: - -- Does this feature exist in other libraries and what experiences have their community had? -- Papers: Are there any published papers or great posts that discuss this? - -This section is intended to encourage you as an author to think about the lessons from other tools and provide readers of your RFC with a fuller picture. - -Note that while precedent set by other engines is some motivation, it does not on its own motivate an RFC. - -## Unresolved questions - -- What parts of the design do you expect to resolve through the RFC process before this gets merged? -- What parts of the design do you expect to resolve through the implementation of this feature before the feature PR is merged? -- What related issues do you consider out of scope for this RFC that could be addressed in the future independently of the solution that comes out of this RFC? - -## \[Optional\] Future possibilities - -Think about what the natural extension and evolution of your proposal would -be and how it would affect Bevy as a whole in a holistic way. -Try to use this section as a tool to more fully consider other possible -interactions with the engine in your proposal. - -This is also a good place to "dump ideas", if they are out of scope for the -RFC you are writing but otherwise related. - -Note that having something written down in the future-possibilities section -is not a reason to accept the current or a future RFC; such notes should be -in the section on motivation or rationale in this or subsequent RFCs. -If a feature or change has no direct value on its own, expand your RFC to include the first valuable feature that would build on it.