|
| 1 | +# Joint Consensus in OpenRaft |
| 2 | + |
| 3 | +## Introduction |
| 4 | + |
| 5 | +Joint consensus is a mechanism in the Raft protocol that allows for safe changes to cluster membership. When membership changes occur (such as adding or removing nodes), the cluster transitions through a joint configuration phase where both old and new configurations are considered valid. This ensures that the cluster maintains availability and safety during transitions. |
| 6 | + |
| 7 | +## Membership Change Process |
| 8 | + |
| 9 | +OpenRaft implements membership changes using a two-phase joint consensus approach: |
| 10 | + |
| 11 | +1. **First Phase:** Apply a membership change request (e.g., `AddVoters({x})`) to the current configuration, forming a joint configuration that contains both the old and new configurations. This joint configuration is then committed. |
| 12 | + |
| 13 | +2. **Second Phase:** Apply an **empty change request** (`AddVoterIds({})`) to the last configuration in the current joint config. This transitions the cluster from joint configuration to a uniform configuration and is then committed. |
| 14 | + |
| 15 | +## Problem in Previous Versions of OpenRaft (prior to 2025-04-03) |
| 16 | + |
| 17 | +In earlier implementations, the second phase of a membership change would reapply the original change operation to the current membership state. When multiple membership change requests were processed concurrently, this approach could leave the system in a joint configuration state rather than transitioning to a uniform configuration. Consider this example: |
| 18 | + |
| 19 | +- Initial configuration: `{a, b, c}` |
| 20 | +- Task 1: `AddVoterIds(x)` |
| 21 | + - After first phase: `[{a, b, c}, {a, b, c, x}]` |
| 22 | +- Task 2: `RemoveVoters(x)` (runs concurrently) |
| 23 | + - After first phase: `[{a, b, c, x}, {a, b, c}]` (applied to the last configuration `{a, b, c, x}`) |
| 24 | +- Task 1 proceeds to second phase, reapplies `AddVoterIds(x)` to the current state |
| 25 | + - Result: `[{a, b, c}, {a, b, c, x}]` (still a joint configuration) |
| 26 | + |
| 27 | +This behavior was problematic because: |
| 28 | +1. The system remained in a joint configuration state indefinitely |
| 29 | +2. This contradicted the standard Raft expectation that membership changes should eventually result in a uniform configuration |
| 30 | +3. It created confusion for users who expected membership changes to be fully applied |
| 31 | + |
| 32 | +## Solution |
| 33 | + |
| 34 | +The second step now applies an **empty change request** (`AddVoterIds({})`) to the last configuration in the current joint config. This ensures that the system always transitions to a uniform configuration in the second step, regardless of concurrent membership operations. |
| 35 | + |
| 36 | +## Impact |
| 37 | + |
| 38 | +- **Single Change Request:** No behavior changes occur if only one membership change request is in progress. |
| 39 | +- **Concurrent Requests:** If multiple requests are processed concurrently, the application must still verify the result, but the new behavior ensures the system always transitions to a uniform state. |
| 40 | + |
| 41 | +## Acknowledgments |
| 42 | + |
| 43 | +Thanks to @tvsfx for providing feedback on this issue and offering a detailed explanation of the solution. |
0 commit comments