New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Added logic to update the slot map based on MOVED errors #186

Merged

barshaul merged 8 commits into amazon-contributing:update_slotmap_moved from barshaul:update_on_moves

Sep 12, 2024

barshaul commented Aug 28, 2024 •

edited

Loading

Added logic to update the slot map based on MOVED errors. Now, when a MOVED error is received, the client will push a new future before retrying the request to attempt to update the slot map based on the new primary node, while still spawning a background task to refresh slots. This provides an optimization to ensure quicker rerouting of requests without waiting for the background slot refresh which can be skipped due to the rate limiter.

The updated logic handles several scenarios:

No Change: If the new primary is already the current slot owner, no changes are required.
Failover: If the new primary is a replica in the same shard, the client promotes the replica to primary by updating the shard's addresses.
Slot Migration: If the new primary is the existing primary of a different shard, the client updates the slot map to point to the new shard's addresses.
Replica Moved to a Different Shard: If the new primary is a replica in a different shard, it is either promoted to primary of its shard or moved to a new shard. The replica is removed from its original shard, and the slot map is updated accordingly.
New Node: If the new primary is an unknown node, the client adds it as a new primary node in a new shard, possibly indicating a scale-out.

If the slot map update fails, the request is retried with the moved redirect node, and the background slot refresh task will correct the map asynchronously.


          SlotMap refactor: Added new NodesMap, changed shard addresses to be s…

6c62867

…hard between shard nodes and slot map values

barshaul force-pushed the update_on_moves branch 4 times, most recently from f41533b to b0a3e5c Compare

September 8, 2024 16:20


          fix lint

b705d8c

barshaul force-pushed the update_on_moves branch 2 times, most recently from a374c59 to e17a802 Compare

September 11, 2024 12:54


          Added logic to update the slot map based on MOVED errors

7b675bf

barshaul force-pushed the update_on_moves branch from e17a802 to 7b675bf Compare

September 11, 2024 13:19

barshaul requested a review from eifrah-aws

September 11, 2024 13:25

barshaul marked this pull request as ready for review

September 11, 2024 13:25

barshaul changed the title ~~WORK IN PROGRESS - Update on moves~~ Added logic to update the slot map based on MOVED errors

barshaul added 2 commits

September 11, 2024 14:12


          Fixed linter

8e61990


          Some fixes

c271922

eifrah-aws requested changes

View reviewed changes

eifrah-aws left a comment

Please consider changing Arc<RwLock<ShardAddr>> into a clean API so ShardAddr will manage the locks internally and we can pass Arc<ShardAddr>

redis/src/cluster_async/mod.rs Outdated Show resolved Hide resolved

redis/src/cluster_async/mod.rs Outdated

+                                  // If it fails, proceed by retrying the request with the redirected node,
+                                  // and allow the slot refresh task to correct the slot map.
+                                  warn!(
+                                      "Failed to update the slot map based on the received MOVED error.\n

eifrah-aws Sep 11, 2024

Avoid using \n in the log as it create line without its prefix. Please use 2 x warn! calls
If it is OK to fail, why use warn!? (this is a good place to use telemetry call - when its available)

Author

barshaul Sep 12, 2024

ack, changed to info!

redis/src/cluster_async/mod.rs Outdated Show resolved Hide resolved

redis/src/cluster_async/mod.rs Outdated

+                          if curr_shard_addrs_read.replicas().contains(&new_primary) {
+                              // Scenario 2: Failover - The new primary is a replica within the same shard
+                              drop(curr_shard_addrs_read);

eifrah-aws Sep 11, 2024

Quesiton: we are dropping and then re-acquiring it, meaning: we are losing atomicity - does this may yield a problem for us?.

If it does, it better to obtain write lock to begin with

Author

barshaul Sep 12, 2024

Yes you're right I found some potential issues. I’ve updated it to acquire the write lock at the start

redis/src/cluster_async/mod.rs Outdated Show resolved Hide resolved

redis/src/cluster_async/mod.rs Outdated Show resolved Hide resolved

redis/src/cluster_async/mod.rs

+                      }
+                      // Scenario 5: New Node - The new primary is not present in the current slots map, add it as a primary of a new shard.
+                      drop(nodes_iter);

eifrah-aws Sep 11, 2024

Why the explicit drop here? won't it be dropped when leaving the function?

Author

barshaul Sep 12, 2024

I have to drop it bc it borrows connection container as immutable and we need to do some mutable borrowing next line

redis/src/cluster_routing.rs Outdated Show resolved Hide resolved

redis/src/cluster_routing.rs Outdated Show resolved Hide resolved

redis/src/cluster_routing.rs Outdated Show resolved Hide resolved

barshaul added 3 commits

September 12, 2024 07:24


          wrap with rw lock only inside the shardAddrs struct

631bb52


          Changed tohe ShardAddrs struct to hold locks internally rather than w…

bda970f

…rapping it


          Addressing comments

58564f6

barshaul force-pushed the update_on_moves branch from 9ce7c58 to 58564f6 Compare

September 12, 2024 08:40

barshaul requested a review from eifrah-aws

September 12, 2024 08:41

eifrah-aws approved these changes

View reviewed changes

barshaul merged commit c48287a into amazon-contributing:update_slotmap_moved

10 checks passed

barshaul mentioned this pull request

SlotMap refactor - Added NodesMap, Update the slot map upon MOVED errors #190

Open

barshaul added a commit that referenced this pull request


          Added logic to update the slot map based on MOVED errors (#186)

bb5eb32

barshaul mentioned this pull request

Core: SlotMap refactor - Added NodesMap, Update the slot map upon MOVED errors valkey-io/valkey-glide#2682

Merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet