-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added logic to update the slot map based on MOVED errors #186
Added logic to update the slot map based on MOVED errors #186
Conversation
…hard between shard nodes and slot map values
f41533b
to
b0a3e5c
Compare
a374c59
to
e17a802
Compare
e17a802
to
7b675bf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please consider changing Arc<RwLock<ShardAddr>>
into a clean API so ShardAddr
will manage the locks internally and we can pass Arc<ShardAddr>
redis/src/cluster_async/mod.rs
Outdated
// If it fails, proceed by retrying the request with the redirected node, | ||
// and allow the slot refresh task to correct the slot map. | ||
warn!( | ||
"Failed to update the slot map based on the received MOVED error.\n |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Avoid using
\n
in the log as it create line without its prefix. Please use 2 xwarn!
calls - If it is OK to fail, why use
warn!
? (this is a good place to use telemetry call - when its available)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack, changed to info!
redis/src/cluster_async/mod.rs
Outdated
|
||
if curr_shard_addrs_read.replicas().contains(&new_primary) { | ||
// Scenario 2: Failover - The new primary is a replica within the same shard | ||
drop(curr_shard_addrs_read); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quesiton: we are dropping and then re-acquiring it, meaning: we are losing atomicity - does this may yield a problem for us?.
If it does, it better to obtain write
lock to begin with
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes you're right I found some potential issues. I’ve updated it to acquire the write lock at the start
} | ||
|
||
// Scenario 5: New Node - The new primary is not present in the current slots map, add it as a primary of a new shard. | ||
drop(nodes_iter); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the explicit drop here? won't it be dropped when leaving the function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to drop it bc it borrows connection container as immutable and we need to do some mutable borrowing next line
9ce7c58
to
58564f6
Compare
c48287a
into
amazon-contributing:update_slotmap_moved
Added logic to update the slot map based on MOVED errors. Now, when a MOVED error is received, the client will push a new future before retrying the request to attempt to update the slot map based on the new primary node, while still spawning a background task to refresh slots. This provides an optimization to ensure quicker rerouting of requests without waiting for the background slot refresh which can be skipped due to the rate limiter.
The updated logic handles several scenarios:
If the slot map update fails, the request is retried with the moved redirect node, and the background slot refresh task will correct the map asynchronously.