-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nearest dc balancer #206
base: master
Are you sure you want to change the base?
Nearest dc balancer #206
Conversation
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello, thanks for the draft :)
I suggest few ideas for that state
…loop if continue from error. Added tests
ydb/src/load_balancer.rs
Outdated
} | ||
|
||
const NODES_PER_DC: usize = 5; | ||
const PING_TIMEOUT_SECS: u64 = 60; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about use discovery/ping interval for ping timeout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
60 sec - interval for TimerDiscovery renewal (there is also no predefined constants)
https://github.com/ydb-platform/ydb-rs-sdk/blob/master/ydb/src/client_builder.rs#L283
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the great fork
I have only few small comments
|
||
impl NearestDCBalancer { | ||
fn get_endpoint(&self, service: Service) -> YdbResult<Uri> { | ||
match self.balancer_state.try_lock() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use lock instead of try_lock: we don`t need fast fail at the path: if code doesn't see endpoint it will return to retrier. The retraier will see custom error and fail transaction.
Better wait until update state finish: it is fast operation.
the method should return a error is the ballancer is empty
}); | ||
match Self::find_local_dc(&to_check).await { | ||
Ok(dc) => { | ||
info!("found new local dc:{}", dc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is often message, not intresting when all ok. Better to use debug for the message
ping_token: CancellationToken, | ||
waiter: Arc<WaiterImpl>, | ||
config: BalancerConfig, | ||
balancer_state: Arc<Mutex<BalancerState>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about use RWLock instead of Mutex?
it will allow multiply reads in parallel and block for update state only
new_nodes: &Vec<NodeInfo>, | ||
local_dc: String, | ||
) { | ||
info!("adjusting endpoints"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should be debug or trace log level
tokio::select! { | ||
biased; // check timeout first | ||
_ = interrupt_collector_future.cancelled() =>{ | ||
Self::join_all(&mut nursery).await; // Children will be cancelled due to tokens chaining, see (*) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what mean the comment: "see (*)"?
No description provided.