New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Nearest dc balancer #206

Open

bakalover wants to merge 42 commits into ydb-platform:master from bakalover:master

bakalover commented Aug 4, 2024 •

edited

Loading

No description provided.

bakalover added 12 commits

July 17, 2024 14:07


          init structure of balancer

3b804d0


          Support location inside NodeInfo

44e86a4


          Balancer logic (need to optimize)

b65b0d0


          Fire to start computation, await on sending addr ready, Pretty vars

76b9dbc


          Structured concurrency

6a999e3


          Construct NodeInfo with location

aa441be


          Balancr backup strategy options


          first draft

8072d92


          less diff

1156ef7


          less diff

4a9f1d9


          less diff

fcfa8c9


          info about location inside error trace

03d82ce

bakalover marked this pull request as draft

August 4, 2024 14:32

bakalover added 3 commits

August 8, 2024 12:16


          config

55eb7f0


          no need in services yet

25d8e4b


          lint checks

e97bb2a

Author

bakalover commented Aug 8, 2024 •

edited

Loading

squash
tests

bakalover added 2 commits

August 8, 2024 12:55


          balancer config

aa9f207


          non experimental default

85c86c9

rekby reviewed

View reviewed changes

Member

rekby left a comment

Hello, thanks for the draft :)

I suggest few ideas for that state

ydb/src/load_balancer.rs Outdated Show resolved Hide resolved

ydb/src/load_balancer.rs Outdated Show resolved Hide resolved

ydb/src/load_balancer.rs Outdated Show resolved Hide resolved

ydb/src/load_balancer.rs Outdated Show resolved Hide resolved

ydb/src/load_balancer.rs Outdated Show resolved Hide resolved

bakalover added 3 commits

August 9, 2024 13:19


          naming

7f2fc72


          random - default fallback

9b233ec


          static config

2c8f650

bakalover marked this pull request as ready for review

August 9, 2024 06:50

bakalover added 6 commits

August 11, 2024 18:36


          all waiter

e4abe1d


          share state between self and child Balancer and wait alll if needed +…

97eb2cc

… generalize child balancer


          better concurrent wait on waiters awaits

3b9a944


          reduce imports

e139cef


          non-blocking balancer with full cancellation control and tracing

30caee3


          optimize imports

dd8bf2f


          client waits for balancer set up its state both in waiter and on endp…

4be1624

…oint()

rekby linked an issue

that may be closed by this pull request

Rust Detect and select nearest DB by tcp-pings #204

Open

bakalover added 4 commits

August 31, 2024 15:27


          pretty + tests on internal non-async functions

2212b32


          ci restart

a9e787c


          Fix deadlock on buffer send by producers, fixed livelock on infinite …

dde7b9b

…loop if continue from error. Added tests


          clear

103a410

rekby reviewed

View reviewed changes

ydb/src/load_balancer.rs Outdated Show resolved Hide resolved

ydb/src/load_balancer.rs Outdated

+              }
+              const NODES_PER_DC: usize = 5;
+              const PING_TIMEOUT_SECS: u64 = 60;

Member

rekby Sep 6, 2024

what about use discovery/ping interval for ping timeout?

Author

bakalover Sep 7, 2024

60 sec - interval for TimerDiscovery renewal (there is also no predefined constants)
https://github.com/ydb-platform/ydb-rs-sdk/blob/master/ydb/src/client_builder.rs#L283

ydb/src/load_balancer.rs Outdated Show resolved Hide resolved

ydb/src/load_balancer.rs Outdated Show resolved Hide resolved

ydb/src/load_balancer.rs Outdated Show resolved Hide resolved

ydb/src/load_balancer.rs Outdated Show resolved Hide resolved

ydb/src/load_balancer.rs Outdated Show resolved Hide resolved

ydb/src/load_balancer.rs Outdated Show resolved Hide resolved

ydb/src/load_balancer.rs Outdated Show resolved Hide resolved

bakalover added 11 commits

September 7, 2024 15:57


          Move fallback balancer inside fallback strategy

94f00cf


          Choose random endpoint from prefered

9f92b50


          Iter over only values in dc_to_nodes

57d2627


          Divide into multiple files

5661d85


          Address producers do nothing on ping error

fa2e570


          Check exact map

a25f949


          Better random shuffle test

0b83dad


          No clones

a225958


          Remove useless test

6499c41


          pub(crate)

4c2071d

fix

079f46c

rekby reviewed

View reviewed changes

Member

rekby left a comment

thanks for the great fork

I have only few small comments

ydb/src/load_balancer/nearest_dc_balancer.rs

+              impl NearestDCBalancer {
+                  fn get_endpoint(&self, service: Service) -> YdbResult<Uri> {
+                      match self.balancer_state.try_lock() {

Member

rekby Oct 9, 2024

use lock instead of try_lock: we don`t need fast fail at the path: if code doesn't see endpoint it will return to retrier. The retraier will see custom error and fail transaction.

Better wait until update state finish: it is fast operation.

the method should return a error is the ballancer is empty

ydb/src/load_balancer/nearest_dc_balancer.rs

+                                  });
+                                  match Self::find_local_dc(&to_check).await {
+                                      Ok(dc) => {
+                                          info!("found new local dc:{}", dc);

Member

rekby Oct 9, 2024

it is often message, not intresting when all ok. Better to use debug for the message

ydb/src/load_balancer/nearest_dc_balancer.rs

+                  ping_token: CancellationToken,
+                  waiter: Arc<WaiterImpl>,
+                  config: BalancerConfig,
+                  balancer_state: Arc<Mutex<BalancerState>>,

Member

rekby Oct 9, 2024

what about use RWLock instead of Mutex?
it will allow multiply reads in parallel and block for update state only

ydb/src/load_balancer/nearest_dc_balancer.rs

+                      new_nodes: &Vec<NodeInfo>,
+                      local_dc: String,
+                  ) {
+                      info!("adjusting endpoints");

Member

rekby Oct 9, 2024

it should be debug or trace log level

ydb/src/load_balancer/nearest_dc_balancer.rs

+                              tokio::select! {
+                                  biased; // check timeout first
+                                  _ = interrupt_collector_future.cancelled() =>{
+                                      Self::join_all(&mut nursery).await; // Children will be cancelled due to tokens chaining, see (*)

Member

rekby Oct 9, 2024

what mean the comment: "see (*)"?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet