Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to ReplicaOrdering.RANDOM for select LBPs #32

Merged
merged 1 commit into from
Nov 19, 2024

Conversation

Bouncheck
Copy link
Contributor

This setting has the benefit of evenly distributing the load across replicas. Using round robin policies with NEUTRAL ordering can easily lead to spikes in load on singular nodes during cluster grow and uneven workload afterwards when using tablets.

The reason for not switching to RANDOM for rack aware LBP right now is that it is slightly broken in that configuration.
See java-driver/369.

This setting has the benefit of evenly distributing the load across replicas.
Using round robin policies with `NEUTRAL` ordering can easily lead to spikes in
load on singular nodes during cluster grow and uneven workload afterwards when
using tablets.

The reason for not switching to `RANDOM` for rack aware LBP right now is that
it is slightly broken in that configuration.
See java-driver/369.
@Bouncheck
Copy link
Contributor Author

Previous switch to ReplicaOrdering.NEUTRAL was done in response to this issue scylladb/java-driver#255
While that resolved that issue it silently introduced another. I think the underlying cause for 255 could be the problem I'm describing in scylladb/java-driver#369 . It would match why the driver distributed the load across local DC instead of local rack.


if (settings.node.rack != null) {
RackAwareRoundRobinPolicy.Builder policyBuilder = RackAwareRoundRobinPolicy.builder();
if (settings.node.datacenter != null)
policyBuilder.withLocalDc(settings.node.datacenter);
policyBuilder = policyBuilder.withLocalRack(settings.node.rack);
ret = policyBuilder.build();
replicaOrdering = ReplicaOrdering.NEUTRAL;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this mean that a mix of using Rackaware and tablets would be imbalanced ?

and would be needed to be fix on the driver end ?

Copy link
Contributor Author

@Bouncheck Bouncheck Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All round robin policies (rack,dc) used with TokenAwarePolicy can be imbalanced with tablets when using neutral ordering. I think this combination should not be used if we want load to be as balanced as possible. Long story short let's say we have RF=3 and 6 nodes [A,B,C,D,E,F] and tablets are spread evenly but only on ABC (this can happen when growing the cluster). If round robin happens to point to either D,E,F,A then that request will hit replica A first. This results in A getting 4/6 of the load, B 1/6, and C 1/6. However if RF=3 and cluster has only A,B,C then all will be nearly perfectly balanced.

I'll try to make a comment with broader explanation what happens in scylladb/scylladb#19107 to better illustrate this issue with neutral ordering.

The rack aware one just does not work correctly (will ignore rack awareness in favor of local dc) with random ordering so it has to stay neutral for now. This is the part that needs fixing on driver's end.

Copy link

@fruch fruch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@CodeLieutenant CodeLieutenant merged commit 58ff646 into scylladb:master Nov 19, 2024
2 checks passed
@Bouncheck Bouncheck self-assigned this Nov 20, 2024
CodeLieutenant added a commit to CodeLieutenant/scylla-cluster-tests that referenced this pull request Nov 21, 2024
Main reason for version change:

Using cassandra-stress 3.17 to mittigate
- Switch to ReplicaOrdering.RANDOM for select LBPs
  [32](scylladb/cassandra-stress#32)

Other Noticable Changes since the last version used in SCT:

- Add support for hostname verification
  [31](scylladb/cassandra-stress#31)
- Print thread dump on specific signals
  [27](scylladb/cassandra-stress#27)
- Replace uninterruptible wait
  [26](scylladb/cassandra-stress#26)
- Make it use DCAwareRoundRobinPolicy unless rack is provided
  [21](scylladb/cassandra-stress#21)
- feature(docker): adding support for dependabot
  [19](scylladb/cassandra-stress#19)

Signed-off-by: Dusan Malusev <[email protected]>
CodeLieutenant added a commit to CodeLieutenant/scylla-cluster-tests that referenced this pull request Nov 21, 2024
Main reason for version change:

Using cassandra-stress 3.17 to mittigate
- Switch to ReplicaOrdering.RANDOM for select LBPs
  [32](scylladb/cassandra-stress#32)

Other Noticable Changes since the last version used in SCT:

- Add support for hostname verification
  [31](scylladb/cassandra-stress#31)
- Print thread dump on specific signals
  [27](scylladb/cassandra-stress#27)
- Replace uninterruptible wait
  [26](scylladb/cassandra-stress#26)
- Make it use DCAwareRoundRobinPolicy unless rack is provided
  [21](scylladb/cassandra-stress#21)
- feature(docker): adding support for dependabot
  [19](scylladb/cassandra-stress#19)

Signed-off-by: Dusan Malusev <[email protected]>
CodeLieutenant added a commit to CodeLieutenant/scylla-cluster-tests that referenced this pull request Nov 23, 2024
Main reason for version change:

Using cassandra-stress 3.17 to mittigate
- Switch to ReplicaOrdering.RANDOM for select LBPs
  [32](scylladb/cassandra-stress#32)

Other Noticable Changes since the last version used in SCT:

- Add support for hostname verification
  [31](scylladb/cassandra-stress#31)
- Print thread dump on specific signals
  [27](scylladb/cassandra-stress#27)
- Replace uninterruptible wait
  [26](scylladb/cassandra-stress#26)
- Make it use DCAwareRoundRobinPolicy unless rack is provided
  [21](scylladb/cassandra-stress#21)
- feature(docker): adding support for dependabot
  [19](scylladb/cassandra-stress#19)

Signed-off-by: Dusan Malusev <[email protected]>
CodeLieutenant added a commit to CodeLieutenant/scylla-cluster-tests that referenced this pull request Nov 23, 2024
Main reason for version change:

Using cassandra-stress 3.17 to mittigate
- Switch to ReplicaOrdering.RANDOM for select LBPs
  [32](scylladb/cassandra-stress#32)

Other Noticable Changes since the last version used in SCT:

- Add support for hostname verification
  [31](scylladb/cassandra-stress#31)
- Print thread dump on specific signals
  [27](scylladb/cassandra-stress#27)
- Replace uninterruptible wait
  [26](scylladb/cassandra-stress#26)
- Make it use DCAwareRoundRobinPolicy unless rack is provided
  [21](scylladb/cassandra-stress#21)
- feature(docker): adding support for dependabot
  [19](scylladb/cassandra-stress#19)

Signed-off-by: Dusan Malusev <[email protected]>
fruch pushed a commit to scylladb/scylla-cluster-tests that referenced this pull request Nov 26, 2024
Main reason for version change:

Using cassandra-stress 3.17 to mittigate
- Switch to ReplicaOrdering.RANDOM for select LBPs
  [32](scylladb/cassandra-stress#32)

Other Noticable Changes since the last version used in SCT:

- Add support for hostname verification
  [31](scylladb/cassandra-stress#31)
- Print thread dump on specific signals
  [27](scylladb/cassandra-stress#27)
- Replace uninterruptible wait
  [26](scylladb/cassandra-stress#26)
- Make it use DCAwareRoundRobinPolicy unless rack is provided
  [21](scylladb/cassandra-stress#21)
- feature(docker): adding support for dependabot
  [19](scylladb/cassandra-stress#19)

Signed-off-by: Dusan Malusev <[email protected]>
mergify bot pushed a commit to scylladb/scylla-cluster-tests that referenced this pull request Nov 26, 2024
Main reason for version change:

Using cassandra-stress 3.17 to mittigate
- Switch to ReplicaOrdering.RANDOM for select LBPs
  [32](scylladb/cassandra-stress#32)

Other Noticable Changes since the last version used in SCT:

- Add support for hostname verification
  [31](scylladb/cassandra-stress#31)
- Print thread dump on specific signals
  [27](scylladb/cassandra-stress#27)
- Replace uninterruptible wait
  [26](scylladb/cassandra-stress#26)
- Make it use DCAwareRoundRobinPolicy unless rack is provided
  [21](scylladb/cassandra-stress#21)
- feature(docker): adding support for dependabot
  [19](scylladb/cassandra-stress#19)

Signed-off-by: Dusan Malusev <[email protected]>
(cherry picked from commit 02997a6)

# Conflicts:
#	defaults/docker_images/cassandra-stress/values_cassandra-stress.yaml
mergify bot pushed a commit to scylladb/scylla-cluster-tests that referenced this pull request Nov 26, 2024
Main reason for version change:

Using cassandra-stress 3.17 to mittigate
- Switch to ReplicaOrdering.RANDOM for select LBPs
  [32](scylladb/cassandra-stress#32)

Other Noticable Changes since the last version used in SCT:

- Add support for hostname verification
  [31](scylladb/cassandra-stress#31)
- Print thread dump on specific signals
  [27](scylladb/cassandra-stress#27)
- Replace uninterruptible wait
  [26](scylladb/cassandra-stress#26)
- Make it use DCAwareRoundRobinPolicy unless rack is provided
  [21](scylladb/cassandra-stress#21)
- feature(docker): adding support for dependabot
  [19](scylladb/cassandra-stress#19)

Signed-off-by: Dusan Malusev <[email protected]>
(cherry picked from commit 02997a6)

# Conflicts:
#	defaults/docker_images/cassandra-stress/values_cassandra-stress.yaml
mergify bot pushed a commit to scylladb/scylla-cluster-tests that referenced this pull request Nov 26, 2024
Main reason for version change:

Using cassandra-stress 3.17 to mittigate
- Switch to ReplicaOrdering.RANDOM for select LBPs
  [32](scylladb/cassandra-stress#32)

Other Noticable Changes since the last version used in SCT:

- Add support for hostname verification
  [31](scylladb/cassandra-stress#31)
- Print thread dump on specific signals
  [27](scylladb/cassandra-stress#27)
- Replace uninterruptible wait
  [26](scylladb/cassandra-stress#26)
- Make it use DCAwareRoundRobinPolicy unless rack is provided
  [21](scylladb/cassandra-stress#21)
- feature(docker): adding support for dependabot
  [19](scylladb/cassandra-stress#19)

Signed-off-by: Dusan Malusev <[email protected]>
(cherry picked from commit 02997a6)
mergify bot pushed a commit to scylladb/scylla-cluster-tests that referenced this pull request Nov 26, 2024
Main reason for version change:

Using cassandra-stress 3.17 to mittigate
- Switch to ReplicaOrdering.RANDOM for select LBPs
  [32](scylladb/cassandra-stress#32)

Other Noticable Changes since the last version used in SCT:

- Add support for hostname verification
  [31](scylladb/cassandra-stress#31)
- Print thread dump on specific signals
  [27](scylladb/cassandra-stress#27)
- Replace uninterruptible wait
  [26](scylladb/cassandra-stress#26)
- Make it use DCAwareRoundRobinPolicy unless rack is provided
  [21](scylladb/cassandra-stress#21)
- feature(docker): adding support for dependabot
  [19](scylladb/cassandra-stress#19)

Signed-off-by: Dusan Malusev <[email protected]>
(cherry picked from commit 02997a6)

# Conflicts:
#	defaults/docker_images/cassandra-stress/values_cassandra-stress.yaml
mergify bot pushed a commit to scylladb/scylla-cluster-tests that referenced this pull request Nov 26, 2024
Main reason for version change:

Using cassandra-stress 3.17 to mittigate
- Switch to ReplicaOrdering.RANDOM for select LBPs
  [32](scylladb/cassandra-stress#32)

Other Noticable Changes since the last version used in SCT:

- Add support for hostname verification
  [31](scylladb/cassandra-stress#31)
- Print thread dump on specific signals
  [27](scylladb/cassandra-stress#27)
- Replace uninterruptible wait
  [26](scylladb/cassandra-stress#26)
- Make it use DCAwareRoundRobinPolicy unless rack is provided
  [21](scylladb/cassandra-stress#21)
- feature(docker): adding support for dependabot
  [19](scylladb/cassandra-stress#19)

Signed-off-by: Dusan Malusev <[email protected]>
(cherry picked from commit 02997a6)
mergify bot pushed a commit to scylladb/scylla-cluster-tests that referenced this pull request Nov 26, 2024
Main reason for version change:

Using cassandra-stress 3.17 to mittigate
- Switch to ReplicaOrdering.RANDOM for select LBPs
  [32](scylladb/cassandra-stress#32)

Other Noticable Changes since the last version used in SCT:

- Add support for hostname verification
  [31](scylladb/cassandra-stress#31)
- Print thread dump on specific signals
  [27](scylladb/cassandra-stress#27)
- Replace uninterruptible wait
  [26](scylladb/cassandra-stress#26)
- Make it use DCAwareRoundRobinPolicy unless rack is provided
  [21](scylladb/cassandra-stress#21)
- feature(docker): adding support for dependabot
  [19](scylladb/cassandra-stress#19)

Signed-off-by: Dusan Malusev <[email protected]>
(cherry picked from commit 02997a6)
mergify bot pushed a commit to scylladb/scylla-cluster-tests that referenced this pull request Nov 26, 2024
Main reason for version change:

Using cassandra-stress 3.17 to mittigate
- Switch to ReplicaOrdering.RANDOM for select LBPs
  [32](scylladb/cassandra-stress#32)

Other Noticable Changes since the last version used in SCT:

- Add support for hostname verification
  [31](scylladb/cassandra-stress#31)
- Print thread dump on specific signals
  [27](scylladb/cassandra-stress#27)
- Replace uninterruptible wait
  [26](scylladb/cassandra-stress#26)
- Make it use DCAwareRoundRobinPolicy unless rack is provided
  [21](scylladb/cassandra-stress#21)
- feature(docker): adding support for dependabot
  [19](scylladb/cassandra-stress#19)

Signed-off-by: Dusan Malusev <[email protected]>
(cherry picked from commit 02997a6)
fruch pushed a commit to scylladb/scylla-cluster-tests that referenced this pull request Nov 27, 2024
Main reason for version change:

Using cassandra-stress 3.17 to mittigate
- Switch to ReplicaOrdering.RANDOM for select LBPs
  [32](scylladb/cassandra-stress#32)

Other Noticable Changes since the last version used in SCT:

- Add support for hostname verification
  [31](scylladb/cassandra-stress#31)
- Print thread dump on specific signals
  [27](scylladb/cassandra-stress#27)
- Replace uninterruptible wait
  [26](scylladb/cassandra-stress#26)
- Make it use DCAwareRoundRobinPolicy unless rack is provided
  [21](scylladb/cassandra-stress#21)
- feature(docker): adding support for dependabot
  [19](scylladb/cassandra-stress#19)

Signed-off-by: Dusan Malusev <[email protected]>
(cherry picked from commit 02997a6)
fruch pushed a commit to scylladb/scylla-cluster-tests that referenced this pull request Nov 27, 2024
Main reason for version change:

Using cassandra-stress 3.17 to mittigate
- Switch to ReplicaOrdering.RANDOM for select LBPs
  [32](scylladb/cassandra-stress#32)

Other Noticable Changes since the last version used in SCT:

- Add support for hostname verification
  [31](scylladb/cassandra-stress#31)
- Print thread dump on specific signals
  [27](scylladb/cassandra-stress#27)
- Replace uninterruptible wait
  [26](scylladb/cassandra-stress#26)
- Make it use DCAwareRoundRobinPolicy unless rack is provided
  [21](scylladb/cassandra-stress#21)
- feature(docker): adding support for dependabot
  [19](scylladb/cassandra-stress#19)

Signed-off-by: Dusan Malusev <[email protected]>
(cherry picked from commit 02997a6)
fruch pushed a commit to scylladb/scylla-cluster-tests that referenced this pull request Nov 27, 2024
Main reason for version change:

Using cassandra-stress 3.17 to mittigate
- Switch to ReplicaOrdering.RANDOM for select LBPs
  [32](scylladb/cassandra-stress#32)

Other Noticable Changes since the last version used in SCT:

- Add support for hostname verification
  [31](scylladb/cassandra-stress#31)
- Print thread dump on specific signals
  [27](scylladb/cassandra-stress#27)
- Replace uninterruptible wait
  [26](scylladb/cassandra-stress#26)
- Make it use DCAwareRoundRobinPolicy unless rack is provided
  [21](scylladb/cassandra-stress#21)
- feature(docker): adding support for dependabot
  [19](scylladb/cassandra-stress#19)

Signed-off-by: Dusan Malusev <[email protected]>
(cherry picked from commit 02997a6)
fruch pushed a commit to scylladb/scylla-cluster-tests that referenced this pull request Nov 27, 2024
Main reason for version change:

Using cassandra-stress 3.17 to mittigate
- Switch to ReplicaOrdering.RANDOM for select LBPs
  [32](scylladb/cassandra-stress#32)

Other Noticable Changes since the last version used in SCT:

- Add support for hostname verification
  [31](scylladb/cassandra-stress#31)
- Print thread dump on specific signals
  [27](scylladb/cassandra-stress#27)
- Replace uninterruptible wait
  [26](scylladb/cassandra-stress#26)
- Make it use DCAwareRoundRobinPolicy unless rack is provided
  [21](scylladb/cassandra-stress#21)
- feature(docker): adding support for dependabot
  [19](scylladb/cassandra-stress#19)

Signed-off-by: Dusan Malusev <[email protected]>
(cherry picked from commit 02997a6)
fruch pushed a commit to scylladb/scylla-cluster-tests that referenced this pull request Nov 28, 2024
Main reason for version change:

Using cassandra-stress 3.17 to mittigate
- Switch to ReplicaOrdering.RANDOM for select LBPs
  [32](scylladb/cassandra-stress#32)

Other Noticable Changes since the last version used in SCT:

- Add support for hostname verification
  [31](scylladb/cassandra-stress#31)
- Print thread dump on specific signals
  [27](scylladb/cassandra-stress#27)
- Replace uninterruptible wait
  [26](scylladb/cassandra-stress#26)
- Make it use DCAwareRoundRobinPolicy unless rack is provided
  [21](scylladb/cassandra-stress#21)
- feature(docker): adding support for dependabot
  [19](scylladb/cassandra-stress#19)

Signed-off-by: Dusan Malusev <[email protected]>
(cherry picked from commit 02997a6)
fruch pushed a commit to scylladb/scylla-cluster-tests that referenced this pull request Nov 28, 2024
Main reason for version change:

Using cassandra-stress 3.17 to mittigate
- Switch to ReplicaOrdering.RANDOM for select LBPs
  [32](scylladb/cassandra-stress#32)

Other Noticable Changes since the last version used in SCT:

- Add support for hostname verification
  [31](scylladb/cassandra-stress#31)
- Print thread dump on specific signals
  [27](scylladb/cassandra-stress#27)
- Replace uninterruptible wait
  [26](scylladb/cassandra-stress#26)
- Make it use DCAwareRoundRobinPolicy unless rack is provided
  [21](scylladb/cassandra-stress#21)
- feature(docker): adding support for dependabot
  [19](scylladb/cassandra-stress#19)

Signed-off-by: Dusan Malusev <[email protected]>
(cherry picked from commit 02997a6)
fruch pushed a commit to scylladb/scylla-cluster-tests that referenced this pull request Nov 28, 2024
Main reason for version change:

Using cassandra-stress 3.17 to mittigate
- Switch to ReplicaOrdering.RANDOM for select LBPs
  [32](scylladb/cassandra-stress#32)

Other Noticable Changes since the last version used in SCT:

- Add support for hostname verification
  [31](scylladb/cassandra-stress#31)
- Print thread dump on specific signals
  [27](scylladb/cassandra-stress#27)
- Replace uninterruptible wait
  [26](scylladb/cassandra-stress#26)
- Make it use DCAwareRoundRobinPolicy unless rack is provided
  [21](scylladb/cassandra-stress#21)
- feature(docker): adding support for dependabot
  [19](scylladb/cassandra-stress#19)

Signed-off-by: Dusan Malusev <[email protected]>
(cherry picked from commit 02997a6)
fruch pushed a commit to scylladb/scylla-cluster-tests that referenced this pull request Nov 28, 2024
Main reason for version change:

Using cassandra-stress 3.17 to mittigate
- Switch to ReplicaOrdering.RANDOM for select LBPs
  [32](scylladb/cassandra-stress#32)

Other Noticable Changes since the last version used in SCT:

- Add support for hostname verification
  [31](scylladb/cassandra-stress#31)
- Print thread dump on specific signals
  [27](scylladb/cassandra-stress#27)
- Replace uninterruptible wait
  [26](scylladb/cassandra-stress#26)
- Make it use DCAwareRoundRobinPolicy unless rack is provided
  [21](scylladb/cassandra-stress#21)
- feature(docker): adding support for dependabot
  [19](scylladb/cassandra-stress#19)

Signed-off-by: Dusan Malusev <[email protected]>
(cherry picked from commit 02997a6)
fruch pushed a commit to scylladb/scylla-cluster-tests that referenced this pull request Nov 28, 2024
Main reason for version change:

Using cassandra-stress 3.17 to mittigate
- Switch to ReplicaOrdering.RANDOM for select LBPs
  [32](scylladb/cassandra-stress#32)

Other Noticable Changes since the last version used in SCT:

- Add support for hostname verification
  [31](scylladb/cassandra-stress#31)
- Print thread dump on specific signals
  [27](scylladb/cassandra-stress#27)
- Replace uninterruptible wait
  [26](scylladb/cassandra-stress#26)
- Make it use DCAwareRoundRobinPolicy unless rack is provided
  [21](scylladb/cassandra-stress#21)
- feature(docker): adding support for dependabot
  [19](scylladb/cassandra-stress#19)

Signed-off-by: Dusan Malusev <[email protected]>
(cherry picked from commit 02997a6)

# Conflicts:
#	defaults/docker_images/cassandra-stress/values_cassandra-stress.yaml
fruch pushed a commit to scylladb/scylla-cluster-tests that referenced this pull request Dec 1, 2024
Main reason for version change:

Using cassandra-stress 3.17 to mittigate
- Switch to ReplicaOrdering.RANDOM for select LBPs
  [32](scylladb/cassandra-stress#32)

Other Noticable Changes since the last version used in SCT:

- Add support for hostname verification
  [31](scylladb/cassandra-stress#31)
- Print thread dump on specific signals
  [27](scylladb/cassandra-stress#27)
- Replace uninterruptible wait
  [26](scylladb/cassandra-stress#26)
- Make it use DCAwareRoundRobinPolicy unless rack is provided
  [21](scylladb/cassandra-stress#21)
- feature(docker): adding support for dependabot
  [19](scylladb/cassandra-stress#19)

Signed-off-by: Dusan Malusev <[email protected]>
(cherry picked from commit 02997a6)

# Conflicts:
#	defaults/docker_images/cassandra-stress/values_cassandra-stress.yaml
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants