Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase replication factor for some blocks on store-gateways #9944

Open
56quarters opened this issue Nov 18, 2024 · 0 comments · May be fixed by #10382
Open

Increase replication factor for some blocks on store-gateways #9944

56quarters opened this issue Nov 18, 2024 · 0 comments · May be fixed by #10382
Assignees
Labels

Comments

@56quarters
Copy link
Contributor

Each block owned by store-gateways is replicated to three store-gateways. When there are many queries that touch a particular block this can result in unbalanced CPU usage between store-gateways leading to higher costs. By far, recent blocks are queried more than older blocks.

From an internal cluster, we see that most queries only touch the most recent data:

  • ~92% of Select() calls that hit store-gateways touch data from the last 25h
  • ~50% of Select() calls that hit store-gateways touch data from the last 73h
  • Less than 1% of Select() calls that hit store-gateways touch data older than 28d ago
  • Less than 0.1% of Select() calls that hit store-gateways touch data older than 30d ago

In order to support spreading load for more recent blocks to more store-gateways, we should introduce the ability to override the configured replication factor (three by default) to something higher. The mechanism for picking overridden replication factor may be configurable or may be based on a variety of factors.

This issue proposed to add the ability to override the replication factor and default behavior or doing this based on the age or duration of blocks and iterating on the exact behavior in further PRs.

copied from an internal issue and discussion

56quarters added a commit to grafana/dskit that referenced this issue Nov 18, 2024
This change adds a new method that accepts 0 or more `Option` instances
that modify the behavior of the call. These options can (currently) be
used to adjust the replication factor for a particular key or use buffers
to avoid excessive allocation.

Part of grafana/mimir#9944
56quarters added a commit to grafana/dskit that referenced this issue Jan 3, 2025
This change adds a new method that accepts 0 or more `Option` instances
that modify the behavior of the call. These options can (currently) be
used to adjust the replication factor for a particular key or use buffers
to avoid excessive allocation.

The most notable changes are in the `Ring.findInstancesForKey` method
which is the core of the `Ring.Get` method. Instead of keeping track
of distinct zones and assuming that only a single instance per zone
would ever be returned, we keep a map of the number of instances
found in each zone.

Part of grafana/mimir#9944
56quarters added a commit to grafana/dskit that referenced this issue Jan 3, 2025
This change adds a new method that accepts 0 or more `Option` instances
that modify the behavior of the call. These options can (currently) be
used to adjust the replication factor for a particular key or use buffers
to avoid excessive allocation.

The most notable changes are in the `Ring.findInstancesForKey` method
which is the core of the `Ring.Get` method. Instead of keeping track
of distinct zones and assuming that only a single instance per zone
would ever be returned, we keep a map of the number of instances
found in each zone.

Part of grafana/mimir#9944
56quarters added a commit to grafana/dskit that referenced this issue Jan 3, 2025
This change adds a new method that accepts 0 or more `Option` instances
that modify the behavior of the call. These options can (currently) be
used to adjust the replication factor for a particular key or use buffers
to avoid excessive allocation.

The most notable changes are in the `Ring.findInstancesForKey` method
which is the core of the `Ring.Get` method. Instead of keeping track
of distinct zones and assuming that only a single instance per zone
would ever be returned, we keep a map of the number of instances
found in each zone.

Part of grafana/mimir#9944
56quarters added a commit to grafana/dskit that referenced this issue Jan 6, 2025
This change adds a new method that accepts 0 or more `Option` instances
that modify the behavior of the call. These options can (currently) be
used to adjust the replication factor for a particular key or use buffers
to avoid excessive allocation.

The most notable changes are in the `Ring.findInstancesForKey` method
which is the core of the `Ring.Get` method. Instead of keeping track
of distinct zones and assuming that only a single instance per zone
would ever be returned, we keep a map of the number of instances
found in each zone.

Part of grafana/mimir#9944
56quarters added a commit to grafana/dskit that referenced this issue Jan 13, 2025
This change adds a new method that accepts 0 or more `Option` instances
that modify the behavior of the call. These options can (currently) be
used to adjust the replication factor for a particular key or use buffers
to avoid excessive allocation.

The most notable changes are in the `Ring.findInstancesForKey` method
which is the core of the `Ring.Get` method. Instead of keeping track
of distinct zones and assuming that only a single instance per zone
would ever be returned, we keep a map of the number of instances
found in each zone.

Part of grafana/mimir#9944
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants