Skip to content

Commit

Permalink
Add zone config troubleshooting guide v1
Browse files Browse the repository at this point in the history
Fixes DOC-9210

Summary of changes:

- Add a new page, 'Troubleshoot Replication Zones', to _The
  ZoneConfigonomicon (tm)_

- Update the 'Replication controls' page with more detailed info re:
  zone config inheritance hierarchy and behavior

- Fix incorrect statements on the `ALTER RANGE` page since they're
  needed to map from range IDs returned by the critical nodes
  endpoint (mentioned in 'Troubleshoot Replication Zones') to actual
  schema objects

- Add moar links (tm) from various zone config-related pages to the new
  troubleshooting guide and amongst themselves

- Add a note to various zone config-related docs saying "most users
  should not do manual zone config changes, see Multi-region SQL and
  Zone Config Extensions instead"
  • Loading branch information
rmloveland committed Jan 27, 2025
1 parent 9c326ff commit d460824
Show file tree
Hide file tree
Showing 32 changed files with 545 additions and 50 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
[Zone configurations]({% link {{ page.version.version }}/configure-replication-zones.md %}) present on the destination cluster prior to a restore will be **overwritten** during a [cluster restore]({% link {{ page.version.version }}/restore.md %}#full-cluster) with the zone configurations from the [backed up cluster]({% link {{ page.version.version }}/backup.md %}#back-up-a-cluster). If there were no customized zone configurations on the cluster when the backup was taken, then after the restore the destination cluster will use the zone configuration from the [`RANGE DEFAULT` configuration]({% link {{ page.version.version }}/configure-replication-zones.md %}#view-the-default-replication-zone).
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
For instructions showing how to troubleshoot replication zones, see [Troubleshoot Replication Zone Configurations]({% link {{ page.version.version}}/troubleshoot-replication-zones.md %}).
6 changes: 6 additions & 0 deletions src/current/_includes/v24.3/sidebar-data/troubleshooting.json
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,12 @@
"/${VERSION}/query-replication-reports.html"
]
},
{
"title": "Troubleshoot Replication Zones",
"urls": [
"/${VERSION}/troubleshoot-replication-zones.html"
]
},
{
"title": "Benchmarking",
"items": [
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
Cockroach Labs {% if page.name != "configure-replication-zones.md" %} [does not recommend modifying zone configurations manually]({% link {{ page.version.version }}/configure-replication-zones.md %}#why-manual-zone-config-management-is-not-recommended) {% else %} [does not recommend modifying zone configurations manually](#why-manual-zone-config-management-is-not-recommended) {% endif %}.

Most users should use [Multi-region SQL statements]({% link {{ page.version.version }}/multiregion-overview.md %}) instead; if additional control is needed, [Zone config extensions]({% link {{ page.version.version }}/zone-config-extensions.md %}) can be used to augment the multi-region SQL statements.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
19 changes: 19 additions & 0 deletions src/current/v24.3/alter-database.md
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,10 @@ For usage, see [Synopsis](#synopsis).
If you directly change a database's zone configuration with `ALTER DATABASE ... CONFIGURE ZONE`, CockroachDB will block all [`ALTER DATABASE ... SET PRIMARY REGION`](#set-primary-region) statements on the database.
{{site.data.alerts.end}}

{{site.data.alerts.callout_danger}}
{% include {{ page.version.version }}/zone-configs/avoid-manual-zone-configs.md %}
{{site.data.alerts.end}}

You can use *replication zones* to control the number and location of replicas for specific sets of data, both when replicas are first added and when they are rebalanced to maintain cluster equilibrium.

For examples, see [Replication Controls](#configure-replication-zones).
Expand Down Expand Up @@ -689,6 +693,10 @@ HINT: you must first drop super region usa before you can drop the region us-wes

### Configure replication zones

{{site.data.alerts.callout_danger}}
{% include {{ page.version.version }}/zone-configs/avoid-manual-zone-configs.md %}
{{site.data.alerts.end}}

{% include {{ page.version.version }}/sql/movr-statements-geo-partitioned-replicas.md %}

#### Create a replication zone for a database
Expand All @@ -715,6 +723,10 @@ You cannot `DISCARD` any zone configurations on multi-region tables, indexes, or
ALTER DATABASE movr CONFIGURE ZONE DISCARD;
~~~

### Troubleshoot replication zones

{% include {{ page.version.version }}/see-zone-config-troubleshooting-guide.md %}

### Use Zone Config Extensions

The following examples show:
Expand Down Expand Up @@ -1078,6 +1090,12 @@ When you discard a zone configuration, the objects it was applied to will then i
However, this statement will not remove any configuration created by the [multi-region abstractions]({% link {{ page.version.version }}/multiregion-overview.md %}).
{{site.data.alerts.end}}

#### Troubleshoot Zone Config Extensions

The process for troubleshooting Zone Config Extensions is the same as troubleshooting any other changes to zone configs.

{% include {{ page.version.version }}/see-zone-config-troubleshooting-guide.md %}

### Change database owner

{% include {{page.version.version}}/sql/movr-statements.md %}
Expand Down Expand Up @@ -1283,3 +1301,4 @@ For more information about the region survival goal, see [Surviving region failu
- [`ALTER TABLE`]({% link {{ page.version.version }}/alter-table.md %})
- [Online Schema Changes]({% link {{ page.version.version }}/online-schema-changes.md %})
- [SQL Statements]({% link {{ page.version.version }}/sql-statements.md %})
- [Troubleshoot Replication Zones]({% link {{ page.version.version}}/troubleshoot-replication-zones.md %})
8 changes: 6 additions & 2 deletions src/current/v24.3/alter-index.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,12 +47,12 @@ Subcommand | Description |

`ALTER INDEX ... CONFIGURE ZONE` is used to add, modify, reset, or remove replication zones for an index. To view details about existing replication zones, use [`SHOW ZONE CONFIGURATIONS`]({% link {{ page.version.version }}/show-zone-configurations.md %}). For more information about replication zones, see [Replication Controls]({% link {{ page.version.version }}/configure-replication-zones.md %}).



You can use *replication zones* to control the number and location of replicas for specific sets of data, both when replicas are first added and when they are rebalanced to maintain cluster equilibrium.

For examples, see [Replication Controls](#configure-replication-zones).

{% include {{ page.version.version }}/see-zone-config-troubleshooting-guide.md %}

#### Required privileges

The user must be a member of the [`admin` role]({% link {{ page.version.version }}/security-reference/authorization.md %}#admin-role) or have been granted [`CREATE`]({% link {{ page.version.version }}/security-reference/authorization.md %}#supported-privileges) or [`ZONECONFIG`]({% link {{ page.version.version }}/security-reference/authorization.md %}#supported-privileges) privileges. To configure [`system` objects]({% link {{ page.version.version }}/configure-replication-zones.md %}#for-system-data), the user must be a member of the `admin` role.
Expand Down Expand Up @@ -225,6 +225,10 @@ You cannot `DISCARD` any zone configurations on multi-region tables, indexes, or
ALTER INDEX vehicles@vehicles_auto_index_fk_city_ref_users CONFIGURE ZONE DISCARD;
~~~

#### Troubleshoot replication zones

{% include {{ page.version.version }}/see-zone-config-troubleshooting-guide.md %}

### Define partitions

#### Define a list partition on an index
Expand Down
10 changes: 10 additions & 0 deletions src/current/v24.3/alter-partition.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ docs_area: reference.sql

To view details about existing replication zones, use [`SHOW ZONE CONFIGURATIONS`]({% link {{ page.version.version }}/show-zone-configurations.md %}). For more information about replication zones, see [Replication Controls]({% link {{ page.version.version }}/configure-replication-zones.md %}).

{% include {{ page.version.version }}/see-zone-config-troubleshooting-guide.md %}

You can use *replication zones* to control the number and location of replicas for specific sets of data, both when replicas are first added and when they are rebalanced to maintain cluster equilibrium.


Expand Down Expand Up @@ -44,3 +46,11 @@ The user must have the [`CREATE`]({% link {{ page.version.version }}/grant.md %}
### Create a replication zone for a partition

{% include {{ page.version.version }}/zone-configs/create-a-replication-zone-for-a-table-partition.md hide-enterprise-warning="true" %}

{% include {{ page.version.version }}/see-zone-config-troubleshooting-guide.md %}

## See also

- [Table partitioning]({% link {{page.version.version}}/partitioning.md %})
- [`SHOW PARTITIONS`]({% link {{page.version.version}}/show-partitions.md %})
- [Troubleshoot Replication Zones]({% link {{ page.version.version}}/troubleshoot-replication-zones.md %})
16 changes: 9 additions & 7 deletions src/current/v24.3/alter-range.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,9 +34,11 @@ Additional parameters are documented for the respective [subcommands](#subcomman

### `CONFIGURE ZONE`

`ALTER RANGE ... CONFIGURE ZONE` is used to add, modify, reset, or remove replication zones for a range. To view details about existing replication zones, see [`SHOW ZONE CONFIGURATIONS`]({% link {{ page.version.version }}/show-zone-configurations.md %}).
`ALTER RANGE ... CONFIGURE ZONE` is used to add, modify, reset, or remove [replication zones]({% link {{ page.version.version }}/configure-replication-zones.md %}) for a range. To view details about existing replication zones, see [`SHOW ZONE CONFIGURATIONS`]({% link {{ page.version.version }}/show-zone-configurations.md %}).

You can use *replication zones* to control the number and location of replicas for specific sets of data, both when replicas are first added and when they are rebalanced to maintain cluster equilibrium.
You can use replication zones to control the number and location of replicas for specific sets of data, both when replicas are first added and when they are rebalanced to maintain cluster equilibrium.

{% include {{ page.version.version }}/see-zone-config-troubleshooting-guide.md %}

#### Required privileges

Expand Down Expand Up @@ -121,7 +123,7 @@ For example, to get all range IDs, leaseholder store IDs, and leaseholder locali

{% include_cached copy-clipboard.html %}
~~~ sql
WITH user_info AS (SHOW RANGES FROM TABLE users) SELECT range_id, lease_holder, lease_holder_locality FROM user_info;
WITH user_info AS (SHOW RANGES FROM TABLE users WITH DETAILS) SELECT range_id, lease_holder, lease_holder_locality FROM user_info;
~~~

~~~
Expand Down Expand Up @@ -163,7 +165,7 @@ To move the leases for all data in the [`movr.users`]({% link {{ page.version.ve

{% include_cached copy-clipboard.html %}
~~~ sql
ALTER RANGE RELOCATE LEASE TO 2 FOR SELECT range_id from crdb_internal.ranges where table_name = 'users'
ALTER RANGE RELOCATE LEASE TO 2 FOR SELECT range_id from [SHOW RANGES FROM TABLE users WITH DETAILS];
~~~

~~~
Expand Down Expand Up @@ -205,7 +207,7 @@ To move the replicas for all data in the [`movr.users`]({% link {{ page.version.

{% include_cached copy-clipboard.html %}
~~~ sql
ALTER RANGE RELOCATE FROM 2 TO 7 FOR SELECT range_id from crdb_internal.ranges where table_name = 'users';
ALTER RANGE RELOCATE FROM 2 TO 7 FOR SELECT range_id from [SHOW RANGES FROM TABLE users WITH DETAILS];
~~~

~~~
Expand All @@ -231,7 +233,7 @@ To move all of a range's voting replicas from one store to another store:

{% include_cached copy-clipboard.html %}
~~~ sql
ALTER RANGE RELOCATE VOTERS FROM 7 TO 2 FOR SELECT range_id from crdb_internal.ranges where table_name = 'users';
ALTER RANGE RELOCATE VOTERS FROM 7 TO 2 FOR SELECT range_id from [SHOW RANGES FROM TABLE users WITH DETAILS];
~~~

~~~
Expand Down Expand Up @@ -261,7 +263,7 @@ This statement will only have an effect on clusters that have non-voting replica

{% include_cached copy-clipboard.html %}
~~~ sql
ALTER RANGE RELOCATE NONVOTERS FROM 7 TO 2 FOR SELECT range_id from crdb_internal.ranges where table_name = 'users';
ALTER RANGE RELOCATE NONVOTERS FROM 7 TO 2 FOR SELECT range_id from [SHOW RANGES FROM TABLE users WITH DETAILS];
~~~

~~~
Expand Down
4 changes: 2 additions & 2 deletions src/current/v24.3/alter-table.md
Original file line number Diff line number Diff line change
Expand Up @@ -223,6 +223,8 @@ You can use *replication zones* to control the number and location of replicas f

For examples, see [Replication Controls](#configure-replication-zones).

{% include {{ page.version.version }}/see-zone-config-troubleshooting-guide.md %}

#### Required privileges

The user must be a member of the [`admin` role]({% link {{ page.version.version }}/security-reference/authorization.md %}#admin-role) or have been granted [`CREATE`]({% link {{ page.version.version }}/security-reference/authorization.md %}#supported-privileges) or [`ZONECONFIG`]({% link {{ page.version.version }}/security-reference/authorization.md %}#supported-privileges) privileges. To configure [`system` objects]({% link {{ page.version.version }}/configure-replication-zones.md %}#for-system-data), the user must be a member of the `admin` role.
Expand Down Expand Up @@ -358,8 +360,6 @@ For usage, see [Synopsis](#synopsis).

`ALTER TABLE ... PARTITION BY` is used to partition, re-partition, or un-partition a table. After defining partitions, [`CONFIGURE ZONE`](#configure-zone) is used to control the replication and placement of partitions.



For examples, see [Define partitions](#define-partitions).

#### Parameters
Expand Down
3 changes: 2 additions & 1 deletion src/current/v24.3/backup.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,10 +33,10 @@ To view the contents of an backup created with the `BACKUP` statement, use [`SHO
## Considerations

- [Full cluster backups](#back-up-a-cluster) include [license keys]({% link {{ page.version.version }}/licensing-faqs.md %}#set-a-license). When you [restore]({% link {{ page.version.version }}/restore.md %}) a full cluster backup that includes a license, the license is also restored.
- [Zone configurations]({% link {{ page.version.version }}/configure-replication-zones.md %}) present on the destination cluster prior to a restore will be **overwritten** during a [cluster restore]({% link {{ page.version.version }}/restore.md %}#full-cluster) with the zone configurations from the [backed up cluster](#back-up-a-cluster). If there were no customized zone configurations on the cluster when the backup was taken, then after the restore the destination cluster will use the zone configuration from the [`RANGE DEFAULT` configuration]({% link {{ page.version.version }}/configure-replication-zones.md %}#view-the-default-replication-zone).
- You cannot restore a backup of a multi-region database into a single-region database.
- Exclude a table's row data from a backup using the [`exclude_data_from_backup`]({% link {{ page.version.version }}/take-full-and-incremental-backups.md %}#exclude-a-tables-data-from-backups) parameter.
- `BACKUP` is a blocking statement. To run a backup job asynchronously, use the `DETACHED` option. See the [options](#options) below.
- {% include {{ page.version.version }}/backups/zone-configs-overwritten-during-restore.md %}

### Storage considerations

Expand Down Expand Up @@ -378,3 +378,4 @@ To use an external connection URI to back up to cloud storage with an associated
- [`CREATE SCHEDULE FOR BACKUP`]({% link {{ page.version.version }}/create-schedule-for-backup.md %})
- [`RESTORE`]({% link {{ page.version.version }}/restore.md %})
- [Replication Controls]({% link {{ page.version.version }}/configure-replication-zones.md %})
- [Troubleshoot Replication Zones]({% link {{ page.version.version}}/troubleshoot-replication-zones.md %})
2 changes: 1 addition & 1 deletion src/current/v24.3/cluster-api.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ Endpoint | Name | Description | Support
[`/databases/{database}`](https://cockroachlabs.com/docs/api/cluster/v2.html#operation/databaseDetails) | Get database details | Get the descriptor ID of a specified database. | Stable
[`/databases/{database}/grants`](https://cockroachlabs.com/docs/api/cluster/v2.html#operation/databaseGrants) | List database grants | List all [privileges](security-reference/authorization.html#managing-privileges) granted to users for a specified database. | Stable
[`/databases/{database}/tables`](https://cockroachlabs.com/docs/api/cluster/v2.html#operation/databaseTables) | List database tables | List all tables in a specified database. | Stable
[`/databases/{database}/tables/{table}`](https://cockroachlabs.com/docs/api/cluster/v2.html#operation/tableDetails) | Get table details | Get details on a specified table, including schema, grants, indexes, range count, and zone configuration. | Stable
[`/databases/{database}/tables/{table}`](https://cockroachlabs.com/docs/api/cluster/v2.html#operation/tableDetails) | Get table details | Get details on a specified table, including schema, grants, indexes, range count, and [zone configurations]({% link {{ page.version.version }}/configure-replication-zones.md %}). | Stable
[`/events`](https://cockroachlabs.com/docs/api/cluster/v2.html#operation/listEvents) | List events | List the latest [events](eventlog.html) on the cluster, in descending order. | Unstable
[`/health`](https://cockroachlabs.com/docs/api/cluster/v2.html#operation/health) | Check node health | Determine if the node is running and ready to accept SQL connections. | Stable
[`/nodes`](https://cockroachlabs.com/docs/api/cluster/v2.html#operation/listNodes) | List nodes | Get details on all nodes in the cluster, including node IDs, software versions, and hardware. | Stable
Expand Down
24 changes: 12 additions & 12 deletions src/current/v24.3/cluster-setup-troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -587,6 +587,18 @@ If you still see under-replicated/unavailable ranges on the Cluster Overview pag
1. To view the **Range Report** for a range, click on the range number in the **Under-replicated (or slow)** table or **Unavailable** table.
1. On the Range Report page, scroll down to the **Simulated Allocator Output** section. The table contains an error message which explains the reason for the under-replicated range. Follow the guidance in the message to resolve the issue. If you need help understanding the error or the guidance, [file an issue]({% link {{ page.version.version }}/file-an-issue.md %}). Please be sure to include the full Range Report and error message when you submit the issue.
#### Check for under-replicated or unavailable data
To see if any data is under-replicated or unavailable in your cluster, follow the steps described in [Critical nodes endpoint]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#critical-nodes-endpoint).
#### Check for replication zone constraint violations
To see if any of your cluster's [data placement constraints]({% link {{ page.version.version }}/configure-replication-zones.md %}#replication-constraints) are being violated, follow the steps described in [Troubleshoot Replication Zones]({% link {{ page.version.version}}/troubleshoot-replication-zones.md %}).
#### Check for critical localities
To see which of your [localities]({% link {{ page.version.version }}/cockroach-start.md %}#locality) (if any) are critical, follow the steps described in the [Critical nodes endpoint documentation]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#critical-nodes-endpoint). A locality is "critical" for a range if all of the nodes in that locality becoming [unreachable](#node-liveness-issues) would cause the range to become unavailable. In other words, the locality contains a majority of the range's replicas.
## Node liveness issues
"Node liveness" refers to whether a node in your cluster has been determined to be "dead" or "alive" by the rest of the cluster. This is achieved using checks that ensure that each node connected to the cluster is updating its liveness record. This information is shared with the rest of the cluster using an internal gossip protocol.
Expand Down Expand Up @@ -633,18 +645,6 @@ If your cluster is in a partially-available state due to a recent node or networ
Even with `server.eventlog.enabled` set to `false`, notable log events are still sent to configured [log sinks]({% link {{ page.version.version }}/configure-logs.md %}#configure-log-sinks) as usual.
## Check for under-replicated or unavailable data
To see if any data is under-replicated or unavailable in your cluster, follow the steps described in [Replication Reports]({% link {{ page.version.version }}/query-replication-reports.md %}).
## Check for replication zone constraint violations
To see if any of your cluster's [data placement constraints]({% link {{ page.version.version }}/configure-replication-zones.md %}#replication-constraints) are being violated, follow the steps described in [Replication Reports]({% link {{ page.version.version }}/query-replication-reports.md %}).
## Check for critical localities
To see which of your [localities]({% link {{ page.version.version }}/cockroach-start.md %}#locality) (if any) are critical, follow the steps described in [Replication Reports]({% link {{ page.version.version }}/query-replication-reports.md %}). A locality is "critical" for a range if all of the nodes in that locality becoming [unreachable](#node-liveness-issues) would cause the range to become unavailable. In other words, the locality contains a majority of the range's replicas.
## Something else?
If we do not have a solution here, you can try using our other [support resources]({% link {{ page.version.version }}/support-resources.md %}), including:
Expand Down
Loading

0 comments on commit d460824

Please sign in to comment.