Skip to content

Commit

Permalink
Add zone config troubleshooting guide v1
Browse files Browse the repository at this point in the history
Fixes DOC-9210

Summary of changes:

- Add Zone Config troubleshooting guide

  - a.k.a. Chapter 3 of _The ZoneConfigonomicon (tm)_ (in this version
    of events, the existing 'Replication Controls' page is Chapter 1,
    and 'Zone Config Extensions' is Chapter 2)

- Update 'Replication controls' page with more detailed info re: zone
  config inheritance hierarchy and behavior

- Fix incorrect ALTER RANGE statements since they're needed to map range
  IDs from critical nodes endpoint (mentioned in troubleshooting guide)
  to actual schema objects

- Add links from various zone config-related pages to the new
  troubleshooting guide

- Add a note to various zone config-related pages saying "most users
  should not do manual zone config changes, see Multi-region SQL and
  Zone Config Extensions instead"
  • Loading branch information
rmloveland committed Jan 15, 2025
1 parent 6fcf900 commit 674915d
Show file tree
Hide file tree
Showing 30 changed files with 340 additions and 42 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
[Zone configurations]({% link {{ page.version.version }}/configure-replication-zones.md %}) present on the destination cluster prior to a restore will be **overwritten** during a [cluster restore]({% link {{ page.version.version }}/restore.md %}#full-cluster) with the zone configurations from the [backed up cluster]({% link {{ page.version.version }}/backup.md %}#back-up-a-cluster). If there were no customized zone configurations on the cluster when the backup was taken, then after the restore the destination cluster will use the zone configuration from the [`RANGE DEFAULT` configuration]({% link {{ page.version.version }}/configure-replication-zones.md %}#view-the-default-replication-zone).
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
For instructions showing how to troubleshoot replication zones, see [Troubleshoot Replication Zone Configurations]({% link {{ page.version.version}}/troubleshoot-replication-zones.md %}).
6 changes: 6 additions & 0 deletions src/current/_includes/v24.3/sidebar-data/troubleshooting.json
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,12 @@
"/${VERSION}/query-replication-reports.html"
]
},
{
"title": "Troubleshoot Replication Zone Configurations",
"urls": [
"/${VERSION}/troubleshoot-replication-zones.html"
]
},
{
"title": "Benchmarking",
"items": [
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
Cockroach Labs does not recommend adding zone configurations manually, for the following reasons:

- It is easy to introduce logic errors and end up in a state where your replication is not behaving as it "should be".
- It is not easy to do proper change management and auditing of manually altered zone configurations.
- Manual zone config modifications are managed by the user with no help from the system and must be fully overwritten on each configuration change in order to take effect; this introduces another avenue for error.

For these reasons, most users should use [Multi-region SQL statements]({% link {{ page.version.version }}/multiregion-overview.md %}) instead; if additional control is needed, [Zone config extensions]({% link {{ page.version.version }}/zone-config-extensions.md %}) can be used to augment the multi-region SQL statements.
19 changes: 19 additions & 0 deletions src/current/v24.3/alter-database.md
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,10 @@ For usage, see [Synopsis](#synopsis).
If you directly change a database's zone configuration with `ALTER DATABASE ... CONFIGURE ZONE`, CockroachDB will block all [`ALTER DATABASE ... SET PRIMARY REGION`](#set-primary-region) statements on the database.
{{site.data.alerts.end}}

{{site.data.alerts.callout_danger}}
{% include {{ page.version.version }}/zone-configs/avoid-manual-zone-configs.md %}
{{site.data.alerts.end}}

You can use *replication zones* to control the number and location of replicas for specific sets of data, both when replicas are first added and when they are rebalanced to maintain cluster equilibrium.

For examples, see [Replication Controls](#configure-replication-zones).
Expand Down Expand Up @@ -689,6 +693,10 @@ HINT: you must first drop super region usa before you can drop the region us-wes

### Configure replication zones

{{site.data.alerts.callout_danger}}
{% include {{ page.version.version }}/zone-configs/avoid-manual-zone-configs.md %}
{{site.data.alerts.end}}

{% include {{ page.version.version }}/sql/movr-statements-geo-partitioned-replicas.md %}

#### Create a replication zone for a database
Expand All @@ -715,6 +723,10 @@ You cannot `DISCARD` any zone configurations on multi-region tables, indexes, or
ALTER DATABASE movr CONFIGURE ZONE DISCARD;
~~~

### Troubleshoot replication zones

{% include {{ page.version.version }}/see-zone-config-troubleshooting-guide.md %}

### Use Zone Config Extensions

The following examples show:
Expand Down Expand Up @@ -1078,6 +1090,12 @@ When you discard a zone configuration, the objects it was applied to will then i
However, this statement will not remove any configuration created by the [multi-region abstractions]({% link {{ page.version.version }}/multiregion-overview.md %}).
{{site.data.alerts.end}}

#### Troubleshoot Zone Config Extensions

The process for troubleshooting Zone Config Extensions is the same as troubleshooting any other changes to zone configs.

{% include {{ page.version.version }}/see-zone-config-troubleshooting-guide.md %}

### Change database owner

{% include {{page.version.version}}/sql/movr-statements.md %}
Expand Down Expand Up @@ -1283,3 +1301,4 @@ For more information about the region survival goal, see [Surviving region failu
- [`ALTER TABLE`]({% link {{ page.version.version }}/alter-table.md %})
- [Online Schema Changes]({% link {{ page.version.version }}/online-schema-changes.md %})
- [SQL Statements]({% link {{ page.version.version }}/sql-statements.md %})
- [Troubleshoot Replication Zone Configurations]({% link {{ page.version.version}}/troubleshoot-replication-zones.md %})
4 changes: 4 additions & 0 deletions src/current/v24.3/alter-index.md
Original file line number Diff line number Diff line change
Expand Up @@ -225,6 +225,10 @@ You cannot `DISCARD` any zone configurations on multi-region tables, indexes, or
ALTER INDEX vehicles@vehicles_auto_index_fk_city_ref_users CONFIGURE ZONE DISCARD;
~~~

#### Troubleshoot replication zones

{% include {{ page.version.version }}/see-zone-config-troubleshooting-guide.md %}

### Define partitions

#### Define a list partition on an index
Expand Down
8 changes: 8 additions & 0 deletions src/current/v24.3/alter-partition.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ docs_area: reference.sql

To view details about existing replication zones, use [`SHOW ZONE CONFIGURATIONS`]({% link {{ page.version.version }}/show-zone-configurations.md %}). For more information about replication zones, see [Replication Controls]({% link {{ page.version.version }}/configure-replication-zones.md %}).

{% include {{ page.version.version }}/see-zone-config-troubleshooting-guide.md %}

You can use *replication zones* to control the number and location of replicas for specific sets of data, both when replicas are first added and when they are rebalanced to maintain cluster equilibrium.


Expand Down Expand Up @@ -44,3 +46,9 @@ The user must have the [`CREATE`]({% link {{ page.version.version }}/grant.md %}
### Create a replication zone for a partition

{% include {{ page.version.version }}/zone-configs/create-a-replication-zone-for-a-table-partition.md hide-enterprise-warning="true" %}

## See also

- [Table partitioning]({% link {{page.version.version}}/partitioning.md %})
- [`SHOW PARTITIONS`]({% link {{page.version.version}}/show-partitions.md %})
- [Troubleshoot Replication Zone Configurations]({% link {{ page.version.version}}/troubleshoot-replication-zones.md %})
14 changes: 7 additions & 7 deletions src/current/v24.3/alter-range.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,9 +34,9 @@ Additional parameters are documented for the respective [subcommands](#subcomman

### `CONFIGURE ZONE`

`ALTER RANGE ... CONFIGURE ZONE` is used to add, modify, reset, or remove replication zones for a range. To view details about existing replication zones, see [`SHOW ZONE CONFIGURATIONS`]({% link {{ page.version.version }}/show-zone-configurations.md %}).
`ALTER RANGE ... CONFIGURE ZONE` is used to add, modify, reset, or remove [replication zones]({% link {{ page.version.version }}/configure-replication-zones.md %}) for a range. To view details about existing replication zones, see [`SHOW ZONE CONFIGURATIONS`]({% link {{ page.version.version }}/show-zone-configurations.md %}).

You can use *replication zones* to control the number and location of replicas for specific sets of data, both when replicas are first added and when they are rebalanced to maintain cluster equilibrium.
You can use replication zones to control the number and location of replicas for specific sets of data, both when replicas are first added and when they are rebalanced to maintain cluster equilibrium.

#### Required privileges

Expand Down Expand Up @@ -121,7 +121,7 @@ For example, to get all range IDs, leaseholder store IDs, and leaseholder locali

{% include_cached copy-clipboard.html %}
~~~ sql
WITH user_info AS (SHOW RANGES FROM TABLE users) SELECT range_id, lease_holder, lease_holder_locality FROM user_info;
WITH user_info AS (SHOW RANGES FROM TABLE users WITH DETAILS) SELECT range_id, lease_holder, lease_holder_locality FROM user_info;
~~~

~~~
Expand Down Expand Up @@ -163,7 +163,7 @@ To move the leases for all data in the [`movr.users`]({% link {{ page.version.ve

{% include_cached copy-clipboard.html %}
~~~ sql
ALTER RANGE RELOCATE LEASE TO 2 FOR SELECT range_id from crdb_internal.ranges where table_name = 'users'
ALTER RANGE RELOCATE LEASE TO 2 FOR SELECT range_id from [SHOW RANGES FROM TABLE users WITH DETAILS];
~~~

~~~
Expand Down Expand Up @@ -205,7 +205,7 @@ To move the replicas for all data in the [`movr.users`]({% link {{ page.version.

{% include_cached copy-clipboard.html %}
~~~ sql
ALTER RANGE RELOCATE FROM 2 TO 7 FOR SELECT range_id from crdb_internal.ranges where table_name = 'users';
ALTER RANGE RELOCATE FROM 2 TO 7 FOR SELECT range_id from [SHOW RANGES FROM TABLE users WITH DETAILS];
~~~

~~~
Expand All @@ -231,7 +231,7 @@ To move all of a range's voting replicas from one store to another store:

{% include_cached copy-clipboard.html %}
~~~ sql
ALTER RANGE RELOCATE VOTERS FROM 7 TO 2 FOR SELECT range_id from crdb_internal.ranges where table_name = 'users';
ALTER RANGE RELOCATE VOTERS FROM 7 TO 2 FOR SELECT range_id from [SHOW RANGES FROM TABLE users WITH DETAILS];
~~~

~~~
Expand Down Expand Up @@ -261,7 +261,7 @@ This statement will only have an effect on clusters that have non-voting replica

{% include_cached copy-clipboard.html %}
~~~ sql
ALTER RANGE RELOCATE NONVOTERS FROM 7 TO 2 FOR SELECT range_id from crdb_internal.ranges where table_name = 'users';
ALTER RANGE RELOCATE NONVOTERS FROM 7 TO 2 FOR SELECT range_id from [SHOW RANGES FROM TABLE users WITH DETAILS];
~~~

~~~
Expand Down
2 changes: 2 additions & 0 deletions src/current/v24.3/alter-table.md
Original file line number Diff line number Diff line change
Expand Up @@ -223,6 +223,8 @@ You can use *replication zones* to control the number and location of replicas f

For examples, see [Replication Controls](#configure-replication-zones).

{% include {{ page.version.version }}/see-zone-config-troubleshooting-guide.md %}

#### Required privileges

The user must be a member of the [`admin` role]({% link {{ page.version.version }}/security-reference/authorization.md %}#admin-role) or have been granted [`CREATE`]({% link {{ page.version.version }}/security-reference/authorization.md %}#supported-privileges) or [`ZONECONFIG`]({% link {{ page.version.version }}/security-reference/authorization.md %}#supported-privileges) privileges. To configure [`system` objects]({% link {{ page.version.version }}/configure-replication-zones.md %}#for-system-data), the user must be a member of the `admin` role.
Expand Down
3 changes: 2 additions & 1 deletion src/current/v24.3/backup.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,10 +33,10 @@ To view the contents of an backup created with the `BACKUP` statement, use [`SHO
## Considerations

- [Full cluster backups](#back-up-a-cluster) include [license keys]({% link {{ page.version.version }}/licensing-faqs.md %}#set-a-license). When you [restore]({% link {{ page.version.version }}/restore.md %}) a full cluster backup that includes a license, the license is also restored.
- [Zone configurations]({% link {{ page.version.version }}/configure-replication-zones.md %}) present on the destination cluster prior to a restore will be **overwritten** during a [cluster restore]({% link {{ page.version.version }}/restore.md %}#full-cluster) with the zone configurations from the [backed up cluster](#back-up-a-cluster). If there were no customized zone configurations on the cluster when the backup was taken, then after the restore the destination cluster will use the zone configuration from the [`RANGE DEFAULT` configuration]({% link {{ page.version.version }}/configure-replication-zones.md %}#view-the-default-replication-zone).
- You cannot restore a backup of a multi-region database into a single-region database.
- Exclude a table's row data from a backup using the [`exclude_data_from_backup`]({% link {{ page.version.version }}/take-full-and-incremental-backups.md %}#exclude-a-tables-data-from-backups) parameter.
- `BACKUP` is a blocking statement. To run a backup job asynchronously, use the `DETACHED` option. See the [options](#options) below.
- {% include {{ page.version.version }}/backups/zone-configs-overwritten-during-restore.md %}

### Storage considerations

Expand Down Expand Up @@ -378,3 +378,4 @@ To use an external connection URI to back up to cloud storage with an associated
- [`CREATE SCHEDULE FOR BACKUP`]({% link {{ page.version.version }}/create-schedule-for-backup.md %})
- [`RESTORE`]({% link {{ page.version.version }}/restore.md %})
- [Replication Controls]({% link {{ page.version.version }}/configure-replication-zones.md %})
- [Troubleshoot Replication Zone Configurations]({% link {{ page.version.version}}/troubleshoot-replication-zones.md %})
2 changes: 1 addition & 1 deletion src/current/v24.3/cluster-api.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ Endpoint | Name | Description | Support
[`/databases/{database}`](https://cockroachlabs.com/docs/api/cluster/v2.html#operation/databaseDetails) | Get database details | Get the descriptor ID of a specified database. | Stable
[`/databases/{database}/grants`](https://cockroachlabs.com/docs/api/cluster/v2.html#operation/databaseGrants) | List database grants | List all [privileges](security-reference/authorization.html#managing-privileges) granted to users for a specified database. | Stable
[`/databases/{database}/tables`](https://cockroachlabs.com/docs/api/cluster/v2.html#operation/databaseTables) | List database tables | List all tables in a specified database. | Stable
[`/databases/{database}/tables/{table}`](https://cockroachlabs.com/docs/api/cluster/v2.html#operation/tableDetails) | Get table details | Get details on a specified table, including schema, grants, indexes, range count, and zone configuration. | Stable
[`/databases/{database}/tables/{table}`](https://cockroachlabs.com/docs/api/cluster/v2.html#operation/tableDetails) | Get table details | Get details on a specified table, including schema, grants, indexes, range count, and [zone configurations]({% link {{ page.version.version }}/configure-replication-zones.md %}). | Stable
[`/events`](https://cockroachlabs.com/docs/api/cluster/v2.html#operation/listEvents) | List events | List the latest [events](eventlog.html) on the cluster, in descending order. | Unstable
[`/health`](https://cockroachlabs.com/docs/api/cluster/v2.html#operation/health) | Check node health | Determine if the node is running and ready to accept SQL connections. | Stable
[`/nodes`](https://cockroachlabs.com/docs/api/cluster/v2.html#operation/listNodes) | List nodes | Get details on all nodes in the cluster, including node IDs, software versions, and hardware. | Stable
Expand Down
24 changes: 12 additions & 12 deletions src/current/v24.3/cluster-setup-troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -587,6 +587,18 @@ If you still see under-replicated/unavailable ranges on the Cluster Overview pag
1. To view the **Range Report** for a range, click on the range number in the **Under-replicated (or slow)** table or **Unavailable** table.
1. On the Range Report page, scroll down to the **Simulated Allocator Output** section. The table contains an error message which explains the reason for the under-replicated range. Follow the guidance in the message to resolve the issue. If you need help understanding the error or the guidance, [file an issue]({% link {{ page.version.version }}/file-an-issue.md %}). Please be sure to include the full Range Report and error message when you submit the issue.
#### Check for under-replicated or unavailable data
To see if any data is under-replicated or unavailable in your cluster, follow the steps described in [Critical nodes endpoint]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#critical-nodes-endpoint).
#### Check for replication zone constraint violations
To see if any of your cluster's [data placement constraints]({% link {{ page.version.version }}/configure-replication-zones.md %}#replication-constraints) are being violated, follow the steps described in [Troubleshoot Replication Zone Configurations]({% link {{ page.version.version}}/troubleshoot-replication-zones.md %}).
#### Check for critical localities
To see which of your [localities]({% link {{ page.version.version }}/cockroach-start.md %}#locality) (if any) are critical, follow the steps described in the [Critical nodes endpoint documentation]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#critical-nodes-endpoint). A locality is "critical" for a range if all of the nodes in that locality becoming [unreachable](#node-liveness-issues) would cause the range to become unavailable. In other words, the locality contains a majority of the range's replicas.
## Node liveness issues
"Node liveness" refers to whether a node in your cluster has been determined to be "dead" or "alive" by the rest of the cluster. This is achieved using checks that ensure that each node connected to the cluster is updating its liveness record. This information is shared with the rest of the cluster using an internal gossip protocol.
Expand Down Expand Up @@ -633,18 +645,6 @@ If your cluster is in a partially-available state due to a recent node or networ
Even with `server.eventlog.enabled` set to `false`, notable log events are still sent to configured [log sinks]({% link {{ page.version.version }}/configure-logs.md %}#configure-log-sinks) as usual.
## Check for under-replicated or unavailable data
To see if any data is under-replicated or unavailable in your cluster, follow the steps described in [Replication Reports]({% link {{ page.version.version }}/query-replication-reports.md %}).
## Check for replication zone constraint violations
To see if any of your cluster's [data placement constraints]({% link {{ page.version.version }}/configure-replication-zones.md %}#replication-constraints) are being violated, follow the steps described in [Replication Reports]({% link {{ page.version.version }}/query-replication-reports.md %}).
## Check for critical localities
To see which of your [localities]({% link {{ page.version.version }}/cockroach-start.md %}#locality) (if any) are critical, follow the steps described in [Replication Reports]({% link {{ page.version.version }}/query-replication-reports.md %}). A locality is "critical" for a range if all of the nodes in that locality becoming [unreachable](#node-liveness-issues) would cause the range to become unavailable. In other words, the locality contains a majority of the range's replicas.
## Something else?
If we do not have a solution here, you can try using our other [support resources]({% link {{ page.version.version }}/support-resources.md %}), including:
Expand Down
3 changes: 2 additions & 1 deletion src/current/v24.3/common-errors.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,7 @@ When running a single-node CockroachDB cluster, an error about replicas failing
E160407 09:53:50.337328 storage/queue.go:511 [replicate] 7 replicas failing with "0 of 1 store with an attribute matching []; likely not enough nodes in cluster"
~~~
This happens because CockroachDB expects three nodes by default. If you do not intend to add additional nodes, you can stop this error by using [`ALTER RANGE ... CONFIGURE ZONE`]({% link {{ page.version.version }}/alter-range.md %}#configure-zone) to update your default zone configuration to expect only one node:
This happens because CockroachDB expects three nodes by default. If you do not intend to add additional nodes, you can stop this error by using [`ALTER RANGE ... CONFIGURE ZONE`]({% link {{ page.version.version }}/alter-range.md %}#configure-zone) to update your default [zone configuration]({% link {{ page.version.version }}/configure-replication-zones.md %}) to expect only one node:
{% include_cached copy-clipboard.html %}
~~~ shell
Expand Down Expand Up @@ -222,3 +222,4 @@ Try searching the rest of our docs for answers or using our other [support resou
- [StackOverflow](http://stackoverflow.com/questions/tagged/cockroachdb)
- [CockroachDB Support Portal](https://support.cockroachlabs.com)
- [Transaction retry error reference]({% link {{ page.version.version }}/transaction-retry-error-reference.md %})
- [Troubleshoot Replication Zone Configurations]({% link {{ page.version.version}}/troubleshoot-replication-zones.md %})
Loading

0 comments on commit 674915d

Please sign in to comment.