From 4f6c1f8601d8dafd0562c8f4b3e298d4f01c00f4 Mon Sep 17 00:00:00 2001 From: Ti Chi Robot Date: Tue, 21 Nov 2023 16:38:41 +0800 Subject: [PATCH] upgrade using tiup: add FAQ for concurrent DDL (#15348) (#15398) --- upgrade-tidb-using-tiup.md | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/upgrade-tidb-using-tiup.md b/upgrade-tidb-using-tiup.md index 20400d0fc8036..ab4ca53bcf429 100644 --- a/upgrade-tidb-using-tiup.md +++ b/upgrade-tidb-using-tiup.md @@ -23,6 +23,7 @@ This document is targeted for the following upgrade paths: > **Note:** > > - If your cluster to be upgraded is v3.1 or an earlier version (v3.0 or v2.1), the direct upgrade to v7.3.0 is not supported. You need to upgrade your cluster first to v4.0 and then to v7.3.0. +> - If your cluster to be upgraded is earlier than v6.2, the upgrade might get stuck when you upgrade the cluster to v6.2 or later versions in some scenarios. You can refer to [How to fix the issue](#how-to-fix-the-issue-that-the-upgrade-gets-stuck-when-upgrading-to-v620-or-later-versions). > - TiDB nodes use the value of the [`server-version`](/tidb-configuration-file.md#server-version) configuration item to verify the current TiDB version. Therefore, to avoid unexpected behaviors, before upgrading the TiDB cluster, you need to set the value of `server-version` to empty or the real version of the current TiDB cluster. ## Upgrade caveat @@ -271,6 +272,30 @@ Re-execute the `tiup cluster upgrade` command to resume the upgrade. The upgrade tiup cluster replay ``` +### How to fix the issue that the upgrade gets stuck when upgrading to v6.2.0 or later versions? + +Starting from v6.2.0, TiDB enables the [concurrent DDL framework](/ddl-introduction.md#how-the-online-ddl-asynchronous-change-works-in-tidb) by default to execute concurrent DDLs. This framework changes the DDL job storage from a KV queue to a table queue. This change might cause the upgrade to get stuck in some scenarios. The following are some scenarios that might trigger this issue and the corresponding solutions: + +- Upgrade gets stuck due to plugin loading + + During the upgrade, loading certain plugins that require executing DDL statements might cause the upgrade to get stuck. + + **Solution**: avoid loading plugins during the upgrade. Instead, load plugins only after the upgrade is completed. + +- Upgrade gets stuck due to using the `kill -9` command for offline upgrade + + - Precautions: avoid using the `kill -9` command to perform the offline upgrade. If it is necessary, restart the new version TiDB node after 2 minutes. + - If the upgrade is already stuck, restart the affected TiDB node. If the issue has just occurred, it is recommended to restart the node after 2 minutes. + +- Upgrade gets stuck due to DDL Owner change + + In multi-instance scenarios, network or hardware failures might cause DDL Owner change. If there are unfinished DDL statements in the upgrade phase, the upgrade might get stuck. + + **Solution**: + + 1. Terminate the stuck TiDB node (avoid using `kill -9`). + 2. Restart the new version TiDB node. + ### The evict leader has waited too long during the upgrade. How to skip this step for a quick upgrade? You can specify `--force`. Then the processes of transferring PD leader and evicting TiKV leader are skipped during the upgrade. The cluster is directly restarted to update the version, which has a great impact on the cluster that runs online. In the following command, `` is the version to upgrade to, such as `v7.3.0`.