From 40c5d5726183342257bffa98198768861e0c8011 Mon Sep 17 00:00:00 2001 From: Ti Chi Robot Date: Tue, 21 Nov 2023 16:38:41 +0800 Subject: [PATCH] upgrade using tiup: add FAQ for concurrent DDL (#15348) (#15399) --- upgrade-tidb-using-tiup.md | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/upgrade-tidb-using-tiup.md b/upgrade-tidb-using-tiup.md index daef2740e33bf..d42461ee0bd6c 100644 --- a/upgrade-tidb-using-tiup.md +++ b/upgrade-tidb-using-tiup.md @@ -23,6 +23,7 @@ This document is targeted for the following upgrade paths: > **Note:** > > - If your cluster to be upgraded is v3.1 or an earlier version (v3.0 or v2.1), the direct upgrade to v7.4.0 is not supported. You need to upgrade your cluster first to v4.0 and then to v7.4.0. +> - If your cluster to be upgraded is earlier than v6.2, the upgrade might get stuck when you upgrade the cluster to v6.2 or later versions in some scenarios. You can refer to [How to fix the issue](#how-to-fix-the-issue-that-the-upgrade-gets-stuck-when-upgrading-to-v620-or-later-versions). > - TiDB nodes use the value of the [`server-version`](/tidb-configuration-file.md#server-version) configuration item to verify the current TiDB version. Therefore, to avoid unexpected behaviors, before upgrading the TiDB cluster, you need to set the value of `server-version` to empty or the real version of the current TiDB cluster. ## Upgrade caveat @@ -271,6 +272,30 @@ Re-execute the `tiup cluster upgrade` command to resume the upgrade. The upgrade tiup cluster replay ``` +### How to fix the issue that the upgrade gets stuck when upgrading to v6.2.0 or later versions? + +Starting from v6.2.0, TiDB enables the [concurrent DDL framework](/ddl-introduction.md#how-the-online-ddl-asynchronous-change-works-in-tidb) by default to execute concurrent DDLs. This framework changes the DDL job storage from a KV queue to a table queue. This change might cause the upgrade to get stuck in some scenarios. The following are some scenarios that might trigger this issue and the corresponding solutions: + +- Upgrade gets stuck due to plugin loading + + During the upgrade, loading certain plugins that require executing DDL statements might cause the upgrade to get stuck. + + **Solution**: avoid loading plugins during the upgrade. Instead, load plugins only after the upgrade is completed. + +- Upgrade gets stuck due to using the `kill -9` command for offline upgrade + + - Precautions: avoid using the `kill -9` command to perform the offline upgrade. If it is necessary, restart the new version TiDB node after 2 minutes. + - If the upgrade is already stuck, restart the affected TiDB node. If the issue has just occurred, it is recommended to restart the node after 2 minutes. + +- Upgrade gets stuck due to DDL Owner change + + In multi-instance scenarios, network or hardware failures might cause DDL Owner change. If there are unfinished DDL statements in the upgrade phase, the upgrade might get stuck. + + **Solution**: + + 1. Terminate the stuck TiDB node (avoid using `kill -9`). + 2. Restart the new version TiDB node. + ### The evict leader has waited too long during the upgrade. How to skip this step for a quick upgrade? You can specify `--force`. Then the processes of transferring PD leader and evicting TiKV leader are skipped during the upgrade. The cluster is directly restarted to update the version, which has a great impact on the cluster that runs online. In the following command, `` is the version to upgrade to, such as `v7.4.0`.