diff --git a/TOC.md b/TOC.md index 91bea57eb5fc7..954bab93fb513 100644 --- a/TOC.md +++ b/TOC.md @@ -262,9 +262,9 @@ - [SQL Plan Management](/sql-plan-management.md) - [The Blocklist of Optimization Rules and Expression Pushdown](/blocklist-control-plan.md) - Tutorials - - [Multiple Data Centers in One City Deployment](/multi-data-centers-in-one-city-deployment.md) - - [Three Data Centers in Two Cities Deployment](/three-data-centers-in-two-cities-deployment.md) - - [Two Data Centers in One City Deployment](/two-data-centers-in-one-city-deployment.md) + - [Multiple Availability Zones in One Region Deployment](/multi-data-centers-in-one-city-deployment.md) + - [Three Availability Zones in Two Regions Deployment](/three-data-centers-in-two-cities-deployment.md) + - [Two Availability Zones in One Region Deployment](/two-data-centers-in-one-city-deployment.md) - Read Historical Data - Use Stale Read (Recommended) - [Usage Scenarios of Stale Read](/stale-read.md) diff --git a/media/multi-data-centers-in-one-city-deployment-sample.png b/media/multi-data-centers-in-one-city-deployment-sample.png index 27c68ffe93343..909995da75ae5 100644 Binary files a/media/multi-data-centers-in-one-city-deployment-sample.png and b/media/multi-data-centers-in-one-city-deployment-sample.png differ diff --git a/media/three-data-centers-in-two-cities-deployment-01.png b/media/three-data-centers-in-two-cities-deployment-01.png index 5e4e420284dbc..fce59a054e38a 100644 Binary files a/media/three-data-centers-in-two-cities-deployment-01.png and b/media/three-data-centers-in-two-cities-deployment-01.png differ diff --git a/media/three-data-centers-in-two-cities-deployment-02.png b/media/three-data-centers-in-two-cities-deployment-02.png index af2fd0bd04ff0..ad41a00cd313b 100644 Binary files a/media/three-data-centers-in-two-cities-deployment-02.png and b/media/three-data-centers-in-two-cities-deployment-02.png differ diff --git a/media/three-data-centers-in-two-cities-deployment-03.png b/media/three-data-centers-in-two-cities-deployment-03.png index 2f60f035573e6..5af72de7f721b 100644 Binary files a/media/three-data-centers-in-two-cities-deployment-03.png and b/media/three-data-centers-in-two-cities-deployment-03.png differ diff --git a/multi-data-centers-in-one-city-deployment.md b/multi-data-centers-in-one-city-deployment.md index 637663015ecbc..5f67a00b6040f 100644 --- a/multi-data-centers-in-one-city-deployment.md +++ b/multi-data-centers-in-one-city-deployment.md @@ -1,10 +1,10 @@ --- -title: Multiple Data Centers in One City Deployment -summary: Learn the deployment solution to multi-data centers in one city. +title: Multiple Availability Zones in One Region Deployment +summary: Learn the deployment solution to multiple availability zones in one region. aliases: ['/docs/dev/how-to/deploy/geographic-redundancy/overview/','/docs/dev/geo-redundancy-deployment/','/tidb/dev/geo-redundancy-deployment'] --- -# Multiple Data Centers in One City Deployment +# Multiple Availability Zones in One Region Deployment -As a distributed SQL database, TiDB combines the best features of the traditional relational database and the scalability of the NoSQL database, and is highly available across data centers (DC). This document introduces the deployment of multiple DCs in one city. +As a distributed SQL database, TiDB combines the best features of the traditional relational database and the scalability of the NoSQL database, and is highly available across availability zones (AZs). This document introduces the deployment of multiple AZs in one region. + +The term "region" in this document refers to a geographic area, while the capitalized "Region" refers to a basic unit of data storage in TiKV. "AZ" refers to an isolated location within a region, and each region has multiple AZs. The solution described in this document also applies to the scenario where multiple data centers are located in a single city. ## Raft protocol @@ -27,49 +29,49 @@ To take advantage of Raft's reliability, the following conditions must be met in - Use at least three servers in case one server fails. - Use at least three racks in case one rack fails. -- Use at least three DCs in case one DC fails. -- Deploy TiDB in at least three cities in case data safety issue occurs in one city. +- Use at least three AZs in case one AZ fails. +- Deploy TiDB in at least three regions in case data safety issue occurs in one region. -The native Raft protocol does not have a good support for an even number of replicas. Considering the impact of cross-city network latency, three DCs in the same city might be the most suitable solution to a highly available and disaster tolerant Raft deployment. +The native Raft protocol does not have good support for an even number of replicas. Considering the impact of cross-region network latency, three AZs in the same region might be the most suitable solution to a highly available and disaster tolerant Raft deployment. -## Three DCs in one city deployment +## Three AZs in one region deployment -TiDB clusters can be deployed in three DCs in the same city. In this solution, data replication across the three DCs is implemented using the Raft protocol within the cluster. These three DCs can provide read and write services at the same time. Data consistency is not affected even if one DC fails. +TiDB clusters can be deployed in three AZs in the same region. In this solution, data replication across the three AZs is implemented using the Raft protocol within the cluster. These three AZs can provide read and write services at the same time. Data consistency is not affected even if one AZ fails. ### Simple architecture -TiDB, TiKV and PD are distributed among three DCs, which is the most common deployment with the highest availability. +TiDB, TiKV, and PD are distributed among three AZs, which is the most common deployment with the highest availability. -![3-DC Deployment Architecture](/media/deploy-3dc.png) +![3-AZ Deployment Architecture](/media/deploy-3dc.png) **Advantages:** -- All replicas are distributed among three DCs, with high availability and disaster recovery capability. -- No data will be lost if one DC is down (RPO = 0). -- Even if one DC is down, the other two DCs will automatically start leader election and automatically resume services within a reasonable amount of time (within 20 seconds in most cases). See the following diagram for more information: +- All replicas are distributed among three AZs, with high availability and disaster recovery capability. +- No data will be lost if one AZ is down (RPO = 0). +- Even if one AZ is down, the other two AZs will automatically start leader election and automatically resume services within a certain period (within 20 seconds in most cases). See the following diagram for more information: -![Disaster Recovery for 3-DC Deployment](/media/deploy-3dc-dr.png) +![Disaster Recovery for 3-AZ Deployment](/media/deploy-3dc-dr.png) **Disadvantages:** The performance can be affected by the network latency. -- For writes, all the data has to be replicated to at least 2 DCs. Because TiDB uses 2-phase commit for writes, the write latency is at least twice the latency of the network between two DCs. -- The read performance will also be affected by the network latency if the leader is not in the same DC with the TiDB node that sends the read request. -- Each TiDB transaction needs to obtain TimeStamp Oracle (TSO) from the PD leader. So if the TiDB and PD leaders are not in the same DC, the performance of the transactions will also be affected by the network latency because each transaction with the write request has to obtain TSO twice. +- For writes, all the data has to be replicated to at least two AZs. Because TiDB uses a two-phase commit for writes, the write latency is at least twice the latency of the network between two AZs. +- The read performance will also be affected by the network latency if the leader is not in the same AZ with the TiDB node that sends the read request. +- Each TiDB transaction needs to obtain TimeStamp Oracle (TSO) from the PD leader. So if the TiDB and PD leaders are not in the same AZ, the performance of the transactions will also be affected by the network latency because each transaction with the write request has to obtain TSO twice. ### Optimized architecture -If not all of the three DCs need to provide services to the applications, you can dispatch all the requests to one DC and configure the scheduling policy to migrate all the TiKV Region leader and PD leader to the same DC. In this way, neither obtaining TSO nor reading TiKV Regions will be impacted by the network latency across DCs. If this DC is down, the PD leader and TiKV Region leader will be automatically elected in other surviving DCs, and you just need to switch the requests to the DCs that are still alive. +If not all of the three AZs need to provide services to the applications, you can dispatch all the requests to one AZ and configure the scheduling policy to migrate the TiKV Region leader and PD leader to the same AZ. In this way, neither obtaining TSO nor reading TiKV Regions will be impacted by the network latency across AZs. If this AZ is down, the PD leader and TiKV Region leader will be automatically elected in other surviving AZs, and you just need to switch the requests to the AZs that are still alive. -![Read Performance Optimized 3-DC Deployment](/media/deploy-3dc-optimize.png) +![Read Performance Optimized 3-AZ Deployment](/media/deploy-3dc-optimize.png) **Advantages:** The cluster's read performance and the capability to get TSO are improved. A configuration template of scheduling policy is as follows: ```shell --- Evicts all leaders of other DCs to the DC that provides services to the application. +-- Evicts all leaders of other AZs to the AZ that provides services to the application. config set label-property reject-leader LabelName labelValue -- Migrates PD leaders and sets priority. @@ -81,14 +83,14 @@ member leader_priority pdName3 3 > **Note:** > -> Since TiDB 5.2, the `label-property` configuration is not supported by default. To set the replica policy, use the [placement rules](/configure-placement-rules.md). +> Starting from TiDB v5.2, the `label-property` configuration is not supported by default. To set the replica policy, use the [placement rules](/configure-placement-rules.md). **Disadvantages:** -- Write scenarios are still affected by network latency across DCs. This is because Raft follows the majority protocol and all written data must be replicated to at least two DCs. -- The TiDB server that provides services is only in one DC. -- All application traffic is processed by one DC and the performance is limited by the network bandwidth pressure of that DC. -- The capability to get TSO and the read performance are affected by whether the PD server and TiKV server are up in the DC that processes application traffic. If these servers are down, the application is still affected by the cross-center network latency. +- Write scenarios are still affected by network latency across AZs. This is because Raft follows the majority protocol and all written data must be replicated to at least two AZs. +- The TiDB server that provides services is only in one AZ. +- All application traffic is processed by one AZ and the performance is limited by the network bandwidth pressure of that AZ. +- The capability to get TSO and the read performance are affected by whether the PD server and TiKV server are up in the AZ that processes application traffic. If these servers are down, the application is still affected by the cross-center network latency. ### Deployment example @@ -96,9 +98,9 @@ This section provides a topology example, and introduces TiKV labels and TiKV la #### Topology example -The following example assumes that three DCs (IDC1, IDC2, and IDC3) are located in one city; each IDC has two sets of racks and each rack has three servers. The example ignores the hybrid deployment or the scenario where multiple instances are deployed on one machine. The deployment of a TiDB cluster (three replicas) on three DCs in one city is as follows: +The following example assumes that three AZs (AZ1, AZ2, and AZ3) are located in one region; each AZ has two sets of racks and each rack has three servers. The example ignores the hybrid deployment or the scenario where multiple instances are deployed on one machine. The deployment of a TiDB cluster (three replicas) on three AZs in one region is as follows: -![3-DC in One City](/media/multi-data-centers-in-one-city-deployment-sample.png) +![3-AZ in One Region](/media/multi-data-centers-in-one-city-deployment-sample.png) #### TiKV labels @@ -106,62 +108,64 @@ TiKV is a Multi-Raft system where data is divided into Regions and the size of e Because a Raft group of three replicas tolerates only one replica failure, even if the cluster is scaled out to have N TiKV instances, this cluster still tolerates only one replica failure. Two failed TiKV instances might cause some Regions to lose replicas and the data in this cluster is no longer complete. SQL requests that access data from these Regions will fail. The probability of two simultaneous failures among N TiKV instances is much higher than the probability of two simultaneous failures among three TiKV instances. This means that the more TiKV instances the Multi-Raft system is scaled out to have, the less the availability of the system. -Because of the limitation described above, `label` is used to describe the location information of TiKV. The label information is refreshed to the TiKV startup configuration file with deployment or rolling upgrade operations. The started TiKV reports its latest label information to PD. Based on the user-registered label name (the label metadata) and the TiKV topology, PD optimally schedules Region replicas and improves the system availability. +Because of the preceding limitation, `label` is used to describe the location information of TiKV. The label information is refreshed to the TiKV startup configuration file with deployment or rolling upgrade operations. The started TiKV reports its latest label information to PD. Based on the user-registered label name (the label metadata) and the TiKV topology, PD optimally schedules Region replicas and improves the system availability. #### TiKV labels planning example -To improve the availability and disaster recovery of the system, you need to design and plan TiKV labels according to your existing physical resources and the disaster recovery capability. You also need to configure in the cluster initialization configuration file according to the planned topology: +To improve the availability and disaster recovery of the system, you need to design and plan TiKV labels according to your existing physical resources and the disaster recovery capability. You also need to edit the cluster initialization configuration file according to the planned topology: ```ini server_configs: pd: - replication.location-labels: ["zone","dc","rack","host"] + replication.location-labels: ["zone","az","rack","host"] tikv_servers: - host: 10.63.10.30 config: - server.labels: { zone: "z1", dc: "d1", rack: "r1", host: "30" } + server.labels: { zone: "z1", az: "az1", rack: "r1", host: "30" } - host: 10.63.10.31 config: - server.labels: { zone: "z1", dc: "d1", rack: "r1", host: "31" } + server.labels: { zone: "z1", az: "az1", rack: "r1", host: "31" } - host: 10.63.10.32 config: - server.labels: { zone: "z1", dc: "d1", rack: "r2", host: "32" } + server.labels: { zone: "z1", az: "az1", rack: "r2", host: "32" } - host: 10.63.10.33 config: - server.labels: { zone: "z1", dc: "d1", rack: "r2", host: "33" } + server.labels: { zone: "z1", az: "az1", rack: "r2", host: "33" } + - host: 10.63.10.34 config: - server.labels: { zone: "z2", dc: "d1", rack: "r1", host: "34" } + server.labels: { zone: "z2", az: "az2", rack: "r1", host: "34" } - host: 10.63.10.35 config: - server.labels: { zone: "z2", dc: "d1", rack: "r1", host: "35" } + server.labels: { zone: "z2", az: "az2", rack: "r1", host: "35" } - host: 10.63.10.36 config: - server.labels: { zone: "z2", dc: "d1", rack: "r2", host: "36" } + server.labels: { zone: "z2", az: "az2", rack: "r2", host: "36" } - host: 10.63.10.37 config: - server.labels: { zone: "z2", dc: "d1", rack: "r2", host: "37" } + server.labels: { zone: "z2", az: "az2", rack: "r2", host: "37" } + - host: 10.63.10.38 config: - server.labels: { zone: "z3", dc: "d1", rack: "r1", host: "38" } + server.labels: { zone: "z3", az: "az3", rack: "r1", host: "38" } - host: 10.63.10.39 config: - server.labels: { zone: "z3", dc: "d1", rack: "r1", host: "39" } + server.labels: { zone: "z3", az: "az3", rack: "r1", host: "39" } - host: 10.63.10.40 config: - server.labels: { zone: "z3", dc: "d1", rack: "r2", host: "40" } + server.labels: { zone: "z3", az: "az3", rack: "r2", host: "40" } - host: 10.63.10.41 config: - server.labels: { zone: "z3", dc: "d1", rack: "r2", host: "41" } + server.labels: { zone: "z3", az: "az3", rack: "r2", host: "41" } ``` -In the example above, `zone` is the logical availability zone layer that controls the isolation of replicas (three replicas in the example cluster). +In the preceding example, `zone` is the logical availability zone layer that controls the isolation of replicas (three replicas in the example cluster). -Considering that the DC might be scaled out in the future, the three-layer label structure (`dc`, `rack`, `host`) is not directly adopted. Assuming that `d2`, `d3`, and `d4` are to be scaled out, you only need to scale out the DCs in the corresponding availability zone and scale out the racks in the corresponding DC. +Considering that the AZs might be scaled out in the future, the three-layer label structure (`az`, `rack`, and `host`) is not directly adopted. Assuming that `AZ2`, `AZ3`, and `AZ4` are to be scaled out, you only need to scale out the AZs in the corresponding availability zone and scale out the racks in the corresponding AZ. -If this three-layer label structure is directly adopted, after scaling out a DC, you might need to apply new labels and the data in TiKV needs to be rebalanced. +If this three-layer label structure is directly adopted, after scaling out an AZ, you might need to apply new labels and the data in TiKV needs to be rebalanced. ### High availability and disaster recovery analysis -The multiple DCs in one city deployment can guarantee that if one DC fails, the cluster can automatically recover services without manual intervention. Data consistency is also guaranteed. Note that scheduling policies are used to optimize performance, but when failure occurs, these policies prioritize availability over performance. +The multiple AZs in one region deployment can guarantee that if one AZ fails, the cluster can automatically recover services without manual intervention. Data consistency is also guaranteed. Note that scheduling policies are used to optimize performance, but when a failure occurs, these policies prioritize availability over performance. diff --git a/three-data-centers-in-two-cities-deployment.md b/three-data-centers-in-two-cities-deployment.md index 8ca85d4e90db2..27352220f4174 100644 --- a/three-data-centers-in-two-cities-deployment.md +++ b/three-data-centers-in-two-cities-deployment.md @@ -1,57 +1,62 @@ --- -title: Three Data Centers in Two Cities Deployment -summary: Learn the deployment solution to three data centers in two cities. +title: Three Availability Zones in Two Regions Deployment +summary: Learn the deployment solution to three availability zones in two regions. aliases: ['/docs/dev/three-data-centers-in-two-cities-deployment/'] --- -# Three Data Centers in Two Cities Deployment +# Three Availability Zones in Two Regions Deployment -This document introduces the architecture and configuration of the three data centers (DC) in two cities deployment. +This document introduces the architecture and configuration of the three availability zones (AZs) in two regions deployment. + +The term "region" in this document refers to a geographic area, while the capitalized "Region" refers to a basic unit of data storage in TiKV. "AZ" refers to an isolated location within a region, and each region has multiple AZs. The solution described in this document also applies to the scenario where multiple data centers are located in a single city. ## Overview -The architecture of three DCs in two cities is a highly available and disaster tolerant deployment solution that provides a production data center, a disaster recovery center in the same city, and a disaster recovery center in another city. In this mode, the three DCs in two cities are interconnected. If one DC fails or suffers from a disaster, other DCs can still operate well and take over the the key applications or all applications. Compared with the the multi-DC in one city deployment, this solution has the advantage of cross-city high availability and can survive city-level natural disasters. +The architecture of three AZs in two regions is a highly available and disaster tolerant deployment solution that provides a production data AZ, a disaster recovery AZ in the same region, and a disaster recovery AZ in another region. In this mode, the three AZs in two regions are interconnected. If one AZ fails or suffers from a disaster, other AZs can still operate well and take over the key applications or all applications. Compared with the multi-AZ in one region deployment, this solution has the advantage of cross-region high availability and can survive region-level natural disasters. -The distributed database TiDB natively supports the three-DC-in-two-city architecture by using the Raft algorithm, and guarantees the consistency and high availability of data within a database cluster. Because the network latency across DCs in the same city is relatively low, the application traffic can be dispatched to two DCs in the same city, and the traffic load can be shared by these two DCs by controlling the distribution of TiKV Region leaders and PD leaders. +The distributed database TiDB natively supports the three-AZ-in-two-region architecture by using the Raft algorithm, and guarantees the consistency and high availability of data within a database cluster. Because the network latency across AZs in the same region is relatively low, the application traffic can be dispatched to two AZs in the same region, and the traffic load can be shared by these two AZs by controlling the distribution of TiKV Region leaders and PD leaders. ## Deployment architecture -This section takes the example of Seattle and San Francisco to explain the deployment mode of three DCs in two cities for the distributed database of TiDB. +This section takes the example of Seattle and San Francisco to explain the deployment mode of three AZs in two regions for the distributed database of TiDB. -In this example, two DCs (IDC1 and IDC2) are located in Seattle and another DC (IDC3) is located in San Francisco. The network latency between IDC1 and IDC2 is lower than 3 milliseconds. The network latency between IDC3 and IDC1/IDC2 in Seattle is about 20 milliseconds (ISP dedicated network is used). +In this example, two AZs (AZ1 and AZ2) are located in Seattle and another AZ (AZ3) is located in San Francisco. The network latency between AZ1 and AZ2 is lower than 3 milliseconds. The network latency between AZ3 and AZ1/AZ2 in Seattle is about 20 milliseconds (ISP dedicated network is used). The architecture of the cluster deployment is as follows: -- The TiDB cluster is deployed to three DCs in two cities: IDC1 in Seattle, IDC2 in Seattle, and IDC3 in San Francisco. -- The cluster has five replicas, two in IDC1, two in IDC2, and one in IDC3. For the TiKV component, each rack has a label, which means that each rack has a replica. +- The TiDB cluster is deployed to three AZs in two regions: AZ1 in Seattle, AZ2 in Seattle, and AZ3 in San Francisco. +- The cluster has five replicas, two in AZ1, two in AZ2, and one in AZ3. For the TiKV component, each rack has a label, which means that each rack has a replica. - The Raft protocol is adopted to ensure consistency and high availability of data, which is transparent to users. -![3-DC-in-2-city architecture](/media/three-data-centers-in-two-cities-deployment-01.png) +![3-AZ-in-2-region architecture](/media/three-data-centers-in-two-cities-deployment-01.png) -This architecture is highly available. The distribution of Region leaders is restricted to the two DCs (IDC1 and IDC2) that are in the same city (Seattle). Compared with the three-DC solution in which the distribution of Region leaders is not restricted, this architecture has the following advantages and disadvantages: +This architecture is highly available. The distribution of Region leaders is restricted to the two AZs (AZ1 and AZ2) that are in the same region (Seattle). Compared with the three-AZ solution in which the distribution of Region leaders is not restricted, this architecture has the following advantages and disadvantages: - **Advantages** - - Region leaders are in DCs of the same city with low latency, so the write is faster. - - The two DCs can provide services at the same time, so the resources usage rate is higher. - - If one DC fails, services are still available and data safety is ensured. + - Region leaders are in AZs of the same region with low latency, so the write is faster. + - The two AZs can provide services at the same time, so the resource usage rate is higher. + - If one AZ fails, services are still available and data safety is ensured. - **Disadvantages** - - Because the data consistency is achieved by the Raft algorithm, when two DCs in the same city fail at the same time, only one surviving replica remains in the disaster recovery DC in another city (San Francisco). This cannot meet the requirement of the Raft algorithm that most replicas survive. As a result, the cluster can be temporarily unavailable. Maintenance staff needs to recover the cluster from the one surviving replica and a small amount of hot data that has not been replicated will be lost. But this case is a rare occurrence. + - Because the data consistency is achieved by the Raft algorithm, when two AZs in the same region fail at the same time, only one surviving replica remains in the disaster recovery AZ in another region (San Francisco). This cannot meet the requirement of the Raft algorithm that most replicas survive. As a result, the cluster can be temporarily unavailable. Maintenance staff needs to recover the cluster from the one surviving replica and a small amount of hot data that has not been replicated will be lost. But this case is a rare occurrence. - Because the ISP dedicated network is used, the network infrastructure of this architecture has a high cost. - - Five replicas are configured in three DCs in two cities, data redundancy increases, which brings a higher storage cost. + - Five replicas are configured in three AZs in two regions, data redundancy increases, which brings a higher storage cost. ### Deployment details -The configuration of the three DCs in two cities (Seattle and San Francisco) deployment plan is illustrated as follows: +The configuration of the three AZs in two regions (Seattle and San Francisco) deployment plan is illustrated as follows: + +![3-AZ-2-region](/media/three-data-centers-in-two-cities-deployment-02.png) + +From the preceding illustration, you can see that Seattle has two AZs: AZ1 and AZ2. AZ1 has three sets of racks: rac1, rac2, and rac3. AZ2 has two racks: rac4 and rac5. The AZ3 in San Francisco has the rac6 rack. + +In the rac1 of AZ1, one server is deployed with TiDB and PD services, and the other two servers are deployed with TiKV services. Each TiKV server is deployed with two TiKV instances (tikv-server). This is similar to rac2, rac4, rac5, and rac6. -![3-DC-2-city](/media/three-data-centers-in-two-cities-deployment-02.png) +The TiDB server, the control machine, and the monitoring server are on rac3. The TiDB server is deployed for regular maintenance and backup. Prometheus, Grafana, and the restore tools are deployed on the control machine and monitoring machine. -- From the illustration above, you can see that Seattle has two DCs: IDC1 and IDC2. IDC1 has three sets of racks: RAC1, RAC2, and RAC3. IDC2 has two racks: RAC4 and RAC5. The IDC3 DC in San Francisco has the RAC6 rack. -- From the RAC1 rack illustrated above, TiDB and PD services are deployed on the same server. Each of the two TiKV servers are deployed with two TiKV instances (tikv-server). This is similar to RAC2, RAC4, RAC5, and RAC6. -- The TiDB server, the control machine, and the monitoring server are on RAC3. The TiDB server is deployed for regular maintenance and backup. Prometheus, Grafana, and the restore tools are deployed on the control machine and monitoring machine. -- Another backup server can be added to deploy Drainer. Drainer saves binlog data to a specified location by outputting files, to achieve incremental backup. +Another backup server can be added to deploy Drainer. Drainer saves binlog data to a specified location by outputting files, to achieve incremental backup. ## Configuration @@ -62,7 +67,7 @@ See the following `tiup topology.yaml` yaml file for example: ```yaml # # Global variables are applied to all deployments and used as the default value of # # the deployments if a specific deployment value is missing. -global +global: user: "tidb" ssh_port: 22 deploy_dir: "/data/tidb_cluster/tidb-deploy" @@ -72,7 +77,7 @@ server_configs: tikv: server.grpc-compression-type: gzip pd: - replication.location-labels: ["dc","zone","rack","host"] + replication.location-labels: ["az","replication zone","rack","host"] pd_servers: - host: 10.63.10.10 @@ -96,19 +101,19 @@ tidb_servers: tikv_servers: - host: 10.63.10.30 config: - server.labels: { dc: "1", zone: "1", rack: "1", host: "30" } + server.labels: { az: "1", replication zone: "1", rack: "1", host: "30" } - host: 10.63.10.31 config: - server.labels: { dc: "1", zone: "2", rack: "2", host: "31" } + server.labels: { az: "1", replication zone: "2", rack: "2", host: "31" } - host: 10.63.10.32 config: - server.labels: { dc: "2", zone: "3", rack: "3", host: "32" } + server.labels: { az: "2", replication zone: "3", rack: "3", host: "32" } - host: 10.63.10.33 config: - server.labels: { dc: "2", zone: "4", rack: "4", host: "33" } + server.labels: { az: "2", replication zone: "4", rack: "4", host: "33" } - host: 10.63.10.34 config: - server.labels: { dc: "3", zone: "5", rack: "5", host: "34" } + server.labels: { az: "3", replication zone: "5", rack: "5", host: "34" } raftstore.raft-min-election-timeout-ticks: 1000 raftstore.raft-max-election-timeout-ticks: 1200 @@ -124,7 +129,7 @@ alertmanager_servers: ### Labels design -In the deployment of three DCs in two cities, the label design requires taking availability and disaster recovery into account. It is recommended that you define the four levels (`dc`, `zone`, `rack`, `host`) based on the physical structure of the deployment. +In the deployment of three AZs in two regions, the label design requires taking availability and disaster recovery into account. It is recommended that you define the four levels (`az`, `replication zone`, `rack`, and `host`) based on the physical structure of the deployment. ![Label logical definition](/media/three-data-centers-in-two-cities-deployment-03.png) @@ -133,7 +138,7 @@ In the PD configuration, add level information of TiKV labels: ```yaml server_configs: pd: - replication.location-labels: ["dc","zone","rack","host"] + replication.location-labels: ["az","replication zone","rack","host"] ``` The configuration of `tikv_servers` is based on the label information of the real physical deployment location of TiKV, which makes it easier for PD to perform global management and scheduling. @@ -142,24 +147,24 @@ The configuration of `tikv_servers` is based on the label information of the rea tikv_servers: - host: 10.63.10.30 config: - server.labels: { dc: "1", zone: "1", rack: "1", host: "30" } + server.labels: { az: "1", replication zone: "1", rack: "1", host: "30" } - host: 10.63.10.31 config: - server.labels: { dc: "1", zone: "2", rack: "2", host: "31" } + server.labels: { az: "1", replication zone: "2", rack: "2", host: "31" } - host: 10.63.10.32 config: - server.labels: { dc: "2", zone: "3", rack: "3", host: "32" } + server.labels: { az: "2", replication zone: "3", rack: "3", host: "32" } - host: 10.63.10.33 config: - server.labels: { dc: "2", zone: "4", rack: "4", host: "33" } + server.labels: { az: "2", replication zone: "4", rack: "4", host: "33" } - host: 10.63.10.34 config: - server.labels: { dc: "3", zone: "5", rack: "5", host: "34" } + server.labels: { az: "3", replication zone: "5", rack: "5", host: "34" } ``` ### Optimize parameter configuration -In the deployment of three DCs in two cities, to optimize performance, you need to not only configure regular parameters, but also adjust component parameters. +In the deployment of three AZs in two regions, to optimize performance, you need to not only configure regular parameters, but also adjust component parameters. - Enable gRPC message compression in TiKV. Because data of the cluster is transmitted in the network, you can enable the gRPC message compression to lower the network traffic. @@ -167,7 +172,7 @@ In the deployment of three DCs in two cities, to optimize performance, you need server.grpc-compression-type: gzip ``` -- Optimize the network configuration of the TiKV node in another city (San Francisco). Modify the following TiKV parameters for IDC3 (alone) in San Francisco and try to prevent the replica in this TiKV node from participating in the Raft election. +- Optimize the network configuration of the TiKV node in another region (San Francisco). Modify the following TiKV parameters for AZ3 in San Francisco and try to prevent the replica in this TiKV node from participating in the Raft election. ```yaml raftstore.raft-min-election-timeout-ticks: 1000 @@ -176,23 +181,23 @@ In the deployment of three DCs in two cities, to optimize performance, you need - Configure scheduling. After the cluster is enabled, use the `tiup ctl:v pd` tool to modify the scheduling policy. Modify the number of TiKV Raft replicas. Configure this number as planned. In this example, the number of replicas is five. - ```yaml + ```bash config set max-replicas 5 ``` -- Forbid scheduling the Raft leader to IDC3. Scheduling the Raft leader to in another city (IDC3) causes unnecessary network overhead between IDC1/IDC2 in Seattle and IDC3 in San Francisco. The network bandwidth and latency also affect performance of the TiDB cluster. +- Forbid scheduling the Raft leader to AZ3. Scheduling the Raft leader to another region (AZ3) causes unnecessary network overhead between AZ1/AZ2 in Seattle and AZ3 in San Francisco. The network bandwidth and latency also affect the performance of the TiDB cluster. - ```yaml + ```bash config set label-property reject-leader dc 3 ``` > **Note:** > - > Since TiDB 5.2, the `label-property` configuration is not supported by default. To set the replica policy, use the [placement rules](/configure-placement-rules.md). + > Starting from TiDB v5.2, the `label-property` configuration is not supported by default. To set the replica policy, use the [placement rules](/configure-placement-rules.md). -- Configure the priority of PD. To avoid the situation where the PD leader is in another city (IDC3), you can increase the priority of local PD (in Seattle) and decrease the priority of PD in another city (San Francisco). The larger the number, the higher the priority. +- Configure the priority of PD. To avoid the situation where the PD leader is in another region (AZ3), you can increase the priority of local PD (in Seattle) and decrease the priority of PD in another region (San Francisco). The larger the number, the higher the priority. - ```yaml + ```bash member leader_priority PD-10 5 member leader_priority PD-11 5 member leader_priority PD-12 5 diff --git a/two-data-centers-in-one-city-deployment.md b/two-data-centers-in-one-city-deployment.md index 4fff5f1603e96..18b240452dec7 100644 --- a/two-data-centers-in-one-city-deployment.md +++ b/two-data-centers-in-one-city-deployment.md @@ -1,42 +1,45 @@ --- -title: Two Data Centers in One City Deployment -summary: Learn the deployment solution of two data centers in one city. +title: Two Availability Zones in One Region Deployment +summary: Learn the deployment solution of two availability zones in one region. aliases: ['/tidb/dev/synchronous-replication'] --- -# Two Data Centers in One City Deployment +# Two Availability Zones in One Region Deployment -This document introduces the deployment mode of two data centers (DCs) in one city, including the architecture, configuration, how to enable this deployment mode, and how to use replicas in this mode. +This document introduces the deployment mode of two availability zones (AZs) in one region, including the architecture, configuration, how to enable this deployment mode, and how to use replicas in this mode. -In an on-premises environment, TiDB usually adopts the multi-data-center deployment solution to ensure high availability and disaster recovery capability. The multi-data-center deployment solution includes multiple deployment modes, such as three data centers in two cities and three data centers in one city. This document introduces the deployment mode of two data centers in one city. Deployed in this mode, TiDB can also meet the requirements of high availability and disaster recovery, with a lower cost. This deployment solution adopts Data Replication Auto Synchronous mode, or the DR Auto-Sync mode. +The term "region" in this document refers to a geographic area, while the capitalized "Region" refers to a basic unit of data storage in TiKV. "AZ" refers to an isolated location within a region, and each region has multiple AZs. The solution described in this document also applies to the scenario where multiple data centers are located in a single city. -Under the mode of two data centers in one city, the two data centers are less than 50 kilometers apart. They are usually located in the same city or in two adjacent cities. The network latency between the two data centers is lower than 1.5 milliseconds and the bandwidth is higher than 10 Gbps. +## Introduction + +TiDB usually adopts the multi-AZ deployment solution to ensure high availability and disaster recovery capability. The multi-AZ deployment solution includes multiple deployment modes, such as multiple AZs in one region and multiple AZs in two regions. This document introduces the deployment mode of two AZs in one region. Deployed in this mode, TiDB can also meet the requirements of high availability and disaster recovery, with a lower cost. This deployment solution adopts Data Replication Auto Synchronous mode, or the DR Auto-Sync mode. + +Under the mode of two AZs in one region, the two AZs are less than 50 kilometers apart. They are usually located in the same region or in two adjacent regions. The network latency between the two AZs is lower than 1.5 milliseconds and the bandwidth is higher than 10 Gbps. ## Deployment architecture -This section takes the example of a city where two data centers IDC1 and IDC2 are located respectively in the east and west. +This section takes the example of a region where two availability zones AZ1 and AZ2 are located respectively in the east and west. AZ1 is the primary AZ and AZ2 is the disaster recovery (DR) AZ. The architecture of the cluster deployment is as follows: -- The TiDB cluster is deployed to two DCs in one city: the primary IDC1 in the east, and the disaster recovery (DR) IDC2 in the west. -- The cluster has 4 replicas: 2 Voter replicas in IDC1, 1 Voter replica and 1 Learner replica in IDC2. For the TiKV component, each rack has a proper label. +- The cluster has four replicas: two Voter replicas in AZ1, one Voter replica, and one Learner replica in AZ2. For the TiKV component, each rack has a proper label. - The Raft protocol is adopted to ensure consistency and high availability of data, which is transparent to users. -![2-DC-in-1-city architecture](/media/two-dc-replication-1.png) +![2-AZ-in-1-region architecture](/media/two-dc-replication-1.png) This deployment solution defines three statuses to control and identify the replication status of the cluster, which restricts the replication mode of TiKV. The replication mode of the cluster can automatically and adaptively switch between the three statuses. For details, see the [Status switch](#status-switch) section. -- **sync**: Synchronous replication mode. In this mode, at least one replica in the disaster recovery (DR) data center synchronizes with the primary data center. The Raft algorithm ensures that each log is replicated to the DR based on the label. -- **async**: Asynchronous replication mode. In this mode, the DR data center is not fully synchronized with the primary data center. The Raft algorithm follows the majority protocol to replicate logs. -- **sync-recover**: Synchronous recovery mode. In this mode, the DR data center is not fully synchronized with the primary data center. Raft gradually switches to the label replication mode and then reports the label information to PD. +- **sync**: Synchronous replication mode. In this mode, at least one replica in the disaster recovery AZ synchronizes with the primary AZ. The Raft algorithm ensures that each log is replicated to the DR based on the label. +- **async**: Asynchronous replication mode. In this mode, the disaster recovery AZ is not fully synchronized with the primary AZ. The Raft algorithm follows the majority protocol to replicate logs. +- **sync-recover**: Synchronous recovery mode. In this mode, the disaster recovery AZ is not fully synchronized with the primary AZ. Raft gradually switches to the label replication mode and then reports the label information to PD. ## Configuration ### Example -The following `tiup topology.yaml` example file is a typical topology configuration for the two data centers in one city deployment mode: +The following `tiup topology.yaml` example file is a typical topology configuration for the two availability zones in one region deployment mode: -``` +```yaml # # Global variables are applied to all deployments and used as the default value of # # the deployments if a specific deployment value is missing. global: @@ -46,7 +49,7 @@ global: data_dir: "/data/tidb_cluster/tidb-data" server_configs: pd: - replication.location-labels: ["zone","rack","host"] + replication.location-labels: ["az","rack","host"] pd_servers: - host: 10.63.10.10 name: "pd-10" @@ -61,16 +64,22 @@ tidb_servers: tikv_servers: - host: 10.63.10.30 config: - server.labels: { zone: "east", rack: "east-1", host: "30" } + server.labels: { az: "east", rack: "east-1", host: "30" } - host: 10.63.10.31 config: - server.labels: { zone: "east", rack: "east-2", host: "31" } + server.labels: { az: "east", rack: "east-2", host: "31" } - host: 10.63.10.32 config: - server.labels: { zone: "west", rack: "west-1", host: "32" } + server.labels: { az: "east", rack: "east-3", host: "32" } - host: 10.63.10.33 config: - server.labels: { zone: "west", rack: "west-2", host: "33" } + server.labels: { az: "west", rack: "west-1", host: "33" } + - host: 10.63.10.34 + config: + server.labels: { az: "west", rack: "west-2", host: "34" } + - host: 10.63.10.35 + config: + server.labels: { az: "west", rack: "west-3", host: "35" } monitoring_servers: - host: 10.63.10.60 grafana_servers: @@ -81,7 +90,7 @@ alertmanager_servers: ### Placement Rules -To deploy a cluster based on the planned topology, you need to use [Placement Rules](/configure-placement-rules.md) to determine the locations of the cluster replicas. Taking the deployment of 4 replicas (2 Voter replicas are at the primary center, 1 Voter replica and 1 Learner replica are at the DR center) as an example, you can use the Placement Rules to configure the replicas as follows: +To deploy a cluster based on the planned topology, you need to use [Placement Rules](/configure-placement-rules.md) to determine the locations of the cluster replicas. Taking the deployment of four replicas (two Voter replicas are at the primary AZ, one Voter replica, and one Learner replica are at the disaster recovery AZ) as an example, you can use the Placement Rules to configure the replicas as follows: ``` cat rule.json @@ -93,14 +102,14 @@ cat rule.json "rules": [ { "group_id": "pd", - "id": "zone-east", + "id": "az-east", "start_key": "", "end_key": "", "role": "voter", - "count": 2, + "count": 3, "label_constraints": [ { - "key": "zone", + "key": "az", "op": "in", "values": [ "east" @@ -108,21 +117,21 @@ cat rule.json } ], "location_labels": [ - "zone", + "az", "rack", "host" ] }, { "group_id": "pd", - "id": "zone-west", + "id": "az-west", "start_key": "", "end_key": "", - "role": "voter", - "count": 1, + "role": "follower", + "count": 2, "label_constraints": [ { - "key": "zone", + "key": "az", "op": "in", "values": [ "west" @@ -130,21 +139,21 @@ cat rule.json } ], "location_labels": [ - "zone", + "az", "rack", "host" ] }, { "group_id": "pd", - "id": "zone-west", + "id": "az-west", "start_key": "", "end_key": "", "role": "learner", "count": 1, "label_constraints": [ { - "key": "zone", + "key": "az", "op": "in", "values": [ "west" @@ -152,7 +161,7 @@ cat rule.json } ], "location_labels": [ - "zone", + "az", "rack", "host" ] @@ -187,7 +196,7 @@ cat default.json "start_key": "", "end_key": "", "role": "voter", - "count": 3 + "count": 5 } ] } @@ -206,13 +215,12 @@ The replication mode is controlled by PD. You can configure the replication mode [replication-mode] replication-mode = "dr-auto-sync" [replication-mode.dr-auto-sync] - label-key = "zone" + label-key = "az" primary = "east" dr = "west" - primary-replicas = 2 - dr-replicas = 1 + primary-replicas = 3 + dr-replicas = 2 wait-store-timeout = "1m" - wait-sync-timeout = "1m" ``` - Method 2: If you have deployed a cluster, use pd-ctl commands to modify the configurations of PD. @@ -221,19 +229,19 @@ The replication mode is controlled by PD. You can configure the replication mode ```shell config set replication-mode dr-auto-sync - config set replication-mode dr-auto-sync label-key zone + config set replication-mode dr-auto-sync label-key az config set replication-mode dr-auto-sync primary east config set replication-mode dr-auto-sync dr west - config set replication-mode dr-auto-sync primary-replicas 2 - config set replication-mode dr-auto-sync dr-replicas 1 + config set replication-mode dr-auto-sync primary-replicas 3 + config set replication-mode dr-auto-sync dr-replicas 2 ``` Descriptions of configuration items: -+ `replication-mode` is the replication mode to be enabled. In the above example, it is set to `dr-auto-sync`. By default, the majority protocol is used. -+ `label-key` is used to distinguish different data centers and needs to match Placement Rules. In this example, the primary data center is "east" and the DR data center is "west". -+ `primary-replicas` is the number of Voter replicas in the primary data center. -+ `dr-replicas` is the number of Voter replicas in the DR data center. ++ `replication-mode` is the replication mode to be enabled. In the preceding example, it is set to `dr-auto-sync`. By default, the majority protocol is used. ++ `label-key` is used to distinguish different AZs and needs to match Placement Rules. In this example, the primary AZ is "east" and the disaster recovery AZ is "west". ++ `primary-replicas` is the number of Voter replicas in the primary AZ. ++ `dr-replicas` is the number of Voter replicas in the disaster recovery AZ. + `wait-store-timeout` is the waiting time for switching to asynchronous replication mode when network isolation or failure occurs. If the time of network failure exceeds the waiting time, asynchronous replication mode is enabled. The default waiting time is 60 seconds. To check the current replication status of the cluster, use the following API: @@ -250,7 +258,7 @@ curl http://pd_ip:pd_port/pd/api/v1/replication_mode/status { "mode": "dr-auto-sync", "dr-auto-sync": { - "label-key": "zone", + "label-key": "az", "state": "sync" } } @@ -260,28 +268,28 @@ curl http://pd_ip:pd_port/pd/api/v1/replication_mode/status The replication mode of a cluster can automatically and adaptively switch between three statuses: -- When the cluster is normal, the synchronous replication mode is enabled to maximize the data integrity of the disaster recovery data center. -- When the network connection between the two data centers fails or the DR data center breaks down, after a pre-set protective interval, the cluster enables the asynchronous replication mode to ensure the availability of the application. -- When the network reconnects or the DR data center recovers, the TiKV node joins the cluster again and gradually replicates the data. Finally, the cluster switches to the synchronous replication mode. +- When the cluster is normal, the synchronous replication mode is enabled to maximize the data integrity of the disaster recovery AZ. +- When the network connection between the two AZs fails or the disaster recovery AZ breaks down, after a pre-set protective interval, the cluster enables the asynchronous replication mode to ensure the availability of the application. +- When the network reconnects or the disaster recovery AZ recovers, the TiKV node joins the cluster again and gradually replicates the data. Finally, the cluster switches to the synchronous replication mode. The details for the status switch are as follows: 1. **Initialization**: At the initialization stage, the cluster is in the synchronous replication mode. PD sends the status information to TiKV, and all TiKV nodes strictly follow the synchronous replication mode to work. -2. **Switch from sync to async**: PD regularly checks the heartbeat information of TiKV to judge whether the TiKV node fails or is disconnected. If the number of failed nodes exceeds the number of replicas of the primary data center (`primary-replicas`) and the DR data center (`dr-replicas`), the synchronous replication mode can no longer serve the data replication and it is necessary to switch the status. When the failure or disconnect time exceeds the time set by `wait-store-timeout`, PD switches the status of the cluster to the async mode. Then PD sends the status of async to all TiKV nodes, and the replication mode for TiKV switches from two-center replication to the native Raft majority. +2. **Switch from sync to async**: PD regularly checks the heartbeat information of TiKV to judge whether the TiKV node fails or is disconnected. If the number of failed nodes exceeds the number of replicas of the primary AZ (`primary-replicas`) and the disaster recovery AZ (`dr-replicas`), the synchronous replication mode can no longer serve the data replication and it is necessary to switch the status. When the failure or disconnect time exceeds the time set by `wait-store-timeout`, PD switches the status of the cluster to the async mode. Then PD sends the status of async to all TiKV nodes, and the replication mode for TiKV switches from two-availability-zone replication to the native Raft majority. -3. **Switch from async to sync**: PD regularly checks the heartbeat information of TiKV to judge whether the TiKV node is reconnected. If the number of failed nodes is less than the number of replicas of the primary data center (`primary-replicas`) and the DR data center (`dr-replicas`), the synchronous replication mode can be enabled again. PD first switches the status of the cluster to sync-recover and sends the status information to all TiKV nodes. All Regions of TiKV gradually switch to the two-data-center synchronous replication mode and then report the heartbeat information to PD. PD records the status of TiKV Regions and calculates the recovery progress. When all TiKV Regions finish the switching, PD switches the replication mode to sync. +3. **Switch from async to sync**: PD regularly checks the heartbeat information of TiKV to judge whether the TiKV node is reconnected. If the number of failed nodes is less than the number of replicas of the primary AZ (`primary-replicas`) and the disaster recovery AZ (`dr-replicas`), the synchronous replication mode can be enabled again. PD first switches the status of the cluster to sync-recover and sends the status information to all TiKV nodes. All Regions of TiKV gradually switch to the two-availability-zone synchronous replication mode and then report the heartbeat information to PD. PD records the status of TiKV Regions and calculates the recovery progress. When all TiKV Regions finish the switching, PD switches the replication mode to sync. ### Disaster recovery -This section introduces the disaster recovery solution of the two data centers in one city deployment. +This section introduces the disaster recovery solution of the two AZs in one region deployment. When a disaster occurs to a cluster in the synchronous replication mode, you can perform data recovery with `RPO = 0`: -- If the primary data center fails and most of the Voter replicas are lost, but complete data exists in the DR data center, the lost data can be recovered from the DR data center. At this time, manual intervention is required with professional tools. You can [get support](/support.md) from PingCAP or the community for a recovery solution. +- If the primary AZ fails and most of the Voter replicas are lost, but complete data exists in the disaster recovery AZ, the lost data can be recovered from the disaster recovery AZ. At this time, manual intervention is required with professional tools. You can [get support](/support.md) from PingCAP or the community for a recovery solution. -- If the DR center fails and a few Voter replicas are lost, the cluster automatically switches to the asynchronous replication mode. +- If the disaster recovery AZ fails and a few Voter replicas are lost, the cluster automatically switches to the asynchronous replication mode. When a disaster occurs to a cluster that is not in the synchronous replication mode and you cannot perform data recovery with `RPO = 0`: -- If most of the Voter replicas are lost, manual intervention is required with professional tools. You can [get support](/support.md) from PingCAP or the community for a recovery solution. \ No newline at end of file +- If most of the Voter replicas are lost, manual intervention is required with professional tools. You can [get support](/support.md) from PingCAP or the community for a recovery solution.