Skip to content

Commit

Permalink
add-balance-partition-within-zones-steps- (#2534)
Browse files Browse the repository at this point in the history
* add-balance-partition-within-zones-steps-

* Update docs-2.0/4.deployment-and-installation/5.zone.md

Co-authored-by: Chris Chen <[email protected]>

* Update docs-2.0/8.service-tuning/load-balance.md

Co-authored-by: Chris Chen <[email protected]>

---------

Co-authored-by: Chris Chen <[email protected]>
  • Loading branch information
abby-cyber and ChrisChen2023 authored Apr 11, 2024
1 parent d905e75 commit 939487c
Show file tree
Hide file tree
Showing 3 changed files with 101 additions and 44 deletions.
4 changes: 2 additions & 2 deletions docs-2.0/3.ngql-guide/9.space-statements/1.create-space.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,8 +127,8 @@ To balance the request loads, use the following command.
nebula> BALANCE LEADER;
nebula> SHOW HOSTS;
+-------------+------+----------+--------------+--------------------------------+--------------------------------+---------+
| Host | Port | HTTP port | Status | Leader count | Leader distribution | Partition distribution | Version |
+-------------+------+-----------+----------+--------------+--------------------------------+--------------------------------+---------+
| Host | Port | Status | Leader count | Leader distribution | Partition distribution | Version |
+-------------+------+----------+--------------+--------------------------------+--------------------------------+---------+
| "storaged0" | 9779 | "ONLINE" | 7 | "basketballplayer:3, test:4" | "basketballplayer:10, test:10" | "{{nebula.release}}" |
| "storaged1" | 9779 | "ONLINE" | 7 | "basketballplayer:4, test:3" | "basketballplayer:10, test:10" | "{{nebula.release}}" |
| "storaged2" | 9779 | "ONLINE" | 6 | "basketballplayer:3, test:3" | "basketballplayer:10, test:10" | "{{nebula.release}}" |
Expand Down
17 changes: 5 additions & 12 deletions docs-2.0/4.deployment-and-installation/5.zone.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ In cases where data retrieval fails due to a Storage node failure in a specific
- Adjusting the number of Zones isn't allowed.
- Zone name modifications are unsupported.

## Enable Zone
## Enable Zone

1. In the configuration file `nebula-metad.conf` of the Meta service, set `--zone_list` to Zone names to be added, such as `--zone_list=zone1, zone2, zone3`.

Expand Down Expand Up @@ -158,7 +158,7 @@ nebula> DESC SPACE my_space_1
ADD HOSTS <ip>:<port> [,<ip>:<port> ...] INTO ZONE <zone_name>;
```

- After enabling the Zone feature, you must include the `INTO ZONE` clause when executing the `ADD HOSTS` command; otherwise, adding a Storage node will fail.
- After enabling the Zone feature, you must include the `INTO ZONE` keywords when executing the `ADD HOSTS` command; otherwise, adding a Storage node fails.
- A Storage node can belong to only one Zone, but a single Zone can encompass multiple different Storage nodes.


Expand All @@ -168,7 +168,7 @@ For example:
nebula> ADD HOSTS 192.168.8.111:9779,192.168.8.112:9779 INTO ZONE az1;
```

### Balance the Zone replicas
### Balance the partitions within each Zone

```ngql
BALANCE DATA IN ZONE;
Expand All @@ -178,14 +178,7 @@ BALANCE DATA IN ZONE;

Specify a space before executing this command.

After enabling the Zone feature, run `BALANCE DATA IN ZONE` to balance the partition replicas within each Zone.

For example:

```ngql
nebula> USE my_space_1;
nebula> BALANCE DATA IN ZONE;
```
After enabling the Zone feature, run `BALANCE DATA IN ZONE` to balance the partitions within each Zone. For more information, see [Storage load balance](../8.service-tuning/load-balance.md).

### Migrate partitions from the Storage nodes in the specified Zones to other Storage nodes

Expand Down Expand Up @@ -219,7 +212,7 @@ nebula> SHOW JOBS 34
+--------+----------------+------------+----------------------------+----------------------------+
```

### Drop Storage nodes from the specified Zone
### Drop Storage nodes from the specified Zone

```ngql
DROP HOSTS <ip>:<port> [,<ip>:<port> ...];
Expand Down
124 changes: 94 additions & 30 deletions docs-2.0/8.service-tuning/load-balance.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,38 +4,36 @@ You can use the `SUBMIT JOB BALANCE` statement to balance the distribution of pa

!!! danger

The `BALANCE` commands migrate data and balance the distribution of partitions by creating and executing a set of subtasks. **DO NOT** stop any machine in the cluster or change its IP address until all the subtasks finish. Otherwise, the follow-up subtasks fail.
The `BALANCE` commands migrate data and balance the distribution of partitions by creating and executing a set of subtasks. **DO NOT** stop any machine in the cluster or change its IP address until all the subtasks are finished. Otherwise, the follow-up subtasks fail.

{{ ent.ent_begin }}

## Balance partition distribution

The `SUBMIT JOB BALANCE DATA` command starts a job to balance the distribution of storage partitions in the current graph space by creating and executing a set of subtasks.
The `SUBMIT JOB BALANCE DATA` command starts a job to balance the distribution of storage partitions in the current graph space by creating and executing a set of subtasks. If the [Zone](../4.deployment-and-installation/5.zone.md) feature is enabled, you can balance the partitions within each Zone by adding the `IN ZONE` keywords to the command. For example, `SUBMIT JOB BALANCE DATA IN ZONE`.

!!! enterpriseonly

Only available for the NebulaGraph Enterprise Edition.

!!! note

- If the current graph space already has a `SUBMIT JOB BALANCE DATA` job in the `FAILED` status, you can restore the `FAILED` job, but cannot start a new `SUBMIT JOB BALANCE DATA` job. If the job continues to fail, manually stop it, and then you can start a new one.
- The following example introduces the methods of balanced partition distribution for storage nodes with the Zone feature disabled. When the Zone feature is enabled, balanced partition distribution is performed across zones by specifying the `IN ZONE` clause. For details, see [Manage Zones](../4.deployment-and-installation/5.zone.md).

If the current graph space already has a `SUBMIT JOB BALANCE DATA` job in the `FAILED` status, you can restore the `FAILED` job, but cannot start a new `SUBMIT JOB BALANCE DATA` job. If the job continues to fail, manually stop it, and then you can start a new one.

### Balance partitions with Zone disabled

### Examples

After you add new storage hosts into the cluster, no partition is deployed on the new hosts.
After you add new storage hosts to the cluster, no partition is deployed on the new hosts. For example, run the following steps to balance the partition distribution when the Zone feature is disabled.

1. Run `SHOW HOSTS` to check the partition distribution.

```ngql
nebual> SHOW HOSTS;
+-----------------+------+----------+--------------+-----------------------+------------------------+----------------------+
| Host | Port | Status | Leader count | Leader distribution | Partition distribution | Version |
+-----------------+------+----------+--------------+-----------------------+------------------------+----------------------+
+-----------------+------+----------+--------------+-----------------------+------------------------+-------------+
| Host | Port | Status | Leader count | Leader distribution | Partition distribution | Version |
+-----------------+------+----------+--------------+-----------------------+------------------------+-------------+
| "192.168.8.101" | 9779 | "ONLINE" | 0 | "No valid partition" | "No valid partition" | "{{nebula.release}}" |
| "192.168.8.100" | 9779 | "ONLINE" | 15 | "basketballplayer:15" | "basketballplayer:15" | "{{nebula.release}}" |
+-----------------+------+----------+--------------+-----------------------+------------------------+----------------------+
+-----------------+------+----------+--------------+-----------------------+------------------------+-------------+
```
2. Enter the graph space `basketballplayer`, and execute the command `SUBMIT JOB BALANCE DATA` to balance the distribution of storage partitions.
Expand Down Expand Up @@ -70,16 +68,82 @@ After you add new storage hosts into the cluster, no partition is deployed on th
```ngql
nebula> SHOW HOSTS;
+-----------------+------+----------+--------------+----------------------+------------------------+----------------------+
| Host | Port | Status | Leader count | Leader distribution | Partition distribution | Version |
+-----------------+------+----------+--------------+----------------------+------------------------+----------------------+
+-----------------+------+----------+--------------+----------------------+------------------------+-------------+
| Host | Port | Status | Leader count | Leader distribution | Partition distribution | Version |
+-----------------+------+----------+--------------+----------------------+------------------------+-------------+
| "192.168.8.101" | 9779 | "ONLINE" | 7 | "basketballplayer:7" | "basketballplayer:7" | "{{nebula.release}}" |
| "192.168.8.100" | 9779 | "ONLINE" | 8 | "basketballplayer:8" | "basketballplayer:8" | "{{nebula.release}}" |
+-----------------+------+----------+--------------+----------------------+------------------------+----------------------+
+-----------------+------+----------+--------------+----------------------+------------------------+-------------+
```

If any subtask fails, run `RECOVER JOB <job_id>` to recover the failed jobs. If redoing load balancing does not solve the problem, ask for help in the [NebulaGraph community](https://github.com/vesoft-inc/nebula/discussions).


### Balance partitions with Zone enabled

For Zone-enabled clusters, you can balance the partitions within each Zone by adding the `IN ZONE` keywords to the `SUBMIT JOB BALANCE DATA` command. After you add a new storage host to the cluster, no partition is deployed on the new hosts. The following example adds a new storage host `192.168.8.158` and assigns the new host to `zone1` to show how to balance the partition distribution within the Zone `zone1`.

1. Run `SHOW HOSTS` to check the partition distribution.

```ngql
nebula> SHOW HOSTS;
+-----------------+------+----------+--------------+----------------------+------------------------+---------+-------------+
| Host | Port | Status | Leader count | Leader distribution | Partition distribution | Zone | Version |
+-----------------+------+----------+--------------+----------------------+------------------------+---------+-------------+
| "192.168.8.111" | 7779 | "ONLINE" | 5 | "my_space:5" | "my_space:10" | "zone1" | "{{nebula.release}}" |
| "192.168.8.113" | 7779 | "ONLINE" | 5 | "my_space:5" | "my_space:10" | "zone3" | "{{nebula.release}}" |
| "192.168.8.129" | 7779 | "ONLINE" | 0 | "No valid partition" | "my_space:10" | "zone2" | "{{nebula.release}}" |
| "192.168.8.158" | 7779 | "ONLINE" | 0 | "No valid partition" | "No valid partition" | "zone1" | "{{nebula.release}}" |
+-----------------+------+----------+--------------+----------------------+------------------------+---------+-------------+
```

2. Enter the graph space `my_zoned_space`, and execute the command `SUBMIT JOB BALANCE DATA IN ZONE` to balance the distribution of storage partitions within each Zone.

```ngql
nebula> USE my_zoned_space;
nebula> SUBMIT JOB BALANCE DATA IN ZONE;
+------------+
| New Job Id |
+------------+
| 2 |
+------------+
```

3. Run `SHOW JOB <job_id>` to check the status of the data balancing job. You can obtain the job ID after running `SUBMIT JOB BALANCE DATA IN ZONE` in the previous step.

```ngql
nebula> SHOW JOB 2;
+------------------------+------------------------------------------+-------------+----------------------------+----------------------------+-------------+
| Job Id(spaceId:partId) | Command(src->dst) | Status | Start Time | Stop Time | State |
+------------------------+------------------------------------------+-------------+----------------------------+----------------------------+-------------+
| 2 | "DATA_BALANCE" | "FINISHED" | 2024-04-11T02:41:27.000000 | 2024-04-11T02:41:32.000000 | "SUCCEEDED" |
| "2, 1:1" | "192.168.8.111:7779->192.168.8.158:7779" | "SUCCEEDED" | 2024-04-11T02:41:27.000000 | 2024-04-11T02:41:27.000000 | "SUCCEEDED" |
| "2, 1:2" | "192.168.8.111:7779->192.168.8.158:7779" | "SUCCEEDED" | 2024-04-11T02:41:27.000000 | 2024-04-11T02:41:32.000000 | "SUCCEEDED" |
| "2, 1:3" | "192.168.8.111:7779->192.168.8.158:7779" | "SUCCEEDED" | 2024-04-11T02:41:27.000000 | 2024-04-11T02:41:27.000000 | "SUCCEEDED" |
| "2, 1:4" | "192.168.8.111:7779->192.168.8.158:7779" | "SUCCEEDED" | 2024-04-11T02:41:27.000000 | 2024-04-11T02:41:27.000000 | "SUCCEEDED" |
| "2, 1:5" | "192.168.8.111:7779->192.168.8.158:7779" | "SUCCEEDED" | 2024-04-11T02:41:27.000000 | 2024-04-11T02:41:32.000000 | "SUCCEEDED" |
| "Total:5" | "Succeeded:5" | "Failed:0" | "In Progress:0" | "Invalid:0" | "" |
+------------------------+------------------------------------------+-------------+----------------------------+----------------------------+-------------+
```

The above result shows the process of balancing the partitions within the Zone `zone1`. When the job succeeds, the load balancing process finishes.

4. Run `SHOW HOSTS` again to make sure the partition distribution is balanced.

```ngql
+-----------------+------+----------+--------------+----------------------+------------------------+---------+-------------+
| Host | Port | Status | Leader count | Leader distribution | Partition distribution | Zone | Version |
+-----------------+------+----------+--------------+----------------------+------------------------+---------+-------------+
| "192.168.8.111" | 7779 | "ONLINE" | 3 | "my_space:3" | "my_space:5" | "zone1" | "{{nebula.release}}" |
| "192.168.8.113" | 7779 | "ONLINE" | 7 | "my_space:7" | "my_space:10" | "zone3" | "{{nebula.release}}" |
| "192.168.8.129" | 7779 | "ONLINE" | 0 | "No valid partition" | "my_space:10" | "zone2" | "{{nebula.release}}" |
| "192.168.8.158" | 7779 | "ONLINE" | 0 | "No valid partition" | "my_space:5" | "zone1" | "{{nebula.release}}" |
+-----------------+------+----------+--------------+----------------------+------------------------+---------+-------------+
```

From the result, you can see that the partition distribution is balanced on all the storage hosts within Zone `zone1`.


### Stop data balancing

To stop a balance job, run `STOP JOB <job_id>`.
Expand All @@ -102,23 +166,23 @@ To restore a balance job in the `FAILED` or `STOPPED` status, run `RECOVER JOB <

For a `STOPPED` `SUBMIT JOB BALANCE DATA` job, NebulaGraph detects whether the same type of `FAILED` jobs or `FINISHED` jobs have been created since the start time of the job. If so, the `STOPPED` job cannot be restored. For example, if chronologically there are STOPPED job1, FINISHED job2, and STOPPED Job3, only job3 can be restored, and job1 cannot.

### Migrate partition
### Migrate partitions

To migrate specified partitions and scale in the cluster, you can run `SUBMIT JOB BALANCE DATA REMOVE <ip:port> [,<ip>:<port> ...]`.

To migrate specified partitions for Zone-enabled clusters, you need to add the `IN ZONE` clause. For example, `SUBMIT JOB BALANCE DATA IN ZONE REMOVE <ip:port> [,<ip>:<port> ...]`. For details, see [Manage Zones](../4.deployment-and-installation/5.zone.md).
To migrate specified partitions for Zone-enabled clusters, you need to add the `IN ZONE` keywords. For example, `SUBMIT JOB BALANCE DATA IN ZONE REMOVE <ip:port> [,<ip>:<port> ...]`. For details, see [Manage Zones](../4.deployment-and-installation/5.zone.md).

For example, to migrate the partitions in server `192.168.8.100:9779`, the command as following:
For example, to migrate the partitions in server `192.168.8.100:9779`, the command is as following:

```ngql
nebula> SUBMIT JOB BALANCE DATA REMOVE 192.168.8.100:9779;
nebula> SHOW HOSTS;
+-----------------+------+----------+--------------+-----------------------+------------------------+----------------------+
| Host | Port | Status | Leader count | Leader distribution | Partition distribution | Version |
+-----------------+------+----------+--------------+-----------------------+------------------------+----------------------+
+-----------------+------+----------+--------------+-----------------------+------------------------+-------------+
| Host | Port | Status | Leader count | Leader distribution | Partition distribution | Version |
+-----------------+------+----------+--------------+-----------------------+------------------------+-------------+
| "192.168.8.101" | 9779 | "ONLINE" | 15 | "basketballplayer:15" | "basketballplayer:15" | "{{nebula.release}}" |
| "192.168.8.100" | 9779 | "ONLINE" | 0 | "No valid partition" | "No valid partition" | "{{nebula.release}}" |
+-----------------+------+----------+--------------+-----------------------+------------------------+----------------------+
+-----------------+------+----------+--------------+-----------------------+------------------------+-------------+
```

!!! note
Expand All @@ -127,12 +191,11 @@ nebula> SHOW HOSTS;

{{ ent.ent_end }}


## Balance leader distribution

To balance the raft leaders, run `SUBMIT JOB BALANCE LEADER`. It will start a job to balance the distribution of all the storage leaders in all graph spaces.
To balance the raft leaders, run `SUBMIT JOB BALANCE LEADER`. It starts a job to balance the distribution of all the storage leaders in all graph spaces.

### Example
For example, to balance the leader distribution, run the following command.

```ngql
nebula> SUBMIT JOB BALANCE LEADER;
Expand All @@ -142,15 +205,16 @@ Run `SHOW HOSTS` to check the balance result.

```ngql
nebula> SHOW HOSTS;
+------------------+------+----------+--------------+-----------------------------------+------------------------+----------------------+
| Host | Port | Status | Leader count | Leader distribution | Partition distribution | Version |
+------------------+------+----------+--------------+-----------------------------------+------------------------+----------------------+
+------------------+------+----------+--------------+-----------------------------------+------------------------+-------------+
| Host | Port | Status | Leader count | Leader distribution | Partition distribution | Version |
+------------------+------+----------+--------------+-----------------------------------+------------------------+-------------+
| "192.168.10.100" | 9779 | "ONLINE" | 4 | "basketballplayer:4" | "basketballplayer:8" | "{{nebula.release}}" |
| "192.168.10.101" | 9779 | "ONLINE" | 8 | "basketballplayer:3" | "basketballplayer:8" | "{{nebula.release}}" |
| "192.168.10.102" | 9779 | "ONLINE" | 3 | "basketballplayer:3" | "basketballplayer:8" | "{{nebula.release}}" |
| "192.168.10.103" | 9779 | "ONLINE" | 0 | "basketballplayer:2" | "basketballplayer:7" | "{{nebula.release}}" |
| "192.168.10.104" | 9779 | "ONLINE" | 0 | "basketballplayer:2" | "basketballplayer:7" | "{{nebula.release}}" |
| "192.168.10.105" | 9779 | "ONLINE" | 0 | "basketballplayer:2" | "basketballplayer:7" | "{{nebula.release}}" |
+------------------+------+----------+--------------+-----------------------------------+------------------------+----------------------+
+------------------+------+----------+--------------+-----------------------------------+------------------------+-------------+
```

!!! caution
Expand Down

0 comments on commit 939487c

Please sign in to comment.