Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dumpling: add URI formats #15165

Merged
merged 16 commits into from
Oct 27, 2023
Merged
1 change: 1 addition & 0 deletions TOC.md
Original file line number Diff line number Diff line change
Expand Up @@ -995,6 +995,7 @@
- [Error Codes](/error-codes.md)
- [Table Filter](/table-filter.md)
- [Schedule Replicas by Topology Labels](/schedule-replicas-by-topology-labels.md)
- [URI Formats of External Storage Services](/external-storage-uri.md)
- Internal Components
- [TiDB Backend Task Distributed Execution Framework](/tidb-distributed-execution-framework.md)
- [TiDB Global Sort](/tidb-global-sort.md)
Expand Down
2 changes: 1 addition & 1 deletion best-practices/pd-scheduling-best-practices.md
Original file line number Diff line number Diff line change
Expand Up @@ -229,7 +229,7 @@ This scenario requires examining the generation and execution of operators throu

If operators are successfully generated but the scheduling process is slow, possible reasons are:

- The scheduling speed is limited by default. You can adjust `leader-schedule-limit` or `replica-schedule-limit` to larger value.s Similarly, you can consider loosening the limits on `max-pending-peer-count` and `max-snapshot-count`.
- The scheduling speed is limited by default. You can adjust `leader-schedule-limit` or `replica-schedule-limit` to larger values. Similarly, you can consider loosening the limits on `max-pending-peer-count` and `max-snapshot-count`.
- Other scheduling tasks are running concurrently and racing for resources in the system. You can refer to the solution in [Leaders/regions are not evenly distributed](#leadersregions-are-not-evenly-distributed).
- When you take a single node offline, a number of region leaders to be processed (around 1/3 under the configuration of 3 replicas) are distributed on the node to remove. Therefore, the speed is limited by the speed at which snapshots are generated by this single node. You can speed it up by manually adding an `evict-leader-scheduler` to migrate leaders.

Expand Down
2 changes: 1 addition & 1 deletion br/backup-and-restore-overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,7 @@ Corresponding to the backup features, you can perform two types of restore: full

TiDB supports backing up data to Amazon S3, Google Cloud Storage (GCS), Azure Blob Storage, NFS, and other S3-compatible file storage services. For details, see the following documents:

- [Specify backup storage in URI](/br/backup-and-restore-storages.md#uri-format)
- [Specify backup storage in URI](/external-storage-uri.md)
- [Configure access privileges to backup storages](/br/backup-and-restore-storages.md#authentication)

## Compatibility
Expand Down
2 changes: 1 addition & 1 deletion br/br-incremental-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ tiup br backup full --pd "${PD_IP}:2379" \

- `--lastbackupts`: The last backup timestamp.
- `--ratelimit`: The maximum speed **per TiKV** performing backup tasks (in MiB/s).
- `storage`: The storage path of backup data. You need to save the incremental backup data under a different path from the previous snapshot backup. In the preceding example, incremental backup data is saved in the `incr` directory under the full backup data. For details, see [Backup storage URI configuration](/br/backup-and-restore-storages.md#uri-format).
- `storage`: The storage path of backup data. You need to save the incremental backup data under a different path from the previous snapshot backup. In the preceding example, incremental backup data is saved in the `incr` directory under the full backup data. For details, see [URI Formats of External Storage Services](/external-storage-uri.md).

## Restore incremental data

Expand Down
10 changes: 5 additions & 5 deletions br/br-pitr-manual.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ The example output only shows the common parameters. These parameters are descri
- `task-name`: specifies the task name for the log backup. This name is also used to query, pause, and resume the backup task.
- `--ca`, `--cert`, `--key`: specifies the mTLS encryption method to communicate with TiKV and PD.
- `--pd`: specifies the PD address for the backup cluster. BR needs to access PD to start the log backup task.
- `--storage`: specifies the backup storage address. Currently, BR supports Amazon S3, Google Cloud Storage (GCS), or Azure Blob Storage as the storage for log backup. The preceding command uses Amazon S3 as an example. For details, see [URI format of backup storages](/br/backup-and-restore-storages.md#uri-format).
- `--storage`: specifies the backup storage address. Currently, BR supports Amazon S3, Google Cloud Storage (GCS), or Azure Blob Storage as the storage for log backup. The preceding command uses Amazon S3 as an example. For details, see [URI Formats of External Storage Services](/external-storage-uri.md).

Usage example:

Expand Down Expand Up @@ -284,7 +284,7 @@ This command only accesses the backup storage and does not access the TiDB clust

- `--dry-run`: run the command but do not really delete the files.
- `--until`: delete all log backup data before the specified timestamp.
- `--storage`: the backup storage address. Currently, BR supports Amazon S3, GCS, or Azure Blob Storage as the storage for log backup. For details, see [URI format of backup storages](/br/backup-and-restore-storages.md#uri-format).
- `--storage`: the backup storage address. Currently, BR supports Amazon S3, GCS, or Azure Blob Storage as the storage for log backup. For details, see [URI Formats of External Storage Services](/external-storage-uri.md).

Usage example:

Expand Down Expand Up @@ -325,7 +325,7 @@ Global Flags:

This command only accesses the backup storage and does not access the TiDB cluster.

The `--storage` parameter is used to specify the backup storage address. Currently, BR supports Amazon S3, GCS, or Azure Blob Storage as the storage for log backup. For details, see [URI format of backup storages](/br/backup-and-restore-storages.md#uri-format).
The `--storage` parameter is used to specify the backup storage address. Currently, BR supports Amazon S3, GCS, or Azure Blob Storage as the storage for log backup. For details, see [URI Formats of External Storage Services](/external-storage-uri.md).

Usage example:

Expand Down Expand Up @@ -369,12 +369,12 @@ Global Flags:

The example output only shows the common parameters. These parameters are described as follows:

- `--full-backup-storage`: the storage address for the snapshot (full) backup. To use PITR, specify this parameter and choose the latest snapshot backup before the restore timestamp. To restore only log backup data, you can omit this parameter. Currently, BR supports Amazon S3, GCS, or Azure Blob Storage as the storage for log backup. For details, see [URI format of backup storages](/br/backup-and-restore-storages.md#uri-format).
- `--full-backup-storage`: the storage address for the snapshot (full) backup. To use PITR, specify this parameter and choose the latest snapshot backup before the restore timestamp. To restore only log backup data, you can omit this parameter. Currently, BR supports Amazon S3, GCS, or Azure Blob Storage as the storage for log backup. For details, see [URI Formats of External Storage Services](/external-storage-uri.md).
- `--restored-ts`: the timestamp that you want to restore data to. If this parameter is not specified, BR restores data to the latest timestamp available in the log backup, that is, the checkpoint of the backup data.
- `--start-ts`: the start timestamp that you want to restore log backup data from. If you only need to restore log backup data, you must specify this parameter.
- `--pd`: the PD address of the restore cluster.
- `--ca`, `--cert`, `--key`: specify the mTLS encryption method to communicate with TiKV and PD.
- `--storage`: the storage address for the log backup. Currently, BR supports Amazon S3, GCS, or Azure Blob Storage as the storage for log backup. For details, see [URI format of backup storages](/br/backup-and-restore-storages.md#uri-format).
- `--storage`: the storage address for the log backup. Currently, BR supports Amazon S3, GCS, or Azure Blob Storage as the storage for log backup. For details, see [URI Formats of External Storage Services](/external-storage-uri.md).

Usage example:

Expand Down
2 changes: 1 addition & 1 deletion br/br-snapshot-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ tiup br backup full --pd "${PD_IP}:2379" \
In the preceding command:

- `--backupts`: The time point of the snapshot. The format can be [TSO](/glossary.md#tso) or timestamp, such as `400036290571534337` or `2018-05-11 01:42:23`. If the data of this snapshot is garbage collected, the `br backup` command returns an error and `br` exits. If you leave this parameter unspecified, `br` picks the snapshot corresponding to the backup start time.
- `--storage`: The storage address of the backup data. Snapshot backup supports Amazon S3, Google Cloud Storage, and Azure Blob Storage as backup storage. The preceding command uses Amazon S3 as an example. For more details, see [URI format of backup storages](/br/backup-and-restore-storages.md#uri-format).
- `--storage`: The storage address of the backup data. Snapshot backup supports Amazon S3, Google Cloud Storage, and Azure Blob Storage as backup storage. The preceding command uses Amazon S3 as an example. For more details, see [URI Formats of External Storage Services](/external-storage-uri.md).
- `--ratelimit`: The maximum speed **per TiKV** performing backup tasks. The unit is in MiB/s.

During backup, a progress bar is displayed in the terminal as shown below. When the progress bar advances to 100%, the backup task is completed and statistics such as total backup time, average backup speed, and backup data size are displayed.
Expand Down
2 changes: 1 addition & 1 deletion br/use-br-command-line-tool.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ A `br` command consists of multiple layers of sub-commands. Currently, br comman
### Common options

* `--pd`: specifies the PD service address. For example, `"${PD_IP}:2379"`.
* `-s` (or `--storage`): specifies the path where the backup files are stored. Amazon S3, Google Cloud Storage (GCS), Azure Blob Storage, and NFS are supported to store backup data. For more details, refer to [URI format of backup storages](/br/backup-and-restore-storages.md#uri-format).
* `-s` (or `--storage`): specifies the path where the backup files are stored. Amazon S3, Google Cloud Storage (GCS), Azure Blob Storage, and NFS are supported to store backup data. For more details, refer to [URI Formats of External Storage Services](/external-storage-uri.md).
* `--ca`: specifies the path to the trusted CA certificate in the PEM format.
* `--cert`: specifies the path to the SSL certificate in the PEM format.
* `--key`: specifies the path to the SSL certificate key in the PEM format.
Expand Down
42 changes: 13 additions & 29 deletions dumpling-overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,19 +95,7 @@ dumpling -u root -P 4000 -h 127.0.0.1 --filetype sql -t 8 -o /tmp/test -r 200000
In the command above:

+ The `-h`, `-P`, and `-u` option respectively mean the address, the port, and the user. If a password is required for authentication, you can use `-p $YOUR_SECRET_PASSWORD` to pass the password to Dumpling.

<CustomContent platform="tidb">

+ The `-o` (or `--output`) option specifies the export directory of the storage, which supports an absolute local file path or an [external storage URI](/br/backup-and-restore-storages.md#uri-format).

</CustomContent>

<CustomContent platform="tidb-cloud">

+ The `-o` (or `--output`) option specifies the export directory of the storage, which supports an absolute local file path or an [external storage URI](https://docs.pingcap.com/tidb/stable/backup-and-restore-storages#uri-format).

</CustomContent>

+ The `-o` (or `--output`) option specifies the export directory of the storage, which supports an absolute local file path or an [external storage URI](/external-storage-uri.md).
+ The `-t` option specifies the number of threads for the export. Increasing the number of threads improves the concurrency of Dumpling and the export speed, and also increases the database's memory consumption. Therefore, it is not recommended to set the number too large. Usually, it's less than 64.
+ The `-r` option enables the in-table concurrency to speed up the export. The default value is `0`, which means disabled. A value greater than 0 means it is enabled, and the value is of `INT` type. When the source database is TiDB, a `-r` value greater than 0 indicates that the TiDB region information is used for splitting, and reduces the memory usage. The specific `-r` value does not affect the split algorithm. When the source database is MySQL and the primary key is of the `INT` type, specifying `-r` can also enable the in-table concurrency.
+ The `-F` option is used to specify the maximum size of a single file (the unit here is `MiB`; inputs like `5GiB` or `8KB` are also acceptable). It is recommended to keep its value to 256 MiB or less if you plan to use TiDB Lightning to load this file into a TiDB instance.
Expand All @@ -116,6 +104,16 @@ In the command above:
>
> If the size of a single exported table exceeds 10 GB, it is **strongly recommended to use** the `-r` and `-F` options.

#### URI formats of the storage services

This section describes the URI formats of the storage services, including Amazon S3, GCS, and Azure Blob Storage. The URI format is as follows:

```shell
[scheme]://[host]/[path]?[parameters]
```

For more information, see [URI Formats of External Storage Services](/external-storage-uri.md).

### Export to CSV files

You can export data to CSV files by adding the `--filetype csv` argument.
Expand Down Expand Up @@ -233,19 +231,7 @@ export AWS_ACCESS_KEY_ID=${AccessKey}
export AWS_SECRET_ACCESS_KEY=${SecretKey}
```

<CustomContent platform="tidb">

Dumpling also supports reading credential files from `~/.aws/credentials`. Parameters for exporting data to Amazon S3 using Dumpling are the same as the parameters used in BR. For more parameter descriptions, see [external storage URI](/br/backup-and-restore-storages.md#uri-format).

</CustomContent>

<CustomContent platform="tidb-cloud">

Dumpling also supports reading credential files from `~/.aws/credentials`. Parameters for exporting data to Amazon S3 using Dumpling are the same as the parameters used in BR. For more parameter descriptions, see [external storage URI](https://docs.pingcap.com/tidb/stable/backup-and-restore-storages#uri-format).

</CustomContent>

{{< copyable "shell-regular" >}}
Dumpling also supports reading credential files from `~/.aws/credentials`. For more information about URI parameter descriptions, see [URI Formats of External Storage Services](/external-storage-uri.md).

```shell
./dumpling -u root -P 4000 -h 127.0.0.1 -r 200000 -o "s3://${Bucket}/${Folder}"
Expand All @@ -257,8 +243,6 @@ Dumpling also supports reading credential files from `~/.aws/credentials`. Param

By default, Dumpling exports all databases except system databases (including `mysql`, `sys`, `INFORMATION_SCHEMA`, `PERFORMANCE_SCHEMA`, `METRICS_SCHEMA`, and `INSPECTION_SCHEMA`). You can use `--where <SQL where expression>` to select the records to be exported.

{{< copyable "shell-regular" >}}

```shell
./dumpling -u root -P 4000 -h 127.0.0.1 -o /tmp/test --where "id < 100"
```
Expand Down Expand Up @@ -402,7 +386,7 @@ SET GLOBAL tidb_gc_life_time = '10m';
| `-s` or `--statement-size` | Control the size of the `INSERT` statements; the unit is bytes |
| `-F` or `--filesize` | The file size of the divided tables. The unit must be specified such as `128B`, `64KiB`, `32MiB`, and `1.5GiB`. |
| `--filetype` | Exported file type (csv/sql) | "sql" |
| `-o` or `--output` | Specify the absolute local file path or [external storage URI](https://docs.pingcap.com/tidb/stable/backup-and-restore-storages#uri-format) for exporting the data. | "./export-${time}" |
| `-o` or `--output` | Specify the absolute local file path or [external storage URI](/external-storage-uri.md) for exporting the data. | "./export-${time}" |
| `-S` or `--sql` | Export data according to the specified SQL statement. This command does not support concurrent export. |
| `--consistency` | flush: use FTWRL before the dump <br/> snapshot: dump the TiDB data of a specific snapshot of a TSO <br/> lock: execute `lock tables read` on all tables to be dumped <br/> none: dump without adding locks, which cannot guarantee consistency <br/> auto: use --consistency flush for MySQL; use --consistency snapshot for TiDB | "auto" |
| `--snapshot` | Snapshot TSO; valid only when `consistency=snapshot` |
Expand Down
2 changes: 1 addition & 1 deletion error-codes.md
Original file line number Diff line number Diff line change
Expand Up @@ -372,7 +372,7 @@ TiDB is compatible with the error codes in MySQL, and in most cases returns the

* Error Number: 8158

The provided path is invalid. Refer to the specific error message for actions. For Amazon S3 or GCS path settings, see [External storage](/br/backup-and-restore-storages.md#uri-format).
The provided path is invalid. Refer to the specific error message for actions. For Amazon S3 or GCS path settings, see [URI Formats of External Storage Services](/external-storage-uri.md).

* Error Number: 8159

Expand Down
Loading