Skip to content

Commit

Permalink
Merge remote-tracking branch 'upstream/master'
Browse files Browse the repository at this point in the history
  • Loading branch information
qiancai committed Aug 15, 2024
2 parents 05c9465 + 7e070f2 commit 2672c37
Show file tree
Hide file tree
Showing 12 changed files with 110 additions and 65 deletions.
20 changes: 2 additions & 18 deletions accelerated-table-creation.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,9 @@ aliases: ['/tidb/dev/ddl-v2/']

TiDB v7.6.0 introduces the system variable [`tidb_ddl_version`](https://docs.pingcap.com/tidb/v7.6/system-variables#tidb_enable_fast_create_table-new-in-v800) to support accelerating table creation, which improves the efficiency of bulk table creation. Starting from v8.0.0, this system variable is renamed to [`tidb_enable_fast_create_table`](/system-variables.md#tidb_enable_fast_create_table-new-in-v800).

TiDB uses the online asynchronous schema change algorithm to change the metadata. All DDL jobs are submitted to the `mysql.tidb_ddl_job` table, and the owner node pulls the DDL job to execute. After executing each phase of the online DDL algorithm, the DDL job is marked as completed and moved to the `mysql.tidb_ddl_history` table. Therefore, DDL statements can only be executed on the owner node and cannot be linearly extended.
When accelerated table creation is enabled via [`tidb_enable_fast_create_table`](/system-variables.md#tidb_enable_fast_create_table-new-in-v800), table creation statements with the same schema committed to the same TiDB node at the same time are merged into batch table creation statements to improve table creation performance. Therefore, to improve the table creation performance, try to connect to the same TiDB node, create tables with the same schema concurrently, and increase the concurrency appropriately.

However, for some DDL statements, it is not necessary to strictly follow the online DDL algorithm. For example, the `CREATE TABLE` statement only has two states for the job: `none` and `public`. Therefore, TiDB can simplify the execution process of DDL, and executes the `CREATE TABLE` statement on a non-owner node to accelerate table creation.
The merged batch table creation statements are executed within the same transaction, so if one statement of them fails, all of them will fail.

> **Warning:**
>
Expand Down Expand Up @@ -39,19 +39,3 @@ To disable performance optimization for creating tables, set the value of this v
```sql
SET GLOBAL tidb_enable_fast_create_table = OFF;
```

## Implementation principle

The detailed implementation principle of performance optimization for table creation is as follows:

1. Create a `CREATE TABLE` Job.

The corresponding DDL Job is generated by parsing the `CREATE TABLE` statement.

2. Execute the `CREATE TABLE` job.

The TiDB node that receives the `CREATE TABLE` statement executes it directly, and then persists the table structure to TiKV. At the same time, the `CREATE TABLE` job is marked as completed and inserted into the `mysql.tidb_ddl_history` table.

3. Synchronize the table information.

TiDB notifies other nodes to synchronize the newly created table structure.
26 changes: 25 additions & 1 deletion auto-random.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ When you execute an `INSERT` statement:
- If you do not explicitly specify the value of the `AUTO_RANDOM` column, TiDB generates a random value and inserts it into the table.

```sql
tidb> CREATE TABLE t (a BIGINT PRIMARY KEY AUTO_RANDOM, b VARCHAR(255));
tidb> CREATE TABLE t (a BIGINT PRIMARY KEY AUTO_RANDOM, b VARCHAR(255)) /*T! PRE_SPLIT_REGIONS=2 */ ;
Query OK, 0 rows affected, 1 warning (0.01 sec)

tidb> INSERT INTO t(a, b) VALUES (1, 'string');
Expand Down Expand Up @@ -76,6 +76,29 @@ tidb> SELECT * FROM t;
| 4899916394579099651 | string3 |
+---------------------+---------+
3 rows in set (0.00 sec)

tidb> SHOW CREATE TABLE t;
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| t | CREATE TABLE `t` (
`a` bigint(20) NOT NULL /*T![auto_rand] AUTO_RANDOM(5) */,
`b` varchar(255) DEFAULT NULL,
PRIMARY KEY (`a`) /*T![clustered_index] CLUSTERED */
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin /*T! PRE_SPLIT_REGIONS=2 */ |
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

tidb> SHOW TABLE t REGIONS;
+-----------+-----------------------------+-----------------------------+-----------+-----------------+---------------------+------------+---------------+------------+----------------------+------------------+------------------------+------------------+
| REGION_ID | START_KEY | END_KEY | LEADER_ID | LEADER_STORE_ID | PEERS | SCATTERING | WRITTEN_BYTES | READ_BYTES | APPROXIMATE_SIZE(MB) | APPROXIMATE_KEYS | SCHEDULING_CONSTRAINTS | SCHEDULING_STATE |
+-----------+-----------------------------+-----------------------------+-----------+-----------------+---------------------+------------+---------------+------------+----------------------+------------------+------------------------+------------------+
| 62798 | t_158_ | t_158_r_2305843009213693952 | 62810 | 28 | 62811, 62812, 62810 | 0 | 151 | 0 | 1 | 0 | | |
| 62802 | t_158_r_2305843009213693952 | t_158_r_4611686018427387904 | 62803 | 1 | 62803, 62804, 62805 | 0 | 39 | 0 | 1 | 0 | | |
| 62806 | t_158_r_4611686018427387904 | t_158_r_6917529027641081856 | 62813 | 4 | 62813, 62814, 62815 | 0 | 160 | 0 | 1 | 0 | | |
| 9289 | t_158_r_6917529027641081856 | 78000000 | 48268 | 1 | 48268, 58951, 62791 | 0 | 10628 | 43639 | 2 | 7999 | | |
+-----------+-----------------------------+-----------------------------+-----------+-----------------+---------------------+------------+---------------+------------+----------------------+------------------+------------------------+------------------+
4 rows in set (0.00 sec)
```

The `AUTO_RANDOM(S, R)` column value automatically assigned by TiDB has a total of 64 bits:
Expand All @@ -101,6 +124,7 @@ The structure of an `AUTO_RANDOM` value without a signed bit is as follows:
- The content of the shard bits is obtained by calculating the hash value of the starting time of the current transaction. To use a different length of shard bits (such as 10), you can specify `AUTO_RANDOM(10)` when creating the table.
- The value of the auto-increment bits is stored in the storage engine and allocated sequentially. Each time a new value is allocated, the value is incremented by 1. The auto-increment bits ensure that the values of `AUTO_RANDOM` are unique globally. When the auto-increment bits are exhausted, an error `Failed to read auto-increment value from storage engine` is reported when the value is allocated again.
- Value range: the maximum number of bits for the final generated value = shard bits + auto-increment bits. The range of a signed column is `[-(2^(R-1))+1, (2^(R-1))-1]`, and the range of an unsigned column is `[0, (2^R)-1]`.
- You can use `AUTO_RANDOM` with `PRE_SPLIT_REGIONS`. When a table is created successfully, `PRE_SPLIT_REGIONS` pre-splits data in the table into the number of Regions as specified by `2^(PRE_SPLIT_REGIONS)`.

> **Note:**
>
Expand Down
40 changes: 31 additions & 9 deletions log-redaction.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,14 +9,12 @@ When TiDB provides detailed log information, it might print sensitive data (for

## Log redaction in TiDB side

To enable log redaction in the TiDB side, set the value of [`global.tidb_redact_log`](/system-variables.md#tidb_redact_log) to `1`. This configuration value defaults to `0`, which means that log redaction is disabled.
To enable log redaction in the TiDB side, set the value of [`global.tidb_redact_log`](/system-variables.md#tidb_redact_log) to `ON` or `MARKER`. This configuration value defaults to `OFF`, which means that log redaction is disabled.

You can use the `set` syntax to set the global variable `tidb_redact_log`:

{{< copyable "sql" >}}

```sql
set @@global.tidb_redact_log=1;
set @@global.tidb_redact_log = ON;
```

After the setting, all logs generated in new sessions are redacted:
Expand All @@ -32,19 +30,43 @@ ERROR 1062 (23000): Duplicate entry '1' for key 't.a'
The error log for the `INSERT` statement above is printed as follows:

```
[2020/10/20 11:45:49.539 +08:00] [INFO] [conn.go:800] ["command dispatched failed"] [conn=5] [connInfo="id:5, addr:127.0.0.1:57222 status:10, collation:utf8_general_ci, user:root"] [command=Query] [status="inTxn:0, autocommit:1"] [sql="insert into t values ( ? ) , ( ? )"] [txn_mode=OPTIMISTIC] [err="[kv:1062]Duplicate entry '?' for key 't.a'"]
[2024/07/02 11:35:32.686 +08:00] [INFO] [conn.go:1146] ["command dispatched failed"] [conn=1482686470] [session_alias=] [connInfo="id:1482686470, addr:127.0.0.1:52258 status:10, collation:utf8mb4_0900_ai_ci, user:root"] [command=Query] [status="inTxn:0, autocommit:1"] [sql="insert into `t` values ( ... )"] [txn_mode=PESSIMISTIC] [timestamp=450859193514065921] [err="[kv:1062]Duplicate entry '?' for key 't.a'"]
```

From the preceding error log, you can see that when the value of `tidb_redact_log` is set to `ON`, sensitive information is replaced by the `?` mark in the TiDB log to avoid data security risks.

In addition, TiDB provides the `MARKER` option. When the value of `tidb_redact_log` is set to `MARKER`, TiDB marks sensitive information in the log with `‹›` instead of replacing it directly, so you can customize the redaction rules.

```sql
set @@global.tidb_redact_log = MARKER;
```

After the preceding configuration, the sensitive information is marked rather than replaced in all logs generated by new sessions:

```sql
create table t (a int, unique key (a));
Query OK, 0 rows affected (0.07 sec)

insert into t values (1),(1);
ERROR 1062 (23000): Duplicate entry '‹1›' for key 't.a'
```

The error log is as follows:

```
[2024/07/02 11:35:01.426 +08:00] [INFO] [conn.go:1146] ["command dispatched failed"] [conn=1482686470] [session_alias=] [connInfo="id:1482686470, addr:127.0.0.1:52258 status:10, collation:utf8mb4_0900_ai_ci, user:root"] [command=Query] [status="inTxn:0, autocommit:1"] [sql="insert into `t` values ( ‹1› ) , ( ‹1› )"] [txn_mode=PESSIMISTIC] [timestamp=450859185309483010] [err="[kv:1062]Duplicate entry '‹1›' for key 't.a'"]
```

From the error log above, you can see that all sensitive information is shielded using `?` after `tidb_redact_log` is enabled. In this way, data security risks are avoided.
As you can see from the preceding error log, after you set `tidb_redact_log` to `MARKER`, TiDB marks sensitive information using `‹ ›` in the log. You can customize redaction rules to handle sensitive information in the log as needed.

## Log redaction in TiKV side

To enable log redaction in the TiKV side, set the value of [`security.redact-info-log`](/tikv-configuration-file.md#redact-info-log-new-in-v408) to `true`. This configuration value defaults to `false`, which means that log redaction is disabled.
To enable log redaction in the TiKV side, set the value of [`security.redact-info-log`](/tikv-configuration-file.md#redact-info-log-new-in-v408) to `true` or `"marker"`. This configuration value defaults to `false`, which means that log redaction is disabled.

## Log redaction in PD side

To enable log redaction in the PD side, set the value of [`security.redact-info-log`](/pd-configuration-file.md#redact-info-log-new-in-v50) to `true`. This configuration value defaults to `false`, which means that log redaction is disabled.
To enable log redaction in the PD side, set the value of [`security.redact-info-log`](/pd-configuration-file.md#redact-info-log-new-in-v50) to `true` or `"marker"`. This configuration value defaults to `false`, which means that log redaction is disabled.

## Log redaction in TiFlash side

To enable log redaction in the TiFlash side, set both the [`security.redact_info_log`](/tiflash/tiflash-configuration.md#configure-the-tiflashtoml-file) value in tiflash-server and the [`security.redact-info-log`](/tiflash/tiflash-configuration.md#configure-the-tiflash-learnertoml-file) value in tiflash-learner to `true`. Both configuration values default to `false`, which means that log redaction is disabled.
To enable log redaction in the TiFlash side, set both the [`security.redact_info_log`](/tiflash/tiflash-configuration.md#configure-the-tiflashtoml-file) value in tiflash-server and the [`security.redact-info-log`](/tiflash/tiflash-configuration.md#configure-the-tiflash-learnertoml-file) value in tiflash-learner to `true` or `"marker"`. Both configuration values default to `false`, which means that log redaction is disabled.
3 changes: 2 additions & 1 deletion pd-configuration-file.md
Original file line number Diff line number Diff line change
Expand Up @@ -198,8 +198,9 @@ Configuration items related to security
### `redact-info-log` <span class="version-mark">New in v5.0</span>
+ Controls whether to enable log redaction in the PD log
+ When you set the configuration value to `true`, user data is redacted in the PD log.
+ Optional value: `false`, `true`, `"marker"`
+ Default value: `false`
+ For details on how to use it, see [Log redaction in PD side](/log-redaction.md#log-redaction-in-pd-side).
## `log`
Expand Down
35 changes: 23 additions & 12 deletions scripts/release_notes_update_pr_author_info_add_dup.py
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ def store_exst_rn(ext_path, version):
else:
return 0

def get_pr_info_from_github(cp_pr_link,cp_pr_title, current_pr_author):
def get_pr_info_from_github(row_number, cp_pr_link,cp_pr_title, current_pr_author):

g = Github(access_token, timeout=30)# Create a Github object with the access token
target_pr_number_existence = 1
Expand Down Expand Up @@ -103,9 +103,10 @@ def get_pr_info_from_github(cp_pr_link,cp_pr_title, current_pr_author):
pr_obj = repo_obj.get_pull(int(target_pr_number))# Get the pull request object
pr_author = pr_obj.user.login # Get the author of the pull request
except:
print("Failed to get the original PR information for this PR: " + cp_pr_link)
print(f"Row {row_number}: failed to find the non-bot author for this PR ({cp_pr_link}) created by {current_pr_author}.\n")
else:
pr_author = current_pr_author # Use the current author if the cherry-pick PR cannot be found
print(f"Row {row_number}: failed to find the non-bot author for this PR ({cp_pr_link}) created by {current_pr_author}.\n")

return(pr_author)

Expand Down Expand Up @@ -135,14 +136,24 @@ def update_pr_author_and_release_notes(excel_path):
# If pr_author is ti-chi-bot or ti-srebot
current_pr_author = row[pr_author_index]
current_formated_rn= row[pr_formated_rn_index]
pr_response = requests.get(row[pr_link_index])
if (current_pr_author in ['ti-chi-bot', 'ti-srebot']) and (pr_response.status_code == 200):
print ("Replacing the author info for row " + str(row_index) + ".")
actual_pr_author = get_pr_info_from_github(row[pr_link_index], row[pr_title_index], current_pr_author) # Get the PR author according to the cherry-pick PR
pr_author_cell = sheet.cell(row=row_index, column=pr_author_index+1, value = actual_pr_author)#Fill in the pr_author_cell
updated_formated_rn = current_formated_rn.replace("[{}](https://github.com/{}".format(current_pr_author, current_pr_author),"[{}](https://github.com/{}".format(actual_pr_author, actual_pr_author))
formated_release_note_cell = sheet.cell(row=row_index, column=pr_formated_rn_index+1, value = updated_formated_rn) # Fill in the formated_release_note_cell
current_pr_author = actual_pr_author

if (current_pr_author in ['ti-chi-bot', 'ti-srebot']):
try:
actual_pr_author = get_pr_info_from_github(str(row_index), row[pr_link_index], row[pr_title_index], current_pr_author) # Get the PR author according to the cherry-pick PR
if actual_pr_author != current_pr_author:
print ("Replacing the author info for row " + str(row_index) + ".")
pr_author_cell = sheet.cell(row=row_index, column=pr_author_index+1, value = actual_pr_author)#Fill in the pr_author_cell
updated_formated_rn = current_formated_rn.replace("[{}](https://github.com/{}".format(current_pr_author, current_pr_author),"[{}](https://github.com/{}".format(actual_pr_author, actual_pr_author))
formated_release_note_cell = sheet.cell(row=row_index, column=pr_formated_rn_index+1, value = updated_formated_rn) # Fill in the formated_release_note_cell
current_pr_author = actual_pr_author
else: # Do nothing if non-bot author is not found.
pass
except:
pr_response = requests.get(row[pr_link_index])
if pr_response.status_code != 200:
print (f"\nRow {str(row_index)}: failed to find the non-bot author for this PR ({row[pr_link_index]}) because this link cannot be accessed now.")
else:
print (f"\nRow {str(row_index)}: failed to find the non-bot author for this PR ({row[pr_link_index]}).")
else:
pass

Expand Down Expand Up @@ -232,12 +243,12 @@ def create_release_file(version, dup_notes_levels, dup_notes):
file.seek(0)
file.write(content)
file.truncate()
print(f'The v{version} release note is now created in the following directory: \n {release_file}')
print(f'\nThe v{version} release note is now created in the following directory: \n {release_file}')

if __name__ == '__main__':
note_pairs = store_exst_rn(ext_path, version)
dup_notes, dup_notes_levels = update_pr_author_and_release_notes(release_note_excel)
print ("The bot author info in the excel is now replaced with the actual authors.")
print ("\nThe bot author info in the excel is now replaced with the actual authors.")
version_parts = version.split('.')
if len(version_parts) >= 2:
create_release_file(version, dup_notes_levels, dup_notes)
4 changes: 0 additions & 4 deletions sql-statements/sql-statement-admin-bdr-role.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,6 @@ summary: An overview of the usage of ADMIN [SET|SHOW|UNSET] BDR ROLE for the TiD
- Use `ADMIN SHOW BDR ROLE` to show the BDR role of the cluster.
- Use `ADMIN UNSET BDR ROLE` to unset the BDR role of the cluster.

> **Warning:**
>
> This feature is experimental. It is not recommended that you use it in the production environment. This feature might be changed or removed without prior notice. If you find a bug, you can report an [issue](https://github.com/pingcap/tidb/issues) on GitHub.
## Synopsis

```ebnf+diagram
Expand Down
Loading

0 comments on commit 2672c37

Please sign in to comment.