Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enrich instructions for string functions INSTR() and LCASE() #16144

Merged
merged 14 commits into from
Feb 6, 2024
92 changes: 89 additions & 3 deletions functions-and-operators/string-functions.md
Original file line number Diff line number Diff line change
Expand Up @@ -163,11 +163,71 @@ Insert a substring at the specified position up to the specified number of chara

### [`INSTR()`](https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_instr)

Return the index of the first occurrence of substring.
The `INSTR(str, substr)` function is used to get the index of the first occurrence of the second given argument `substr` in the first given argument `str`. Each argument can be either a string or a number. This function is the same as the two-argument version of [`LOCATE(substr, str)`](/functions-and-operators/string-functions.md#locate), but with the order of the arguments reversed.
qiancai marked this conversation as resolved.
Show resolved Hide resolved

> **Note:**
>
> `INSTR(str, substr)` is case-sensitive by default, since [TiDB defaults to using binary collations](/character-set-and-collation.md), which differs from MySQL.

- If either argument is a number, the function treats the number as a string.
- If `substr` is not in `str`, the function returns 0; otherwise, it returns the index of the first occurrence of `substr` in `str`.
qiancai marked this conversation as resolved.
Show resolved Hide resolved
- If either argument is NULL, the function returns NULL.
qiancai marked this conversation as resolved.
Show resolved Hide resolved

Examples:

```sql
SELECT INSTR("pingcap.com", "tidb");

+------------------------------+
| INSTR("pingcap.com", "tidb") |
+------------------------------+
| 0 |
+------------------------------+
```

```sql
SELECT INSTR("pingcap.com/tidb", "tidb");

+-----------------------------------+
| INSTR("pingcap.com/tidb", "tidb") |
+-----------------------------------+
| 13 |
+-----------------------------------+
```

```sql
SELECT INSTR("pingcap.com/tidb", "TiDB");
Copy link
Contributor

@xzhangxian1008 xzhangxian1008 Feb 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which tidb version do you use? I tried it on my tidb cluster with tidb commit cac449b3370c5, and get different result.

mysql> SELECT INSTR("pingcap.com/tidb", "TiDB");
+-----------------------------------+
| INSTR("pingcap.com/tidb", "TiDB") |
+-----------------------------------+
|                                13 |
+-----------------------------------+

and we can get 0 with this sql

mysql> SELECT INSTR(_utf8mb4 "pingcap.com/tidb", _utf8mb4 "TiDB");
+-----------------------------------------------------+
| INSTR(_utf8mb4 "pingcap.com/tidb", _utf8mb4 "TiDB") |
+-----------------------------------------------------+
|                                                   0 |
+-----------------------------------------------------+

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mysql> SELECT TIDB_VERSION()\G
*************************** 1. row ***************************
TIDB_VERSION(): Release Version: v7.1.1-serverless
Edition: Community
Git Commit Hash: 14db49a2968b68ac4cf3e4a59db87ba30a97996d
Git Branch: release-7.1-serverless
UTC Build Time: 2024-01-29 10:05:53
GoVersion: go1.21.6
Race Enabled: false
TiKV Min Version: 6.1.0
Check Table Before Drop: false
Store: tikv
1 row in set (0.24 sec)

mysql> SHOW VARIABLES LIKE 'collation\_%';
+----------------------+-------------+
| Variable_name        | Value       |
+----------------------+-------------+
| collation_connection | utf8mb4_bin |
| collation_database   | utf8mb4_bin |
| collation_server     | utf8mb4_bin |
+----------------------+-------------+
3 rows in set (0.25 sec)

mysql> SELECT INSTR("pingcap.com/tidb", "TiDB");
+-----------------------------------+
| INSTR("pingcap.com/tidb", "TiDB") |
+-----------------------------------+
|                                 0 |
+-----------------------------------+
1 row in set (0.24 sec)

Copy link
Collaborator

@qiancai qiancai Feb 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jyf111, thanks for sharing the version info. Currently, the "master" branch of TiDB docs is for TiDB v7.6.0. Would you please try installing TiDB v7.6 according to this quick start guide? Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mysql> SELECT TIDB_VERSION()\G
*************************** 1. row ***************************
TIDB_VERSION(): Release Version: v7.6.0
Edition: Community
Git Commit Hash: 52794d985ba6325d75a714d4eaa0838d59425eb6
Git Branch: heads/refs/tags/v7.6.0
UTC Build Time: 2024-01-22 14:20:42
GoVersion: go1.21.5
Race Enabled: false
Check Table Before Drop: false
Store: tikv
1 row in set (0.01 sec)

mysql> SHOW VARIABLES LIKE 'collation\_%';
+----------------------+--------------------+
| Variable_name        | Value              |
+----------------------+--------------------+
| collation_connection | utf8mb4_0900_ai_ci |
| collation_database   | utf8mb4_bin        |
| collation_server     | utf8mb4_bin        |
+----------------------+--------------------+
3 rows in set (0.01 sec)

mysql> SELECT INSTR("pingcap.com/tidb", "TiDB");
+-----------------------------------+
| INSTR("pingcap.com/tidb", "TiDB") |
+-----------------------------------+
|                                13 |
+-----------------------------------+
1 row in set (0.00 sec)

Um.. But I don't know why collation_connection becomes utf8mb4_0900_ai_ci.

The latest doc still shows that the default value of collation_connection is utf8mb4_bin.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jyf111, thanks for checking the version.

According to @YangKeao (a TiDB developer), utf8mb4_0900_ai_ci was introduced in v7.4 to be compatible with MySQL 8.0 (pingcap/tidb#45650).

Starting from v7.4.0, the default collation of utf8mb4 on TiDB depends on the version of MySQL client, and @YangKeao will create a PR to clarify this issue in the doc.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great. I have updated the sentences related to case sensitivity and collations. PTAL


+-----------------------------------+
| INSTR("pingcap.com/tidb", "TiDB") |
+-----------------------------------+
| 0 |
+-----------------------------------+
```

```sql
SELECT INSTR("pingcap.com/tidb" COLLATE utf8mb4_general_ci, "TiDB");

+--------------------------------------------------------------+
| INSTR("pingcap.com/tidb" COLLATE utf8mb4_general_ci, "TiDB") |
+--------------------------------------------------------------+
| 13 |
+--------------------------------------------------------------+
```

```sql
SELECT INSTR(0123, "12");

+-------------------+
| INSTR(0123, "12") |
+-------------------+
| 1 |
+-------------------+
```

### [`LCASE()`](https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_lcase)

Synonym for `LOWER()`.
The `LCASE(str)` function is a synonym for [`LOWER(str)`](/functions-and-operators/string-functions.md#lower), which returns the given argument `str` with all characters changed to lowercase.
qiancai marked this conversation as resolved.
Show resolved Hide resolved

### [`LEFT()`](https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_left)

Expand All @@ -187,7 +247,33 @@ Return the position of the first occurrence of substring.

### [`LOWER()`](https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_lower)

Return the argument in lowercase.
The `LOWER(str)` function is used to convert the given argument `str` with all characters changed to lowercase. The argument can be either a string or a number.
qiancai marked this conversation as resolved.
Show resolved Hide resolved

- If the argument is a string, the function returns the string in lowercase.
- If the argument is a number, the function returns the number without leading zeros.
- If the argument is NULL, the function returns NULL.
qiancai marked this conversation as resolved.
Show resolved Hide resolved

Examples:

```sql
SELECT LOWER("TiDB");

+---------------+
| LOWER("TiDB") |
+---------------+
| tidb |
+---------------+
```

```sql
SELECT LOWER(-012);

+-------------+
| LOWER(-012) |
+-------------+
| -12 |
+-------------+
```

### [`LPAD()`](https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_lpad)

Expand Down
Loading