diff --git a/docs/sql/functions/char.md b/docs/sql/functions/char.md index fa4f5f52af3..367f31ba373 100644 --- a/docs/sql/functions/char.md +++ b/docs/sql/functions/char.md @@ -4,82 +4,82 @@ title: Text Functions --- This section describes functions and operators for examining and manipulating string values. `␣` denotes a space character. -| Function | Description | Example | Result | -|:--|:--|:---|:--| -| *`string`* `^@` *`search_string`* | Alias for `starts_with`. | `'abc' ^@ 'a'` | `true` | -| *`string`* `||` *`string`* | String concatenation | `'Duck' || 'DB'` | `DuckDB` | -| *`string`*`[`*`index`*`]` | Alias for `array_extract`. | `'DuckDB'[4]` | `'k'` | -| *`string`*`[`*`begin`*`:`*`end`*`]` | Alias for `array_slice`. Missing arguments are interpreted as `NULL`s. | `'DuckDB'[:4]` | `'Duck'` | -| `array_extract(`*`list`*`, `*`index`*`)` | Extract a single character using a (1-based) index. | `array_extract('DuckDB', 2)` | `'u'` | -| `array_slice(`*`list`*`, `*`begin`*`, `*`end`*`)` | Extract a string using slice conventions. `NULL`s are interpreted as the bounds of the string. Negative values are accepted. | `array_slice('DuckDB', 5, NULL)` | `'DB'` | -| `ascii(`*`string`*`)`| Returns an integer that represents the Unicode code point of the first character of the *string* | `ascii('Ω')` | `937` | -| `bar(`*`x`*`, `*`min`*`, `*`max`*`[, `*`width`*`])` | Draw a band whose width is proportional to (*x* - *min*) and equal to *width* characters when *x* = *max*. *width* defaults to 80. | `bar(5, 0, 20, 10)` | `██▌` | -| `base64(`*`blob`*`)`| Convert a blob to a base64 encoded string. Alias of to_base64. | `base64('A'::blob)` | `'QQ=='` | -| `bit_length(`*`string`*`)`| Number of bits in a string. | `bit_length('abc')` | `24` | -| `chr(`*`x`*`)` | returns a character which is corresponding the ASCII code value or Unicode code point | `chr(65)` | A | -| `concat(`*`string`*`, ...)` | Concatenate many strings together | `concat('Hello', ' ', 'World')` | `Hello World` | -| `concat_ws(`*`separator`*`, `*`string`*`, ...)` | Concatenate strings together separated by the specified separator | `concat_ws(', ', 'Banana', 'Apple', 'Melon')` | `Banana,Apple,Melon` | -| `contains(`*`string`*`, `*`search_string`*`)` | Return true if *search_string* is found within *string* | `contains('abc', 'a')` | `true` | -| `format(`*`format`*`, `*`parameters`*`...)` | Formats a string using fmt syntax | `format('Benchmark "{}" took {} seconds', 'CSV', 42)` | `Benchmark "CSV" took 42 seconds` | -| `from_base64(`*`string`*`)`| Convert a base64 encoded string to a character string. | `from_base64('QQ==')` | `'A'` | -| `hash(`*`value`*`)` | Returns an integer with the hash of the *value* | `hash('🦆')` | `2595805878642663834` | -| `instr(`*`string`*`, `*`search_string`*`)`| Return location of first occurrence of `search_string` in `string`, counting from 1. Returns 0 if no match found. | `instr('test test', 'es')` | 2 | -| `lcase(`*`string`*`)` | Alias of `lower`. Convert *string* to lower case | `lcase('Hello')` | `hello` | -| `left(`*`string`*`, `*`count`*`)`| Extract the left-most count characters | `left('Hello🦆', 2)` | `He` | -| `left_grapheme(`*`string`*`, `*`count`*`)`| Extract the left-most grapheme clusters | `left_grapheme('🤦🏼‍♂️🤦🏽‍♀️', 1)` | `🤦🏼‍♂️` | -| `length(`*`string`*`)` | Number of characters in *string* | `length('Hello🦆')` | `6` | -| `length_grapheme(`*`string`*`)` | Number of grapheme clusters in *string* | `length_grapheme('🤦🏼‍♂️🤦🏽‍♀️')` | `2` | -| *`string`*` LIKE `*`target`* | Returns true if the *string* matches the like specifier (see [Pattern Matching](../../sql/functions/patternmatching)) | `'hello' LIKE '%lo'` | `true` | -| `like_escape(`*`string`*`, `*`like_specifier`*`, `*`escape_character`*`)` | Returns true if the *string* matches the *like_specifier* (see [Pattern Matching](../../sql/functions/patternmatching)). *escape_character* is used to search for wildcard characters in the *string*. | `like_escape('a%c', 'a$%c', '$')` | `true` | -| `list_element(`*`string`*`, `*`index`*`)` | An alias for `array_extract`. | `list_element('DuckDB', 2)` | `'u'` | -| `list_extract(`*`string`*`, `*`index`*`)` | An alias for `array_extract`. | `list_extract('DuckDB', 2)` | `'u'` | -| `lower(`*`string`*`)` | Convert *string* to lower case | `lower('Hello')` | `hello` | -| `lpad(`*`string`*`, `*`count`*`, `*`character`*`)`| Pads the *string* with the character from the left until it has count characters | `lpad('hello', 10, '>')` | `>>>>>hello` | -| `ltrim(`*`string`*`)`| Removes any spaces from the left side of the *string* | `ltrim('␣␣␣␣test␣␣')` | `test␣␣` | -| `ltrim(`*`string`*`, `*`characters`*`)`| Removes any occurrences of any of the *characters* from the left side of the *string* | `ltrim('>>>>test<<', '><')` | `test<<` | -| `md5(`*`value`*`)` | Returns the [MD5 hash](https://en.wikipedia.org/wiki/MD5) of the *value* | `md5('123')` | `'202cb962ac59075b964b07152d234b70'` | -| `nfc_normalize(`*`string`*`)`| Convert string to Unicode NFC normalized string. Useful for comparisons and ordering if text data is mixed between NFC normalized and not. | `nfc_normalize('ardèch')` | ``arde`ch`` | -| `not_like_escape(`*`string`*`, `*`like_specifier`*`, `*`escape_character`*`)` | Returns false if the *string* matches the *like_specifier* (see [Pattern Matching](../../sql/functions/patternmatching)). *escape_character* is used to search for wildcard characters in the *string*. | `like_escape('a%c', 'a$%c', '$')` | `true` | -| `ord(`*`string`*`)`| Return ASCII character code of the leftmost character in a string. | `ord('ü')` | `252` | -| `position(`*`search_string`*` in `*`string`*`)` | Return location of first occurrence of `search_string` in `string`, counting from 1. Returns 0 if no match found. | `position('b' in 'abc')` | `2` | -| `prefix(`*`string`*`, `*`search_string`*`)` | Return true if *string* starts with *search_string*. | `prefix('abc', 'ab')` | `true` | -| `printf(`*`format`*`, `*`parameters`*`...)` | Formats a *string* using printf syntax | `printf('Benchmark "%s" took %d seconds', 'CSV', 42)` | `Benchmark "CSV" took 42 seconds` | -| `regexp_full_match(`*`string`*`, `*`regex`*`)`| Returns true if the entire *string* matches the *regex* (see [Pattern Matching](../../sql/functions/patternmatching)) | `regexp_full_match('anabanana', '(an)*')` | `false` | -| `regexp_matches(`*`string`*`, `*`regex`*`)`| Returns true if a part of *string* matches the *regex* (see [Pattern Matching](../../sql/functions/patternmatching)) | `regexp_matches('anabanana', '(an)*')` | `true` | -| `regexp_replace(`*`string`*`, `*`regex`*`, `*`replacement`*`, `*`modifiers`*`)`| Replaces the first occurrence of *regex* with the *replacement*, use `'g'` modifier to replace all occurrences instead (see [Pattern Matching](../../sql/functions/patternmatching)) | `select regexp_replace('hello', '[lo]', '-')` | `he-lo` | -| `regexp_split_to_array(`*`string`*`, `*`regex`*`)` | Alias of `string_split_regex`. Splits the *string* along the *regex* | `regexp_split_to_array('hello␣world; 42', ';?␣')` | `['hello', 'world', '42']` | -| `regexp_extract(`*`string`*`, `*`regex`*`[, `*`group`*` = 0])` | Split the *string* along the *regex* and extract first occurrence of *group* | `regexp_extract('hello_world', '([a-z ]+)_?', 1)` | `hello` | -| `regexp_extract_all(`*`string`*`, `*`regex`*`[, `*`group`*` = 0])` | Split the *string* along the *regex* and extract all occurrences of *group* | `regexp_extract_all('hello_world', '([a-z ]+)_?', 1)` | `[hello, world]` | -| `repeat(`*`string`*`, `*`count`*`)`| Repeats the *string* *count* number of times | `repeat('A', 5)` | `AAAAA` | -| `replace(`*`string`*`, `*`source`*`, `*`target`*`)`| Replaces any occurrences of the *source* with *target* in *string* | `replace('hello', 'l', '-')` | `he--o` | -| `reverse(`*`string`*`)`| Reverses the *string* | `reverse('hello')` | `olleh` | -| `right(`*`string`*`, `*`count`*`)`| Extract the right-most *count* characters | `right('Hello🦆', 3)` | `lo🦆` | -| `right_grapheme(`*`string`*`, `*`count`*`)`| Extract the right-most *count* grapheme clusters | `right_grapheme('🤦🏼‍♂️🤦🏽‍♀️', 1)` | `🤦🏽‍♀️` | -| `rpad(`*`string`*`, `*`count`*`, `*`character`*`)`| Pads the *string* with the character from the right until it has *count* characters | `rpad('hello', 10, '<')` | `hello<<<<<` | -| `rtrim(`*`string`*`)`| Removes any spaces from the right side of the *string* | `rtrim('␣␣␣␣test␣␣')` | `␣␣␣␣test` | -| `rtrim(`*`string`*`, `*`characters`*`)`| Removes any occurrences of any of the *characters* from the right side of the *string* | `rtrim('>>>>test<<', '><')` | `>>>>test` | -| `split_part(`*`string`*`, `*`separator`*`, `*`index`*`)` | Split the *string* along the *separator* and return the data at the (1-based) *index* of the list. If the *index* is outside the bounds of the list, return an empty string (to match PostgreSQL's behavior). | `split_part('a|b|c', '|', 2)` | `b` | -| `starts_with(`*`string`*`, `*`search_string`*`)`| Return true if *string* begins with *search_string* | `starts_with('abc', 'a')` | `true` | -| *`string`*` SIMILAR TO `*`regex`* | Returns `true` if the *string* matches the *regex*; identical to `regexp_full_match` (see [Pattern Matching](../../sql/functions/patternmatching)) | `'hello' SIMILAR TO 'l+'` | `false` | -| `strlen(`*`string`*`)` | Number of bytes in *string* | `strlen('🦆')` | `4` | -| `strpos(`*`string`*`, `*`search_string`*`)`| Alias of `instr`. Return location of first occurrence of *search_string* in *string*, counting from 1. Returns 0 if no match found. | `strpos('test test', 'es')` | 2 | -| `strip_accents(`*`string`*`)`| Strips accents from *string* | `strip_accents('mühleisen')` | `muhleisen` | -| `str_split(`*`string`*`, `*`separator`*`)` | Alias of `string_split`. Splits the *string* along the *separator* | `str_split('hello␣world', '␣')` | `['hello', 'world']` | -| `str_split_regex(`*`string`*`, `*`regex`*`)` | Alias of `string_split_regex`. Splits the *string* along the *regex* | `str_split_regex('hello␣world; 42', ';?␣')` | `['hello', 'world', '42']` | -| `string_split(`*`string`*`, `*`separator`*`)` | Splits the *string* along the *separator* | `string_split('hello␣world', '␣')` | `['hello', 'world']` | -| `string_split_regex(`*`string`*`, `*`regex`*`)` | Splits the *string* along the *regex* | `string_split_regex('hello␣world; 42', ';?␣')` | `['hello', 'world', '42']` | -| `string_to_array(`*`string`*`, `*`separator`*`)` | Alias of `string_split`. Splits the *string* along the *separator* | `string_to_array('hello␣world', '␣')` | `['hello', 'world']` | -| `substr(`*`string`*`, `*`start`*`, `*`length`*`)` | Alias of `substring`. Extract substring of *length* characters starting from character *start*. Note that a *start* value of `1` refers to the *first* character of the string. | `substr('Hello', 2, 2)` | `el` | -| `substring(`*`string`*`, `*`start`*`, `*`length`*`)` | Extract substring of *length* characters starting from character *start*. Note that a *start* value of `1` refers to the *first* character of the string. | `substring('Hello', 2, 2)` | `el` | -| `substring_grapheme(`*`string`*`, `*`start`*`, `*`length`*`)` | Extract substring of *length* grapheme clusters starting from character *start*. Note that a *start* value of `1` refers to the *first* character of the string. | `substring_grapheme('🦆🤦🏼‍♂️🤦🏽‍♀️🦆', 3, 2)` | `🤦🏽‍♀️🦆` | -| `suffix(`*`string`*`, `*`search_string`*`)` | Return true if *string* ends with *search_string*. | `suffix('abc', 'bc')` | `true` | -| `strpos(`*`string`*`, `*`characters`*`)`| Alias of `instr`. Return location of first occurrence of *characters* in *string*, counting from 1. Returns 0 if no match found. | `strpos('test test', 'es')` | 2 | -| `to_base64(`*`blob`*`)`| Convert a blob to a base64 encoded string. Alias of base64. | `to_base64('A'::blob)` | `QQ==` | -| `trim(`*`string`*`)`| Removes any spaces from either side of the *string* | `trim('␣␣␣␣test␣␣')` | `test` | -| `trim(`*`string`*`, `*`characters`*`)`| Removes any occurrences of any of the *characters* from either side of the *string* | `trim('>>>>test<<', '><')` | `test` | -| `ucase(`*`string`*`)`| Alias of `upper`. Convert *string* to upper case | `ucase('Hello')` | `HELLO` | -| `unicode(`*`string`*`)`| Returns the unicode code of the first character of the *string* | `unicode('ü')` | `252` | -| `upper(`*`string`*`)`| Convert *string* to upper case | `upper('Hello')` | `HELLO` | +| Function | Description | Example | Result | +|:--------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------|:-------------------------------------| +| *`string`* `^@` *`search_string`* | Alias for `starts_with`. | `'abc' ^@ 'a'` | `true` | +| *`string`* ` | | ` *`string`* | String concatenation | `'Duck' || 'DB'` | `DuckDB` | +| *`string`*`[`*`index`*`]` | Alias for `array_extract`. | `'DuckDB'[4]` | `'k'` | +| *`string`*`[`*`begin`*`:`*`end`*`]` | Alias for `array_slice`. Missing `begin` or `end` arguments are interpreted as the beginning or end of the list respectively. | `'DuckDB'[:4]` | `'Duck'` | +| `array_extract(`*`list`*`, `*`index`*`)` | Extract a single character using a (1-based) index. | `array_extract('DuckDB', 2)` | `'u'` | +| `array_slice(`*`list`*`, `*`begin`*`, `*`end`*`)` | Extract a string using slice conventions. Negative values are accepted. | `array_slice('DuckDB', 5, NULL)` | `'DB'` | +| `ascii(`*`string`*`)` | Returns an integer that represents the Unicode code point of the first character of the *string* | `ascii('Ω')` | `937` | +| `bar(`*`x`*`, `*`min`*`, `*`max`*`[, `*`width`*`])` | Draw a band whose width is proportional to (*x* - *min*) and equal to *width* characters when *x* = *max*. *width* defaults to 80. | `bar(5, 0, 20, 10)` | `██▌` | +| `base64(`*`blob`*`)` | Convert a blob to a base64 encoded string. Alias of to_base64. | `base64('A'::blob)` | `'QQ=='` | +| `bit_length(`*`string`*`)` | Number of bits in a string. | `bit_length('abc')` | `24` | +| `chr(`*`x`*`)` | returns a character which is corresponding the ASCII code value or Unicode code point | `chr(65)` | A | +| `concat(`*`string`*`, ...)` | Concatenate many strings together | `concat('Hello', ' ', 'World')` | `Hello World` | +| `concat_ws(`*`separator`*`, `*`string`*`, ...)` | Concatenate strings together separated by the specified separator | `concat_ws(', ', 'Banana', 'Apple', 'Melon')` | `Banana,Apple,Melon` | +| `contains(`*`string`*`, `*`search_string`*`)` | Return true if *search_string* is found within *string* | `contains('abc', 'a')` | `true` | +| `format(`*`format`*`, `*`parameters`*`...)` | Formats a string using fmt syntax | `format('Benchmark "{}" took {} seconds', 'CSV', 42)` | `Benchmark "CSV" took 42 seconds` | +| `from_base64(`*`string`*`)` | Convert a base64 encoded string to a character string. | `from_base64('QQ==')` | `'A'` | +| `hash(`*`value`*`)` | Returns an integer with the hash of the *value* | `hash('🦆')` | `2595805878642663834` | +| `instr(`*`string`*`, `*`search_string`*`)` | Return location of first occurrence of `search_string` in `string`, counting from 1. Returns 0 if no match found. | `instr('test test', 'es')` | 2 | +| `lcase(`*`string`*`)` | Alias of `lower`. Convert *string* to lower case | `lcase('Hello')` | `hello` | +| `left(`*`string`*`, `*`count`*`)` | Extract the left-most count characters | `left('Hello🦆', 2)` | `He` | +| `left_grapheme(`*`string`*`, `*`count`*`)` | Extract the left-most grapheme clusters | `left_grapheme('🤦🏼‍♂️🤦🏽‍♀️', 1)` | `🤦🏼‍♂️` | +| `length(`*`string`*`)` | Number of characters in *string* | `length('Hello🦆')` | `6` | +| `length_grapheme(`*`string`*`)` | Number of grapheme clusters in *string* | `length_grapheme('🤦🏼‍♂️🤦🏽‍♀️')` | `2` | +| *`string`*` LIKE `*`target`* | Returns true if the *string* matches the like specifier (see [Pattern Matching](../../sql/functions/patternmatching)) | `'hello' LIKE '%lo'` | `true` | +| `like_escape(`*`string`*`, `*`like_specifier`*`, `*`escape_character`*`)` | Returns true if the *string* matches the *like_specifier* (see [Pattern Matching](../../sql/functions/patternmatching)). *escape_character* is used to search for wildcard characters in the *string*. | `like_escape('a%c', 'a$%c', '$')` | `true` | +| `list_element(`*`string`*`, `*`index`*`)` | An alias for `array_extract`. | `list_element('DuckDB', 2)` | `'u'` | +| `list_extract(`*`string`*`, `*`index`*`)` | An alias for `array_extract`. | `list_extract('DuckDB', 2)` | `'u'` | +| `lower(`*`string`*`)` | Convert *string* to lower case | `lower('Hello')` | `hello` | +| `lpad(`*`string`*`, `*`count`*`, `*`character`*`)` | Pads the *string* with the character from the left until it has count characters | `lpad('hello', 10, '>')` | `>>>>>hello` | +| `ltrim(`*`string`*`)` | Removes any spaces from the left side of the *string* | `ltrim('␣␣␣␣test␣␣')` | `test␣␣` | +| `ltrim(`*`string`*`, `*`characters`*`)` | Removes any occurrences of any of the *characters* from the left side of the *string* | `ltrim('>>>>test<<', '><')` | `test<<` | +| `md5(`*`value`*`)` | Returns the [MD5 hash](https://en.wikipedia.org/wiki/MD5) of the *value* | `md5('123')` | `'202cb962ac59075b964b07152d234b70'` | +| `nfc_normalize(`*`string`*`)` | Convert string to Unicode NFC normalized string. Useful for comparisons and ordering if text data is mixed between NFC normalized and not. | `nfc_normalize('ardèch')` | ``arde`ch`` | +| `not_like_escape(`*`string`*`, `*`like_specifier`*`, `*`escape_character`*`)` | Returns false if the *string* matches the *like_specifier* (see [Pattern Matching](../../sql/functions/patternmatching)). *escape_character* is used to search for wildcard characters in the *string*. | `like_escape('a%c', 'a$%c', '$')` | `true` | +| `ord(`*`string`*`)` | Return ASCII character code of the leftmost character in a string. | `ord('ü')` | `252` | +| `position(`*`search_string`*` in `*`string`*`)` | Return location of first occurrence of `search_string` in `string`, counting from 1. Returns 0 if no match found. | `position('b' in 'abc')` | `2` | +| `prefix(`*`string`*`, `*`search_string`*`)` | Return true if *string* starts with *search_string*. | `prefix('abc', 'ab')` | `true` | +| `printf(`*`format`*`, `*`parameters`*`...)` | Formats a *string* using printf syntax | `printf('Benchmark "%s" took %d seconds', 'CSV', 42)` | `Benchmark "CSV" took 42 seconds` | +| `regexp_full_match(`*`string`*`, `*`regex`*`)` | Returns true if the entire *string* matches the *regex* (see [Pattern Matching](../../sql/functions/patternmatching)) | `regexp_full_match('anabanana', '(an)*')` | `false` | +| `regexp_matches(`*`string`*`, `*`regex`*`)` | Returns true if a part of *string* matches the *regex* (see [Pattern Matching](../../sql/functions/patternmatching)) | `regexp_matches('anabanana', '(an)*')` | `true` | +| `regexp_replace(`*`string`*`, `*`regex`*`, `*`replacement`*`, `*`modifiers`*`)` | Replaces the first occurrence of *regex* with the *replacement*, use `'g'` modifier to replace all occurrences instead (see [Pattern Matching](../../sql/functions/patternmatching)) | `select regexp_replace('hello', '[lo]', '-')` | `he-lo` | +| `regexp_split_to_array(`*`string`*`, `*`regex`*`)` | Alias of `string_split_regex`. Splits the *string* along the *regex* | `regexp_split_to_array('hello␣world; 42', ';?␣')` | `['hello', 'world', '42']` | +| `regexp_extract(`*`string`*`, `*`regex`*`[, `*`group`*` = 0])` | Split the *string* along the *regex* and extract first occurrence of *group* | `regexp_extract('hello_world', '([a-z ]+)_?', 1)` | `hello` | +| `regexp_extract_all(`*`string`*`, `*`regex`*`[, `*`group`*` = 0])` | Split the *string* along the *regex* and extract all occurrences of *group* | `regexp_extract_all('hello_world', '([a-z ]+)_?', 1)` | `[hello, world]` | +| `repeat(`*`string`*`, `*`count`*`)` | Repeats the *string* *count* number of times | `repeat('A', 5)` | `AAAAA` | +| `replace(`*`string`*`, `*`source`*`, `*`target`*`)` | Replaces any occurrences of the *source* with *target* in *string* | `replace('hello', 'l', '-')` | `he--o` | +| `reverse(`*`string`*`)` | Reverses the *string* | `reverse('hello')` | `olleh` | +| `right(`*`string`*`, `*`count`*`)` | Extract the right-most *count* characters | `right('Hello🦆', 3)` | `lo🦆` | +| `right_grapheme(`*`string`*`, `*`count`*`)` | Extract the right-most *count* grapheme clusters | `right_grapheme('🤦🏼‍♂️🤦🏽‍♀️', 1)` | `🤦🏽‍♀️` | +| `rpad(`*`string`*`, `*`count`*`, `*`character`*`)` | Pads the *string* with the character from the right until it has *count* characters | `rpad('hello', 10, '<')` | `hello<<<<<` | +| `rtrim(`*`string`*`)` | Removes any spaces from the right side of the *string* | `rtrim('␣␣␣␣test␣␣')` | `␣␣␣␣test` | +| `rtrim(`*`string`*`, `*`characters`*`)` | Removes any occurrences of any of the *characters* from the right side of the *string* | `rtrim('>>>>test<<', '><')` | `>>>>test` | +| `split_part(`*`string`*`, `*`separator`*`, `*`index`*`)` | Split the *string* along the *separator* and return the data at the (1-based) *index* of the list. If the *index* is outside the bounds of the list, return an empty string (to match PostgreSQL's behavior). | `split_part('a | b |c', '|', 2)` | `b` | +| `starts_with(`*`string`*`, `*`search_string`*`)` | Return true if *string* begins with *search_string* | `starts_with('abc', 'a')` | `true` | +| *`string`*` SIMILAR TO `*`regex`* | Returns `true` if the *string* matches the *regex*; identical to `regexp_full_match` (see [Pattern Matching](../../sql/functions/patternmatching)) | `'hello' SIMILAR TO 'l+'` | `false` | +| `strlen(`*`string`*`)` | Number of bytes in *string* | `strlen('🦆')` | `4` | +| `strpos(`*`string`*`, `*`search_string`*`)` | Alias of `instr`. Return location of first occurrence of *search_string* in *string*, counting from 1. Returns 0 if no match found. | `strpos('test test', 'es')` | 2 | +| `strip_accents(`*`string`*`)` | Strips accents from *string* | `strip_accents('mühleisen')` | `muhleisen` | +| `str_split(`*`string`*`, `*`separator`*`)` | Alias of `string_split`. Splits the *string* along the *separator* | `str_split('hello␣world', '␣')` | `['hello', 'world']` | +| `str_split_regex(`*`string`*`, `*`regex`*`)` | Alias of `string_split_regex`. Splits the *string* along the *regex* | `str_split_regex('hello␣world; 42', ';?␣')` | `['hello', 'world', '42']` | +| `string_split(`*`string`*`, `*`separator`*`)` | Splits the *string* along the *separator* | `string_split('hello␣world', '␣')` | `['hello', 'world']` | +| `string_split_regex(`*`string`*`, `*`regex`*`)` | Splits the *string* along the *regex* | `string_split_regex('hello␣world; 42', ';?␣')` | `['hello', 'world', '42']` | +| `string_to_array(`*`string`*`, `*`separator`*`)` | Alias of `string_split`. Splits the *string* along the *separator* | `string_to_array('hello␣world', '␣')` | `['hello', 'world']` | +| `substr(`*`string`*`, `*`start`*`, `*`length`*`)` | Alias of `substring`. Extract substring of *length* characters starting from character *start*. Note that a *start* value of `1` refers to the *first* character of the string. | `substr('Hello', 2, 2)` | `el` | +| `substring(`*`string`*`, `*`start`*`, `*`length`*`)` | Extract substring of *length* characters starting from character *start*. Note that a *start* value of `1` refers to the *first* character of the string. | `substring('Hello', 2, 2)` | `el` | +| `substring_grapheme(`*`string`*`, `*`start`*`, `*`length`*`)` | Extract substring of *length* grapheme clusters starting from character *start*. Note that a *start* value of `1` refers to the *first* character of the string. | `substring_grapheme('🦆🤦🏼‍♂️🤦🏽‍♀️🦆', 3, 2)` | `🤦🏽‍♀️🦆` | +| `suffix(`*`string`*`, `*`search_string`*`)` | Return true if *string* ends with *search_string*. | `suffix('abc', 'bc')` | `true` | +| `strpos(`*`string`*`, `*`characters`*`)` | Alias of `instr`. Return location of first occurrence of *characters* in *string*, counting from 1. Returns 0 if no match found. | `strpos('test test', 'es')` | 2 | +| `to_base64(`*`blob`*`)` | Convert a blob to a base64 encoded string. Alias of base64. | `to_base64('A'::blob)` | `QQ==` | +| `trim(`*`string`*`)` | Removes any spaces from either side of the *string* | `trim('␣␣␣␣test␣␣')` | `test` | +| `trim(`*`string`*`, `*`characters`*`)` | Removes any occurrences of any of the *characters* from either side of the *string* | `trim('>>>>test<<', '><')` | `test` | +| `ucase(`*`string`*`)` | Alias of `upper`. Convert *string* to upper case | `ucase('Hello')` | `HELLO` | +| `unicode(`*`string`*`)` | Returns the unicode code of the first character of the *string* | `unicode('ü')` | `252` | +| `upper(`*`string`*`)` | Convert *string* to upper case | `upper('Hello')` | `HELLO` | ## Text Similarity Functions diff --git a/docs/sql/functions/nested.md b/docs/sql/functions/nested.md index 44f344a7204..90557b65c16 100644 --- a/docs/sql/functions/nested.md +++ b/docs/sql/functions/nested.md @@ -10,62 +10,35 @@ In the descriptions, `l` is the three element list `[4, 5, 6]`. -| Function | Description | Example | Result | -|:---|:--|:---|:-| -| *`list`*`[`*`index`*`]` | Bracket notation serves as an alias for `list_extract`. | `l[3]` | `6` | -| `list_extract(`*`list`*`, `*`index`*`)` | Extract the `index`th (1-based) value from the list. | `list_extract(l, 3)` | `6` | -| `list_element(`*`list`*`, `*`index`*`)` | Alias for `list_extract`. | `list_element(l, 3)` | `6` | -| `array_extract(`*`list`*`, `*`index`*`)` | Alias for `list_extract`. | `array_extract(l, 3)` | `6` | -| *`list`*`[`*`begin`*`:`*`end`*`]` | Bracket notation with colon is an alias for `list_slice`. Missing arguments are interpreted as `NULL`s. | `l[2:3]` | `[5, 6]` | -| `list_slice(`*`list`*`, `*`begin`*`, `*`end`*`)` | Extract a sublist using slice conventions. `NULL`s are interpreted as the bounds of the `LIST`. Negative values are accepted. | `list_slice(l, 2, NULL)` | `[5, 6]` | -| `array_slice(`*`list`*`, `*`begin`*`, `*`end`*`)` | Alias for list_slice. | `array_slice(l, 2, NULL)` | `[5, 6]` | -| `array_pop_front(`*`list`*`)` | Returns the list without the first element. | `array_pop_front(l)` | `[5, 6]` | -| `array_pop_back(`*`list`*`)` | Returns the list without the last element. | `array_pop_back(l)` | `[4, 5]` | -| `list_value(`*`any`*`, ...)` | Create a `LIST` containing the argument values. | `list_value(4, 5, 6)` | `[4, 5, 6]` | -| `list_pack(`*`any`*`, ...)` | Alias for `list_value`. | `list_pack(4, 5, 6)` | `[4, 5, 6]` | -| `len(`*`list`*`)` | Return the length of the list. | `len([1, 2, 3])` | `3` | -| `array_length(`*`list`*`)` | Alias for `len`. | `array_length([1, 2, 3])` | `3` | -| `unnest(`*`list`*`)` | Unnests a list by one level. Note that this is a special function that alters the cardinality of the result. See the [UNNEST page](../query_syntax/unnest) for more details. | `unnest([1, 2, 3])` | `1`, `2`, `3` | -| `flatten(`*`list_of_lists`*`)` | Concatenate a list of lists into a single list. This only flattens one level of the list (see [examples](nested#flatten)). | `flatten([[1, 2], [3, 4]])` | `[1, 2, 3, 4]` | -| `list_concat(`*`list1`*`, `*`list2`*`)` | Concatenates two lists. | `list_concat([2, 3], [4, 5, 6])` | `[2, 3, 4, 5, 6]` | -| `list_cat(`*`list1`*`, `*`list2`*`)` | Alias for `list_concat`. | `list_cat([2, 3], [4, 5, 6])` | `[2, 3, 4, 5, 6]` | -| `array_concat(`*`list1`*`, `*`list2`*`)` | Alias for `list_concat`. | `array_concat([2, 3], [4, 5, 6])` | `[2, 3, 4, 5, 6]` | -| `array_cat(`*`list1`*`, `*`list2`*`)` | Alias for `list_concat`. | `array_cat([2, 3], [4, 5, 6])` | `[2, 3, 4, 5, 6]` | -| `list_prepend(`*`element`*`, `*`list`*`)` | Prepends `element` to `list`. | `list_prepend(3, [4, 5, 6])` | `[3, 4, 5, 6]` | -| `array_prepend(`*`element`*`, `*`list`*`)` | Alias for `list_prepend`. | `array_prepend(3, [4, 5, 6])` | `[3, 4, 5, 6]` | -| `array_push_front(`*`list`*`, `*`element`*`)` | Alias for `list_prepend`. | `array_push_front(l, 3)` | `[3, 4, 5,6]` | -| `list_append(`*`list`*`, `*`element`*`)` | Appends `element` to `list`. | `list_append([2, 3], 4)` | `[2, 3, 4]` | -| `array_append(`*`list`*`, `*`element`*`)` | Alias for `list_append`. | `array_append([2, 3], 4)` | `[2, 3, 4]` | -| `array_push_back(`*`list`*`, `*`element`*`)` | Alias for `list_append`. | `array_push_back(l, 7)` | `[4, 5, 6, 7]` | -| `list_contains(`*`list`*`, `*`element`*`)` | Returns true if the list contains the element. | `list_contains([1, 2, NULL], 1)` | `true` | -| `list_has(`*`list`*`, `*`element`*`)` | Alias for `list_contains`. | `list_has([1, 2, NULL], 1)` | `true` | -| `array_contains(`*`list`*`, `*`element`*`)` | Alias for `list_contains`. | `array_contains([1, 2, NULL], 1)` | `true` | -| `array_has(`*`list`*`, `*`element`*`)` | Alias for `list_contains`. | `array_has([1, 2, NULL], 1)` | `true` | -| `list_position(`*`list`*`, `*`element`*`)` | Returns the index of the element if the list contains the element. | `list_contains([1, 2, NULL], 2)` | `2` | -| `list_indexof(`*`list`*`, `*`element`*`)` | Alias for `list_position`. | `list_indexof([1, 2, NULL], 2)` | `2` | -| `array_position(`*`list`*`, `*`element`*`)` | Alias for `list_position`. | `array_position([1, 2, NULL], 2)` | `2` | -| `array_indexof(`*`list`*`, `*`element`*`)` | Alias for `list_position`. | `array_indexof([1, 2, NULL], 2)` | `2` | -| `list_aggregate(`*`list`*`, `*`name`*`)` | Executes the aggregate function `name` on the elements of `list`. See the [List Aggregates](nested#list-aggregates) section for more details. | `list_aggregate([1, 2, NULL], 'min')` | `1` | -| `list_aggr(`*`list`*`, `*`name`*`)` | Alias for `list_aggregate`. | `list_aggr([1, 2, NULL], 'min')` | `1` | -| `array_aggregate(`*`list`*`, `*`name`*`)` | Alias for `list_aggregate`. | `array_aggregate([1, 2, NULL], 'min')` | `1` | -| `array_aggr(`*`list`*`, `*`name`*`)` | Alias for `list_aggregate`. | `array_aggr([1, 2, NULL], 'min')` | `1` | -| `list_sort(`*`list`*`)` | Sorts the elements of the list. See the [Sorting Lists](nested#sorting-lists) section for more details about the sorting order and the null sorting order. | `list_sort([3, 6, 1, 2])` | `[1, 2, 3, 6]` | -| `array_sort(`*`list`*`)` | Alias for `list_sort`. | `array_sort([3, 6, 1, 2])` | `[1, 2, 3, 6]` | -| `list_reverse_sort(`*`list`*`)` | Sorts the elements of the list in reverse order. See the [Sorting Lists](nested#sorting-lists) section for more details about the null sorting order. | `list_reverse_sort([3, 6, 1, 2])` | `[6, 3, 2, 1]` | -| `array_reverse_sort(`*`list`*`)` | Alias for `list_reverse_sort`. | `array_reverse_sort([3, 6, 1, 2])` | `[6, 3, 2, 1]` | -| `list_transform(`*`list`*`, `*`lambda`*`)` | Returns a list that is the result of applying the lambda function to each element of the input list. See the [Lambda Functions](nested#lambda-functions) section for more details. | `list_transform(l, x -> x + 1)` | `[5, 6, 7]` | -| `array_transform(`*`list`*`, `*`lambda`*`)` | Alias for `list_transform`. | `array_transform(l, x -> x + 1)` | `[5, 6, 7]` | -| `list_apply(`*`list`*`, `*`lambda`*`)` | Alias for `list_transform`. | `list_apply(l, x -> x + 1)` | `[5, 6, 7]` | -| `array_apply(`*`list`*`, `*`lambda`*`)` | Alias for `list_transform`. | `array_apply(l, x -> x + 1)` | `[5, 6, 7]` | -| `list_filter(`*`list`*`, `*`lambda`*`)` | Constructs a list from those elements of the input list for which the lambda function returns true. See the [Lambda Functions](nested#lambda-functions) section for more details. | `list_filter(l, x -> x > 4)` | `[5, 6]` | -| `array_filter(`*`list`*`, `*`lambda`*`)` | Alias for `list_filter`. | `array_filter(l, x -> x > 4)` | `[5, 6]` | -| `list_distinct(`*`list`*`)` | Removes all duplicates and NULLs from a list. Does not preserve the original order. | `list_distinct([1, 1, NULL, -3, 1, 5])` | `[1, 5, -3]` | -| `array_distinct(`*`list`*`)` | Alias for `list_distinct`. | `array_distinct([1, 1, NULL, -3, 1, 5])` | `[1, 5, -3]` | -| `list_unique(`*`list`*`)` | Counts the unique elements of a list. | `list_unique([1, 1, NULL, -3, 1, 5])` | `3` | -| `array_unique(`*`list`*`)` | Alias for `list_unique`. | `array_unique([1, 1, NULL, -3, 1, 5])` | `3` | -| `list_any_value(`*`list`*`)` | Returns the first non-null value in the list | `list_any_value([NULL, -3])` | `-3` | -| `list_resize(`*`list`*`, `*`size`*`[, `*`value`*`])` | Resizes the list to contain `size` elements. Initializes new elements with `value` or `NULL` if `value` is not set. | `list_resize([1,2,3], 5, 0)` | `[1, 2, 3, 0, 0]` | -| `array_resize(`*`list`*`, `*`size`*`[, `*`value`*`])` | Alias for `list_resize`. | `array_resize([1,2,3], 5, 0)` | `[1, 2, 3, 0, 0]` | +| Function | Aliases | Description | Example | Result | +|:-------------------------------------------------------------|:--------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------|:------------------| +| *`list`*`[`*`index`*`]` | | Bracket notation serves as an alias for `list_extract`. | `l[3]` | `6` | +| `list_extract(`*`list`*`, `*`index`*`)` | `list_element`, `array_extract` | Extract the `index`th (1-based) value from the list. | `list_extract(l, 3)` | `6` | +| *`list`*`[`*`begin`*`:`*`end`*`]` | | Bracket notation with colon is an alias for `list_slice`. | `l[2:3]` | `[5, 6]` | +| *`list`*`[`*`begin`*`:`*`end`*`: `*`step`*`]` | | `list_slice` in bracket notation with an added `step` feature. | `l[:-:2]` | `[4, 6]` | +| `list_slice(`*`list`*`, `*`begin`*`, `*`end`*`)` | `array_slice` | Extract a sublist using slice conventions. Negative values are accepted. See [slicing](nested#slicing). | `list_slice(l, 2, 3)` | `[5, 6]` | +| `list_slice(`*`list`*`, `*`begin`*`, `*`end`*`, `*`step`*`)` | `array_slice` | `list_slice` with added `step` feature. | `list_slice(l, 1, 3, 2)` | `[4, 6]` | +| `list_reverse(`*`list`*`)` | `array_reverse` | Reverses the list. | `list_reverse(l)` | `[6, 5, 4]` | +| `array_pop_front(`*`list`*`)` | | Returns the list without the first element. | `array_pop_front(l)` | `[5, 6]` | +| `array_pop_back(`*`list`*`)` | | Returns the list without the last element. | `array_pop_back(l)` | `[4, 5]` | +| `list_value(`*`any`*`, ...)` | `list_pack` | Create a `LIST` containing the argument values. | `list_value(4, 5, 6)` | `[4, 5, 6]` | +| `len(`*`list`*`)` | `array_length` | Return the length of the list. | `len([1, 2, 3])` | `3` | +| `unnest(`*`list`*`)` | | Unnests a list by one level. Note that this is a special function that alters the cardinality of the result. See the [UNNEST page](../query_syntax/unnest) for more details. | `unnest([1, 2, 3])` | `1`, `2`, `3` | +| `flatten(`*`list_of_lists`*`)` | | Concatenate a list of lists into a single list. This only flattens one level of the list (see [examples](nested#flatten)). | `flatten([[1, 2], [3, 4]])` | `[1, 2, 3, 4]` | +| `list_concat(`*`list1`*`, `*`list2`*`)` | `list_cat`, `array_concat`, `array_cat` | Concatenates two lists. | `list_concat([2, 3], [4, 5, 6])` | `[2, 3, 4, 5, 6]` | +| `list_prepend(`*`element`*`, `*`list`*`)` | `array_prepend`, `array_push_front` | Prepends `element` to `list`. | `list_prepend(3, [4, 5, 6])` | `[3, 4, 5, 6]` | +| `list_append(`*`list`*`, `*`element`*`)` | `array_append`, `array_push_back` | Appends `element` to `list`. | `list_append([2, 3], 4)` | `[2, 3, 4]` | +| `list_contains(`*`list`*`, `*`element`*`)` | `list_has`, `array_contains`, `array_has` | Returns true if the list contains the element. | `list_contains([1, 2, NULL], 1)` | `true` | +| `list_position(`*`list`*`, `*`element`*`)` | `list_indexof`, `array_position`, `array_indexof` | Returns the index of the element if the list contains the element. | `list_contains([1, 2, NULL], 2)` | `2` | +| `list_aggregate(`*`list`*`, `*`name`*`)` | `list_aggr`, `array_aggregate`, `array_aggr` | Executes the aggregate function `name` on the elements of `list`. See the [List Aggregates](nested#list-aggregates) section for more details. | `list_aggregate([1, 2, NULL], 'min')` | `1` | +| `list_sort(`*`list`*`)` | `array_sort` | Sorts the elements of the list. See the [Sorting Lists](nested#sorting-lists) section for more details about the sorting order and the null sorting order. | `list_sort([3, 6, 1, 2])` | `[1, 2, 3, 6]` | +| `list_reverse_sort(`*`list`*`)` | `array_reverse_sort` | Sorts the elements of the list in reverse order. See the [Sorting Lists](nested#sorting-lists) section for more details about the null sorting order. | `list_reverse_sort([3, 6, 1, 2])` | `[6, 3, 2, 1]` | +| `list_transform(`*`list`*`, `*`lambda`*`)` | `array_transform`, `list_apply`, `array_apply` | Returns a list that is the result of applying the lambda function to each element of the input list. See the [Lambda Functions](nested#lambda-functions) section for more details. | `list_transform(l, x -> x + 1)` | `[5, 6, 7]` | +| `list_filter(`*`list`*`, `*`lambda`*`)` | `array_filter` | Constructs a list from those elements of the input list for which the lambda function returns true. See the [Lambda Functions](nested#lambda-functions) section for more details. | `list_filter(l, x -> x > 4)` | `[5, 6]` | +| `list_distinct(`*`list`*`)` | `array_distinct` | Removes all duplicates and NULLs from a list. Does not preserve the original order. | `list_distinct([1, 1, NULL, -3, 1, 5])` | `[1, 5, -3]` | +| `list_unique(`*`list`*`)` | `array_unique` | Counts the unique elements of a list. | `list_unique([1, 1, NULL, -3, 1, 5])` | `3` | +| `list_any_value(`*`list`*`)` | | Returns the first non-null value in the list | `list_any_value([NULL, -3])` | `-3` | +| `list_resize(`*`list`*`, `*`size`*`[, `*`value`*`])` | `array_resize` | Resizes the list to contain `size` elements. Initializes new elements with `value` or `NULL` if `value` is not set. | `list_resize([1,2,3], 5, 0)` | `[1, 2, 3, 0, 0]` | ## List Operators @@ -175,6 +148,57 @@ SELECT * FROM range(date '1992-01-01', date '1992-03-01', interval '1' month); └─────────────────────┘ ``` +## Slicing +The function `list_slice` can be used to extract a sublist from a list. The following variants exist: +- `list_slice(`*`list`*`, `*`begin`*`, `*`end`*`)` +- `list_slice(`*`list`*`, `*`begin`*`, `*`end`*`)` +- `array_slice(`*`list`*`, `*`begin`*`, `*`end`*`, `*`step`*`)` +- `array_slice(`*`list`*`, `*`begin`*`, `*`end`*`, `*`step`*`)` +- `list[`*`begin`*`:`*`end`*`]` +- `list[`*`begin`*`:`*`end`*`:`*`step`*`]` + +**`list`** +- Is the list to be sliced + +**`begin`** +- Is the index of the first element to be included in the slice +- When `begin < 0` the index is counted from the end of the list +- When `begin < 0` and `-begin > length`, `begin` is clamped to the beginning of the list +- When `begin > length`, the result is an empty list +- **Bracket Notation:** When `begin` is omitted, it defaults to the beginning of the list + +**`end`** +- Is the index of the last element to be included in the slice +- When `end < 0` the index is counted from the end of the list +- When `end > length`, end is clamped to `length` +- When `end < begin`, the result is an empty list +- **Bracket Notation:** When `end` is omitted, it defaults to the end of the list. When `end` is omitted and a `step` is provided, `end` must be replaced with a `-`. + +**`step`** *(optional)* +- Is the step size between elements in the slice +- When `step < 0` the slice is reversed, and `begin` and `end` are swapped +- Must be non-zero + +```sql +SELECT list_slice([1, 2, 3, 4, 5], 2, 4); +-- [2, 3, 4] + +SELECT ([1, 2, 3, 4, 5])[2:4:2]; +-- [2, 4] + +SELECT([1, 2, 3, 4, 5])[4:2:-2]; +-- [4, 2] + +SELECT ([1, 2, 3, 4, 5])[:]; +-- [1, 2, 3, 4, 5] + +SELECT ([1, 2, 3, 4, 5])[:-:2]; +-- [1, 3, 5] + +SELECT ([1, 2, 3, 4, 5])[:-:-2]; +-- [5, 3, 1] +``` + ## List Aggregates The function `list_aggregate` allows the execution of arbitrary existing aggregate functions on the elements of a list. Its first argument is the list (column), its second argument is the aggregate function name, e.g., `min`, `histogram` or `sum`.