Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eliminate trailing spaces and various other small fixes #4051

Merged
merged 4 commits into from
Nov 12, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions _posts/2023-02-13-announcing-duckdb-070.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ orders

Note that currently the parallel writing is currently limited to non-insertion order preserving – which can be toggled by setting the `preserve_insertion_order` setting to false. In a future release we aim to alleviate this restriction and order parallel insertion order preserving writes as well.

#### Multi-Database Support
#### Multi-Database Support

**Attach Functionality.** This release adds support for [attaching multiple databases](https://github.com/duckdb/duckdb/pull/5764) to the same DuckDB instance. This easily allows data to be transferred between separate DuckDB database files, and also allows data from separate database files to be combined together in individual queries. Remote DuckDB instances (stored on a network accessible location like GitHub, for example) may also be attached.

Expand All @@ -82,7 +82,7 @@ DETACH new_db;

See the [documentation for more information]({% link docs/sql/statements/attach.md %}).

**SQLite Storage Back-end.** In addition to adding support for attaching DuckDB databases – this release also adds support for [*pluggable database engines*](https://github.com/duckdb/duckdb/pull/6066). This allows extensions to define their own database and catalog engines that can be attached to the system. Once attached, an engine can support both reads and writes. The [SQLite extension](https://github.com/duckdb/sqlite_scanner) makes use of this to add native read/write support for SQLite database files to DuckDB.
**SQLite Storage Back-End.** In addition to adding support for attaching DuckDB databases – this release also adds support for [*pluggable database engines*](https://github.com/duckdb/duckdb/pull/6066). This allows extensions to define their own database and catalog engines that can be attached to the system. Once attached, an engine can support both reads and writes. The [SQLite extension](https://github.com/duckdb/sqlite_scanner) makes use of this to add native read/write support for SQLite database files to DuckDB.

```sql
ATTACH 'sqlite_file.db' AS sqlite (TYPE sqlite);
Expand Down Expand Up @@ -118,7 +118,7 @@ FROM movies;

See the [documentation for more information]({% link docs/sql/statements/insert.md %}#on-conflict-clause).

**Lateral Joins.** Support for [lateral joins](https://github.com/duckdb/duckdb/pull/5393) is added in this release. Lateral joins are a more flexible variant of correlated subqueries that make working with nested data easier, as they allow [easier unnesting](https://github.com/duckdb/duckdb/pull/5485) of nested data.
**Lateral Joins.** Support for [lateral joins](https://github.com/duckdb/duckdb/pull/5393) is added in this release. Lateral joins are a more flexible variant of correlated subqueries that make working with nested data easier, as they allow [easier unnesting](https://github.com/duckdb/duckdb/pull/5485) of nested data.

**Positional Joins.** While SQL formally models unordered sets, in practice the order of datasets does frequently have a meaning. DuckDB offers guarantees around maintaining the order of rows when loading data into tables or when exporting data back out to a file – as well as when executing queries such as `LIMIT` without a corresponding `ORDER BY` clause.

Expand Down
26 changes: 13 additions & 13 deletions _posts/2024-08-08-friendly-lists-and-their-buddies-the-lambdas.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,10 +92,10 @@ In SQL, it would look like this:

```sql
WITH flattened_tbl AS (
SELECT unnest(l) AS elements, n, rowid
SELECT unnest(l) AS elements, n, rowid
FROM my_lists
)
SELECT array_agg(elements + n) AS result
SELECT array_agg(elements + n) AS result
FROM flattened_tbl
GROUP BY rowid
ORDER BY rowid;
Expand Down Expand Up @@ -168,7 +168,7 @@ Firstly, we added 1M rows to our table `my_lists`, each containing five elements

```sql
INSERT INTO my_lists
SELECT [r, r % 10, r + 5, r + 11, r % 2], r
SELECT [r, r % 10, r + 5, r + 11, r % 2], r
FROM range(1_000_000) AS tbl(r);
```

Expand Down Expand Up @@ -253,24 +253,24 @@ For our example, we assume that input BSNs are of type `INTEGER[]`.

```sql
CREATE OR REPLACE TABLE bsn_tbl AS
FROM VALUES
([2, 4, 6, 7, 4, 7, 5, 9, 6]),
([1, 2, 3, 4, 5, 6, 7, 8, 9]),
([7, 6, 7, 4, 4, 5, 2, 1, 1]),
([8, 7, 9, 0, 2, 3, 4, 1, 7]),
([1, 2, 3, 4, 5, 6, 7, 8, 9, 0])
tbl(bsn);
FROM VALUES
([2, 4, 6, 7, 4, 7, 5, 9, 6]),
([1, 2, 3, 4, 5, 6, 7, 8, 9]),
([7, 6, 7, 4, 4, 5, 2, 1, 1]),
([8, 7, 9, 0, 2, 3, 4, 1, 7]),
([1, 2, 3, 4, 5, 6, 7, 8, 9, 0])
tbl(bsn);
```

#### Solution

When this problem was initially proposed, DuckDB didn't have support for `list_reduce`.
When this problem was initially proposed, DuckDB didn't have support for `list_reduce`.
Instead, the user came up with the following:

```sql
CREATE OR REPLACE MACRO valid_bsn(bsn) AS (
list_sum(
[array_extract(bsn, x)::INTEGER * (IF (x = 9, -1, 10 - x))
[array_extract(bsn, x)::INTEGER * (IF (x = 9, -1, 10 - x))
FOR x IN range(1, 10, 1)]
) % 11 = 0
);
Expand All @@ -281,7 +281,7 @@ We also added a check validating that the length is always nine digits.

```sql
CREATE OR REPLACE MACRO valid_bsn(bsn) AS (
list_reduce(list_reverse(bsn),
list_reduce(list_reverse(bsn),
(x, y, i) -> IF (i = 1, -x, x) + y * (i + 1)) % 11 = 0
AND len(bsn) = 9
);
Expand Down
6 changes: 3 additions & 3 deletions docs/api/julia.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ results = DBInterface.execute(con, "SELECT 42 a")
print(results)
```

Some SQL statements, such as PIVOT and IMPORT DATABASE are executed as multiple prepared statements and will error when using `DuckDB.execute()`. Instead they can be run with `DuckDB.query()` instead of `DuckDB.execute()` and will always return a materialized result.
Some SQL statements, such as PIVOT and IMPORT DATABASE are executed as multiple prepared statements and will error when using `DuckDB.execute()`. Instead they can be run with `DuckDB.query()` instead of `DuckDB.execute()` and will always return a materialized result.

## Scanning DataFrames

Expand Down Expand Up @@ -94,7 +94,7 @@ for i in eachrow(df)
end
DuckDB.end_row(appender)
end
# close the appender after all rows
# close the appender after all rows
DuckDB.close(appender)
```

Expand Down Expand Up @@ -145,7 +145,7 @@ function run_appender(db, id)
for j in row
DuckDB.append(appender, j);
end
DuckDB.end_row(appender);
DuckDB.end_row(appender);
end
DuckDB.close(appender);
end
Expand Down
2 changes: 1 addition & 1 deletion docs/api/python/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -226,7 +226,7 @@ con.load_extension("spatial")

### Community Extensions

To load [community extensions]({% link docs/extensions/community_extensions.md %}), use `repository="community"` argument to the `install_extension` method.
To load [community extensions]({% link docs/extensions/community_extensions.md %}), use `repository="community"` argument to the `install_extension` method.

For example, install and load the `h3` community extension as follows:

Expand Down
4 changes: 2 additions & 2 deletions docs/configuration/pragmas.md
Original file line number Diff line number Diff line change
Expand Up @@ -342,7 +342,7 @@ SET enable_profiling = 'query_tree_optimizer';
Database drivers and other applications can also access profiling information through API calls, in which case users can disable any other output.
Even though the parameter reads `no_output`, it is essential to note that this **only** affects printing to the configurable output.
When accessing profiling information through API calls, it is still crucial to enable profiling:

```sql
SET enable_profiling = 'no_output';
```
Expand Down Expand Up @@ -383,7 +383,7 @@ Using the `custom_profiling_settings` `PRAGMA`, each metric, including those fro
This `PRAGMA` accepts a JSON object with metric names as keys and boolean values to toggle them on or off.
Settings specified by this `PRAGMA` override the default behavior.

> Note This only affects the metrics when the `enable_profiling` is set to `json` or `no_output`.
> Note This only affects the metrics when the `enable_profiling` is set to `json` or `no_output`.
> The `query_tree` and `query_tree_optimizer` always use a default set of metrics.

In the following example, the `CPU_TIME` metric is disabled.
Expand Down
4 changes: 2 additions & 2 deletions docs/data/csv/auto_detection.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ The type detection works by attempting to convert the values in each column to t

Note everything can be cast to `VARCHAR`. This type has the lowest priority – i.e., columns are converted to `VARCHAR` if they cannot be cast to anything else. In [`flights.csv`](/data/flights.csv) the `FlightDate` column will be cast to a `DATE`, while the other columns will be cast to `VARCHAR`.

The set of candidate types that should be considered by the CSV reader can be explicitly specified using the [`auto_type_candidates`]({% link docs/data/csv/overview.md %}#auto_type_candidates-details) option.
The set of candidate types that should be considered by the CSV reader can be explicitly specified using the [`auto_type_candidates`]({% link docs/data/csv/overview.md %}#auto_type_candidates-details) option.

In addition to the default set of candidate types, other types that may be specified using the `auto_type_candidates` options are:

Expand All @@ -135,7 +135,7 @@ The detected types can be individually overridden using the `types` option. This
* A list of type definitions (e.g., `types = ['INTEGER', 'VARCHAR', 'DATE']`). This overrides the types of the columns in-order of occurrence in the CSV file.
* Alternatively, `types` takes a `name``type` map which overrides options of individual columns (e.g., `types = {'quarter': 'INTEGER'}`).

The set of column types that may be specified using the `types` option is not as limited as the types available for the `auto_type_candidates` option: any valid type definition is acceptable to the `types`-option. (To get a valid type definition, use the [`typeof()`]({% link docs/sql/functions/utility.md %}#typeofexpression) function, or use the `column_type` column of the [`DESCRIBE`]({% link docs/guides/meta/describe.md %}) result.)
The set of column types that may be specified using the `types` option is not as limited as the types available for the `auto_type_candidates` option: any valid type definition is acceptable to the `types`-option. (To get a valid type definition, use the [`typeof()`]({% link docs/sql/functions/utility.md %}#typeofexpression) function, or use the `column_type` column of the [`DESCRIBE`]({% link docs/guides/meta/describe.md %}) result.)

The `sniff_csv()` function's `Column` field returns a struct with column names and types that can be used as a basis for overriding types.

Expand Down
5 changes: 3 additions & 2 deletions docs/dev/building/build_configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,8 +44,9 @@ This doesn't actually create a build, but uses the following format checkers to

The CI will also run this check, causing it to fail if this check fails.

## Extension selection
[Core DuckDB extensions]({% link docs/extensions/core_extensions.md %}) are that are the one maintaned by the DuckDB team, that are hosted in the duckdb GitHub repository, and are served by the `core` extension repository.
## Extension Selection

[Core DuckDB extensions]({% link docs/extensions/core_extensions.md %}) are the ones maintaned by the DuckDB team. These are hosted in the `duckdb` GitHub organization and are served by the `core` extension repository.

Core extensions can be built as part of DuckDB via the `CORE_EXTENSION` flag, then listing the names of the extensions that are to be built.

Expand Down
2 changes: 1 addition & 1 deletion docs/dev/building/build_instructions.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,7 @@ pacman -Syu git mingw-w64-x86_64-toolchain mingw-w64-x86_64-cmake mingw-w64-x86_
git clone https://github.com/duckdb/duckdb
cd duckdb
cmake -G "Ninja" -DCMAKE_BUILD_TYPE=Release -DBUILD_EXTENSIONS="icu;parquet;json"
cmake --build . --config Release
cmake --build . --config Release
```

Once the build finishes successfully, you can find the `duckdb.exe` binary in the repository's directory:
Expand Down
26 changes: 13 additions & 13 deletions docs/dev/profiling.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,23 +41,23 @@ For more information, see the [“Profiling”]({% link docs/configuration/pragm
The query tree has two types of nodes: the `QUERY_ROOT` and `OPERATOR` nodes.
The `QUERY_ROOT` refers exclusively to the top-level node, and the metrics it contains are measured over the entire query.
The `OPERATOR` nodes refer to the individual operators in the query plan.
Some metrics are only available for `QUERY_ROOT` nodes, while others are only for `OPERATOR` nodes.
Some metrics are only available for `QUERY_ROOT` nodes, while others are only for `OPERATOR` nodes.
The table below describes each metric and which nodes they are available for.

Other than `QUERY_NAME` and `OPERATOR_TYPE`, it is possible to turn all metrics on or off.

| Metric | Return type | Unit | Query | Operator | Description |
|-------------------------|-------------|----------|:-----:|:--------:|-------------------------------------------------------------------------------------------------------------------------------|
| `BLOCKED_THREAD_TIME` | `double` | seconds | | | The total time threads are blocked. |
| `EXTRA_INFO` | `string` | | || Unique operator metrics. |
| `LATENCY` | `double` | seconds | | | The total elapsed query execution time. |
| `OPERATOR_CARDINALITY` | `uint64` | absolute | || The cardinality of each operator, i.e., the number of rows it returns to its parent. Operator equivalent of `ROWS_RETURNED`. |
| `OPERATOR_ROWS_SCANNED` | `uint64` | absolute | || The total rows scanned by each operator. |
| `OPERATOR_TIMING` | `double` | seconds | || The time taken by each operator. Operator equivalent of `LATENCY`. |
| `OPERATOR_TYPE` | `string` | | || The name of each operator. |
| `QUERY_NAME` | `string` | | | | The query string. |
| `RESULT_SET_SIZE` | `uint64` | bytes | || The size of the result. |
| `ROWS_RETURNED` | `uint64` | absolute | | | The number of rows returned by the query. |
| `BLOCKED_THREAD_TIME` | `double` | seconds || | The total time threads are blocked. |
| `EXTRA_INFO` | `string` | ||| Unique operator metrics. |
| `LATENCY` | `double` | seconds || | The total elapsed query execution time. |
| `OPERATOR_CARDINALITY` | `uint64` | absolute | || The cardinality of each operator, i.e., the number of rows it returns to its parent. Operator equivalent of `ROWS_RETURNED`. |
| `OPERATOR_ROWS_SCANNED` | `uint64` | absolute | || The total rows scanned by each operator. |
| `OPERATOR_TIMING` | `double` | seconds | || The time taken by each operator. Operator equivalent of `LATENCY`. |
| `OPERATOR_TYPE` | `string` | | || The name of each operator. |
| `QUERY_NAME` | `string` | || | The query string. |
| `RESULT_SET_SIZE` | `uint64` | bytes ||| The size of the result. |
| `ROWS_RETURNED` | `uint64` | absolute || | The number of rows returned by the query. |

### Cumulative Metrics

Expand Down Expand Up @@ -120,7 +120,7 @@ The following are the metrics supported in the physical planner:
## Custom Metrics Examples

The following examples demonstrate how to enable custom profiling and set the output format to `json`.
In the first example, we enable profiling and set the output to a file.
In the first example, we enable profiling and set the output to a file.
We only enable `EXTRA_INFO`, `OPERATOR_CARDINALITY`, and `OPERATOR_TIMING`.

```sql
Expand Down Expand Up @@ -224,7 +224,7 @@ The contents of the outputted file:
"result_set_size": 32,
"cpu_time": 0.000095,
"children": [
...
...
```

## Query Graphs
Expand Down
2 changes: 1 addition & 1 deletion docs/extensions/community_extensions.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
layout: docu
title: Community Extensions
title: Community Extensions
---

DuckDB recently launched a [Community Extensions repository](https://github.com/duckdb/community-extensions).
Expand Down
2 changes: 1 addition & 1 deletion docs/extensions/core_extensions.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
layout: docu
title: Core Extensions
redirect_from:
- docs/extensions/official_extensions
- docs/extensions/official_extensions
---

## List of Core Extensions
Expand Down
Loading