duckdb · szarnyasg · Nov 12, 2024 · Nov 12, 2024 · Nov 12, 2024 · Nov 12, 2024
diff --git a/_posts/2023-02-13-announcing-duckdb-070.md b/_posts/2023-02-13-announcing-duckdb-070.md
@@ -69,7 +69,7 @@ orders
 
 Note that currently the parallel writing is currently limited to non-insertion order preserving – which can be toggled by setting the `preserve_insertion_order` setting to false. In a future release we aim to alleviate this restriction and order parallel insertion order preserving writes as well.
 
-#### Multi-Database Support 
+#### Multi-Database Support
 
 **Attach Functionality.** This release adds support for [attaching multiple databases](https://github.com/duckdb/duckdb/pull/5764) to the same DuckDB instance. This easily allows data to be transferred between separate DuckDB database files, and also allows data from separate database files to be combined together in individual queries. Remote DuckDB instances (stored on a network accessible location like GitHub, for example) may also be attached.
 
@@ -82,7 +82,7 @@ DETACH new_db;
 
 See the [documentation for more information]({% link docs/sql/statements/attach.md %}).
 
-**SQLite Storage Back-end.** In addition to adding support for attaching DuckDB databases – this release also adds support for [*pluggable database engines*](https://github.com/duckdb/duckdb/pull/6066). This allows extensions to define their own database and catalog engines that can be attached to the system. Once attached, an engine can support both reads and writes. The [SQLite extension](https://github.com/duckdb/sqlite_scanner) makes use of this to add native read/write support for SQLite database files to DuckDB.
+**SQLite Storage Back-End.** In addition to adding support for attaching DuckDB databases – this release also adds support for [*pluggable database engines*](https://github.com/duckdb/duckdb/pull/6066). This allows extensions to define their own database and catalog engines that can be attached to the system. Once attached, an engine can support both reads and writes. The [SQLite extension](https://github.com/duckdb/sqlite_scanner) makes use of this to add native read/write support for SQLite database files to DuckDB.
 
 ```sql
 ATTACH 'sqlite_file.db' AS sqlite (TYPE sqlite);
@@ -118,7 +118,7 @@ FROM movies;
 
 See the [documentation for more information]({% link docs/sql/statements/insert.md %}#on-conflict-clause).
 
-**Lateral Joins.** Support for [lateral joins](https://github.com/duckdb/duckdb/pull/5393) is added in this release. Lateral joins are a more flexible variant of correlated subqueries that make working with nested data easier, as they allow [easier unnesting](https://github.com/duckdb/duckdb/pull/5485) of nested data.  
+**Lateral Joins.** Support for [lateral joins](https://github.com/duckdb/duckdb/pull/5393) is added in this release. Lateral joins are a more flexible variant of correlated subqueries that make working with nested data easier, as they allow [easier unnesting](https://github.com/duckdb/duckdb/pull/5485) of nested data.
 
 **Positional Joins.** While SQL formally models unordered sets, in practice the order of datasets does frequently have a meaning. DuckDB offers guarantees around maintaining the order of rows when loading data into tables or when exporting data back out to a file – as well as when executing queries such as `LIMIT` without a corresponding `ORDER BY` clause.
 

diff --git a/_posts/2024-08-08-friendly-lists-and-their-buddies-the-lambdas.md b/_posts/2024-08-08-friendly-lists-and-their-buddies-the-lambdas.md
@@ -92,10 +92,10 @@ In SQL, it would look like this:
 
 ```sql
 WITH flattened_tbl AS (
-    SELECT unnest(l) AS elements, n, rowid 
+    SELECT unnest(l) AS elements, n, rowid
     FROM my_lists
 )
-SELECT array_agg(elements + n) AS result 
+SELECT array_agg(elements + n) AS result
 FROM flattened_tbl
 GROUP BY rowid
 ORDER BY rowid;
@@ -168,7 +168,7 @@ Firstly, we added 1M rows to our table `my_lists`, each containing five elements
 
 ```sql
 INSERT INTO my_lists
-    SELECT [r, r % 10, r + 5, r + 11, r % 2], r 
+    SELECT [r, r % 10, r + 5, r + 11, r % 2], r
     FROM range(1_000_000) AS tbl(r);
 ```
 
@@ -253,24 +253,24 @@ For our example, we assume that input BSNs are of type `INTEGER[]`.
 
 ```sql
 CREATE OR REPLACE TABLE bsn_tbl AS
-FROM VALUES 
-    ([2, 4, 6, 7, 4, 7, 5, 9, 6]), 
-    ([1, 2, 3, 4, 5, 6, 7, 8, 9]), 
-    ([7, 6, 7, 4, 4, 5, 2, 1, 1]), 
-    ([8, 7, 9, 0, 2, 3, 4, 1, 7]), 
-    ([1, 2, 3, 4, 5, 6, 7, 8, 9, 0])
-    tbl(bsn);
+    FROM VALUES
+        ([2, 4, 6, 7, 4, 7, 5, 9, 6]),
+        ([1, 2, 3, 4, 5, 6, 7, 8, 9]),
+        ([7, 6, 7, 4, 4, 5, 2, 1, 1]),
+        ([8, 7, 9, 0, 2, 3, 4, 1, 7]),
+        ([1, 2, 3, 4, 5, 6, 7, 8, 9, 0])
+        tbl(bsn);
 ```
 
 #### Solution
 
-When this problem was initially proposed, DuckDB didn't have support for `list_reduce`. 
+When this problem was initially proposed, DuckDB didn't have support for `list_reduce`.
 Instead, the user came up with the following:
 
 ```sql
 CREATE OR REPLACE MACRO valid_bsn(bsn) AS (
     list_sum(
-        [array_extract(bsn, x)::INTEGER * (IF (x = 9, -1, 10 - x)) 
+        [array_extract(bsn, x)::INTEGER * (IF (x = 9, -1, 10 - x))
         FOR x IN range(1, 10, 1)]
     ) % 11 = 0
 );
@@ -281,7 +281,7 @@ We also added a check validating that the length is always nine digits.
 
 ```sql
 CREATE OR REPLACE MACRO valid_bsn(bsn) AS (
-    list_reduce(list_reverse(bsn), 
+    list_reduce(list_reverse(bsn),
         (x, y, i) -> IF (i = 1, -x, x) + y * (i + 1)) % 11 = 0
     AND len(bsn) = 9
 );

diff --git a/docs/api/julia.md b/docs/api/julia.md
@@ -42,7 +42,7 @@ results = DBInterface.execute(con, "SELECT 42 a")
 print(results)
 ```
 
-Some SQL statements, such as PIVOT and IMPORT DATABASE are executed as multiple prepared statements and will error when using `DuckDB.execute()`. Instead they can be run with `DuckDB.query()` instead of `DuckDB.execute()` and will always return a materialized result. 
+Some SQL statements, such as PIVOT and IMPORT DATABASE are executed as multiple prepared statements and will error when using `DuckDB.execute()`. Instead they can be run with `DuckDB.query()` instead of `DuckDB.execute()` and will always return a materialized result.
 
 ## Scanning DataFrames
 
@@ -94,7 +94,7 @@ for i in eachrow(df)
     end
     DuckDB.end_row(appender)
 end
-# close the appender after all rows 
+# close the appender after all rows
 DuckDB.close(appender)
 ```
 
@@ -145,7 +145,7 @@ function run_appender(db, id)
         for j in row
             DuckDB.append(appender, j);
         end
-        DuckDB.end_row(appender); 
+        DuckDB.end_row(appender);
     end
     DuckDB.close(appender);
 end

diff --git a/docs/api/python/overview.md b/docs/api/python/overview.md
@@ -226,7 +226,7 @@ con.load_extension("spatial")
 
 ### Community Extensions
 
-To load [community extensions]({% link docs/extensions/community_extensions.md %}), use `repository="community"` argument to the `install_extension` method. 
+To load [community extensions]({% link docs/extensions/community_extensions.md %}), use `repository="community"` argument to the `install_extension` method.
 
 For example, install and load the `h3` community extension as follows:
 

diff --git a/docs/configuration/pragmas.md b/docs/configuration/pragmas.md
@@ -342,7 +342,7 @@ SET enable_profiling = 'query_tree_optimizer';
 Database drivers and other applications can also access profiling information through API calls, in which case users can disable any other output.
 Even though the parameter reads `no_output`, it is essential to note that this **only** affects printing to the configurable output.
 When accessing profiling information through API calls, it is still crucial to enable profiling:
-    
+
 ```sql
 SET enable_profiling = 'no_output';
 ```
@@ -383,7 +383,7 @@ Using the `custom_profiling_settings` `PRAGMA`, each metric, including those fro
 This `PRAGMA` accepts a JSON object with metric names as keys and boolean values to toggle them on or off.
 Settings specified by this `PRAGMA` override the default behavior.
 
-> Note This only affects the metrics when the `enable_profiling` is set to `json` or `no_output`. 
+> Note This only affects the metrics when the `enable_profiling` is set to `json` or `no_output`.
 > The `query_tree` and `query_tree_optimizer` always use a default set of metrics.
 
 In the following example, the `CPU_TIME` metric is disabled.

diff --git a/docs/data/csv/auto_detection.md b/docs/data/csv/auto_detection.md
@@ -110,7 +110,7 @@ The type detection works by attempting to convert the values in each column to t
 
 Note everything can be cast to `VARCHAR`. This type has the lowest priority – i.e., columns are converted to `VARCHAR` if they cannot be cast to anything else. In [`flights.csv`](/data/flights.csv) the `FlightDate` column will be cast to a `DATE`, while the other columns will be cast to `VARCHAR`.
 
-The set of candidate types that should be considered by the CSV reader can be explicitly specified using the [`auto_type_candidates`]({% link docs/data/csv/overview.md %}#auto_type_candidates-details) option. 
+The set of candidate types that should be considered by the CSV reader can be explicitly specified using the [`auto_type_candidates`]({% link docs/data/csv/overview.md %}#auto_type_candidates-details) option.
 
 In addition to the default set of candidate types, other types that may be specified using the `auto_type_candidates` options are:
 
@@ -135,7 +135,7 @@ The detected types can be individually overridden using the `types` option. This
 * A list of type definitions (e.g., `types = ['INTEGER', 'VARCHAR', 'DATE']`). This overrides the types of the columns in-order of occurrence in the CSV file.
 * Alternatively, `types` takes a `name` → `type` map which overrides options of individual columns (e.g., `types = {'quarter': 'INTEGER'}`).
 
-The set of column types that may be specified using the `types` option is not as limited as the types available for the `auto_type_candidates` option: any valid type definition is acceptable to the `types`-option. (To get a valid type definition, use the [`typeof()`]({% link docs/sql/functions/utility.md %}#typeofexpression) function, or use the `column_type` column  of the [`DESCRIBE`]({% link docs/guides/meta/describe.md %}) result.)  
+The set of column types that may be specified using the `types` option is not as limited as the types available for the `auto_type_candidates` option: any valid type definition is acceptable to the `types`-option. (To get a valid type definition, use the [`typeof()`]({% link docs/sql/functions/utility.md %}#typeofexpression) function, or use the `column_type` column  of the [`DESCRIBE`]({% link docs/guides/meta/describe.md %}) result.)
 
 The `sniff_csv()` function's `Column` field returns a struct with column names and types that can be used as a basis for overriding types.
 

diff --git a/docs/dev/building/build_configuration.md b/docs/dev/building/build_configuration.md
@@ -44,8 +44,9 @@ This doesn't actually create a build, but uses the following format checkers to
 
 The CI will also run this check, causing it to fail if this check fails.
 
-## Extension selection
-[Core DuckDB extensions]({% link docs/extensions/core_extensions.md %}) are that are the one maintaned by the DuckDB team, that are hosted in the duckdb GitHub repository, and are served by the `core` extension repository.
+## Extension Selection
+
+[Core DuckDB extensions]({% link docs/extensions/core_extensions.md %}) are the ones maintaned by the DuckDB team. These are hosted in the `duckdb` GitHub organization and are served by the `core` extension repository.
 
 Core extensions can be built as part of DuckDB via the `CORE_EXTENSION` flag, then listing the names of the extensions that are to be built.
 

diff --git a/docs/dev/building/build_instructions.md b/docs/dev/building/build_instructions.md
@@ -119,7 +119,7 @@ pacman -Syu git mingw-w64-x86_64-toolchain mingw-w64-x86_64-cmake mingw-w64-x86_
 git clone https://github.com/duckdb/duckdb
 cd duckdb
 cmake -G "Ninja" -DCMAKE_BUILD_TYPE=Release -DBUILD_EXTENSIONS="icu;parquet;json"
-cmake --build . --config Release 
+cmake --build . --config Release
 ```
 
 Once the build finishes successfully, you can find the `duckdb.exe` binary in the repository's directory:

diff --git a/docs/dev/profiling.md b/docs/dev/profiling.md
@@ -41,23 +41,23 @@ For more information, see the [“Profiling”]({% link docs/configuration/pragm
 The query tree has two types of nodes: the `QUERY_ROOT` and `OPERATOR` nodes.
 The `QUERY_ROOT` refers exclusively to the top-level node, and the metrics it contains are measured over the entire query.
 The `OPERATOR` nodes refer to the individual operators in the query plan.
-Some metrics are only available for `QUERY_ROOT` nodes, while others are only for `OPERATOR` nodes.  
+Some metrics are only available for `QUERY_ROOT` nodes, while others are only for `OPERATOR` nodes.
 The table below describes each metric and which nodes they are available for.
 
 Other than `QUERY_NAME` and `OPERATOR_TYPE`, it is possible to turn all metrics on or off.
 
 | Metric                  | Return type |   Unit   | Query | Operator | Description                                                                                                                   |
 |-------------------------|-------------|----------|:-----:|:--------:|-------------------------------------------------------------------------------------------------------------------------------|
-| `BLOCKED_THREAD_TIME`   | `double`    | seconds  |   ✅   |          | The total time threads are blocked.                                                                                          |
-| `EXTRA_INFO`            | `string`    |          |   ✅   |    ✅    | Unique operator metrics.                                                                                                     |
-| `LATENCY`               | `double`    | seconds  |   ✅   |          | The total elapsed query execution time.                                                                                      |
-| `OPERATOR_CARDINALITY`  | `uint64`    | absolute |        |    ✅    | The cardinality of each operator, i.e., the number of rows it returns to its parent. Operator equivalent of `ROWS_RETURNED`. |
-| `OPERATOR_ROWS_SCANNED` | `uint64`    | absolute |        |    ✅    | The total rows scanned by each operator.                                                                                     |
-| `OPERATOR_TIMING`       | `double`    | seconds  |        |    ✅    | The time taken by each operator. Operator equivalent of `LATENCY`.                                                           |
-| `OPERATOR_TYPE`         | `string`    |          |        |    ✅    | The name of each operator.                                                                                                   |
-| `QUERY_NAME`            | `string`    |          |   ✅   |          | The query string.                                                                                                            |
-| `RESULT_SET_SIZE`       | `uint64`    |  bytes   |   ✅   |    ✅    | The size of the result.                                                                                                      |
-| `ROWS_RETURNED`         | `uint64`    | absolute |   ✅   |          | The number of rows returned by the query.                                                                                    |
+| `BLOCKED_THREAD_TIME`   | `double`    | seconds  |   ✅  |          | The total time threads are blocked.                                                                                          |
+| `EXTRA_INFO`            | `string`    |          |   ✅  |    ✅    | Unique operator metrics.                                                                                                     |
+| `LATENCY`               | `double`    | seconds  |   ✅  |          | The total elapsed query execution time.                                                                                      |
+| `OPERATOR_CARDINALITY`  | `uint64`    | absolute |       |    ✅    | The cardinality of each operator, i.e., the number of rows it returns to its parent. Operator equivalent of `ROWS_RETURNED`. |
+| `OPERATOR_ROWS_SCANNED` | `uint64`    | absolute |       |    ✅    | The total rows scanned by each operator.                                                                                     |
+| `OPERATOR_TIMING`       | `double`    | seconds  |       |    ✅    | The time taken by each operator. Operator equivalent of `LATENCY`.                                                           |
+| `OPERATOR_TYPE`         | `string`    |          |       |    ✅    | The name of each operator.                                                                                                   |
+| `QUERY_NAME`            | `string`    |          |   ✅  |          | The query string.                                                                                                            |
+| `RESULT_SET_SIZE`       | `uint64`    |  bytes   |   ✅  |    ✅    | The size of the result.                                                                                                      |
+| `ROWS_RETURNED`         | `uint64`    | absolute |   ✅  |          | The number of rows returned by the query.                                                                                    |
 
 ### Cumulative Metrics
 
@@ -120,7 +120,7 @@ The following are the metrics supported in the physical planner:
 ## Custom Metrics Examples
 
 The following examples demonstrate how to enable custom profiling and set the output format to `json`.
-In the first example, we enable profiling and set the output to a file. 
+In the first example, we enable profiling and set the output to a file.
 We only enable `EXTRA_INFO`, `OPERATOR_CARDINALITY`, and `OPERATOR_TIMING`.
 
 ```sql
@@ -224,7 +224,7 @@ The contents of the outputted file:
       "result_set_size": 32,
       "cpu_time": 0.000095,
       "children": [
-...        
+...
 ```
 
 ## Query Graphs

diff --git a/docs/extensions/community_extensions.md b/docs/extensions/community_extensions.md
@@ -1,6 +1,6 @@
 ---
 layout: docu
-title: Community Extensions 
+title: Community Extensions
 ---
 
 DuckDB recently launched a [Community Extensions repository](https://github.com/duckdb/community-extensions).

diff --git a/docs/extensions/core_extensions.md b/docs/extensions/core_extensions.md
@@ -2,7 +2,7 @@
 layout: docu
 title: Core Extensions
 redirect_from:
-  - docs/extensions/official_extensions 
+  - docs/extensions/official_extensions
 ---
 
 ## List of Core Extensions