Minor fixes

duckdb · Jun 8, 2024 · a75d698 · a75d698
1 parent 9a25a46
commit a75d698
Show file tree

Hide file tree

Showing 8 changed files with 81 additions and 51 deletions.
diff --git a/docs/api/python/conversion.md b/docs/api/python/conversion.md
@@ -11,14 +11,14 @@ This page documents the rules for converting [Python objects to DuckDB](#object-
 
 This is a mapping of Python object types to DuckDB [Logical Types](../../sql/data_types/overview):
 
-* `None` -> `NULL`
-* `bool` -> `BOOLEAN`
-* `datetime.timedelta` -> `INTERVAL`
-* `str` -> `VARCHAR`
-* `bytearray` -> `BLOB`
-* `memoryview` -> `BLOB`
-* `decimal.Decimal` -> `DECIMAL` / `DOUBLE`
-* `uuid.UUID` -> `UUID`
+* `None` → `NULL`
+* `bool` → `BOOLEAN`
+* `datetime.timedelta` → `INTERVAL`
+* `str` → `VARCHAR`
+* `bytearray` → `BLOB`
+* `memoryview` → `BLOB`
+* `decimal.Decimal` → `DECIMAL` / `DOUBLE`
+* `uuid.UUID` → `UUID`
 
 The rest of the conversion rules are as follows.
 
@@ -156,39 +156,57 @@ DuckDB's Python client provides multiple additional methods that can be used to
 
 * `pl()` fetches the data as a Polars DataFrame
 
-Below are some examples using this functionality. See the Python [guides](../../guides/index#python-client) for more examples.
+### Examples
+
+Below are some examples using this functionality. See the [Python guides](../../guides/index#python-client) for more examples.
+
+Fetch as Pandas DataFrame:
 
 ```python
-# fetch as Pandas DataFrame
 df = con.execute("SELECT * FROM items").fetchdf()
 print(df)
-#        item   value  count
-# 0     jeans    20.0      1
-# 1    hammer    42.2      2
-# 2    laptop  2000.0      1
-# 3  chainsaw   500.0     10
-# 4    iphone   300.0      2
-
-# fetch as dictionary of numpy arrays
+```
+
+```text
+       item   value  count
+0     jeans    20.0      1
+1    hammer    42.2      2
+2    laptop  2000.0      1
+3  chainsaw   500.0     10
+4    iphone   300.0      2
+```
+
+Fetch as dictionary of NumPy arrays:
+
+```python
 arr = con.execute("SELECT * FROM items").fetchnumpy()
 print(arr)
-# {'item': masked_array(data=['jeans', 'hammer', 'laptop', 'chainsaw', 'iphone'],
-#              mask=[False, False, False, False, False],
-#        fill_value='?',
-#             dtype=object), 'value': masked_array(data=[20.0, 42.2, 2000.0, 500.0, 300.0],
-#              mask=[False, False, False, False, False],
-#        fill_value=1e+20), 'count': masked_array(data=[1, 2, 1, 10, 2],
-#              mask=[False, False, False, False, False],
-#        fill_value=999999,
-#             dtype=int32)}
-
-# fetch as an Arrow table. Converting to Pandas afterwards just for pretty printing
+```
+
+```text
+{'item': masked_array(data=['jeans', 'hammer', 'laptop', 'chainsaw', 'iphone'],
+             mask=[False, False, False, False, False],
+       fill_value='?',
+            dtype=object), 'value': masked_array(data=[20.0, 42.2, 2000.0, 500.0, 300.0],
+             mask=[False, False, False, False, False],
+       fill_value=1e+20), 'count': masked_array(data=[1, 2, 1, 10, 2],
+             mask=[False, False, False, False, False],
+       fill_value=999999,
+            dtype=int32)}
+```
+
+Fetch as an Arrow table. Converting to Pandas afterwards just for pretty printing:
+
+```python
 tbl = con.execute("SELECT * FROM items").fetch_arrow_table()
 print(tbl.to_pandas())
-#        item    value  count
-# 0     jeans    20.00      1
-# 1    hammer    42.20      2
-# 2    laptop  2000.00      1
-# 3  chainsaw   500.00     10
-# 4    iphone   300.00      2
+```
+
+```text
+       item    value  count
+0     jeans    20.00      1
+1    hammer    42.20      2
+2    laptop  2000.00      1
+3  chainsaw   500.00     10
+4    iphone   300.00      2
 ```
diff --git a/docs/api/python/data_ingestion.md b/docs/api/python/data_ingestion.md
@@ -135,8 +135,11 @@ DuckDB supports querying multiple types of Apache Arrow objects including [table
 import duckdb
 import pandas as pd
 test_df = pd.DataFrame.from_dict({"i": [1, 2, 3, 4], "j": ["one", "two", "three", "four"]})
-duckdb.sql("SELECT * FROM test_df").fetchall()
-# [(1, 'one'), (2, 'two'), (3, 'three'), (4, 'four')]
+print(duckdb.sql("SELECT * FROM test_df").fetchall())
+```
+
+```text
+[(1, 'one'), (2, 'two'), (3, 'three'), (4, 'four')]
 ```
 
 DuckDB also supports "registering" a DataFrame or Arrow object as a virtual table, comparable to a SQL `VIEW`. This is useful when querying a DataFrame/Arrow object that is stored in another way (as a class variable, or a value in a dictionary). Below is a Pandas example:
@@ -149,7 +152,7 @@ import pandas as pd
 my_dictionary = {}
 my_dictionary["test_df"] = pd.DataFrame.from_dict({"i": [1, 2, 3, 4], "j": ["one", "two", "three", "four"]})
 duckdb.register("test_df_view", my_dictionary["test_df"])
-duckdb.sql("SELECT * FROM test_df_view").fetchall()
+print(duckdb.sql("SELECT * FROM test_df_view").fetchall())
 ```
 
 ```text

diff --git a/docs/data/csv/auto_detection.md b/docs/data/csv/auto_detection.md
@@ -82,6 +82,7 @@ FlightDate|UniqueCarrier|OriginCityName|DestCityName
 ```
 
 In this file, the dialect detection works as follows:
+
 * If we split by a `|` every row is split into `4` columns
 * If we split by a `,` rows 2-4 are split into `3` columns, while the first row is split into `1` column
 * If we split by `;`, every row is split into `1` column
@@ -109,7 +110,7 @@ The type detection works by attempting to convert the values in each column to t
 
 Note everything can be cast to `VARCHAR`. This type has the lowest priority – i.e., columns are converted to `VARCHAR` if they cannot be cast to anything else. In [`flights.csv`](/data/flights.csv) the `FlightDate` column will be cast to a `DATE`, while the other columns will be cast to `VARCHAR`.
 
-The detected types can be individually overridden using the `types` option. This option takes either a list of types (e.g., `types = [INTEGER, VARCHAR, DATE]`) which overrides the types of the columns in-order of occurrence in the CSV file. Alternatively, `types` takes a `name -> type` map which overrides options of individual columns (e.g., `types = {'quarter': INTEGER}`).
+The detected types can be individually overridden using the `types` option. This option takes either a list of types (e.g., `types = [INTEGER, VARCHAR, DATE]`) which overrides the types of the columns in-order of occurrence in the CSV file. Alternatively, `types` takes a `name` → `type` map which overrides options of individual columns (e.g., `types = {'quarter': INTEGER}`).
 
 The type detection can be entirely disabled by using the `all_varchar` option. If this is set all columns will remain as `VARCHAR` (as they originally occur in the CSV file).
 

diff --git a/docs/data/csv/overview.md b/docs/data/csv/overview.md
@@ -64,13 +64,13 @@ CREATE TABLE ontime AS
 Write the result of a query to a CSV file.
 
 ```sql
-COPY (SELECT * FROM ontime) TO 'flights.csv' WITH (HEADER true, DELIMITER '|');
+COPY (SELECT * FROM ontime) TO 'flights.csv' WITH (HEADER, DELIMITER '|');
 ```
 
 If we serialize the entire table, we can simply refer to it with its name.
 
 ```sql
-COPY ontime TO 'flights.csv' WITH (HEADER true, DELIMITER '|');
+COPY ontime TO 'flights.csv' WITH (HEADER, DELIMITER '|');
 ```
 
 ## CSV Loading

diff --git a/docs/data/csv/tips.md b/docs/data/csv/tips.md
@@ -23,7 +23,7 @@ SELECT * FROM read_csv('flights.csv', names = ['DateOfFlight', 'CarrierName']);
 
 ## Override the Types of Specific Columns
 
-The `types` flag can be used to override types of only certain columns by providing a struct of `name -> type` mappings.
+The `types` flag can be used to override types of only certain columns by providing a struct of `name` → `type` mappings.
 
 ```sql
 SELECT * FROM read_csv('flights.csv', types = {'FlightDate': 'DATE'});

diff --git a/docs/guides/python/execute_sql.md b/docs/guides/python/execute_sql.md
@@ -15,16 +15,22 @@ By default this will create a relation object. The result can be converted to va
 ```python
 results = duckdb.sql("SELECT 42").fetchall()
 print(results)
-# [(42,)]
+```
+
+```text
+[(42,)]
 ```
 
 Several other result objects exist. For example, you can use `df` to convert the result to a Pandas DataFrame.
 
 ```python
 results = duckdb.sql("SELECT 42").df()
 print(results)
-#    42
-# 0  42
+```
+
+```text
+    42
+ 0  42
 ```
 
 By default, a global in-memory connection will be used. Any data stored in files will be lost after shutting down the program. A connection to a persistent database can be created using the `connect` function.

diff --git a/docs/internals/overview.md b/docs/internals/overview.md
@@ -38,15 +38,17 @@ The SQLStatement represents a complete SQL statement. The type of the SQL Statem
 ## Binder
 
 The binder converts all nodes into their **bound** equivalents. In the binder phase:
+
 * The tables and columns are resolved using the catalog
 * Types are resolved
 * Aggregate/window functions are extracted
 
 The following conversions happen:
-* SQLStatement -> [`BoundStatement`](https://github.com/duckdb/duckdb/blob/main/src/include/duckdb/planner/bound_statement.hpp)
-* QueryNode -> [`BoundQueryNode`](https://github.com/duckdb/duckdb/blob/main/src/include/duckdb/planner/bound_query_node.hpp)
-* TableRef -> [`BoundTableRef`](https://github.com/duckdb/duckdb/blob/main/src/include/duckdb/planner/bound_tableref.hpp)
-* ParsedExpression -> [`Expression`](https://github.com/duckdb/duckdb/blob/main/src/include/duckdb/planner/expression.hpp)
+
+* SQLStatement → [`BoundStatement`](https://github.com/duckdb/duckdb/blob/main/src/include/duckdb/planner/bound_statement.hpp)
+* QueryNode → [`BoundQueryNode`](https://github.com/duckdb/duckdb/blob/main/src/include/duckdb/planner/bound_query_node.hpp)
+* TableRef → [`BoundTableRef`](https://github.com/duckdb/duckdb/blob/main/src/include/duckdb/planner/bound_tableref.hpp)
+* ParsedExpression → [`Expression`](https://github.com/duckdb/duckdb/blob/main/src/include/duckdb/planner/expression.hpp)
 
 ## Logical Planner
 

diff --git a/docs/sql/data_types/union.md b/docs/sql/data_types/union.md
@@ -86,8 +86,8 @@ The only exception to this is when casting a `UNION` to `VARCHAR`, in which case
 
 A type can always be implicitly cast to a `UNION` if it can be implicitly cast to one of the `UNION` member types.
 
-* If there are multiple candidates, the built in implicit casting priority rules determine the target type. For example, a `FLOAT -> UNION(i INTEGER, v VARCHAR)` cast will always cast the `FLOAT` to the `INTEGER` member before `VARCHAR`.
-* If the cast still is ambiguous, i.e., there are multiple candidates with the same implicit casting priority, an error is raised. This usually happens when the `UNION` contains multiple members of the same type, e.g., a `FLOAT -> UNION(i INTEGER, num INTEGER)` is always ambiguous.
+* If there are multiple candidates, the built in implicit casting priority rules determine the target type. For example, a `FLOAT` → `UNION(i INTEGER, v VARCHAR)` cast will always cast the `FLOAT` to the `INTEGER` member before `VARCHAR`.
+* If the cast still is ambiguous, i.e., there are multiple candidates with the same implicit casting priority, an error is raised. This usually happens when the `UNION` contains multiple members of the same type, e.g., a `FLOAT` → `UNION(i INTEGER, num INTEGER)` is always ambiguous.
 
 So how do we disambiguate if we want to create a `UNION` with multiple members of the same type? By using the `union_value` function, which takes a keyword argument specifying the tag. For example, `union_value(num := 2::INTEGER)` will create a `UNION` with a single member of type `INTEGER` with the tag `num`. This can then be used to disambiguate in an explicit (or implicit, read on below!) `UNION` to `UNION` cast, like `CAST(union_value(b := 2) AS UNION(a INTEGER, b INTEGER))`.