diff --git a/data/search_data.json b/data/search_data.json index 892da640bf2..09e6db2ff2c 100644 --- a/data/search_data.json +++ b/data/search_data.json @@ -630,13 +630,6 @@ "url": "/docs/sql/expressions/overview", "blurb": "An expression is a combination of values, operators and functions. Expressions are highly composable, and range from..." }, - { - "title": "Extensions", - "text": "duckdb has a number of extensions available for use not all of them are included by default in every distribution but duckdb has a mechanism that allows for remote installation remote installation if a given extensions is not available with your distribution you can do the following to make it available install fts load fts if you are using the python api client you can install and load them with the load_extension name str and install_extension name str methods unsigned extensions all verified extensions are signed if you wish to load your own extensions or extensions from untrusted third-parties you ll need to enable the allow_unsigned_extensions flag to load unsigned extensions using the cli you ll need to pass the -unsigned flag to it on startup listing extensions you can check the list of core and installed extensions with the following query select from duckdb_extensions all available extensions extension name description aliases ------------------- ---------------------------------------------------------------------- ----------------- autocomplete adds supports for autocomplete in the shell excel fts adds support for full-text search indexes httpfs adds support for reading and writing files over a http s connection http https s3 icu adds support for time zones and collations using the icu library inet adds support for ip-related data types and functions jemalloc overwrites system allocator with jemalloc json adds support for json operations parquet adds support for reading and writing parquet files postgres_scanner adds support for reading from a postgres database postgres spatial adds support for geospatial data processing sqlite_scanner adds support for reading sqlite database files sqlite sqlite3 substrait adds support for the substrait integration tpcds adds tpc-ds data generation and query support tpch adds tpc-h data generation and query support visualizer downloading extensions directly from s3 downloading an extension directly could be helpful when building a lambda or container that uses duckdb duckdb extensions are stored in public s3 buckets but the directory structure of those buckets is not searchable as a result a direct url to the file must be used to directly download an extension file use the following format https extensions duckdb org v release_version_number platform_name extension_name duckdb_extension gz for example https extensions duckdb org v site currentduckdbversion windows_amd64 json duckdb_extension gz the list of supported platforms may increase over time but the current list of platforms includes linux_amd64_gcc4 linux_amd64 linux_arm64 osx_amd64 osx_arm64 windows_amd64 windows_amd64_rtools see above for a list of extension names and how to pull the latest list of extensions loading an extension from local storage extensions are stored in gzip format so they must be unzipped prior to use there are many methods to decompress gzip here is a python example import gzip import shutil with gzip open httpfs duckdb_extension gz rb as f_in with open httpfs duckdb_extension wb as f_out shutil copyfileobj f_in f_out after unzipping the install and load commands can be used with the path to the duckdb_extension file for example if the file was unzipped into the same directory as where duckdb is being executed install httpfs duckdb_extension load httpfs duckdb_extension pages in this section", - "category": "Extensions", - "url": "/docs/extensions/overview", - "blurb": "DuckDB has a number of extensions available for use. Not all of them are included by default in every distribution,..." - }, { "title": "Extensions", "text": "default extensions currently enabled in duckdb-wasm are parquet and fts httpfs is a specific re-implementation that comes bundled by default json and excel are build-time opt-in dynamic runtime extension loading dynamic extension loading is currently experimental participate in the tracking issue or try it on the experimental deployment at https shellwip duckdb org include iframe html src https shellwip duckdb org", @@ -644,6 +637,13 @@ "url": "/docs/api/wasm/extensions", "blurb": "Default extensions currently enabled in DuckDB-Wasm are Parquet and FTS. HTTPFS is a specific re-implementation that..." }, + { + "title": "Extensions", + "text": "duckdb has a number of extensions available for use not all of them are included by default in every distribution but duckdb has a mechanism that allows for remote installation remote installation if a given extensions is not available with your distribution you can do the following to make it available install fts load fts if you are using the python api client you can install and load them with the load_extension name str and install_extension name str methods unsigned extensions all verified extensions are signed if you wish to load your own extensions or extensions from untrusted third-parties you ll need to enable the allow_unsigned_extensions flag to load unsigned extensions using the cli you ll need to pass the -unsigned flag to it on startup listing extensions you can check the list of core and installed extensions with the following query select from duckdb_extensions all available extensions extension name description aliases ------------------- ---------------------------------------------------------------------- ----------------- autocomplete adds supports for autocomplete in the shell excel fts adds support for full-text search indexes httpfs adds support for reading and writing files over a http s connection http https s3 icu adds support for time zones and collations using the icu library inet adds support for ip-related data types and functions jemalloc overwrites system allocator with jemalloc json adds support for json operations parquet adds support for reading and writing parquet files postgres_scanner adds support for reading from a postgres database postgres spatial adds support for geospatial data processing sqlite_scanner adds support for reading sqlite database files sqlite sqlite3 substrait adds support for the substrait integration tpcds adds tpc-ds data generation and query support tpch adds tpc-h data generation and query support visualizer downloading extensions directly from s3 downloading an extension directly could be helpful when building a lambda or container that uses duckdb duckdb extensions are stored in public s3 buckets but the directory structure of those buckets is not searchable as a result a direct url to the file must be used to directly download an extension file use the following format https extensions duckdb org v release_version_number platform_name extension_name duckdb_extension gz for example https extensions duckdb org v site currentduckdbversion windows_amd64 json duckdb_extension gz the list of supported platforms may increase over time but the current list of platforms includes linux_amd64_gcc4 linux_amd64 linux_arm64 osx_amd64 osx_arm64 windows_amd64 windows_amd64_rtools see above for a list of extension names and how to pull the latest list of extensions loading an extension from local storage extensions are stored in gzip format so they must be unzipped prior to use there are many methods to decompress gzip here is a python example import gzip import shutil with gzip open httpfs duckdb_extension gz rb as f_in with open httpfs duckdb_extension wb as f_out shutil copyfileobj f_in f_out after unzipping the install and load commands can be used with the path to the duckdb_extension file for example if the file was unzipped into the same directory as where duckdb is being executed install httpfs duckdb_extension load httpfs duckdb_extension pages in this section", + "category": "Extensions", + "url": "/docs/extensions/overview", + "blurb": "DuckDB has a number of extensions available for use. Not all of them are included by default in every distribution,..." + }, { "title": "FILTER Clause", "text": "the filter clause may optionally follow an aggregate function in a select statement this will filter the rows of data that are fed into the aggregate function in the same way that a where clause filters rows but localized to the specific aggregate function filter s are not currently able to be used when the aggregate function is in a windowing context there are multiple types of situations where this is useful including when evaluating multiple aggregates with different filters and when creating a pivoted view of a dataset filter provides a cleaner syntax for pivoting data when compared with the more traditional case when approach discussed below some aggregate functions also do not filter out null values so using a filter clause will return valid results when at times the case when approach will not this occurs with the functions first and last which are desirable in a non-aggregating pivot operation where the goal is to simply re-orient the data into columns rather than re-aggregate it filter also improves null handling when using the list and array_agg functions as the case when approach will include null values in the list result while the filter clause will remove them examples -- compare total row count to -- the number of rows where i 5 -- the number of rows where i is odd select count as total_rows count filter where i 5 as lte_five count filter where i 2 1 as odds from generate_series 1 10 tbl i total_rows lte_five odds --- --- --- 10 5 5 -- different aggregate functions may be used and multiple where expressions are also permitted -- the sum of i for rows where i 5 -- the median of i where i is odd select sum i filter where i 5 as lte_five_sum median i filter where i 2 1 as odds_median median i filter where i 2 1 and i 5 as odds_lte_five_median from generate_series 1 10 tbl i lte_five_sum odds_median odds_lte_five_median --- --- --- 15 5 0 3 0 the filter clause can also be used to pivot data from rows into columns this is a static pivot as columns must be defined prior to runtime in sql however this kind of statement can be dynamically generated in a host programming language to leverage duckdb s sql engine for rapid larger than memory pivoting --first generate an example dataset create temp table stacked_data as select i case when i rows 0 25 then 2022 when i rows 0 5 then 2023 when i rows 0 75 then 2024 when i rows 0 875 then 2025 else null end as year from select i count over as rows from generate_series 1 100000000 tbl i tbl -- pivot the data out by year move each year out to a separate column select count i filter where year 2022 as 2022 count i filter where year 2023 as 2023 count i filter where year 2024 as 2024 count i filter where year 2025 as 2025 count i filter where year is null as nulls from stacked_data --this syntax produces the same results as the the filter clauses above select count case when year 2022 then i end as 2022 count case when year 2023 then i end as 2023 count case when year 2024 then i end as 2024 count case when year 2025 then i end as 2025 count case when year is null then i end as nulls from stacked_data 2022 2023 2024 2025 nulls --- --- --- --- --- 25000000 25000000 25000000 12500000 12500000 however the case when approach will not work as expected when using an aggregate function that does not ignore null values the first function falls into this category so filter is preferred in this case -- pivot the data out by year move each year out to a separate column select first i filter where year 2022 as 2022 first i filter where year 2023 as 2023 first i filter where year 2024 as 2024 first i filter where year 2025 as 2025 first i filter where year is null as nulls from stacked_data 2022 2023 2024 2025 nulls --- --- --- --- --- 1474561 25804801 50749441 76431361 87500001 --this will produce null values whenever the first evaluation of the case when clause returns a null select first case when year 2022 then i end as 2022 first case when year 2023 then i end as 2023 first case when year 2024 then i end as 2024 first case when year 2025 then i end as 2025 first case when year is null then i end as nulls from stacked_data 2022 2023 2024 2025 nulls --- --- --- --- --- 1228801 null null null null aggregate function syntax including filter clause",