From 860b1a048c87e8a2171ddb811852d5f87df93c42 Mon Sep 17 00:00:00 2001 From: Alex-Monahan Date: Sat, 14 Sep 2024 10:48:27 -0700 Subject: [PATCH 01/24] SQL Only extensions initial commit --- _posts/2024-09-14-sql-only-extensions.md | 11 +++++++++++ 1 file changed, 11 insertions(+) create mode 100644 _posts/2024-09-14-sql-only-extensions.md diff --git a/_posts/2024-09-14-sql-only-extensions.md b/_posts/2024-09-14-sql-only-extensions.md new file mode 100644 index 00000000000..3e69835b877 --- /dev/null +++ b/_posts/2024-09-14-sql-only-extensions.md @@ -0,0 +1,11 @@ +--- +layout: post +title: "SQL-Only Extensions in DuckDB" +author: "Alex Monahan" +excerpt: "Easily create sharable extensions using just SQL MACROs that can apply to any table" +--- + From 8af2ed9c2efbb8e1212a0127ac3a7518e183d7b1 Mon Sep 17 00:00:00 2001 From: Alex-Monahan Date: Sat, 14 Sep 2024 11:14:18 -0700 Subject: [PATCH 02/24] Outline --- _posts/2024-09-14-sql-only-extensions.md | 45 ++++++++++++++++++++++-- 1 file changed, 43 insertions(+), 2 deletions(-) diff --git a/_posts/2024-09-14-sql-only-extensions.md b/_posts/2024-09-14-sql-only-extensions.md index 3e69835b877..6ef2d04554a 100644 --- a/_posts/2024-09-14-sql-only-extensions.md +++ b/_posts/2024-09-14-sql-only-extensions.md @@ -1,11 +1,52 @@ --- layout: post -title: "SQL-Only Extensions in DuckDB" +title: "Using a SQL-Only Extension for Excel-Style Pivoting in DuckDB" author: "Alex Monahan" -excerpt: "Easily create sharable extensions using just SQL MACROs that can apply to any table" +excerpt: "Now you can easily create sharable extensions using only SQL MACROs that can apply to any table and any columns. We demonstrate the power of this capability with the pivot_table extension that provides Excel-style pivoting" --- From 1eea5e515dec9ac8f28ab2a8886e4037f81b6ce7 Mon Sep 17 00:00:00 2001 From: Alex-Monahan Date: Sat, 14 Sep 2024 15:02:41 -0700 Subject: [PATCH 03/24] WIP first draft. Still need pivot_table details. --- _posts/2024-09-14-sql-only-extensions.md | 349 ++++++++++++++++++++++- 1 file changed, 347 insertions(+), 2 deletions(-) diff --git a/_posts/2024-09-14-sql-only-extensions.md b/_posts/2024-09-14-sql-only-extensions.md index 6ef2d04554a..d80d242d129 100644 --- a/_posts/2024-09-14-sql-only-extensions.md +++ b/_posts/2024-09-14-sql-only-extensions.md @@ -1,10 +1,11 @@ --- layout: post -title: "Using a SQL-Only Extension for Excel-Style Pivoting in DuckDB" +title: "Creating a SQL-Only Extension for Excel-Style Pivoting in DuckDB" author: "Alex Monahan" excerpt: "Now you can easily create sharable extensions using only SQL MACROs that can apply to any table and any columns. We demonstrate the power of this capability with the pivot_table extension that provides Excel-style pivoting" --- ---> +## The Power of SQL-Only Extensions + +SQL is not a new language. +As a result, it has historically been missing some of the modern luxuries we take for granted. +With version 1.1, DuckDB has launched community extensions, bringing the incredible power of a package manager to the SQL language. +One goal for these extensions is to enable C++ libraries to be accessible through SQL across all of the languages with a DuckDB library. +For extension builders, compilation and distribution are much easier. +For the user community, installation is as simple as a single command: + +```sql +INSTALL pivot_table FROM community; +``` + +However, not all of us are C++ developers! +Can we, as a SQL community, build up a set of SQL helper functions? +What would it take to build these extensions with *just SQL*? + +### Reusability + +Traditionally, SQL is highly customized to the schema of the database on which it was written. +Can we make it reusable? +Some techniques for reusability were discussed in the SQL Gymnasics post, but now we can go even further. + +With version 1.1, DuckDB's world-class friendly SQL dialect makes it possible to create MACROs that can be applied: +* To any tables +* On any columns +* Using any functions +The new ability to work on any tables is thanks to the `query` and `query_table` functions! + +### Community Extensions as a Central Repository + +Traditionally, there has been no central repository for SQL functions across databases, let alone across companies! +DuckDB's community extensions can be that knowledge base. +If you are a DuckDB fan and a SQL user, you can share your expertise back to the community with an extension. +This post will show you how! +No C++ knowledge is needed - just a little bit of copy/paste and GitHub actions handles all the compilation. +If I can do it, you can do it! + +### Powerful SQL + +All that said, just how valuable can a SQL `MACRO` be? +Can we do more than make small snippets? +I'll make the case that you can do quite complex and powerful operations in DuckDB SQL using the `pivot_table` extension as an example. + +So, we now have all 3 ingredients we will need: a central package manager, reusable `MACRO`s, and enough syntactic flexibility to do valuable work. + +## Capabilities of the `pivot_table` Extension + +The `pivot_table` extension supports advanced pivoting functionality that was previously only available in spreadsheets, dataframe libraries, or custom host language functions. +It uses the Excel pivoting API: `values`, `rows`, `columns`, and `filters`. +It can handle 0 or more of each of those parameters. +However, not only that, but it supports `subtotals` and `grand_totals`. +If multiple `values` are passed in, the `values_axis` parameter allows the user to choose if each value should get its own column or its own row. + +> Note The only missing Excel feature I am aware of is columnar subtotals, but not even Pandas supports that! +> And we are officially open to contributions now... :-) + +Why is this a good example of how DuckDB moves beyond traditional SQL? +The Excel pivoting API requires dramatically different SQL syntax depending on which parameters are in use. +If no `columns` are pivoted outward, a `GROUP BY` is all that is needed. +However, once `columns` are involved, a `PIVOT` is required. + +This function can operate on one or more `table_names` that are passed in as a parameter. +Any set of tables will first be vertically stacked and then pivoted. + +## Example Using `pivot_table` + +
+ + First we will create an example data table. We are a duck product disributor. + + +```sql +CREATE OR REPLACE TABLE business_metrics ( + product_line VARCHAR, product VARCHAR, year INTEGER, quarter VARCHAR, revenue integer, cost integer +); +INSERT INTO business_metrics VALUES + ('Waterfowl watercraft', 'Duck boats', 2022, 'Q1', 100, 100), + ('Waterfowl watercraft', 'Duck boats', 2022, 'Q2', 200, 100), + ('Waterfowl watercraft', 'Duck boats', 2022, 'Q3', 300, 100), + ('Waterfowl watercraft', 'Duck boats', 2022, 'Q4', 400, 100), + ('Waterfowl watercraft', 'Duck boats', 2023, 'Q1', 500, 100), + ('Waterfowl watercraft', 'Duck boats', 2023, 'Q2', 600, 100), + ('Waterfowl watercraft', 'Duck boats', 2023, 'Q3', 700, 100), + ('Waterfowl watercraft', 'Duck boats', 2023, 'Q4', 800, 100), + + ('Duck Duds', 'Duck suits', 2022, 'Q1', 10, 10), + ('Duck Duds', 'Duck suits', 2022, 'Q2', 20, 10), + ('Duck Duds', 'Duck suits', 2022, 'Q3', 30, 10), + ('Duck Duds', 'Duck suits', 2022, 'Q4', 40, 10), + ('Duck Duds', 'Duck suits', 2023, 'Q1', 50, 10), + ('Duck Duds', 'Duck suits', 2023, 'Q2', 60, 10), + ('Duck Duds', 'Duck suits', 2023, 'Q3', 70, 10), + ('Duck Duds', 'Duck suits', 2023, 'Q4', 80, 10), + + ('Duck Duds', 'Duck neckties', 2022, 'Q1', 1, 1), + ('Duck Duds', 'Duck neckties', 2022, 'Q2', 2, 1), + ('Duck Duds', 'Duck neckties', 2022, 'Q3', 3, 1), + ('Duck Duds', 'Duck neckties', 2022, 'Q4', 4, 1), + ('Duck Duds', 'Duck neckties', 2023, 'Q1', 5, 1), + ('Duck Duds', 'Duck neckties', 2023, 'Q2', 6, 1), + ('Duck Duds', 'Duck neckties', 2023, 'Q3', 7, 1), + ('Duck Duds', 'Duck neckties', 2023, 'Q4', 8, 1), +; + +FROM business_metrics; +``` +
+ +| product_line | product | year | quarter | revenue | cost | +|----------------------|---------------|-----:|---------|--------:|-----:| +| Waterfowl watercraft | Duck boats | 2022 | Q1 | 100 | 100 | +| Waterfowl watercraft | Duck boats | 2022 | Q2 | 200 | 100 | +| Waterfowl watercraft | Duck boats | 2022 | Q3 | 300 | 100 | +| Waterfowl watercraft | Duck boats | 2022 | Q4 | 400 | 100 | +| Waterfowl watercraft | Duck boats | 2023 | Q1 | 500 | 100 | +| Waterfowl watercraft | Duck boats | 2023 | Q2 | 600 | 100 | +| Waterfowl watercraft | Duck boats | 2023 | Q3 | 700 | 100 | +| Waterfowl watercraft | Duck boats | 2023 | Q4 | 800 | 100 | +| Duck Duds | Duck suits | 2022 | Q1 | 10 | 10 | +| Duck Duds | Duck suits | 2022 | Q2 | 20 | 10 | +| Duck Duds | Duck suits | 2022 | Q3 | 30 | 10 | +| Duck Duds | Duck suits | 2022 | Q4 | 40 | 10 | +| Duck Duds | Duck suits | 2023 | Q1 | 50 | 10 | +| Duck Duds | Duck suits | 2023 | Q2 | 60 | 10 | +| Duck Duds | Duck suits | 2023 | Q3 | 70 | 10 | +| Duck Duds | Duck suits | 2023 | Q4 | 80 | 10 | +| Duck Duds | Duck neckties | 2022 | Q1 | 1 | 1 | +| Duck Duds | Duck neckties | 2022 | Q2 | 2 | 1 | +| Duck Duds | Duck neckties | 2022 | Q3 | 3 | 1 | +| Duck Duds | Duck neckties | 2022 | Q4 | 4 | 1 | +| Duck Duds | Duck neckties | 2023 | Q1 | 5 | 1 | +| Duck Duds | Duck neckties | 2023 | Q2 | 6 | 1 | +| Duck Duds | Duck neckties | 2023 | Q3 | 7 | 1 | +| Duck Duds | Duck neckties | 2023 | Q4 | 8 | 1 | + +Now we can build pivot tables like the one below. +There is a little bit of boilerplate required, and the details of how this works are explained later in the post. + +```sql +DROP TYPE IF EXISTS columns_parameter_enum; + +CREATE TYPE columns_parameter_enum AS ENUM ( + FROM build_my_enum(['business_metrics'], ['year', 'quarter'], []) +); + +FROM pivot_table(['business_metrics'], -- table_names + ['sum(revenue)', 'sum(cost)'], -- values + ['product_line', 'product'], -- rows + ['year', 'quarter'], -- columns + [], -- filters + subtotals:=1, + grand_totals:=1, + values_axis:='rows' + ); +``` + +| product_line | product | value_names | 2022_Q1 | 2022_Q2 | 2022_Q3 | 2022_Q4 | 2023_Q1 | 2023_Q2 | 2023_Q3 | 2023_Q4 | +|----------------------|---------------|--------------|---------|---------|---------|---------|---------|---------|---------|---------| +| Duck Duds | Duck neckties | sum(cost) | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | +| Duck Duds | Duck neckties | sum(revenue) | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | +| Duck Duds | Duck suits | sum(cost) | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | +| Duck Duds | Duck suits | sum(revenue) | 10 | 20 | 30 | 40 | 50 | 60 | 70 | 80 | +| Duck Duds | Subtotal | sum(cost) | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | +| Duck Duds | Subtotal | sum(revenue) | 11 | 22 | 33 | 44 | 55 | 66 | 77 | 88 | +| Waterfowl watercraft | Duck boats | sum(cost) | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | +| Waterfowl watercraft | Duck boats | sum(revenue) | 100 | 200 | 300 | 400 | 500 | 600 | 700 | 800 | +| Waterfowl watercraft | Subtotal | sum(cost) | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | +| Waterfowl watercraft | Subtotal | sum(revenue) | 100 | 200 | 300 | 400 | 500 | 600 | 700 | 800 | +| Grand Total | Grand Total | sum(cost) | 111 | 111 | 111 | 111 | 111 | 111 | 111 | 111 | +| Grand Total | Grand Total | sum(revenue) | 111 | 222 | 333 | 444 | 555 | 666 | 777 | 888 | + +## Create Your Own SQL Extension + +Let's walk through the steps to creating your own SQL-only extension. + +### Writing the Extension + +#### Extension Setup + +The first step is to create your own GitHub repo from the [DuckDB Extension Template](https://github.com/duckdb/extension-template) by clicking `Use this template`. + +Then clone your new repository onto your local machine using the terminal: +```sh +git clone --recurse-submodules https://github.com//.git +``` +Note that `--recurse-submodules` will ensure DuckDB is pulled which is required to build the extension. + +Next, replace the name of the example extension with the name of your extension in all the right places by running the Python script below. + +> Note If you don't have Python installed, head to [python.org](https://python.org) and follow those instructions. +> This script doesn't require any libraries, so Python is all you need! (No need to set up any environments) + +```python +python3 ./scripts/bootstrap-template.py +``` + +#### Initial Extension Test + +At this point, you can follow the directions in the README to build and test locally if you would like. +However, even easier, you can simply commit your changes to git and push them to GitHub, and GitHub actions can do the compilation for you! + +> Note The instructions are not written for a Windows audience, so we recommend GitHub Actions in that case! + +```sh +git add -A +git commit -m "Initial commit of my SQL extension!" +git push +``` + + + + +#### Write Your SQL Macros + +It it likely a bit faster to iterate if you test our your macros directly in DuckDB. +The example we will use demonstrates how to pull a dynamic set of columns from a dynamic table name. + +```sql +CREATE OR REPLACE MACRO select_distinct_columns_from_table(table_name, columns_list) AS TABLE ( + SELECT DISTINCT + COLUMNS(column_name -> list_contains(columns_list, column_name)) + FROM query_table(table_name) +); +``` + +#### Add SQL Macros + +Technically, this is the C++ part, but we are going to do some copy/paste and use GitHub actions for compiling so it won't feel that way! + +DuckDB supports both scalar and table macros, and they have slightly different syntax. +The extension template has an example for each (and code comments too!) inside the file named `.cpp`. +Let's add a table macro here since it is the more complex one. +We will copy the example and modify it! + +```cpp +static const DefaultTableMacro _table_macros[] = { + {DEFAULT_SCHEMA, "times_two_table", {"x", nullptr}, {{"two", "2"}, {nullptr, nullptr}}, R"(SELECT x * two as output_column;)"}, + { + DEFAULT_SCHEMA, // Leave the schema as the default + "select_distinct_columns_from_table", // Function name + {"table_name", "columns_list", nullptr}, // Parameters + {{nullptr, nullptr}}, // Optional parameter names and values (we choose not to have any here) + // The SQL text inside of your SQL Macro, wrapped in R"( )", which is a raw string in C++ + R"( + SELECT DISTINCT + COLUMNS(column_name -> list_contains(columns_list, column_name)) + FROM query_table(table_name) + )" + }, + {nullptr, nullptr, {nullptr}, {{nullptr, nullptr}}, nullptr} + }; +``` + +That's it! +All we had to provide were the name of the function, the names of the parameters, and the text of our SQL `MACRO`. + +Now, just add, commit, and push your changes to GitHub like before, and GitHub actions will compile your extension and upload it to AWS S3! + +### Testing the Extension + +For testing purposes, we can use any DuckDB client, but this example uses the CLI. + +> Note We need to run DuckDB with the -unsigned flag since our extension hasn't been signed yet. +> It will be signed after we upload it to the community repository + +```shell +duckdb -unsigned +``` + +Next, run the SQL command below to point DuckDB's extension loader to the S3 bucket that was automatically created for you. + +```sql +SET custom_extension_repository='bucket.s3.eu-west-1.amazonaws.com//latest'; +``` +Note that the `/latest` path will allow you to install the latest extension version available for your current version of +DuckDB. To specify a specific version, you can pass the version instead. + +After running these steps, you can install and load your extension using the regular INSTALL/LOAD commands in DuckDB: +```sql +INSTALL ; +LOAD ; + +SELECT * +FROM select_distinct_columns_from_table('business_metrics', ['product_line', 'product']); +``` + +| product_line | product | +|----------------------|---------------| +| Waterfowl watercraft | Duck boats | +| Duck Duds | Duck neckties | +| Duck Duds | Duck suits | + + +### Uploading to the Community Extensions Repository + +Once you are happy with your extension, it's time to share it with the DuckDB community! +Follow the steps in [the Community Extensions post]({% post_url 2024-07-05-community-extensions %}#developer-experience). +A summary of those steps is: + +1. Send a PR with a metadata file `description.yml` contains the description of the extension. For example: + + ```yaml + extension: + name: h3 + description: Hierarchical hexagonal indexing for geospatial data + version: 1.0.0 + language: C++ + build: cmake + license: Apache-2.0 + maintainers: + - isaacbrodsky + + repo: + github: isaacbrodsky/h3-duckdb + ref: 3c8a5358e42ab8d11e0253c70f7cc7d37781b2ef + ``` + +2. Wait for approval from the maintainers! + + + + + \ No newline at end of file From 844919b7d915e8971eccdf5f3fc16e96180f1b15 Mon Sep 17 00:00:00 2001 From: Alex-Monahan Date: Sat, 14 Sep 2024 15:21:21 -0700 Subject: [PATCH 04/24] Reorder the flow to put pivot_table at the end --- _posts/2024-09-14-sql-only-extensions.md | 262 ++++++++++++----------- 1 file changed, 136 insertions(+), 126 deletions(-) diff --git a/_posts/2024-09-14-sql-only-extensions.md b/_posts/2024-09-14-sql-only-extensions.md index d80d242d129..23575c9f0f4 100644 --- a/_posts/2024-09-14-sql-only-extensions.md +++ b/_posts/2024-09-14-sql-only-extensions.md @@ -97,132 +97,6 @@ I'll make the case that you can do quite complex and powerful operations in Duck So, we now have all 3 ingredients we will need: a central package manager, reusable `MACRO`s, and enough syntactic flexibility to do valuable work. -## Capabilities of the `pivot_table` Extension - -The `pivot_table` extension supports advanced pivoting functionality that was previously only available in spreadsheets, dataframe libraries, or custom host language functions. -It uses the Excel pivoting API: `values`, `rows`, `columns`, and `filters`. -It can handle 0 or more of each of those parameters. -However, not only that, but it supports `subtotals` and `grand_totals`. -If multiple `values` are passed in, the `values_axis` parameter allows the user to choose if each value should get its own column or its own row. - -> Note The only missing Excel feature I am aware of is columnar subtotals, but not even Pandas supports that! -> And we are officially open to contributions now... :-) - -Why is this a good example of how DuckDB moves beyond traditional SQL? -The Excel pivoting API requires dramatically different SQL syntax depending on which parameters are in use. -If no `columns` are pivoted outward, a `GROUP BY` is all that is needed. -However, once `columns` are involved, a `PIVOT` is required. - -This function can operate on one or more `table_names` that are passed in as a parameter. -Any set of tables will first be vertically stacked and then pivoted. - -## Example Using `pivot_table` - -
- - First we will create an example data table. We are a duck product disributor. - - -```sql -CREATE OR REPLACE TABLE business_metrics ( - product_line VARCHAR, product VARCHAR, year INTEGER, quarter VARCHAR, revenue integer, cost integer -); -INSERT INTO business_metrics VALUES - ('Waterfowl watercraft', 'Duck boats', 2022, 'Q1', 100, 100), - ('Waterfowl watercraft', 'Duck boats', 2022, 'Q2', 200, 100), - ('Waterfowl watercraft', 'Duck boats', 2022, 'Q3', 300, 100), - ('Waterfowl watercraft', 'Duck boats', 2022, 'Q4', 400, 100), - ('Waterfowl watercraft', 'Duck boats', 2023, 'Q1', 500, 100), - ('Waterfowl watercraft', 'Duck boats', 2023, 'Q2', 600, 100), - ('Waterfowl watercraft', 'Duck boats', 2023, 'Q3', 700, 100), - ('Waterfowl watercraft', 'Duck boats', 2023, 'Q4', 800, 100), - - ('Duck Duds', 'Duck suits', 2022, 'Q1', 10, 10), - ('Duck Duds', 'Duck suits', 2022, 'Q2', 20, 10), - ('Duck Duds', 'Duck suits', 2022, 'Q3', 30, 10), - ('Duck Duds', 'Duck suits', 2022, 'Q4', 40, 10), - ('Duck Duds', 'Duck suits', 2023, 'Q1', 50, 10), - ('Duck Duds', 'Duck suits', 2023, 'Q2', 60, 10), - ('Duck Duds', 'Duck suits', 2023, 'Q3', 70, 10), - ('Duck Duds', 'Duck suits', 2023, 'Q4', 80, 10), - - ('Duck Duds', 'Duck neckties', 2022, 'Q1', 1, 1), - ('Duck Duds', 'Duck neckties', 2022, 'Q2', 2, 1), - ('Duck Duds', 'Duck neckties', 2022, 'Q3', 3, 1), - ('Duck Duds', 'Duck neckties', 2022, 'Q4', 4, 1), - ('Duck Duds', 'Duck neckties', 2023, 'Q1', 5, 1), - ('Duck Duds', 'Duck neckties', 2023, 'Q2', 6, 1), - ('Duck Duds', 'Duck neckties', 2023, 'Q3', 7, 1), - ('Duck Duds', 'Duck neckties', 2023, 'Q4', 8, 1), -; - -FROM business_metrics; -``` -
- -| product_line | product | year | quarter | revenue | cost | -|----------------------|---------------|-----:|---------|--------:|-----:| -| Waterfowl watercraft | Duck boats | 2022 | Q1 | 100 | 100 | -| Waterfowl watercraft | Duck boats | 2022 | Q2 | 200 | 100 | -| Waterfowl watercraft | Duck boats | 2022 | Q3 | 300 | 100 | -| Waterfowl watercraft | Duck boats | 2022 | Q4 | 400 | 100 | -| Waterfowl watercraft | Duck boats | 2023 | Q1 | 500 | 100 | -| Waterfowl watercraft | Duck boats | 2023 | Q2 | 600 | 100 | -| Waterfowl watercraft | Duck boats | 2023 | Q3 | 700 | 100 | -| Waterfowl watercraft | Duck boats | 2023 | Q4 | 800 | 100 | -| Duck Duds | Duck suits | 2022 | Q1 | 10 | 10 | -| Duck Duds | Duck suits | 2022 | Q2 | 20 | 10 | -| Duck Duds | Duck suits | 2022 | Q3 | 30 | 10 | -| Duck Duds | Duck suits | 2022 | Q4 | 40 | 10 | -| Duck Duds | Duck suits | 2023 | Q1 | 50 | 10 | -| Duck Duds | Duck suits | 2023 | Q2 | 60 | 10 | -| Duck Duds | Duck suits | 2023 | Q3 | 70 | 10 | -| Duck Duds | Duck suits | 2023 | Q4 | 80 | 10 | -| Duck Duds | Duck neckties | 2022 | Q1 | 1 | 1 | -| Duck Duds | Duck neckties | 2022 | Q2 | 2 | 1 | -| Duck Duds | Duck neckties | 2022 | Q3 | 3 | 1 | -| Duck Duds | Duck neckties | 2022 | Q4 | 4 | 1 | -| Duck Duds | Duck neckties | 2023 | Q1 | 5 | 1 | -| Duck Duds | Duck neckties | 2023 | Q2 | 6 | 1 | -| Duck Duds | Duck neckties | 2023 | Q3 | 7 | 1 | -| Duck Duds | Duck neckties | 2023 | Q4 | 8 | 1 | - -Now we can build pivot tables like the one below. -There is a little bit of boilerplate required, and the details of how this works are explained later in the post. - -```sql -DROP TYPE IF EXISTS columns_parameter_enum; - -CREATE TYPE columns_parameter_enum AS ENUM ( - FROM build_my_enum(['business_metrics'], ['year', 'quarter'], []) -); - -FROM pivot_table(['business_metrics'], -- table_names - ['sum(revenue)', 'sum(cost)'], -- values - ['product_line', 'product'], -- rows - ['year', 'quarter'], -- columns - [], -- filters - subtotals:=1, - grand_totals:=1, - values_axis:='rows' - ); -``` - -| product_line | product | value_names | 2022_Q1 | 2022_Q2 | 2022_Q3 | 2022_Q4 | 2023_Q1 | 2023_Q2 | 2023_Q3 | 2023_Q4 | -|----------------------|---------------|--------------|---------|---------|---------|---------|---------|---------|---------|---------| -| Duck Duds | Duck neckties | sum(cost) | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | -| Duck Duds | Duck neckties | sum(revenue) | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | -| Duck Duds | Duck suits | sum(cost) | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | -| Duck Duds | Duck suits | sum(revenue) | 10 | 20 | 30 | 40 | 50 | 60 | 70 | 80 | -| Duck Duds | Subtotal | sum(cost) | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | -| Duck Duds | Subtotal | sum(revenue) | 11 | 22 | 33 | 44 | 55 | 66 | 77 | 88 | -| Waterfowl watercraft | Duck boats | sum(cost) | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | -| Waterfowl watercraft | Duck boats | sum(revenue) | 100 | 200 | 300 | 400 | 500 | 600 | 700 | 800 | -| Waterfowl watercraft | Subtotal | sum(cost) | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | -| Waterfowl watercraft | Subtotal | sum(revenue) | 100 | 200 | 300 | 400 | 500 | 600 | 700 | 800 | -| Grand Total | Grand Total | sum(cost) | 111 | 111 | 111 | 111 | 111 | 111 | 111 | 111 | -| Grand Total | Grand Total | sum(revenue) | 111 | 222 | 333 | 444 | 555 | 666 | 777 | 888 | - ## Create Your Own SQL Extension Let's walk through the steps to creating your own SQL-only extension. @@ -371,7 +245,143 @@ A summary of those steps is: 2. Wait for approval from the maintainers! +And there you have it! +You have created a shareable DuckDB Community Extension. +Now let's have a look at the `pivot_table` extension as an example of just how powerful a SQL-only extension can be. + + +## Capabilities of the `pivot_table` Extension +The `pivot_table` extension supports advanced pivoting functionality that was previously only available in spreadsheets, dataframe libraries, or custom host language functions. +It uses the Excel pivoting API: `values`, `rows`, `columns`, and `filters`. +It can handle 0 or more of each of those parameters. +However, not only that, but it supports `subtotals` and `grand_totals`. +If multiple `values` are passed in, the `values_axis` parameter allows the user to choose if each value should get its own column or its own row. + +> Note The only missing Excel feature I am aware of is columnar subtotals, but not even Pandas supports that! +> And we are officially open to contributions now... :-) + +Why is this a good example of how DuckDB moves beyond traditional SQL? +The Excel pivoting API requires dramatically different SQL syntax depending on which parameters are in use. +If no `columns` are pivoted outward, a `GROUP BY` is all that is needed. +However, once `columns` are involved, a `PIVOT` is required. + +This function can operate on one or more `table_names` that are passed in as a parameter. +Any set of tables will first be vertically stacked and then pivoted. + +## Example Using `pivot_table` + +
+ + First we will create an example data table. We are a duck product disributor, and we are tracking our fowl finances. + + +```sql +CREATE OR REPLACE TABLE business_metrics ( + product_line VARCHAR, product VARCHAR, year INTEGER, quarter VARCHAR, revenue integer, cost integer +); +INSERT INTO business_metrics VALUES + ('Waterfowl watercraft', 'Duck boats', 2022, 'Q1', 100, 100), + ('Waterfowl watercraft', 'Duck boats', 2022, 'Q2', 200, 100), + ('Waterfowl watercraft', 'Duck boats', 2022, 'Q3', 300, 100), + ('Waterfowl watercraft', 'Duck boats', 2022, 'Q4', 400, 100), + ('Waterfowl watercraft', 'Duck boats', 2023, 'Q1', 500, 100), + ('Waterfowl watercraft', 'Duck boats', 2023, 'Q2', 600, 100), + ('Waterfowl watercraft', 'Duck boats', 2023, 'Q3', 700, 100), + ('Waterfowl watercraft', 'Duck boats', 2023, 'Q4', 800, 100), + + ('Duck Duds', 'Duck suits', 2022, 'Q1', 10, 10), + ('Duck Duds', 'Duck suits', 2022, 'Q2', 20, 10), + ('Duck Duds', 'Duck suits', 2022, 'Q3', 30, 10), + ('Duck Duds', 'Duck suits', 2022, 'Q4', 40, 10), + ('Duck Duds', 'Duck suits', 2023, 'Q1', 50, 10), + ('Duck Duds', 'Duck suits', 2023, 'Q2', 60, 10), + ('Duck Duds', 'Duck suits', 2023, 'Q3', 70, 10), + ('Duck Duds', 'Duck suits', 2023, 'Q4', 80, 10), + + ('Duck Duds', 'Duck neckties', 2022, 'Q1', 1, 1), + ('Duck Duds', 'Duck neckties', 2022, 'Q2', 2, 1), + ('Duck Duds', 'Duck neckties', 2022, 'Q3', 3, 1), + ('Duck Duds', 'Duck neckties', 2022, 'Q4', 4, 1), + ('Duck Duds', 'Duck neckties', 2023, 'Q1', 5, 1), + ('Duck Duds', 'Duck neckties', 2023, 'Q2', 6, 1), + ('Duck Duds', 'Duck neckties', 2023, 'Q3', 7, 1), + ('Duck Duds', 'Duck neckties', 2023, 'Q4', 8, 1), +; + +FROM business_metrics; +``` +
+ +| product_line | product | year | quarter | revenue | cost | +|----------------------|---------------|-----:|---------|--------:|-----:| +| Waterfowl watercraft | Duck boats | 2022 | Q1 | 100 | 100 | +| Waterfowl watercraft | Duck boats | 2022 | Q2 | 200 | 100 | +| Waterfowl watercraft | Duck boats | 2022 | Q3 | 300 | 100 | +| Waterfowl watercraft | Duck boats | 2022 | Q4 | 400 | 100 | +| Waterfowl watercraft | Duck boats | 2023 | Q1 | 500 | 100 | +| Waterfowl watercraft | Duck boats | 2023 | Q2 | 600 | 100 | +| Waterfowl watercraft | Duck boats | 2023 | Q3 | 700 | 100 | +| Waterfowl watercraft | Duck boats | 2023 | Q4 | 800 | 100 | +| Duck Duds | Duck suits | 2022 | Q1 | 10 | 10 | +| Duck Duds | Duck suits | 2022 | Q2 | 20 | 10 | +| Duck Duds | Duck suits | 2022 | Q3 | 30 | 10 | +| Duck Duds | Duck suits | 2022 | Q4 | 40 | 10 | +| Duck Duds | Duck suits | 2023 | Q1 | 50 | 10 | +| Duck Duds | Duck suits | 2023 | Q2 | 60 | 10 | +| Duck Duds | Duck suits | 2023 | Q3 | 70 | 10 | +| Duck Duds | Duck suits | 2023 | Q4 | 80 | 10 | +| Duck Duds | Duck neckties | 2022 | Q1 | 1 | 1 | +| Duck Duds | Duck neckties | 2022 | Q2 | 2 | 1 | +| Duck Duds | Duck neckties | 2022 | Q3 | 3 | 1 | +| Duck Duds | Duck neckties | 2022 | Q4 | 4 | 1 | +| Duck Duds | Duck neckties | 2023 | Q1 | 5 | 1 | +| Duck Duds | Duck neckties | 2023 | Q2 | 6 | 1 | +| Duck Duds | Duck neckties | 2023 | Q3 | 7 | 1 | +| Duck Duds | Duck neckties | 2023 | Q4 | 8 | 1 | + +Next, we install the extension from the community repository: + +```sql +INSTALL pivot_table FROM community; +LOAD pivot_table; +``` + +Now we can build pivot tables like the one below. +There is a little bit of boilerplate required, and the details of how this works will be explained shortly. + +```sql +DROP TYPE IF EXISTS columns_parameter_enum; + +CREATE TYPE columns_parameter_enum AS ENUM ( + FROM build_my_enum(['business_metrics'], ['year', 'quarter'], []) +); + +FROM pivot_table(['business_metrics'], -- table_names + ['sum(revenue)', 'sum(cost)'], -- values + ['product_line', 'product'], -- rows + ['year', 'quarter'], -- columns + [], -- filters + subtotals:=1, + grand_totals:=1, + values_axis:='rows' + ); +``` + +| product_line | product | value_names | 2022_Q1 | 2022_Q2 | 2022_Q3 | 2022_Q4 | 2023_Q1 | 2023_Q2 | 2023_Q3 | 2023_Q4 | +|----------------------|---------------|--------------|---------|---------|---------|---------|---------|---------|---------|---------| +| Duck Duds | Duck neckties | sum(cost) | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | +| Duck Duds | Duck neckties | sum(revenue) | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | +| Duck Duds | Duck suits | sum(cost) | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | +| Duck Duds | Duck suits | sum(revenue) | 10 | 20 | 30 | 40 | 50 | 60 | 70 | 80 | +| Duck Duds | Subtotal | sum(cost) | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | +| Duck Duds | Subtotal | sum(revenue) | 11 | 22 | 33 | 44 | 55 | 66 | 77 | 88 | +| Waterfowl watercraft | Duck boats | sum(cost) | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | +| Waterfowl watercraft | Duck boats | sum(revenue) | 100 | 200 | 300 | 400 | 500 | 600 | 700 | 800 | +| Waterfowl watercraft | Subtotal | sum(cost) | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | +| Waterfowl watercraft | Subtotal | sum(revenue) | 100 | 200 | 300 | 400 | 500 | 600 | 700 | 800 | +| Grand Total | Grand Total | sum(cost) | 111 | 111 | 111 | 111 | 111 | 111 | 111 | 111 | +| Grand Total | Grand Total | sum(revenue) | 111 | 222 | 333 | 444 | 555 | 666 | 777 | 888 | With version 1.1, DuckDB's world-class friendly SQL dialect makes it possible to create MACROs that can be applied: * To any tables * On any columns @@ -130,6 +130,7 @@ python3 ./scripts/bootstrap-template.py At this point, you can follow the directions in the README to build and test locally if you would like. However, even easier, you can simply commit your changes to git and push them to GitHub, and GitHub actions can do the compilation for you! +GitHub actions will also run tests on your extension to validate it is working properly. > Note The instructions are not written for a Windows audience, so we recommend GitHub Actions in that case! @@ -202,35 +203,30 @@ static const DefaultTableMacro _table_macros[] = { That's it! All we had to provide were the name of the function, the names of the parameters, and the text of our SQL `MACRO`. -Now, just add, commit, and push your changes to GitHub like before, and GitHub actions will compile your extension and upload it to AWS S3! - ### Testing the Extension -For testing purposes, we can use any DuckDB client, but this example uses the CLI. - -> Note We need to run DuckDB with the -unsigned flag since our extension hasn't been signed yet. -> It will be signed after we upload it to the community repository - -```shell -duckdb -unsigned -``` - -Next, run the SQL command below to point DuckDB's extension loader to the S3 bucket that was automatically created for you. +We also recommend adding some tests for your extension to the `.test` file. +This uses [sqllogictest](`{% link docs/dev/sqllogictest/intro.md %}`) to test with just SQL! +Let's add the example from above. -```sql -SET custom_extension_repository='bucket.s3.eu-west-1.amazonaws.com//latest'; -``` -Note that the `/latest` path will allow you to install the latest extension version available for your current version of -DuckDB. To specify a specific version, you can pass the version instead. +> Note In sqllogictest, `query I` indicates that there will be 1 column in the result. +> We then add `----` and the resultset in tab separated format with no column names. -After running these steps, you can install and load your extension using the regular INSTALL/LOAD commands in DuckDB, and then use it: ```sql -INSTALL ; -LOAD ; - +query I FROM select_distinct_columns_from_table('duckdb_types', ['type_category']); +---- +BOOLEAN +COMPOSITE +DATETIME +NUMERIC +STRING +NULL ``` +Now, just add, commit, and push your changes to GitHub like before, and GitHub actions will compile your extension and test it! + +If you would like to do further ad-hoc testing of your extension, you can download the extension from your GitHub actions run's artifacts and then [install it locally using these steps](`{% link docs/extensions/overview.md %}#unsigned-extensions`). ### Uploading to the Community Extensions Repository From 95638efeac573144e42cec4aed5c17962c81faa4 Mon Sep 17 00:00:00 2001 From: Alex-Monahan Date: Fri, 20 Sep 2024 08:33:37 -0700 Subject: [PATCH 09/24] WIP Friendly SQL used in pivot_table --- _posts/2024-09-14-sql-only-extensions.md | 32 +++++++++++++++++++----- 1 file changed, 26 insertions(+), 6 deletions(-) diff --git a/_posts/2024-09-14-sql-only-extensions.md b/_posts/2024-09-14-sql-only-extensions.md index 6d42c80d8cc..8401f1c8d80 100644 --- a/_posts/2024-09-14-sql-only-extensions.md +++ b/_posts/2024-09-14-sql-only-extensions.md @@ -99,6 +99,27 @@ I'll make the case that you can do quite complex and powerful operations in Duck The `pivot_table` function allows for Excel-style pivots, including `subtotals`, `grand_totals`, and more. It is also very similar to the Pandas `pivot_table` function, but with all the scalability and speed benefits of DuckDB! +To achieve this level of flexibility, the `pivot_table` extension uses many friendly and advanced SQL features: +* The [`query` function]({% post_url 2024-09-09-announcing-duckdb-110 %}#query-and-query_table-functions) to execute a SQL string +* The [`query_table` function]({% post_url 2024-09-09-announcing-duckdb-110 %}#query-and-query_table-functions) to query a list of tables +* The [`COLUMNS` expression]({% link docs/sql/expressions/star.md %}#columns-expression) to select a dynamic list of columns +* [List lambda functions]({% link docs/sql/functions/lambda.md %}) to build up the SQL statement passed into `query` + * [`list_transform`]({% link docs/sql/functions/lambda.md %}#list_transformlist-lambda) for string manipulation like quoting + * [`list_reduce`]({% link docs/sql/functions/lambda.md %}#list_reducelist-lambda) to concatenate strings together + * [`list_aggregate`]({% link docs/sql/functions/list.md %}#list_aggregatelist-name) to sum multiple columns and identify subtotal and grand total rows +* Bracket notation for string slicing +* `UNION ALL BY NAME` to stack data by column name for subtotals and grand totals +* `SELECT * REPLACE` to dynamically clean up subtotal columns +* `SELECT * EXCLUDE` to remove internally generated columns from the final result +* `GROUPING SETS` and `ROLLUP` to generate subtotals and grand totals +* `UNNEST` to convert lists into separate rows for `values_axis:='rows'` +* `MACRO`s to modularize the code +* `ORDER BY ALL` to order the result dynamically +* `ENUM`s to determine what columns to pivot horizontally +* And of course the `PIVOT` function for horizontal pivoting! + +DuckDB's innovative syntax makes this extension possible! + So, we now have all 3 ingredients we will need: a central package manager, reusable `MACRO`s, and enough syntactic flexibility to do valuable work. ## Create Your Own SQL Extension @@ -206,7 +227,7 @@ All we had to provide were the name of the function, the names of the parameters ### Testing the Extension We also recommend adding some tests for your extension to the `.test` file. -This uses [sqllogictest](`{% link docs/dev/sqllogictest/intro.md %}`) to test with just SQL! +This uses [sqllogictest]({% link docs/dev/sqllogictest/intro.md %}) to test with just SQL! Let's add the example from above. > Note In sqllogictest, `query I` indicates that there will be 1 column in the result. @@ -226,7 +247,7 @@ NULL Now, just add, commit, and push your changes to GitHub like before, and GitHub actions will compile your extension and test it! -If you would like to do further ad-hoc testing of your extension, you can download the extension from your GitHub actions run's artifacts and then [install it locally using these steps](`{% link docs/extensions/overview.md %}#unsigned-extensions`). +If you would like to do further ad-hoc testing of your extension, you can download the extension from your GitHub actions run's artifacts and then [install it locally using these steps]({% link docs/extensions/overview.md %}#unsigned-extensions). ### Uploading to the Community Extensions Repository @@ -234,7 +255,7 @@ Once you are happy with your extension, it's time to share it with the DuckDB co Follow the steps in [the Community Extensions post]({% post_url 2024-07-05-community-extensions %}#developer-experience). A summary of those steps is: -1. Send a PR with a metadata file `description.yml` contains the description of the extension. For example: +1. Send a PR with a metadata file `description.yml` that contains the description of the extension. For example: ```yaml extension: @@ -262,8 +283,7 @@ Now let's have a look at the `pivot_table` extension as an example of just how p ## Capabilities of the `pivot_table` Extension The `pivot_table` extension supports advanced pivoting functionality that was previously only available in spreadsheets, dataframe libraries, or custom host language functions. -It uses the Excel pivoting API: `values`, `rows`, `columns`, and `filters`. -It can handle 0 or more of each of those parameters. +It uses the Excel pivoting API: `values`, `rows`, `columns`, and `filters` - handling 0 or more of each of those parameters. However, not only that, but it supports `subtotals` and `grand_totals`. If multiple `values` are passed in, the `values_axis` parameter allows the user to choose if each value should get its own column or its own row. @@ -276,7 +296,7 @@ If no `columns` are pivoted outward, a `GROUP BY` is all that is needed. However, once `columns` are involved, a `PIVOT` is required. This function can operate on one or more `table_names` that are passed in as a parameter. -Any set of tables will first be vertically stacked and then pivoted. +Any set of tables (or views!) will first be vertically stacked and then pivoted. ## Example Using `pivot_table` From 19317fdd8228a881e433287f41147d6d64f6d027 Mon Sep 17 00:00:00 2001 From: Alex-Monahan Date: Thu, 26 Sep 2024 07:14:52 -0700 Subject: [PATCH 10/24] Pivot_table fn explanations, links, remove comments --- _posts/2024-09-14-sql-only-extensions.md | 184 ++++++++++++++--------- 1 file changed, 109 insertions(+), 75 deletions(-) diff --git a/_posts/2024-09-14-sql-only-extensions.md b/_posts/2024-09-14-sql-only-extensions.md index 8401f1c8d80..24c8a0585a2 100644 --- a/_posts/2024-09-14-sql-only-extensions.md +++ b/_posts/2024-09-14-sql-only-extensions.md @@ -2,52 +2,9 @@ layout: post title: "Creating a SQL-Only Extension for Excel-Style Pivoting in DuckDB" author: "Alex Monahan" -excerpt: "Now you can easily create sharable extensions using only SQL MACROs that can apply to any table and any columns. We demonstrate the power of this capability with the pivot_table extension that provides Excel-style pivoting" +excerpt: "Easily create sharable extensions using only SQL MACROs that can apply to any table and any columns. We demonstrate the power of this capability with the pivot_table extension that provides Excel-style pivoting." --- - ## The Power of SQL-Only Extensions @@ -78,7 +35,7 @@ With version 1.1, DuckDB's world-class friendly SQL dialect makes it possible to * On any columns * Using any functions -The new ability to work on any tables is thanks to the [`query` and `query_table` functions]({% post_url 2024-09-09-announcing-duckdb-110 %}#query-and-query_table-functions)! +The new ability to work **on any tables** is thanks to the [`query` and `query_table` functions]({% post_url 2024-09-09-announcing-duckdb-110 %}#query-and-query_table-functions)! The `query` function is a safe way to execute `SELECT` statements defined by SQL strings, while `query_table` is a way to make a `FROM` clause pull from multiple tables at once. They are very powerful when used in combination with other friendly SQL features like the `COLUMNS` expression and `LIST` lambda functions. @@ -97,7 +54,8 @@ All that said, just how valuable can a SQL `MACRO` be? Can we do more than make small snippets? I'll make the case that you can do quite complex and powerful operations in DuckDB SQL using the `pivot_table` extension as an example. The `pivot_table` function allows for Excel-style pivots, including `subtotals`, `grand_totals`, and more. -It is also very similar to the Pandas `pivot_table` function, but with all the scalability and speed benefits of DuckDB! +It is also very similar to the Pandas `pivot_table` function, but with all the scalability and speed benefits of DuckDB. +It contains over **250 tests**, so it is intended to be useful beyond just an example! To achieve this level of flexibility, the `pivot_table` extension uses many friendly and advanced SQL features: * The [`query` function]({% post_url 2024-09-09-announcing-duckdb-110 %}#query-and-query_table-functions) to execute a SQL string @@ -107,16 +65,16 @@ To achieve this level of flexibility, the `pivot_table` extension uses many frie * [`list_transform`]({% link docs/sql/functions/lambda.md %}#list_transformlist-lambda) for string manipulation like quoting * [`list_reduce`]({% link docs/sql/functions/lambda.md %}#list_reducelist-lambda) to concatenate strings together * [`list_aggregate`]({% link docs/sql/functions/list.md %}#list_aggregatelist-name) to sum multiple columns and identify subtotal and grand total rows -* Bracket notation for string slicing -* `UNION ALL BY NAME` to stack data by column name for subtotals and grand totals -* `SELECT * REPLACE` to dynamically clean up subtotal columns -* `SELECT * EXCLUDE` to remove internally generated columns from the final result -* `GROUPING SETS` and `ROLLUP` to generate subtotals and grand totals -* `UNNEST` to convert lists into separate rows for `values_axis:='rows'` -* `MACRO`s to modularize the code -* `ORDER BY ALL` to order the result dynamically -* `ENUM`s to determine what columns to pivot horizontally -* And of course the `PIVOT` function for horizontal pivoting! +* [Bracket notation for string slicing]({% link docs/sql/functions/char.md %}#stringbeginend) +* [`UNION ALL BY NAME`]({% link docs/sql/query_syntax/setops.md %}#union-all-by-name) to stack data by column name for subtotals and grand totals +* [`SELECT * REPLACE`]({% link docs/sql/expressions/star.md %}#replace-clause) to dynamically clean up subtotal columns +* [`SELECT * EXCLUDE`]({% link docs/sql/expressions/star.md %}#exclude-clause) to remove internally generated columns from the final result +* [`GROUPING SETS` and `ROLLUP`]({% link docs/sql/query_syntax/grouping_sets.md %}) to generate subtotals and grand totals +* [`UNNEST`]({% link docs/sql/query_syntax/unnest.md %}) to convert lists into separate rows for `values_axis:='rows'` +* [`MACRO`s]({% link docs/sql/statements/create_macro.md %}) to modularize the code +* [`ORDER BY ALL`]({% link docs/sql/query_syntax/orderby.md %}#order-by-all) to order the result dynamically +* [`ENUM`s]({% link docs/sql/statements/create_type.md %}) to determine what columns to pivot horizontally +* And of course the [`PIVOT` function]({% link docs/sql/statements/pivot.md %}) for horizontal pivoting! DuckDB's innovative syntax makes this extension possible! @@ -287,9 +245,6 @@ It uses the Excel pivoting API: `values`, `rows`, `columns`, and `filters` - han However, not only that, but it supports `subtotals` and `grand_totals`. If multiple `values` are passed in, the `values_axis` parameter allows the user to choose if each value should get its own column or its own row. -> Note The only missing Excel feature I am aware of is columnar subtotals, but not even Pandas supports that! -> And this extension is officially open to contributions now... :-) - Why is this a good example of how DuckDB moves beyond traditional SQL? The Excel pivoting API requires dramatically different SQL syntax depending on which parameters are in use. If no `columns` are pivoted outward, a `GROUP BY` is all that is needed. @@ -383,7 +338,9 @@ There is a little bit of boilerplate required, and the details of how this works DROP TYPE IF EXISTS columns_parameter_enum; CREATE TYPE columns_parameter_enum AS ENUM ( - FROM build_my_enum(['business_metrics'], ['year', 'quarter'], []) + FROM build_my_enum(['business_metrics'], -- table_names + ['year', 'quarter'], -- columns + []) -- filters ); FROM pivot_table(['business_metrics'], -- table_names @@ -412,25 +369,102 @@ FROM pivot_table(['business_metrics'], -- table_names | Grand Total | Grand Total | sum(cost) | 111 | 111 | 111 | 111 | 111 | 111 | 111 | 111 | | Grand Total | Grand Total | sum(revenue) | 111 | 222 | 333 | 444 | 555 | 666 | 777 | 888 | +## How the `pivot_table` extension works - \ No newline at end of file +--> \ No newline at end of file From 1b6d6ba89157aa4dfcdba5289edccc3d755bece6 Mon Sep 17 00:00:00 2001 From: Alex-Monahan Date: Thu, 26 Sep 2024 07:48:24 -0700 Subject: [PATCH 11/24] Gabor's PR feedback! --- _posts/2024-09-14-sql-only-extensions.md | 40 +++++++++++------------- 1 file changed, 19 insertions(+), 21 deletions(-) diff --git a/_posts/2024-09-14-sql-only-extensions.md b/_posts/2024-09-14-sql-only-extensions.md index 24c8a0585a2..e8d6dd3ab8f 100644 --- a/_posts/2024-09-14-sql-only-extensions.md +++ b/_posts/2024-09-14-sql-only-extensions.md @@ -2,7 +2,7 @@ layout: post title: "Creating a SQL-Only Extension for Excel-Style Pivoting in DuckDB" author: "Alex Monahan" -excerpt: "Easily create sharable extensions using only SQL MACROs that can apply to any table and any columns. We demonstrate the power of this capability with the pivot_table extension that provides Excel-style pivoting." +excerpt: "Easily create sharable extensions using only SQL macros that can apply to any table and any columns. We demonstrate the power of this capability with the pivot_table extension that provides Excel-style pivoting." --- @@ -14,7 +14,7 @@ As a result, it has historically been missing some of the modern luxuries we tak With version 1.1, DuckDB has launched community extensions, bringing the incredible power of a package manager to the SQL language. One goal for these extensions is to enable C++ libraries to be accessible through SQL across all of the languages with a DuckDB library. For extension builders, compilation and distribution are much easier. -For the user community, installation is as simple as 2 commands: +For the user community, installation is as simple as two commands: ```sql INSTALL pivot_table FROM community; @@ -30,7 +30,7 @@ What would it take to build these extensions with *just SQL*? Traditionally, SQL is highly customized to the schema of the database on which it was written. Can we make it reusable? Some techniques for reusability were discussed in the SQL Gymnasics post, but now we can go even further. -With version 1.1, DuckDB's world-class friendly SQL dialect makes it possible to create MACROs that can be applied: +With version 1.1, DuckDB's world-class friendly SQL dialect makes it possible to create macros that can be applied: * To any tables * On any columns * Using any functions @@ -45,7 +45,7 @@ Traditionally, there has been no central repository for SQL functions across dat DuckDB's community extensions can be that knowledge base. If you are a DuckDB fan and a SQL user, you can share your expertise back to the community with an extension. This post will show you how! -No C++ knowledge is needed - just a little bit of copy/paste and GitHub actions handles all the compilation. +No C++ knowledge is needed - just a little bit of copy/paste and GitHub Actions handles all the compilation. If I can do it, you can do it! ### Powerful SQL @@ -70,7 +70,7 @@ To achieve this level of flexibility, the `pivot_table` extension uses many frie * [`SELECT * REPLACE`]({% link docs/sql/expressions/star.md %}#replace-clause) to dynamically clean up subtotal columns * [`SELECT * EXCLUDE`]({% link docs/sql/expressions/star.md %}#exclude-clause) to remove internally generated columns from the final result * [`GROUPING SETS` and `ROLLUP`]({% link docs/sql/query_syntax/grouping_sets.md %}) to generate subtotals and grand totals -* [`UNNEST`]({% link docs/sql/query_syntax/unnest.md %}) to convert lists into separate rows for `values_axis:='rows'` +* [`UNNEST`]({% link docs/sql/query_syntax/unnest.md %}) to convert lists into separate rows for `values_axis := 'rows'` * [`MACRO`s]({% link docs/sql/statements/create_macro.md %}) to modularize the code * [`ORDER BY ALL`]({% link docs/sql/query_syntax/orderby.md %}#order-by-all) to order the result dynamically * [`ENUM`s]({% link docs/sql/statements/create_type.md %}) to determine what columns to pivot horizontally @@ -78,7 +78,7 @@ To achieve this level of flexibility, the `pivot_table` extension uses many frie DuckDB's innovative syntax makes this extension possible! -So, we now have all 3 ingredients we will need: a central package manager, reusable `MACRO`s, and enough syntactic flexibility to do valuable work. +So, we now have all 3 ingredients we will need: a central package manager, reusable macros, and enough syntactic flexibility to do valuable work. ## Create Your Own SQL Extension @@ -99,7 +99,7 @@ Note that `--recurse-submodules` will ensure DuckDB is pulled which is required Next, replace the name of the example extension with the name of your extension in all the right places by running the Python script below. > Note If you don't have Python installed, head to [python.org](https://python.org) and follow those instructions. -> This script doesn't require any libraries, so Python is all you need! (No need to set up any environments) +> This script doesn't require any libraries, so Python is all you need! (No need to set up any environments.) ```python python3 ./scripts/bootstrap-template.py @@ -108,8 +108,8 @@ python3 ./scripts/bootstrap-template.py #### Initial Extension Test At this point, you can follow the directions in the README to build and test locally if you would like. -However, even easier, you can simply commit your changes to git and push them to GitHub, and GitHub actions can do the compilation for you! -GitHub actions will also run tests on your extension to validate it is working properly. +However, even easier, you can simply commit your changes to git and push them to GitHub, and GitHub Actions can do the compilation for you! +GitHub Actions will also run tests on your extension to validate it is working properly. > Note The instructions are not written for a Windows audience, so we recommend GitHub Actions in that case! @@ -120,11 +120,9 @@ git push ``` - - #### Write Your SQL Macros -It it likely a bit faster to iterate if you test our your macros directly in DuckDB. +It it likely a bit faster to iterate if you test your macros directly in DuckDB. After you have written your SQL, we will move it into the extension. The example we will use demonstrates how to pull a dynamic set of columns from a dynamic table name (or a view name!). @@ -150,7 +148,7 @@ FROM select_distinct_columns_from_table('duckdb_types', ['type_category']); #### Add SQL Macros -Technically, this is the C++ part, but we are going to do some copy/paste and use GitHub actions for compiling so it won't feel that way! +Technically, this is the C++ part, but we are going to do some copy/paste and use GitHub Actions for compiling so it won't feel that way! DuckDB supports both scalar and table macros, and they have slightly different syntax. The extension template has an example for each (and code comments too!) inside the file named `.cpp`. @@ -160,7 +158,7 @@ We will copy the example and modify it! {% raw %} ```cpp static const DefaultTableMacro _table_macros[] = { - {DEFAULT_SCHEMA, "times_two_table", {"x", nullptr}, {{"two", "2"}, {nullptr, nullptr}}, R"(SELECT x * two as output_column;)"}, + {DEFAULT_SCHEMA, "times_two_table", {"x", nullptr}, {{"two", "2"}, {nullptr, nullptr}}, R"(SELECT x * two as output_column;)"}, { DEFAULT_SCHEMA, // Leave the schema as the default "select_distinct_columns_from_table", // Function name @@ -180,7 +178,7 @@ static const DefaultTableMacro _table_macros[] = { {% endraw %} That's it! -All we had to provide were the name of the function, the names of the parameters, and the text of our SQL `MACRO`. +All we had to provide were the name of the function, the names of the parameters, and the text of our SQL macro. ### Testing the Extension @@ -203,9 +201,9 @@ STRING NULL ``` -Now, just add, commit, and push your changes to GitHub like before, and GitHub actions will compile your extension and test it! +Now, just add, commit, and push your changes to GitHub like before, and GitHub Actions will compile your extension and test it! -If you would like to do further ad-hoc testing of your extension, you can download the extension from your GitHub actions run's artifacts and then [install it locally using these steps]({% link docs/extensions/overview.md %}#unsigned-extensions). +If you would like to do further ad-hoc testing of your extension, you can download the extension from your GitHub Actions run's artifacts and then [install it locally using these steps]({% link docs/extensions/overview.md %}#unsigned-extensions). ### Uploading to the Community Extensions Repository @@ -213,7 +211,7 @@ Once you are happy with your extension, it's time to share it with the DuckDB co Follow the steps in [the Community Extensions post]({% post_url 2024-07-05-community-extensions %}#developer-experience). A summary of those steps is: -1. Send a PR with a metadata file `description.yml` that contains the description of the extension. For example: +1. Send a PR with a metadata file `description.yml` that contains the description of the extension. For example, the [`h3` Community Extension](https://community-extensions.duckdb.org/extensions/h3.html) uses the following YAML configuration: ```yaml extension: @@ -348,9 +346,9 @@ FROM pivot_table(['business_metrics'], -- table_names ['product_line', 'product'], -- rows ['year', 'quarter'], -- columns [], -- filters - subtotals:=1, - grand_totals:=1, - values_axis:='rows' + subtotals := 1, + grand_totals := 1, + values_axis := 'rows' ); ``` From 8fa01745c799d4d47517b53db8c47c07e761d94b Mon Sep 17 00:00:00 2001 From: Alex-Monahan Date: Thu, 26 Sep 2024 08:43:31 -0700 Subject: [PATCH 12/24] Add examples to pivot_table explanation --- _posts/2024-09-14-sql-only-extensions.md | 107 +++++++++++++++++++++-- 1 file changed, 100 insertions(+), 7 deletions(-) diff --git a/_posts/2024-09-14-sql-only-extensions.md b/_posts/2024-09-14-sql-only-extensions.md index e8d6dd3ab8f..514adf6fb8f 100644 --- a/_posts/2024-09-14-sql-only-extensions.md +++ b/_posts/2024-09-14-sql-only-extensions.md @@ -29,7 +29,7 @@ What would it take to build these extensions with *just SQL*? Traditionally, SQL is highly customized to the schema of the database on which it was written. Can we make it reusable? -Some techniques for reusability were discussed in the SQL Gymnasics post, but now we can go even further. +Some techniques for reusability were discussed in the [SQL Gymnasics post]({% post_url 2024-03-01-sql-gymnastics %}), but now we can go even further. With version 1.1, DuckDB's world-class friendly SQL dialect makes it possible to create macros that can be applied: * To any tables * On any columns @@ -369,7 +369,7 @@ FROM pivot_table(['business_metrics'], -- table_names ## How the `pivot_table` extension works -The `pivot_table` extension is a collection of multiple scalar and table SQL `MACRO`s. +The `pivot_table` extension is a collection of multiple scalar and table SQL macros. This allows the logic to be modularized. You can see below that the functions are used as building blocks to create more complex functions. This is typically difficult to do in SQL, but it is easy in DuckDB! @@ -422,18 +422,19 @@ The SQL string is constructed using list lambda functions and the building block At its core, the `pivot_table` function determines the SQL required to generate the desired pivot based on which parameters are in use. -Since this SQL statement is a string at the end of the day, we can use a hierarchy of scalar SQL `MACRO`s rather than a single large `MACRO`. +Since this SQL statement is a string at the end of the day, we can use a hierarchy of scalar SQL macros rather than a single large macro. This is a common traditional issue with SQL - it tends to not be very modular or reusable, but we are able to compartmentalize our logic wth DuckDB's syntax. > Note If a non-optional parameter is not in use, an empty string (`[]`) should be passed in. * `table_names`: A list of table or view names to aggregate or pivot. Multiple tables are combined with `UNION ALL BY NAME` prior to any other processing. -* `values`: A list of aggregation metrics in the format `['aggregate_function_1(column_name_1)', 'aggregate_function_2(column_name_2)', ...]`. +* `values`: A list of aggregation metrics in the format `['agg_fn_1(col_1)', 'agg_fn_2(col_2)', ...]`. * `rows`: A list of column names to `SELECT` and `GROUP BY`. * `columns`: A list of column names to `PIVOT` horizontally into a separate column per value in the original column. If multiple column names are passed in, only unique combinations of data that appear in the dataset are pivoted. - * Ex: If passing in a `columns` parameter like `['continent', 'country']`, only valid `continent` / `country` pairs will be included (no `Europe_Canada` column would be generated). -* `filters`: A list of `WHERE` clause expressions to be applied to the raw dataset prior to aggregating in the format `['column_name_1 = 123', 'column_name_2 LIKE ''woot%''', ...]`. - * The `filters` are combined with `AND` so all must evaluate to true for a row to be included. + * Ex: If passing in a `columns` parameter like `['continent', 'country']`, only valid `continent` / `country` pairs will be included. + * (no `Europe_Canada` column would be generated). +* `filters`: A list of `WHERE` clause expressions to be applied to the raw dataset prior to aggregating in the format `['col_1 = 123', 'col_2 LIKE ''woot%''', ...]`. + * The `filters` are combined with `AND`. * `values_axis` (Optional): If multiple `values` are passed in, determine whether to create a separate row or column for each value. Either `rows` or `columns`, defaulting to `columns`. * `subtotals` (Optional): If enabled, calculate the aggregate metric at multiple levels of detail based on the `rows` parameter. Either 0 or 1, defaulting to 0. * `grand_totals` (Optional): If enabled, calculate the aggregate metric across all rows in the raw data in addition to at the granularity defined by `rows`. Either 0 or 1, defaulting to 0. @@ -445,19 +446,111 @@ As a result, a `GROUP BY` statement is used. If `subtotals` are in use, the `ROLLUP` expression is used to calculate the `values` at the different levels of granularity. If `grand_totals` are in use, but not `subtotals`, the `GROUPING SETS` expression is used instead of `ROLLUP` to evaluate across all rows. +In this example, we build a summary of the `revenue` and `cost` of each `product_line` and `product`. + +```sql +FROM pivot_table(['business_metrics'], + ['sum(revenue)', 'sum(cost)'], + ['product_line', 'product'], + [], + [], + subtotals := 1, + grand_totals := 1, + values_axis := 'columns' + ); +``` + +| product_line | product | sum(revenue) | sum("cost") | +|----------------------|---------------|--------------|-------------| +| Duck Duds | Duck neckties | 36 | 8 | +| Duck Duds | Duck suits | 360 | 80 | +| Duck Duds | Subtotal | 396 | 88 | +| Waterfowl watercraft | Duck boats | 3600 | 800 | +| Waterfowl watercraft | Subtotal | 3600 | 800 | +| Grand Total | Grand Total | 3996 | 888 | + #### Pivot horizontally, one column per metric in `values` Build up a `PIVOT` statement that will pivot out all valid combinations of raw data values within the `columns` parameter. If `subtotals` or `grand_totals` are in use, make multiple copies of the input data, but replace appropriate column names in the `rows` parameter with a string constant. Pass all expressions in `values` to the `PIVOT` statement's `USING` clause so they each receive their own column. +We enhance our previous example to pivot out a separate column for each `year` / `value` combination: + +```sql +DROP TYPE IF EXISTS columns_parameter_enum; + +CREATE TYPE columns_parameter_enum AS ENUM ( + FROM build_my_enum(['business_metrics'], + ['year'], + []) +); + +FROM pivot_table(['business_metrics'], + ['sum(revenue)', 'sum(cost)'], + ['product_line', 'product'], + ['year'], + [], + subtotals := 1, + grand_totals := 1, + values_axis := 'columns' + ); +``` + +| product_line | product | 2022_sum(revenue) | 2022_sum("cost") | 2023_sum(revenue) | 2023_sum("cost") | +|----------------------|---------------|-------------------|------------------|-------------------|------------------| +| Duck Duds | Duck neckties | 10 | 4 | 26 | 4 | +| Duck Duds | Duck suits | 100 | 40 | 260 | 40 | +| Duck Duds | Subtotal | 110 | 44 | 286 | 44 | +| Waterfowl watercraft | Duck boats | 1000 | 400 | 2600 | 400 | +| Waterfowl watercraft | Subtotal | 1000 | 400 | 2600 | 400 | +| Grand Total | Grand Total | 1110 | 444 | 2886 | 444 | + #### Pivot horizontally, one row per metric in `values` Build up a separate `PIVOT` statement for each metric in `values` and combine them with `UNION ALL BY NAME`. If `subtotals` or `grand_totals` are in use, make multiple copies of the input data, but replace appropriate column names in the `rows` parameter with a string constant. +To simplify the appearance slightly, we adjust one parameter in our previous query and set `values_axis := 'rows'`: + +```sql +DROP TYPE IF EXISTS columns_parameter_enum; + +CREATE TYPE columns_parameter_enum AS ENUM ( + FROM build_my_enum(['business_metrics'], + ['year'], + []) +); +FROM pivot_table(['business_metrics'], + ['sum(revenue)', 'sum(cost)'], + ['product_line', 'product'], + ['year'], + [], + subtotals := 1, + grand_totals := 1, + values_axis := 'rows' + ); +``` +| product_line | product | value_names | 2022 | 2023 | +|----------------------|---------------|--------------|------|------| +| Duck Duds | Duck neckties | sum(cost) | 4 | 4 | +| Duck Duds | Duck neckties | sum(revenue) | 10 | 26 | +| Duck Duds | Duck suits | sum(cost) | 40 | 40 | +| Duck Duds | Duck suits | sum(revenue) | 100 | 260 | +| Duck Duds | Subtotal | sum(cost) | 44 | 44 | +| Duck Duds | Subtotal | sum(revenue) | 110 | 286 | +| Waterfowl watercraft | Duck boats | sum(cost) | 400 | 400 | +| Waterfowl watercraft | Duck boats | sum(revenue) | 1000 | 2600 | +| Waterfowl watercraft | Subtotal | sum(cost) | 400 | 400 | +| Waterfowl watercraft | Subtotal | sum(revenue) | 1000 | 2600 | +| Grand Total | Grand Total | sum(cost) | 444 | 444 | +| Grand Total | Grand Total | sum(revenue) | 1110 | 2886 | + +## Conclusion + + +With DuckDB 1.1, sharing your SQL knowledge with the community has never been easier! +DuckDB's community extension repository is truly a package manager for the SQL language. +Macros in DuckDB are now highly reusable (thanks to `query` and `query_table`), and DuckDB's SQL syntax provides plenty of power to accomplish complex tasks. - \ No newline at end of file From 654c0247ecc5910985f7c0dd453453af335d308b Mon Sep 17 00:00:00 2001 From: Alex-Monahan Date: Thu, 26 Sep 2024 14:47:25 -0700 Subject: [PATCH 15/24] Carlo feedback. Add Wasm shell link. Script to convert to Wasm link. --- _posts/2024-09-27-sql-only-extensions.md | 11 +++- scripts/sql_to_wasm_shell_link.py | 70 ++++++++++++++++++++++++ 2 files changed, 79 insertions(+), 2 deletions(-) create mode 100644 scripts/sql_to_wasm_shell_link.py diff --git a/_posts/2024-09-27-sql-only-extensions.md b/_posts/2024-09-27-sql-only-extensions.md index 6df0e0571ca..b27a794244d 100644 --- a/_posts/2024-09-27-sql-only-extensions.md +++ b/_posts/2024-09-27-sql-only-extensions.md @@ -12,7 +12,8 @@ excerpt: "Easily create sharable extensions using only SQL macros that can apply SQL is not a new language. As a result, it has historically been missing some of the modern luxuries we take for granted. With version 1.1, DuckDB has launched community extensions, bringing the incredible power of a package manager to the SQL language. -One goal for these extensions is to enable C++ libraries to be accessible through SQL across all of the languages with a DuckDB library. +A bold goal of ours is for DuckDB to become a convenient way to wrap any C++ library, much the way that Python does today, but across any language with a DuckDB client. + For extension builders, compilation and distribution are much easier. For the user community, installation is as simple as two commands: @@ -21,7 +22,9 @@ INSTALL pivot_table FROM community; LOAD pivot_table; ``` -However, not all of us are C++ developers! +The extension can then be used in any query through SQL functions. + +However, **not all of us are C++ developers**! Can we, as a SQL community, build up a set of SQL helper functions? What would it take to build these extensions with *just SQL*? @@ -43,6 +46,8 @@ They are very powerful when used in combination with other friendly SQL features Traditionally, there has been no central repository for SQL functions across databases, let alone across companies! DuckDB's community extensions can be that knowledge base. +DuckDB extensions can be used across all languages with a DuckDB client, including Python, NodeJS, Java, Rust, Go, and even Webassembly (Wasm)! + If you are a DuckDB fan and a SQL user, you can share your expertise back to the community with an extension. This post will show you how! No C++ knowledge is needed - just a little bit of copy/paste and GitHub Actions handles all the compilation. @@ -253,6 +258,8 @@ Any set of tables (or views!) will first be vertically stacked and then pivoted. ## Example Using `pivot_table` +[Check out a live example using the extension in the DuckDB Wasm shell here](https://shell.duckdb.org/#queries=v0,CREATE-OR-REPLACE-TABLE-business_metrics-(-----product_line-VARCHAR%2C-product-VARCHAR%2C-year-INTEGER%2C-quarter-VARCHAR%2C-revenue-integer%2C-cost-integer-)~,INSERT-INTO-business_metrics-VALUES-----('Waterfowl-watercraft'%2C-'Duck-boats'%2C-2022%2C-'Q1'%2C-100%2C-100)%2C-----('Waterfowl-watercraft'%2C-'Duck-boats'%2C-2022%2C-'Q2'%2C-200%2C-100)%2C-----('Waterfowl-watercraft'%2C-'Duck-boats'%2C-2022%2C-'Q3'%2C-300%2C-100)%2C-----('Waterfowl-watercraft'%2C-'Duck-boats'%2C-2022%2C-'Q4'%2C-400%2C-100)%2C-----('Waterfowl-watercraft'%2C-'Duck-boats'%2C-2023%2C-'Q1'%2C-500%2C-100)%2C-----('Waterfowl-watercraft'%2C-'Duck-boats'%2C-2023%2C-'Q2'%2C-600%2C-100)%2C-----('Waterfowl-watercraft'%2C-'Duck-boats'%2C-2023%2C-'Q3'%2C-700%2C-100)%2C-----('Waterfowl-watercraft'%2C-'Duck-boats'%2C-2023%2C-'Q4'%2C-800%2C-100)%2C------('Duck-Duds'%2C-'Duck-suits'%2C-2022%2C-'Q1'%2C-10%2C-10)%2C-----('Duck-Duds'%2C-'Duck-suits'%2C-2022%2C-'Q2'%2C-20%2C-10)%2C-----('Duck-Duds'%2C-'Duck-suits'%2C-2022%2C-'Q3'%2C-30%2C-10)%2C-----('Duck-Duds'%2C-'Duck-suits'%2C-2022%2C-'Q4'%2C-40%2C-10)%2C-----('Duck-Duds'%2C-'Duck-suits'%2C-2023%2C-'Q1'%2C-50%2C-10)%2C-----('Duck-Duds'%2C-'Duck-suits'%2C-2023%2C-'Q2'%2C-60%2C-10)%2C-----('Duck-Duds'%2C-'Duck-suits'%2C-2023%2C-'Q3'%2C-70%2C-10)%2C-----('Duck-Duds'%2C-'Duck-suits'%2C-2023%2C-'Q4'%2C-80%2C-10)%2C------('Duck-Duds'%2C-'Duck-neckties'%2C-2022%2C-'Q1'%2C-1%2C-1)%2C-----('Duck-Duds'%2C-'Duck-neckties'%2C-2022%2C-'Q2'%2C-2%2C-1)%2C-----('Duck-Duds'%2C-'Duck-neckties'%2C-2022%2C-'Q3'%2C-3%2C-1)%2C-----('Duck-Duds'%2C-'Duck-neckties'%2C-2022%2C-'Q4'%2C-4%2C-1)%2C-----('Duck-Duds'%2C-'Duck-neckties'%2C-2023%2C-'Q1'%2C-5%2C-1)%2C-----('Duck-Duds'%2C-'Duck-neckties'%2C-2023%2C-'Q2'%2C-6%2C-1)%2C-----('Duck-Duds'%2C-'Duck-neckties'%2C-2023%2C-'Q3'%2C-7%2C-1)%2C-----('Duck-Duds'%2C-'Duck-neckties'%2C-2023%2C-'Q4'%2C-8%2C-1)%2C~,FROM-business_metrics~,INSTALL-pivot_table-from-community~,LOAD-'https%3A%2F%2Fcommunity extensions.duckdb.org%2Fv1.1.1%2Fwasm_eh%2Fpivot_table.duckdb_extension.wasm'~,DROP-TYPE-IF-EXISTS-columns_parameter_enum~,CREATE-TYPE-columns_parameter_enum-AS-ENUM-(FROM-build_my_enum(['business_metrics']%2C-['year'%2C-'quarter']%2C-[]))~,FROM-pivot_table(['business_metrics']%2C['sum(revenue)'%2C-'sum(cost)']%2C-['product_line'%2C-'product']%2C-['year'%2C-'quarter']%2C-[]%2C-subtotals-%3A%3D-1%2C-grand_totals-%3A%3D-1%2C-values_axis-%3A%3D-'rows')~)! +
First we will create an example data table. We are a duck product distributor, and we are tracking our fowl finances. diff --git a/scripts/sql_to_wasm_shell_link.py b/scripts/sql_to_wasm_shell_link.py new file mode 100644 index 00000000000..5b99fe6d211 --- /dev/null +++ b/scripts/sql_to_wasm_shell_link.py @@ -0,0 +1,70 @@ + + +# Note, this may not handle all special characters +shell_link_stub = "https://shell.duckdb.org/#queries=v0," +# sql = """ +# install tpch; +# load tpch; +# call dbgen(sf=0.1); +# pragma tpch(7); +# """ + +sql = """ +CREATE OR REPLACE TABLE business_metrics ( + product_line VARCHAR, product VARCHAR, year INTEGER, quarter VARCHAR, revenue integer, cost integer +); +INSERT INTO business_metrics VALUES + ('Waterfowl watercraft', 'Duck boats', 2022, 'Q1', 100, 100), + ('Waterfowl watercraft', 'Duck boats', 2022, 'Q2', 200, 100), + ('Waterfowl watercraft', 'Duck boats', 2022, 'Q3', 300, 100), + ('Waterfowl watercraft', 'Duck boats', 2022, 'Q4', 400, 100), + ('Waterfowl watercraft', 'Duck boats', 2023, 'Q1', 500, 100), + ('Waterfowl watercraft', 'Duck boats', 2023, 'Q2', 600, 100), + ('Waterfowl watercraft', 'Duck boats', 2023, 'Q3', 700, 100), + ('Waterfowl watercraft', 'Duck boats', 2023, 'Q4', 800, 100), + + ('Duck Duds', 'Duck suits', 2022, 'Q1', 10, 10), + ('Duck Duds', 'Duck suits', 2022, 'Q2', 20, 10), + ('Duck Duds', 'Duck suits', 2022, 'Q3', 30, 10), + ('Duck Duds', 'Duck suits', 2022, 'Q4', 40, 10), + ('Duck Duds', 'Duck suits', 2023, 'Q1', 50, 10), + ('Duck Duds', 'Duck suits', 2023, 'Q2', 60, 10), + ('Duck Duds', 'Duck suits', 2023, 'Q3', 70, 10), + ('Duck Duds', 'Duck suits', 2023, 'Q4', 80, 10), + + ('Duck Duds', 'Duck neckties', 2022, 'Q1', 1, 1), + ('Duck Duds', 'Duck neckties', 2022, 'Q2', 2, 1), + ('Duck Duds', 'Duck neckties', 2022, 'Q3', 3, 1), + ('Duck Duds', 'Duck neckties', 2022, 'Q4', 4, 1), + ('Duck Duds', 'Duck neckties', 2023, 'Q1', 5, 1), + ('Duck Duds', 'Duck neckties', 2023, 'Q2', 6, 1), + ('Duck Duds', 'Duck neckties', 2023, 'Q3', 7, 1), + ('Duck Duds', 'Duck neckties', 2023, 'Q4', 8, 1), +; + +FROM business_metrics; + +INSTALL pivot_table from community; +LOAD 'https://community-extensions.duckdb.org/v1.1.1/wasm_eh/pivot_table.duckdb_extension.wasm'; + +DROP TYPE IF EXISTS columns_parameter_enum; + +CREATE TYPE columns_parameter_enum AS ENUM (FROM build_my_enum(['business_metrics'], ['year', 'quarter'], [])); + +FROM pivot_table(['business_metrics'],['sum(revenue)', 'sum(cost)'], ['product_line', 'product'], ['year', 'quarter'], [], subtotals := 1, grand_totals := 1, values_axis := 'rows'); +""" + +statements = sql.strip().split(sep=";") + +encoded_statements = [] + +for statement in statements: + trimmed = statement.strip() + no_hyphens = trimmed.replace('-','%2D') + no_spaces = no_hyphens.replace('\n',' ').replace(' ', '-') + encoded = no_spaces.replace(',','%2C').replace('=','%3D').replace(':','%3A').replace(r'/','%2F').replace('%2D',' ') + encoded_statements.append(encoded) + +combined = shell_link_stub + '~,'.join(encoded_statements) + +print(combined) \ No newline at end of file From 83812a869943fab55510f055e19e9e1fd2f4f163 Mon Sep 17 00:00:00 2001 From: Alex-Monahan Date: Thu, 26 Sep 2024 14:51:37 -0700 Subject: [PATCH 16/24] Format python script --- scripts/sql_to_wasm_shell_link.py | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/scripts/sql_to_wasm_shell_link.py b/scripts/sql_to_wasm_shell_link.py index 5b99fe6d211..f65b9450338 100644 --- a/scripts/sql_to_wasm_shell_link.py +++ b/scripts/sql_to_wasm_shell_link.py @@ -60,11 +60,17 @@ for statement in statements: trimmed = statement.strip() - no_hyphens = trimmed.replace('-','%2D') + no_hyphens = trimmed.replace('-', '%2D') no_spaces = no_hyphens.replace('\n',' ').replace(' ', '-') - encoded = no_spaces.replace(',','%2C').replace('=','%3D').replace(':','%3A').replace(r'/','%2F').replace('%2D',' ') + encoded = ( + no_spaces.replace(',','%2C') + .replace('=','%3D') + .replace(':','%3A') + .replace(r'/','%2F') + .replace('%2D',' ') + ) encoded_statements.append(encoded) combined = shell_link_stub + '~,'.join(encoded_statements) -print(combined) \ No newline at end of file +print(combined) From a463f51495ee97ef99f4a1634b49554e1439b340 Mon Sep 17 00:00:00 2001 From: Alex-Monahan Date: Thu, 26 Sep 2024 14:57:45 -0700 Subject: [PATCH 17/24] Another format fix on python script --- scripts/sql_to_wasm_shell_link.py | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/scripts/sql_to_wasm_shell_link.py b/scripts/sql_to_wasm_shell_link.py index f65b9450338..47e4efe39db 100644 --- a/scripts/sql_to_wasm_shell_link.py +++ b/scripts/sql_to_wasm_shell_link.py @@ -63,11 +63,11 @@ no_hyphens = trimmed.replace('-', '%2D') no_spaces = no_hyphens.replace('\n',' ').replace(' ', '-') encoded = ( - no_spaces.replace(',','%2C') - .replace('=','%3D') - .replace(':','%3A') - .replace(r'/','%2F') - .replace('%2D',' ') + no_spaces.replace(',', '%2C') + .replace('=', '%3D') + .replace(':', '%3A') + .replace(r'/', '%2F') + .replace('%2D', ' ') ) encoded_statements.append(encoded) From 707ac4e3a11e8d70d12794df91d5857ac5676422 Mon Sep 17 00:00:00 2001 From: Alex-Monahan Date: Thu, 26 Sep 2024 14:59:15 -0700 Subject: [PATCH 18/24] Another space in python file --- scripts/sql_to_wasm_shell_link.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/scripts/sql_to_wasm_shell_link.py b/scripts/sql_to_wasm_shell_link.py index 47e4efe39db..a1658f0354d 100644 --- a/scripts/sql_to_wasm_shell_link.py +++ b/scripts/sql_to_wasm_shell_link.py @@ -61,7 +61,7 @@ for statement in statements: trimmed = statement.strip() no_hyphens = trimmed.replace('-', '%2D') - no_spaces = no_hyphens.replace('\n',' ').replace(' ', '-') + no_spaces = no_hyphens.replace('\n', ' ').replace(' ', '-') encoded = ( no_spaces.replace(',', '%2C') .replace('=', '%3D') From 18265ea576239fe1f00af893ca491792f419bc90 Mon Sep 17 00:00:00 2001 From: Alex-Monahan Date: Thu, 26 Sep 2024 15:01:45 -0700 Subject: [PATCH 19/24] formatting again. --- scripts/sql_to_wasm_shell_link.py | 2 -- 1 file changed, 2 deletions(-) diff --git a/scripts/sql_to_wasm_shell_link.py b/scripts/sql_to_wasm_shell_link.py index a1658f0354d..ebcef07251d 100644 --- a/scripts/sql_to_wasm_shell_link.py +++ b/scripts/sql_to_wasm_shell_link.py @@ -1,5 +1,3 @@ - - # Note, this may not handle all special characters shell_link_stub = "https://shell.duckdb.org/#queries=v0," # sql = """ From a529734bdda0db0e2772ba61502a9106a69f3d7b Mon Sep 17 00:00:00 2001 From: Gabor Szarnyas Date: Fri, 27 Sep 2024 10:27:30 +0200 Subject: [PATCH 20/24] Add markdownlint-disable for bare url --- _posts/2024-09-27-sql-only-extensions.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/_posts/2024-09-27-sql-only-extensions.md b/_posts/2024-09-27-sql-only-extensions.md index b27a794244d..1cc9638e4e1 100644 --- a/_posts/2024-09-27-sql-only-extensions.md +++ b/_posts/2024-09-27-sql-only-extensions.md @@ -258,8 +258,12 @@ Any set of tables (or views!) will first be vertically stacked and then pivoted. ## Example Using `pivot_table` + + [Check out a live example using the extension in the DuckDB Wasm shell here](https://shell.duckdb.org/#queries=v0,CREATE-OR-REPLACE-TABLE-business_metrics-(-----product_line-VARCHAR%2C-product-VARCHAR%2C-year-INTEGER%2C-quarter-VARCHAR%2C-revenue-integer%2C-cost-integer-)~,INSERT-INTO-business_metrics-VALUES-----('Waterfowl-watercraft'%2C-'Duck-boats'%2C-2022%2C-'Q1'%2C-100%2C-100)%2C-----('Waterfowl-watercraft'%2C-'Duck-boats'%2C-2022%2C-'Q2'%2C-200%2C-100)%2C-----('Waterfowl-watercraft'%2C-'Duck-boats'%2C-2022%2C-'Q3'%2C-300%2C-100)%2C-----('Waterfowl-watercraft'%2C-'Duck-boats'%2C-2022%2C-'Q4'%2C-400%2C-100)%2C-----('Waterfowl-watercraft'%2C-'Duck-boats'%2C-2023%2C-'Q1'%2C-500%2C-100)%2C-----('Waterfowl-watercraft'%2C-'Duck-boats'%2C-2023%2C-'Q2'%2C-600%2C-100)%2C-----('Waterfowl-watercraft'%2C-'Duck-boats'%2C-2023%2C-'Q3'%2C-700%2C-100)%2C-----('Waterfowl-watercraft'%2C-'Duck-boats'%2C-2023%2C-'Q4'%2C-800%2C-100)%2C------('Duck-Duds'%2C-'Duck-suits'%2C-2022%2C-'Q1'%2C-10%2C-10)%2C-----('Duck-Duds'%2C-'Duck-suits'%2C-2022%2C-'Q2'%2C-20%2C-10)%2C-----('Duck-Duds'%2C-'Duck-suits'%2C-2022%2C-'Q3'%2C-30%2C-10)%2C-----('Duck-Duds'%2C-'Duck-suits'%2C-2022%2C-'Q4'%2C-40%2C-10)%2C-----('Duck-Duds'%2C-'Duck-suits'%2C-2023%2C-'Q1'%2C-50%2C-10)%2C-----('Duck-Duds'%2C-'Duck-suits'%2C-2023%2C-'Q2'%2C-60%2C-10)%2C-----('Duck-Duds'%2C-'Duck-suits'%2C-2023%2C-'Q3'%2C-70%2C-10)%2C-----('Duck-Duds'%2C-'Duck-suits'%2C-2023%2C-'Q4'%2C-80%2C-10)%2C------('Duck-Duds'%2C-'Duck-neckties'%2C-2022%2C-'Q1'%2C-1%2C-1)%2C-----('Duck-Duds'%2C-'Duck-neckties'%2C-2022%2C-'Q2'%2C-2%2C-1)%2C-----('Duck-Duds'%2C-'Duck-neckties'%2C-2022%2C-'Q3'%2C-3%2C-1)%2C-----('Duck-Duds'%2C-'Duck-neckties'%2C-2022%2C-'Q4'%2C-4%2C-1)%2C-----('Duck-Duds'%2C-'Duck-neckties'%2C-2023%2C-'Q1'%2C-5%2C-1)%2C-----('Duck-Duds'%2C-'Duck-neckties'%2C-2023%2C-'Q2'%2C-6%2C-1)%2C-----('Duck-Duds'%2C-'Duck-neckties'%2C-2023%2C-'Q3'%2C-7%2C-1)%2C-----('Duck-Duds'%2C-'Duck-neckties'%2C-2023%2C-'Q4'%2C-8%2C-1)%2C~,FROM-business_metrics~,INSTALL-pivot_table-from-community~,LOAD-'https%3A%2F%2Fcommunity extensions.duckdb.org%2Fv1.1.1%2Fwasm_eh%2Fpivot_table.duckdb_extension.wasm'~,DROP-TYPE-IF-EXISTS-columns_parameter_enum~,CREATE-TYPE-columns_parameter_enum-AS-ENUM-(FROM-build_my_enum(['business_metrics']%2C-['year'%2C-'quarter']%2C-[]))~,FROM-pivot_table(['business_metrics']%2C['sum(revenue)'%2C-'sum(cost)']%2C-['product_line'%2C-'product']%2C-['year'%2C-'quarter']%2C-[]%2C-subtotals-%3A%3D-1%2C-grand_totals-%3A%3D-1%2C-values_axis-%3A%3D-'rows')~)! + +
First we will create an example data table. We are a duck product distributor, and we are tracking our fowl finances. From d5ab0f0113f83ba741f2608f4a68e098524ccb13 Mon Sep 17 00:00:00 2001 From: Gabor Szarnyas Date: Fri, 27 Sep 2024 10:28:00 +0200 Subject: [PATCH 21/24] Formatting fixes --- _posts/2024-09-27-sql-only-extensions.md | 61 ++++++++++++------------ 1 file changed, 30 insertions(+), 31 deletions(-) diff --git a/_posts/2024-09-27-sql-only-extensions.md b/_posts/2024-09-27-sql-only-extensions.md index 1cc9638e4e1..3b82b24e658 100644 --- a/_posts/2024-09-27-sql-only-extensions.md +++ b/_posts/2024-09-27-sql-only-extensions.md @@ -5,8 +5,6 @@ author: "Alex Monahan" excerpt: "Easily create sharable extensions using only SQL macros that can apply to any table and any columns. We demonstrate the power of this capability with the pivot_table extension that provides Excel-style pivoting." --- - - ## The Power of SQL-Only Extensions SQL is not a new language. @@ -24,16 +22,17 @@ LOAD pivot_table; The extension can then be used in any query through SQL functions. -However, **not all of us are C++ developers**! +However, **not all of us are C++ developers**! Can we, as a SQL community, build up a set of SQL helper functions? -What would it take to build these extensions with *just SQL*? +What would it take to build these extensions with *just SQL*? ### Reusability -Traditionally, SQL is highly customized to the schema of the database on which it was written. +Traditionally, SQL is highly customized to the schema of the database on which it was written. Can we make it reusable? Some techniques for reusability were discussed in the [SQL Gymnasics post]({% post_url 2024-03-01-sql-gymnastics %}), but now we can go even further. With version 1.1, DuckDB's world-class friendly SQL dialect makes it possible to create macros that can be applied: + * To any tables * On any columns * Using any functions @@ -55,7 +54,7 @@ If I can do it, you can do it! ### Powerful SQL -All that said, just how valuable can a SQL `MACRO` be? +All that said, just how valuable can a SQL `MACRO` be? Can we do more than make small snippets? I'll make the case that you can do quite complex and powerful operations in DuckDB SQL using the `pivot_table` extension as an example. The `pivot_table` function allows for Excel-style pivots, including `subtotals`, `grand_totals`, and more. @@ -63,8 +62,9 @@ It is also very similar to the Pandas `pivot_table` function, but with all the s It contains over **250 tests**, so it is intended to be useful beyond just an example! To achieve this level of flexibility, the `pivot_table` extension uses many friendly and advanced SQL features: + * The [`query` function]({% post_url 2024-09-09-announcing-duckdb-110 %}#query-and-query_table-functions) to execute a SQL string -* The [`query_table` function]({% post_url 2024-09-09-announcing-duckdb-110 %}#query-and-query_table-functions) to query a list of tables +* The [`query_table` function]({% post_url 2024-09-09-announcing-duckdb-110 %}#query-and-query_table-functions) to query a list of tables * The [`COLUMNS` expression]({% link docs/sql/expressions/star.md %}#columns-expression) to select a dynamic list of columns * [List lambda functions]({% link docs/sql/functions/lambda.md %}) to build up the SQL statement passed into `query` * [`list_transform`]({% link docs/sql/functions/lambda.md %}#list_transformlist-lambda) for string manipulation like quoting @@ -81,7 +81,7 @@ To achieve this level of flexibility, the `pivot_table` extension uses many frie * [`ENUM`s]({% link docs/sql/statements/create_type.md %}) to determine what columns to pivot horizontally * And of course the [`PIVOT` function]({% link docs/sql/statements/pivot.md %}) for horizontal pivoting! -DuckDB's innovative syntax makes this extension possible! +DuckDB's innovative syntax makes this extension possible! So, we now have all 3 ingredients we will need: a central package manager, reusable macros, and enough syntactic flexibility to do valuable work. @@ -93,12 +93,14 @@ Let's walk through the steps to creating your own SQL-only extension. #### Extension Setup -The first step is to create your own GitHub repo from the [DuckDB Extension Template for SQL](https://github.com/duckdb/extension-template-sql) by clicking `Use this template`. +The first step is to create your own GitHub repo from the [DuckDB Extension Template for SQL](https://github.com/duckdb/extension-template-sql) by clicking `Use this template`. Then clone your new repository onto your local machine using the terminal: -```sh + +```batch git clone --recurse-submodules https://github.com//.git ``` + Note that `--recurse-submodules` will ensure DuckDB is pulled which is required to build the extension. Next, replace the name of the example extension with the name of your extension in all the right places by running the Python script below. @@ -118,13 +120,12 @@ GitHub Actions will also run tests on your extension to validate it is working p > Note The instructions are not written for a Windows audience, so we recommend GitHub Actions in that case! -```sh +```batch git add -A git commit -m "Initial commit of my SQL extension!" git push ``` - #### Write Your SQL Macros It it likely a bit faster to iterate if you test your macros directly in DuckDB. @@ -182,12 +183,12 @@ static const DefaultTableMacro _table_macros[] = { ``` {% endraw %} -That's it! +That's it! All we had to provide were the name of the function, the names of the parameters, and the text of our SQL macro. ### Testing the Extension -We also recommend adding some tests for your extension to the `.test` file. +We also recommend adding some tests for your extension to the `.test` file. This uses [sqllogictest]({% link docs/dev/sqllogictest/intro.md %}) to test with just SQL! Let's add the example from above. @@ -240,7 +241,6 @@ And there you have it! You have created a shareable DuckDB Community Extension. Now let's have a look at the `pivot_table` extension as an example of just how powerful a SQL-only extension can be. - ## Capabilities of the `pivot_table` Extension The `pivot_table` extension supports advanced pivoting functionality that was previously only available in spreadsheets, dataframe libraries, or custom host language functions. @@ -401,8 +401,8 @@ The functions and a brief description of each follows. ### Functions creating during refactoring for modularity -* `totals_list`: Build up a list as a part of enabling `subtotals` and `grand_totals`. -* `replace_zzz`: Rename `subtotal` and `grand_total` indicators after sorting so they are more friendly. +* `totals_list`: Build up a list as a part of enabling `subtotals` and `grand_totals`. +* `replace_zzz`: Rename `subtotal` and `grand_total` indicators after sorting so they are more friendly. ### Core pivoting logic functions @@ -420,20 +420,20 @@ DuckDB's automatic `PIVOT` syntax can automatically define this, but in our case The reason for this is that automatic pivoting runs 2 statements behind the scenes, but a `MACRO` must only be a single statement. If the `columns` parameter is not in use, this step is essentially a no-op, so it can be omitted or included for consistency (recommended). -The `query` and `query_table` functions only support `SELECT` statements (for security reasons), so the dynamic portion of the `ENUM` creation occurs in the function `build_my_enum`. +The `query` and `query_table` functions only support `SELECT` statements (for security reasons), so the dynamic portion of the `ENUM` creation occurs in the function `build_my_enum`. If this type of usage becomes common, features could be added to DuckDB to enable a `CREATE OR REPLACE` syntax for `ENUM` types, or possibly even temporary enums. -That would reduce this pattern from 3 statements down to 2. +That would reduce this pattern from 3 statements down to 2. Please let us know! The `build_my_enum` function uses a combination of `query_table` to pull from multiple input tables, and the `query` function so that double quotes (and correct character escaping) can be completed prior to passing in the list of table names. -It uses a similar pattern to the core `pivot_table` function: build up a SQL query as a string, then call it with `query`. +It uses a similar pattern to the core `pivot_table` function: build up a SQL query as a string, then call it with `query`. The SQL string is constructed using list lambda functions and the building block functions for quoting. ### The `pivot_table` function At its core, the `pivot_table` function determines the SQL required to generate the desired pivot based on which parameters are in use. -Since this SQL statement is a string at the end of the day, we can use a hierarchy of scalar SQL macros rather than a single large macro. +Since this SQL statement is a string at the end of the day, we can use a hierarchy of scalar SQL macros rather than a single large macro. This is a common traditional issue with SQL - it tends to not be very modular or reusable, but we are able to compartmentalize our logic wth DuckDB's syntax. > Note If a non-optional parameter is not in use, an empty string (`[]`) should be passed in. @@ -441,23 +441,23 @@ This is a common traditional issue with SQL - it tends to not be very modular or * `table_names`: A list of table or view names to aggregate or pivot. Multiple tables are combined with `UNION ALL BY NAME` prior to any other processing. * `values`: A list of aggregation metrics in the format `['agg_fn_1(col_1)', 'agg_fn_2(col_2)', ...]`. * `rows`: A list of column names to `SELECT` and `GROUP BY`. -* `columns`: A list of column names to `PIVOT` horizontally into a separate column per value in the original column. If multiple column names are passed in, only unique combinations of data that appear in the dataset are pivoted. +* `columns`: A list of column names to `PIVOT` horizontally into a separate column per value in the original column. If multiple column names are passed in, only unique combinations of data that appear in the dataset are pivoted. * Ex: If passing in a `columns` parameter like `['continent', 'country']`, only valid `continent` / `country` pairs will be included. * (no `Europe_Canada` column would be generated). -* `filters`: A list of `WHERE` clause expressions to be applied to the raw dataset prior to aggregating in the format `['col_1 = 123', 'col_2 LIKE ''woot%''', ...]`. +* `filters`: A list of `WHERE` clause expressions to be applied to the raw dataset prior to aggregating in the format `['col_1 = 123', 'col_2 LIKE ''woot%''', ...]`. * The `filters` are combined with `AND`. * `values_axis` (Optional): If multiple `values` are passed in, determine whether to create a separate row or column for each value. Either `rows` or `columns`, defaulting to `columns`. -* `subtotals` (Optional): If enabled, calculate the aggregate metric at multiple levels of detail based on the `rows` parameter. Either 0 or 1, defaulting to 0. +* `subtotals` (Optional): If enabled, calculate the aggregate metric at multiple levels of detail based on the `rows` parameter. Either 0 or 1, defaulting to 0. * `grand_totals` (Optional): If enabled, calculate the aggregate metric across all rows in the raw data in addition to at the granularity defined by `rows`. Either 0 or 1, defaulting to 0. #### No horizontal pivoting (no `columns` in use) If not using the `columns` parameter, no columns need to be pivoted horizontally. -As a result, a `GROUP BY` statement is used. +As a result, a `GROUP BY` statement is used. If `subtotals` are in use, the `ROLLUP` expression is used to calculate the `values` at the different levels of granularity. If `grand_totals` are in use, but not `subtotals`, the `GROUPING SETS` expression is used instead of `ROLLUP` to evaluate across all rows. -In this example, we build a summary of the `revenue` and `cost` of each `product_line` and `product`. +In this example, we build a summary of the `revenue` and `cost` of each `product_line` and `product`. ```sql FROM pivot_table(['business_metrics'], @@ -472,7 +472,7 @@ FROM pivot_table(['business_metrics'], ``` | product_line | product | sum(revenue) | sum("cost") | -|----------------------|---------------|--------------|-------------| +|----------------------|---------------|-------------:|------------:| | Duck Duds | Duck neckties | 36 | 8 | | Duck Duds | Duck suits | 360 | 80 | | Duck Duds | Subtotal | 396 | 88 | @@ -509,7 +509,7 @@ FROM pivot_table(['business_metrics'], ``` | product_line | product | 2022_sum(revenue) | 2022_sum("cost") | 2023_sum(revenue) | 2023_sum("cost") | -|----------------------|---------------|-------------------|------------------|-------------------|------------------| +|----------------------|---------------|------------------:|-----------------:|------------------:|-----------------:| | Duck Duds | Duck neckties | 10 | 4 | 26 | 4 | | Duck Duds | Duck suits | 100 | 40 | 260 | 40 | | Duck Duds | Subtotal | 110 | 44 | 286 | 44 | @@ -545,7 +545,7 @@ FROM pivot_table(['business_metrics'], ``` | product_line | product | value_names | 2022 | 2023 | -|----------------------|---------------|--------------|------|------| +|----------------------|---------------|--------------|-----:|-----:| | Duck Duds | Duck neckties | sum(cost) | 4 | 4 | | Duck Duds | Duck neckties | sum(revenue) | 10 | 26 | | Duck Duds | Duck suits | sum(cost) | 40 | 40 | @@ -570,7 +570,6 @@ Together we can write the ultimate pivoting capability just once and use it ever In the future, we have plans to further simplify the creation of SQL extensions. Of course, we would love your feedback! -[Join us on Discord](https://discord.duckdb.org/) in the `community-extensions` channel. +[Join us on Discord](https://discord.duckdb.org/) in the `community-extensions` channel. Happy analyzing! - From bf34e61513cef00a40be169616f5d235003b3c72 Mon Sep 17 00:00:00 2001 From: Gabor Szarnyas Date: Fri, 27 Sep 2024 13:07:47 +0200 Subject: [PATCH 22/24] Update _posts/2024-09-27-sql-only-extensions.md Co-authored-by: Carlo Piovesan --- _posts/2024-09-27-sql-only-extensions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-09-27-sql-only-extensions.md b/_posts/2024-09-27-sql-only-extensions.md index 3b82b24e658..6d271a72fae 100644 --- a/_posts/2024-09-27-sql-only-extensions.md +++ b/_posts/2024-09-27-sql-only-extensions.md @@ -45,7 +45,7 @@ They are very powerful when used in combination with other friendly SQL features Traditionally, there has been no central repository for SQL functions across databases, let alone across companies! DuckDB's community extensions can be that knowledge base. -DuckDB extensions can be used across all languages with a DuckDB client, including Python, NodeJS, Java, Rust, Go, and even Webassembly (Wasm)! +DuckDB extensions can be used across all languages with a DuckDB client, including Python, NodeJS, Java, Rust, Go, and even WebAssembly (Wasm)! If you are a DuckDB fan and a SQL user, you can share your expertise back to the community with an extension. This post will show you how! From 63907fae46722f8d92186500e080d928a314d313 Mon Sep 17 00:00:00 2001 From: Alex-Monahan <52226177+Alex-Monahan@users.noreply.github.com> Date: Fri, 27 Sep 2024 04:41:58 -0700 Subject: [PATCH 23/24] Improve Wasm preview link Co-authored-by: Carlo Piovesan --- _posts/2024-09-27-sql-only-extensions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-09-27-sql-only-extensions.md b/_posts/2024-09-27-sql-only-extensions.md index 6d271a72fae..4d198a71043 100644 --- a/_posts/2024-09-27-sql-only-extensions.md +++ b/_posts/2024-09-27-sql-only-extensions.md @@ -260,7 +260,7 @@ Any set of tables (or views!) will first be vertically stacked and then pivoted. -[Check out a live example using the extension in the DuckDB Wasm shell here](https://shell.duckdb.org/#queries=v0,CREATE-OR-REPLACE-TABLE-business_metrics-(-----product_line-VARCHAR%2C-product-VARCHAR%2C-year-INTEGER%2C-quarter-VARCHAR%2C-revenue-integer%2C-cost-integer-)~,INSERT-INTO-business_metrics-VALUES-----('Waterfowl-watercraft'%2C-'Duck-boats'%2C-2022%2C-'Q1'%2C-100%2C-100)%2C-----('Waterfowl-watercraft'%2C-'Duck-boats'%2C-2022%2C-'Q2'%2C-200%2C-100)%2C-----('Waterfowl-watercraft'%2C-'Duck-boats'%2C-2022%2C-'Q3'%2C-300%2C-100)%2C-----('Waterfowl-watercraft'%2C-'Duck-boats'%2C-2022%2C-'Q4'%2C-400%2C-100)%2C-----('Waterfowl-watercraft'%2C-'Duck-boats'%2C-2023%2C-'Q1'%2C-500%2C-100)%2C-----('Waterfowl-watercraft'%2C-'Duck-boats'%2C-2023%2C-'Q2'%2C-600%2C-100)%2C-----('Waterfowl-watercraft'%2C-'Duck-boats'%2C-2023%2C-'Q3'%2C-700%2C-100)%2C-----('Waterfowl-watercraft'%2C-'Duck-boats'%2C-2023%2C-'Q4'%2C-800%2C-100)%2C------('Duck-Duds'%2C-'Duck-suits'%2C-2022%2C-'Q1'%2C-10%2C-10)%2C-----('Duck-Duds'%2C-'Duck-suits'%2C-2022%2C-'Q2'%2C-20%2C-10)%2C-----('Duck-Duds'%2C-'Duck-suits'%2C-2022%2C-'Q3'%2C-30%2C-10)%2C-----('Duck-Duds'%2C-'Duck-suits'%2C-2022%2C-'Q4'%2C-40%2C-10)%2C-----('Duck-Duds'%2C-'Duck-suits'%2C-2023%2C-'Q1'%2C-50%2C-10)%2C-----('Duck-Duds'%2C-'Duck-suits'%2C-2023%2C-'Q2'%2C-60%2C-10)%2C-----('Duck-Duds'%2C-'Duck-suits'%2C-2023%2C-'Q3'%2C-70%2C-10)%2C-----('Duck-Duds'%2C-'Duck-suits'%2C-2023%2C-'Q4'%2C-80%2C-10)%2C------('Duck-Duds'%2C-'Duck-neckties'%2C-2022%2C-'Q1'%2C-1%2C-1)%2C-----('Duck-Duds'%2C-'Duck-neckties'%2C-2022%2C-'Q2'%2C-2%2C-1)%2C-----('Duck-Duds'%2C-'Duck-neckties'%2C-2022%2C-'Q3'%2C-3%2C-1)%2C-----('Duck-Duds'%2C-'Duck-neckties'%2C-2022%2C-'Q4'%2C-4%2C-1)%2C-----('Duck-Duds'%2C-'Duck-neckties'%2C-2023%2C-'Q1'%2C-5%2C-1)%2C-----('Duck-Duds'%2C-'Duck-neckties'%2C-2023%2C-'Q2'%2C-6%2C-1)%2C-----('Duck-Duds'%2C-'Duck-neckties'%2C-2023%2C-'Q3'%2C-7%2C-1)%2C-----('Duck-Duds'%2C-'Duck-neckties'%2C-2023%2C-'Q4'%2C-8%2C-1)%2C~,FROM-business_metrics~,INSTALL-pivot_table-from-community~,LOAD-'https%3A%2F%2Fcommunity extensions.duckdb.org%2Fv1.1.1%2Fwasm_eh%2Fpivot_table.duckdb_extension.wasm'~,DROP-TYPE-IF-EXISTS-columns_parameter_enum~,CREATE-TYPE-columns_parameter_enum-AS-ENUM-(FROM-build_my_enum(['business_metrics']%2C-['year'%2C-'quarter']%2C-[]))~,FROM-pivot_table(['business_metrics']%2C['sum(revenue)'%2C-'sum(cost)']%2C-['product_line'%2C-'product']%2C-['year'%2C-'quarter']%2C-[]%2C-subtotals-%3A%3D-1%2C-grand_totals-%3A%3D-1%2C-values_axis-%3A%3D-'rows')~)! +[Check out a live example using the extension in the DuckDB Wasm shell here](https://shell.duckdb.org/#queries=v0,CREATE-OR-REPLACE-TABLE-business_metrics-(-----product_line-VARCHAR%2C-product-VARCHAR%2C-year-INTEGER%2C-quarter-VARCHAR%2C-revenue-integer%2C-cost-integer-)~,INSERT-INTO-business_metrics-VALUES-----('Waterfowl-watercraft'%2C-'Duck-boats'%2C-2022%2C-'Q1'%2C-100%2C-100)%2C-----('Waterfowl-watercraft'%2C-'Duck-boats'%2C-2022%2C-'Q2'%2C-200%2C-100)%2C-----('Waterfowl-watercraft'%2C-'Duck-boats'%2C-2022%2C-'Q3'%2C-300%2C-100)%2C-----('Waterfowl-watercraft'%2C-'Duck-boats'%2C-2022%2C-'Q4'%2C-400%2C-100)%2C-----('Waterfowl-watercraft'%2C-'Duck-boats'%2C-2023%2C-'Q1'%2C-500%2C-100)%2C-----('Waterfowl-watercraft'%2C-'Duck-boats'%2C-2023%2C-'Q2'%2C-600%2C-100)%2C-----('Waterfowl-watercraft'%2C-'Duck-boats'%2C-2023%2C-'Q3'%2C-700%2C-100)%2C-----('Waterfowl-watercraft'%2C-'Duck-boats'%2C-2023%2C-'Q4'%2C-800%2C-100)%2C------('Duck-Duds'%2C-'Duck-suits'%2C-2022%2C-'Q1'%2C-10%2C-10)%2C-----('Duck-Duds'%2C-'Duck-suits'%2C-2022%2C-'Q2'%2C-20%2C-10)%2C-----('Duck-Duds'%2C-'Duck-suits'%2C-2022%2C-'Q3'%2C-30%2C-10)%2C-----('Duck-Duds'%2C-'Duck-suits'%2C-2022%2C-'Q4'%2C-40%2C-10)%2C-----('Duck-Duds'%2C-'Duck-suits'%2C-2023%2C-'Q1'%2C-50%2C-10)%2C-----('Duck-Duds'%2C-'Duck-suits'%2C-2023%2C-'Q2'%2C-60%2C-10)%2C-----('Duck-Duds'%2C-'Duck-suits'%2C-2023%2C-'Q3'%2C-70%2C-10)%2C-----('Duck-Duds'%2C-'Duck-suits'%2C-2023%2C-'Q4'%2C-80%2C-10)%2C------('Duck-Duds'%2C-'Duck-neckties'%2C-2022%2C-'Q1'%2C-1%2C-1)%2C-----('Duck-Duds'%2C-'Duck-neckties'%2C-2022%2C-'Q2'%2C-2%2C-1)%2C-----('Duck-Duds'%2C-'Duck-neckties'%2C-2022%2C-'Q3'%2C-3%2C-1)%2C-----('Duck-Duds'%2C-'Duck-neckties'%2C-2022%2C-'Q4'%2C-4%2C-1)%2C-----('Duck-Duds'%2C-'Duck-neckties'%2C-2023%2C-'Q1'%2C-5%2C-1)%2C-----('Duck-Duds'%2C-'Duck-neckties'%2C-2023%2C-'Q2'%2C-6%2C-1)%2C-----('Duck-Duds'%2C-'Duck-neckties'%2C-2023%2C-'Q3'%2C-7%2C-1)%2C-----('Duck-Duds'%2C-'Duck-neckties'%2C-2023%2C-'Q4'%2C-8%2C-1)%2C~,FROM-business_metrics~,INSTALL-pivot_table-from-community~,LOAD-'pivot_table'~,DROP-TYPE-IF-EXISTS-columns_parameter_enum~,CREATE-TYPE-columns_parameter_enum-AS-ENUM-(FROM-build_my_enum(['business_metrics']%2C-['year'%2C-'quarter']%2C-[]))~,FROM-pivot_table(['business_metrics']%2C['sum(revenue)'%2C-'sum(cost)']%2C-['product_line'%2C-'product']%2C-['year'%2C-'quarter']%2C-[]%2C-subtotals-%3A%3D-1%2C-grand_totals-%3A%3D-1%2C-values_axis-%3A%3D-'rows')~)! From f7d86a6aaec4549c6e291ac4f645781cb67c037d Mon Sep 17 00:00:00 2001 From: Alex-Monahan <52226177+Alex-Monahan@users.noreply.github.com> Date: Fri, 27 Sep 2024 04:42:38 -0700 Subject: [PATCH 24/24] Update scripts/sql_to_wasm_shell_link.py Co-authored-by: Carlo Piovesan --- scripts/sql_to_wasm_shell_link.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/scripts/sql_to_wasm_shell_link.py b/scripts/sql_to_wasm_shell_link.py index ebcef07251d..f107c77fa07 100644 --- a/scripts/sql_to_wasm_shell_link.py +++ b/scripts/sql_to_wasm_shell_link.py @@ -43,7 +43,7 @@ FROM business_metrics; INSTALL pivot_table from community; -LOAD 'https://community-extensions.duckdb.org/v1.1.1/wasm_eh/pivot_table.duckdb_extension.wasm'; +LOAD 'pivot_table'; DROP TYPE IF EXISTS columns_parameter_enum;