Quack node #25

mkaruza · 2024-05-06T08:06:11Z

Hook into postgres planner rather than on execution. Idea is to split DuckDB execution into prepare/execute phase. If DuckDB prepare is not valid will we fall back to normal postgres execution.
Another benefit of having planner hook is that custom RETURN TABLE FUNCTIONS can be used in PG syntax. This custom function should only be created to pass parsing phase. Wee can now use SELECT * FROM read_parquet('...') that will read parquet files through DuckDB.
Created quack node which will be used for duckdb exection. This quack nodes is defined as Custom Scan Node. EXPLAIN will work out of box with this approach - we'll output just explain plan from DuckDB execution.
Added httpfs extension to be build together with parquer extension.
Filter for DATE/TIMESTAMP types

Tishj · 2024-05-08T08:13:29Z

src/quack_planner.cpp

+		}
+	}
+
+	quackNode->custom_private = list_make2(db, preparedQuery.release());


Where are we freeing this?

Should be in Quack_EndCustomScan

But we release the unique_ptr here, so we are now responsible for freeing it
We place this in custom_private, which is never passed along to anything else or has free called on it.

pointers from custom_private will be stored inside QuackScanState (Quack_CreateCustomScanState callback). Delete will be called to this pointers.

That happens implicitly?

End callback is called at the end of query execution.

Tishj · 2024-05-08T08:28:57Z

Correct me if I'm wrong, but this is what I understand from this PR:

We no longer do the ExecutorRunHook, so Postgres is now in charge of execution.
Instead we create a QuackNode, and we replace the entire Plan (akin to a PhysicalPlan in duckdb I imagine?) with this one node.

Inside this node we run DuckDB with the given query, outputting to the ss_ScanStateSlot for every "exec" call made by postgres.

What happens when our execution fails here? Can we fall back to Postgres or not?
I assume we can't because we are no longer using a plannedStmt that Postgres can execute

* Hook into postgres planner rather than on execution. Idea is to split DuckDB execution into prepare/execute phase. If DuckDB prepare is not valid will we fall back to normal postgres execution. * Another benefit of having planner hook is that custom RETURN TABLE FUNCTIONS can be used in PG syntax. This custom function should only be created to pass parsing phase. Wee can now use `SELECT * FROM read_parquet('...')` that will read parquet files through DuckDB. * Created quack node which will be used for duckdb exection. This quack nodes is defined as `Custom Scan Node`. EXPLAIN will work out of box with this approach - we'll output just explain plan from DuckDB execution. * Added `httpfs` extension to be build together with parquer extension.

* Memory allocated columns needs to be released for each result tuple * Filter for TIMESTAMP

mkaruza · 2024-05-08T09:18:14Z

@Tishj You are right. We are now hooking immediately after parsing is done (in this way we can use read_parquet for example - which is declared as dummy PLSQL function).

Execution is split between Prepare/Execute. Planning will do Prepare - my assumption if DuckDB Prepare fails for whatever reason - than PostgreSQL planning is done.

Logic is based on Prepare to be responsible for knowing if query can be executed or not - is that true?

Tishj · 2024-05-13T08:15:50Z

Logic is based on Prepare to be responsible for knowing if query can be executed or not - is that true?

That is correct for the most part, but some InvalidInputExceptions or NotImplementedExceptions could still be reached during execution

mkaruza · 2024-05-13T08:30:28Z

Logic is based on Prepare to be responsible for knowing if query can be executed or not - is that true?

That is correct for the most part, but some InvalidInputExceptions or NotImplementedExceptions could still be reached during execution

After Prepare is called we are checking preparedQuery->HasError() for any errror - should maybe Prepare call be in try/catch if some of these errors are throwing?

Tishj · 2024-05-13T16:31:37Z

Logic is based on Prepare to be responsible for knowing if query can be executed or not - is that true?

That is correct for the most part, but some InvalidInputExceptions or NotImplementedExceptions could still be reached during execution

After Prepare is called we are checking preparedQuery->HasError() for any errror - should maybe Prepare call be in try/catch if some of these errors are throwing?

No, to document what we discussed during the meeting, exceptions caused by the data can not be caught by preparing alone, and will arise only during execution.

I was merely bringing up the fact that we can still hit an exception even if prepare succeeds, and thus can't fall back later to postgres execution/planning when that happens

Checking preparedQuery->HasError() is great 👍

mkaruza requested a review from Tishj May 6, 2024 08:06

This was referenced May 6, 2024

ability to read parquet files via https/s3 via duckdb #20

Closed

EXPLAIN should show duckdb is being used #13

Closed

Tishj reviewed May 8, 2024

View reviewed changes

mkaruza added 3 commits May 8, 2024 11:10

Fixed query projection / Added filter for DATE

2ae51dc

Release memory that was allocated in result tuple / TIMESTAMP filter

9e092f5

* Memory allocated columns needs to be released for each result tuple * Filter for TIMESTAMP

mkaruza force-pushed the quack-node branch from ce4e16b to 9e092f5 Compare May 8, 2024 09:10

mkaruza added 2 commits May 13, 2024 08:53

Check if catalog table is in RTE_SUBQUERY

c40d805

Rerecord regression test

cd4417a

mkaruza merged commit 6864bd2 into main May 14, 2024
2 checks passed

mkaruza deleted the quack-node branch May 14, 2024 09:02

mkaruza mentioned this pull request May 15, 2024

Interrupted execution should clean all resources in threads #21

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quack node #25

Quack node #25

mkaruza commented May 6, 2024 •

edited

Loading

Tishj May 8, 2024

mkaruza May 8, 2024

Tishj May 8, 2024

mkaruza May 8, 2024

Tishj May 13, 2024

mkaruza May 13, 2024

Tishj commented May 8, 2024 •

edited

Loading

mkaruza commented May 8, 2024 •

edited

Loading

Tishj commented May 13, 2024

mkaruza commented May 13, 2024

Tishj commented May 13, 2024 •

edited

Loading

Quack node #25

Quack node #25

Conversation

mkaruza commented May 6, 2024 • edited Loading

Tishj May 8, 2024

Choose a reason for hiding this comment

mkaruza May 8, 2024

Choose a reason for hiding this comment

Tishj May 8, 2024

Choose a reason for hiding this comment

mkaruza May 8, 2024

Choose a reason for hiding this comment

Tishj May 13, 2024

Choose a reason for hiding this comment

mkaruza May 13, 2024

Choose a reason for hiding this comment

Tishj commented May 8, 2024 • edited Loading

mkaruza commented May 8, 2024 • edited Loading

Tishj commented May 13, 2024

mkaruza commented May 13, 2024

Tishj commented May 13, 2024 • edited Loading

mkaruza commented May 6, 2024 •

edited

Loading

Tishj commented May 8, 2024 •

edited

Loading

mkaruza commented May 8, 2024 •

edited

Loading

Tishj commented May 13, 2024 •

edited

Loading