Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quack node #25

Merged
merged 5 commits into from
May 14, 2024
Merged

Quack node #25

merged 5 commits into from
May 14, 2024

Conversation

mkaruza
Copy link
Collaborator

@mkaruza mkaruza commented May 6, 2024

  • Hook into postgres planner rather than on execution. Idea is to split DuckDB execution into prepare/execute phase. If DuckDB prepare is not valid will we fall back to normal postgres execution.

  • Another benefit of having planner hook is that custom RETURN TABLE FUNCTIONS can be used in PG syntax. This custom function should only be created to pass parsing phase. Wee can now use SELECT * FROM read_parquet('...') that will read parquet files through DuckDB.

  • Created quack node which will be used for duckdb exection. This quack nodes is defined as Custom Scan Node. EXPLAIN will work out of box with this approach - we'll output just explain plan from DuckDB execution.

  • Added httpfs extension to be build together with parquer extension.

  • Filter for DATE/TIMESTAMP types

}
}

quackNode->custom_private = list_make2(db, preparedQuery.release());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are we freeing this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be in Quack_EndCustomScan

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we release the unique_ptr here, so we are now responsible for freeing it
We place this in custom_private, which is never passed along to anything else or has free called on it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pointers from custom_private will be stored inside QuackScanState (Quack_CreateCustomScanState callback). Delete will be called to this pointers.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That happens implicitly?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

End callback is called at the end of query execution.

@Tishj
Copy link
Collaborator

Tishj commented May 8, 2024

Correct me if I'm wrong, but this is what I understand from this PR:

We no longer do the ExecutorRunHook, so Postgres is now in charge of execution.
Instead we create a QuackNode, and we replace the entire Plan (akin to a PhysicalPlan in duckdb I imagine?) with this one node.

Inside this node we run DuckDB with the given query, outputting to the ss_ScanStateSlot for every "exec" call made by postgres.


What happens when our execution fails here? Can we fall back to Postgres or not?
I assume we can't because we are no longer using a plannedStmt that Postgres can execute

* Hook into postgres planner rather than on execution. Idea is to split
  DuckDB execution into prepare/execute phase. If DuckDB prepare is not
  valid will we fall back to normal postgres execution.

* Another benefit of having planner hook is that custom RETURN TABLE
  FUNCTIONS can be used in PG syntax. This custom function should only
  be created to pass parsing phase. Wee can now use `SELECT * FROM
  read_parquet('...')` that will read parquet files through DuckDB.

* Created quack node which will be used for duckdb exection. This quack
  nodes is defined as `Custom Scan Node`. EXPLAIN will work out of box
  with this approach - we'll output just explain plan from DuckDB
  execution.

* Added `httpfs` extension to be build together with parquer extension.
* Memory allocated columns needs to be released for each result tuple
* Filter for TIMESTAMP
@mkaruza
Copy link
Collaborator Author

mkaruza commented May 8, 2024

@Tishj You are right. We are now hooking immediately after parsing is done (in this way we can use read_parquet for example - which is declared as dummy PLSQL function).

Execution is split between Prepare/Execute. Planning will do Prepare - my assumption if DuckDB Prepare fails for whatever reason - than PostgreSQL planning is done.

Logic is based on Prepare to be responsible for knowing if query can be executed or not - is that true?

@Tishj
Copy link
Collaborator

Tishj commented May 13, 2024

Logic is based on Prepare to be responsible for knowing if query can be executed or not - is that true?

That is correct for the most part, but some InvalidInputExceptions or NotImplementedExceptions could still be reached during execution

@mkaruza
Copy link
Collaborator Author

mkaruza commented May 13, 2024

Logic is based on Prepare to be responsible for knowing if query can be executed or not - is that true?

That is correct for the most part, but some InvalidInputExceptions or NotImplementedExceptions could still be reached during execution

After Prepare is called we are checking preparedQuery->HasError() for any errror - should maybe Prepare call be in try/catch if some of these errors are throwing?

@Tishj
Copy link
Collaborator

Tishj commented May 13, 2024

Logic is based on Prepare to be responsible for knowing if query can be executed or not - is that true?

That is correct for the most part, but some InvalidInputExceptions or NotImplementedExceptions could still be reached during execution

After Prepare is called we are checking preparedQuery->HasError() for any errror - should maybe Prepare call be in try/catch if some of these errors are throwing?

No, to document what we discussed during the meeting, exceptions caused by the data can not be caught by preparing alone, and will arise only during execution.

I was merely bringing up the fact that we can still hit an exception even if prepare succeeds, and thus can't fall back later to postgres execution/planning when that happens

Checking preparedQuery->HasError() is great 👍

@mkaruza mkaruza merged commit 6864bd2 into main May 14, 2024
2 checks passed
@mkaruza mkaruza deleted the quack-node branch May 14, 2024 09:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants