-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add missing iceberg_* functions with parameters #275
Conversation
RETURNS SETOF record LANGUAGE 'plpgsql' AS | ||
$func$ | ||
BEGIN | ||
RAISE EXCEPTION 'Function `iceberg_scan(TEXT)` only works with Duckdb execution.'; | ||
END; | ||
$func$; | ||
|
||
CREATE OR REPLACE FUNCTION iceberg_metadata(path text, allow_moved_paths BOOLEAN DEFAULT FALSE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These new functions should be added to the function_names
array in BuildDuckdbOnlyFunctions
.
sql/pg_duckdb--0.0.1.sql
Outdated
skip_schema_inference BOOLEAN DEFAULT FALSE, | ||
version TEXT DEFAULT 'version-hint.text', | ||
version_name_format TEXT DEFAULT 'v%s%s.metadata.json,%s%s.metadata.json') | ||
RETURNS SETOF record LANGUAGE 'plpgsql' AS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two new functions have fixed result columns, so no need to use record
here. We can return actual rows with known column names and types. Then people don't have to manually specify the columns using AS (...)
CREATE OR REPLACE FUNCTION iceberg_scan(path text, allow_moved_paths BOOLEAN DEFAULT FALSE, | ||
mode TEXT DEFAULT '', | ||
metadata_compression_codec TEXT DEFAULT 'none', | ||
skip_schema_inference BOOLEAN DEFAULT FALSE, | ||
version TEXT DEFAULT 'version-hint.text', | ||
version_name_format TEXT DEFAULT 'v%s%s.metadata.json,%s%s.metadata.json') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where did you find all these arguments? I only see allow_moved_paths
in the docs here: https://duckdb.org/docs/extensions/iceberg.html#access-iceberg-metadata
It would be good to add a code comment with a link to the source for these arguments, so we have a reference in case we need to update them in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have checked source code from iceberg extension
f916871
to
e5d8e6e
Compare
src/pgduckdb_types.cpp
Outdated
@@ -571,6 +571,7 @@ GetPostgresDuckDBType(duckdb::LogicalType type) { | |||
case duckdb::LogicalTypeId::INTEGER: | |||
return INT4OID; | |||
case duckdb::LogicalTypeId::BIGINT: | |||
case duckdb::LogicalTypeId::UBIGINT: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is correct. All other unsigned ints use a type that is one size larger than their own type, which makes sense. The largest half of an unsigned int does not fit into a signed int of the same size.
We might want to start depending on pguint
, to provide actual unsigned integer types: https://github.com/petere/pguint
This change definitely seems like a better fit for another PR though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ubigint is required as manifest_sequence_number
field returns that from duckdb execution - so we don't match it and execution fails. i can add comment for this in code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, that's pretty annoying. Now I understand why you added it to this PR.
The main problem I have is that right now we fail hard for ubigint, which seems nicer than to silently convert too-high ubigints to negative numbers.
How about we just don't return that field for now? i.e. make the type:
CREATE TYPE iceberg_metadata_record AS (
manifest_path TEXT,
-- manifest_sequence_number is not included, because it is a UBIGINT, which we don't support yet.
manifest_content TEXT,
status TEXT,
content TEXT,
file_path TEXT
);
And then in another PR we can support UBIGINT, probably by converting it into a NUMERIC instead of a BIGINT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the UBIGINT change should be removed from this PR, but other than that this now LGTM.
e5d8e6e
to
a91d9a5
Compare
This also adds support for conversion to UBIGINT because one of the Iceberg functions return that type.