Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support extended list types #282

Merged
merged 12 commits into from
Nov 14, 2024
Merged

Conversation

Reminiscent
Copy link
Contributor

@Reminiscent Reminiscent commented Oct 14, 2024

Related to #101

This pull request adds more array types. The changes include:

  1. Extended Array Type Support:
  • Added support for additional array types including CHAR, SMALLINT, VARCHAR, TIMESTAMP, FLOAT4, FLOAT8, NUMERIC, UUID and JSON.
  • Implemented support for both single-dimension and two-dimensional arrays for these types.
  1. Improved Type Conversion:
  • Refactored the type conversion logic to handle the new array types efficiently.
  • Implemented specialized conversion functions for each supported data type to ensure accurate data representation between PostgreSQL and DuckDB.
  1. Enhanced PostgresArrayAppendState:
  • Modified to handle edge cases, such as empty arrays and NULL values, more robustly.
  1. Updated Test Suite:
  • Expanded the test suite to cover the newly supported array types and dimensions.
  1. Code Refactoring:
  • Introduced a more generic approach to array handling, reducing code duplication and improving maintainability.
    Implemented the PostgresOIDMapping structure to streamline type-specific operations.

@Reminiscent Reminiscent force-pushed the Extended-list-type-support branch 2 times, most recently from 80f9f13 to a3b9268 Compare October 14, 2024 14:22
@Reminiscent Reminiscent changed the title [WIP] support extended list types support extended list types Oct 14, 2024
@JelteF
Copy link
Collaborator

JelteF commented Oct 14, 2024

Thanks a lot! This looks great. We're currently preparing for a 0.1.0 release, and given the size of the PR we probably don't want to squeeze it in before that release (we're going to be mostly working on stabilizing what we have soon). But this looks like a good candidate to merge early for the release cycle after that one.

Copy link
Collaborator

@JelteF JelteF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some quick cursory review. I'll look at the actual code later.

Tests are failing on PG17. Seemingly because of a changed error message string. If you cannot change the test to create a consistent error message across postgres versions, then it's probably easiest to create a Python test in test/pycheck directory. That way you can change the expected error message based on the postgres version.

test/regression/schedule Outdated Show resolved Hide resolved
src/pgduckdb_types.cpp Show resolved Hide resolved
@JelteF JelteF added the enhancement New feature or request label Oct 15, 2024
@JelteF JelteF added this to the 0.2.0 milestone Oct 15, 2024
Copy link
Collaborator

@JelteF JelteF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this seems like a good change. I quite like the refactoring that is done. The only real problem with this implementation is the current usage of SearchSysCache1, which is not thread-safe.

src/pgduckdb_types.cpp Outdated Show resolved Hide resolved
template <int32_t OID>
struct PostgresIntegerOIDMapping {};
static Datum
ConvertCharDatum(const duckdb::Value &value) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I very much like these new functions. I think taking that refactor on its own would make the code already a lot better.

numeric_var = ConvertNumeric<int64_t>(value.GetValueUnsafe<int64_t>(), scale);
break;
}
case duckdb::PhysicalType::UINT64: {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you didn't write this code, just moved it around. But reading this now it makes me wonder. Why is this the only unsigned integer type that we check for the physical type? @mkaruza do you know?

src/pgduckdb_types.cpp Outdated Show resolved Hide resolved
src/pgduckdb_types.cpp Outdated Show resolved Hide resolved
@@ -236,6 +396,16 @@ struct PostgresArrayAppendState {
auto &values = duckdb::ListValue::GetChildren(value);
idx_t to_append = values.size();

if (to_append == 0 && dimension < number_of_dimensions-1) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be new code. Could you explain in a comment what edge case this is handling? Because that is not super clear from me from just the code. Also I'm curious why this was not necessary before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. Use another way to solve this issues.
The origin issue is when we insert empty array into multi-dimension array. The dimension[1] to dimension[number_of_dimensions-1] can not be set because of the logic of AppendValueAtDimension function. So we use the dimension to call the function ConstructArray. It will cause overflow. The reason why it not necessary before is lack of the test cases. When you insert empty array into multi-dimension array. It will happen.

Comment on lines 488 to 491
if (append_state.expected_values > 0) {
pfree(datums);
pfree(nulls);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this change should be necessary.

Copy link
Contributor Author

@Reminiscent Reminiscent Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think the pfree itself can handle this. But when I remove the if condition, and run the tests. I'll get the

CREATE TABLE char_array_2d(a CHAR(1)[][]);
INSERT INTO char_array_2d VALUES
    ('{{"a","b"},{"c","d"}}'),
    ('{{"e","f","g"},{"h","i","j"}}'),
    (NULL),
    ('{{"k","l"},{"m",NULL}}'),
    ('{}');
SELECT * FROM char_array_2d;
server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.
connection to server was lost

. That's why I add this. It is a little bit strange.

src/pgduckdb_types.cpp Outdated Show resolved Hide resolved
src/pgduckdb_types.cpp Outdated Show resolved Hide resolved
@Reminiscent Reminiscent force-pushed the Extended-list-type-support branch 3 times, most recently from 8a99eba to 8445d88 Compare November 7, 2024 00:17
yanchengpeng added 5 commits November 7, 2024 00:41
format
remove the useless change for clean

remove the useless change for clean

remove the useless change for clean
@Reminiscent
Copy link
Contributor Author

Some quick cursory review. I'll look at the actual code later.

Tests are failing on PG17. Seemingly because of a changed error message string. If you cannot change the test to create a consistent error message across postgres versions, then it's probably easiest to create a Python test in test/pycheck directory. That way you can change the expected error message based on the postgres version.

Are there any easy way to run the tests in different postgres version in local? I missed the error message in PG17. Can you help me to trigger the tests again? Thanks!

@Reminiscent
Copy link
Contributor Author

@JelteF Thanks for your review! All the comments(except the test failed in PG17) are address. PTAL

@JelteF
Copy link
Collaborator

JelteF commented Nov 7, 2024

Regarding the test failure in PG17, which happens due to a different error message in that PG version. The approach we take so far for cases like this is to move this test to a Python test in the test/pycheck directory.

@Reminiscent
Copy link
Contributor Author

Regarding the test failure in PG17, which happens due to a different error message in that PG version. The approach we take so far for cases like this is to move this test to a Python test in the test/pycheck directory.

I known your meaning. But find the test cases can pass now. Just wonder will you run the regression tests in different postgres version when you develop some new features?

@JelteF
Copy link
Collaborator

JelteF commented Nov 7, 2024

But find the test cases can pass now.

Ah if it passes now you can disregard my comment.

Comment on lines 407 to 411
template <>
Datum
PostgresOIDMapping<BOOLOID>::ToDatum(const duckdb::Value &val) {
return ConvertBoolDatum(val);
}
Copy link
Collaborator

@JelteF JelteF Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These overrides are unnecessary afaict, the default implementation for PostgresOIDMapping will now do the correct thing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JelteF Yes, you are right. Remove the useless part. PTAL.

static constexpr bool typbyval = true;
static constexpr char typalign = 'c';

static Datum
Copy link
Collaborator

@JelteF JelteF Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's mark all these ToDatum functions as static inline. They are so tiny that there's no need for introducing calling overhead.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do the same for some of the smaller Convert functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JelteF Updated. Thanks for your review. PTAL

Copy link
Collaborator

@JelteF JelteF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now. @Y-- do you have some thoughts?

Copy link
Collaborator

@Y-- Y-- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, great refactoring + added list support.
I can think of a few more cleanup/refactoring we could do in a separated PR like:

  • split functions that have big switch/case to return in each case and handle the type mapping/dimensionality separately
  • use C++ vectors/allocator & raii pattern for allocations rather than palloc

Overall a lot of good stuff, thanks for the PR!

@@ -301,7 +540,6 @@ ConvertDuckToPostgresArray(TupleTableSlot *slot, duckdb::Value &value, idx_t col
D_ASSERT(value.type().id() == duckdb::LogicalTypeId::LIST);
auto &child_type = GetChildTypeRecursive(value.type());
auto child_id = child_type.id();
D_ASSERT(child_id == OP::ExpectedType());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you've removed the D_ASSERT, you can now remove the unused variable child_id?

@@ -192,6 +169,269 @@ ConvertNumeric(T value, idx_t scale) {
return result;
}

static Datum
ConvertNumericDatum(const duckdb::Value &value) {
if (value.type().id() == duckdb::LogicalTypeId::DOUBLE) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we could assign value.type().id() to a const here and re-use it a lot in this function

@Reminiscent
Copy link
Contributor Author

@Y-- Updated. Thanks for your review. PTAL.

@Reminiscent
Copy link
Contributor Author

LGTM, great refactoring + added list support. I can think of a few more cleanup/refactoring we could do in a separated PR like:

  • split functions that have big switch/case to return in each case and handle the type mapping/dimensionality separately
  • use C++ vectors/allocator & raii pattern for allocations rather than palloc

Overall a lot of good stuff, thanks for the PR!

I think you are right. After this PR has been merged. I'd like to do this. Thanks!

Copy link
Collaborator

@Y-- Y-- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@Y-- Y-- enabled auto-merge (squash) November 14, 2024 07:56
@Y-- Y-- merged commit d75ed5d into duckdb:main Nov 14, 2024
5 checks passed
@Reminiscent Reminiscent deleted the Extended-list-type-support branch November 14, 2024 13:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants