Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicitly convert datum values before filter operation #194

Merged
merged 1 commit into from
Sep 20, 2024

Conversation

mkaruza
Copy link
Collaborator

@mkaruza mkaruza commented Sep 19, 2024

Datums can contain a pass-by-value or a pass-by-reference type. We were simply casting the Datum for a pass by value type, but that is not always correct. Specifically for floating point numbers this would not always result in the correct floating point number. And int64 values are pass-by-reference on 32bit systems, but pass-by-value on 64bit systems. Postgres provides the DatumGetXyz macros/functions to work around these problems. This starts using these functions in FilterOperationSwitch to avoid these casting problems there.

Fixes #189

@mkaruza mkaruza requested a review from JelteF September 19, 2024 09:15
@mkaruza mkaruza force-pushed the float-filter-op branch 2 times, most recently from 95fb4d9 to 89fc42d Compare September 19, 2024 09:23
@mkaruza mkaruza changed the title Explicilty convert FLOAT4/FLOAT8 values before filter operation Explicitly convert FLOAT4/FLOAT8 values before filter operation Sep 19, 2024
Comment on lines 8 to 10
SELECT COUNT(*) = 1 FROM query_filter_float WHERE a < 1.0;
SELECT COUNT(*) = 2 FROM query_filter_float WHERE a <= 1.0;
SELECT COUNT(*) = 2 FROM query_filter_float WHERE a < 1.1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my experience you should generally avoid returning booleans from queries in pgregress tests, because it makes it harder to debug the test when it starts failing. With your current code you only know that the test failed, but not why. With the following, you could see how many rows were found, which can be helpful with debugging.

Suggested change
SELECT COUNT(*) = 1 FROM query_filter_float WHERE a < 1.0;
SELECT COUNT(*) = 2 FROM query_filter_float WHERE a <= 1.0;
SELECT COUNT(*) = 2 FROM query_filter_float WHERE a < 1.1;
SELECT COUNT(*) FROM query_filter_float WHERE a < 1.0;
SELECT COUNT(*) FROM query_filter_float WHERE a <= 1.0;
SELECT COUNT(*) FROM query_filter_float WHERE a < 1.1;

@@ -55,9 +55,9 @@ FilterOperationSwitch(Datum &value, duckdb::Value &constant, Oid type_oid) {
case INT8OID:
return TemplatedFilterOperation<int64_t, OP>(value, constant);
case FLOAT4OID:
return TemplatedFilterOperation<float, OP>(value, constant);
return TemplatedFilterOperation<float, OP>(DatumGetFloat4(value), constant);
Copy link
Collaborator

@JelteF JelteF Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like we should probably do this explicit conversion for all types here, not just for floats.

@mkaruza mkaruza changed the title Explicitly convert FLOAT4/FLOAT8 values before filter operation Explicitly convert datum values before filter operation Sep 20, 2024
@mkaruza mkaruza requested a review from JelteF September 20, 2024 09:04
Copy link
Collaborator

@JelteF JelteF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two minor suggestions, other than that this is good IMO.

case DATEOID: {
Datum date_datum = static_cast<int32_t>(value + pgduckdb::PGDUCKDB_DUCK_DATE_OFFSET);
return TemplatedFilterOperation<int32_t, OP>(date_datum, constant);
int32_t date = static_cast<int32_t>(value + pgduckdb::PGDUCKDB_DUCK_DATE_OFFSET);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
int32_t date = static_cast<int32_t>(value + pgduckdb::PGDUCKDB_DUCK_DATE_OFFSET);
int32_t date = static_cast<int32_t>(DatumGetDate(value) + pgduckdb::PGDUCKDB_DUCK_DATE_OFFSET);

}
case TIMESTAMPOID: {
Datum timestamp_datum = static_cast<int64_t>(value + pgduckdb::PGDUCKDB_DUCK_TIMESTAMP_OFFSET);
return TemplatedFilterOperation<int64_t, OP>(timestamp_datum, constant);
int64_t timestamp = static_cast<int64_t>(value + pgduckdb::PGDUCKDB_DUCK_TIMESTAMP_OFFSET);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
int64_t timestamp = static_cast<int64_t>(value + pgduckdb::PGDUCKDB_DUCK_TIMESTAMP_OFFSET);
int64_t timestamp = static_cast<int64_t>(DatumGetTimestamp(value) + pgduckdb::PGDUCKDB_DUCK_TIMESTAMP_OFFSET);

@JelteF
Copy link
Collaborator

JelteF commented Sep 20, 2024

I updated the PR description with some additional info/explanation of the problem.

Datums can contain a pass-by-value or a pass-by-reference type. We were
simply casting the Datum for a pass by value type, but that is not always
correct. Specifically for floating point numbers this would not always
result in the correct floating point number. And int64 values are
pass-by-reference on 32bit systems, but pass-by-value on 64bit systems.
Postgres provides the DatumGetXyz macros/functions to work around these problems.
This starts using these functions in FilterOperationSwitch to avoid
these casting problems there.
@mkaruza
Copy link
Collaborator Author

mkaruza commented Sep 20, 2024

Update commit message.

@JelteF
Copy link
Collaborator

JelteF commented Sep 20, 2024

Update commit message.

Afaict the repo is configured to use the PR description as the commit message when merging by default, when doing a squash-merge. That's why I usually like to the PR description to explain a bit what the PR is doing before merging. (it's even better if the PR description is helpful when opening the PR ofcourse, to make it easy for reviewing, but in this case I was able to understand what the issue was from context I had).

@mkaruza mkaruza merged commit e4b914a into main Sep 20, 2024
3 checks passed
@mkaruza mkaruza deleted the float-filter-op branch September 20, 2024 14:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Comparisons with double precision type return wrong results
2 participants