Explicitly convert datum values before filter operation #194

mkaruza · 2024-09-19T09:15:12Z

Datums can contain a pass-by-value or a pass-by-reference type. We were simply casting the Datum for a pass by value type, but that is not always correct. Specifically for floating point numbers this would not always result in the correct floating point number. And int64 values are pass-by-reference on 32bit systems, but pass-by-value on 64bit systems. Postgres provides the DatumGetXyz macros/functions to work around these problems. This starts using these functions in FilterOperationSwitch to avoid these casting problems there.

Fixes #189

JelteF · 2024-09-19T11:14:37Z

test/regression/sql/query_filter.sql

+SELECT COUNT(*) = 1 FROM query_filter_float WHERE a < 1.0;
+SELECT COUNT(*) = 2 FROM query_filter_float WHERE a <= 1.0;
+SELECT COUNT(*) = 2 FROM query_filter_float WHERE a < 1.1;


In my experience you should generally avoid returning booleans from queries in pgregress tests, because it makes it harder to debug the test when it starts failing. With your current code you only know that the test failed, but not why. With the following, you could see how many rows were found, which can be helpful with debugging.

Suggested change

SELECT COUNT(*) = 1 FROM query_filter_float WHERE a < 1.0;

SELECT COUNT(*) = 2 FROM query_filter_float WHERE a <= 1.0;

SELECT COUNT(*) = 2 FROM query_filter_float WHERE a < 1.1;

SELECT COUNT(*) FROM query_filter_float WHERE a < 1.0;

SELECT COUNT(*) FROM query_filter_float WHERE a <= 1.0;

SELECT COUNT(*) FROM query_filter_float WHERE a < 1.1;

JelteF · 2024-09-19T11:17:39Z

src/pgduckdb_filter.cpp

@@ -55,9 +55,9 @@ FilterOperationSwitch(Datum &value, duckdb::Value &constant, Oid type_oid) {
 	case INT8OID:
 		return TemplatedFilterOperation<int64_t, OP>(value, constant);
 	case FLOAT4OID:
-		return TemplatedFilterOperation<float, OP>(value, constant);
+		return TemplatedFilterOperation<float, OP>(DatumGetFloat4(value), constant);


It seems like we should probably do this explicit conversion for all types here, not just for floats.

JelteF

Two minor suggestions, other than that this is good IMO.

JelteF · 2024-09-20T09:14:58Z

src/pgduckdb_filter.cpp

 	case DATEOID: {
-		Datum date_datum = static_cast<int32_t>(value + pgduckdb::PGDUCKDB_DUCK_DATE_OFFSET);
-		return TemplatedFilterOperation<int32_t, OP>(date_datum, constant);
+		int32_t date = static_cast<int32_t>(value + pgduckdb::PGDUCKDB_DUCK_DATE_OFFSET);


Suggested change

int32_t date = static_cast<int32_t>(value + pgduckdb::PGDUCKDB_DUCK_DATE_OFFSET);

int32_t date = static_cast<int32_t>(DatumGetDate(value) + pgduckdb::PGDUCKDB_DUCK_DATE_OFFSET);

JelteF · 2024-09-20T09:15:32Z

src/pgduckdb_filter.cpp

 	}
 	case TIMESTAMPOID: {
-		Datum timestamp_datum = static_cast<int64_t>(value + pgduckdb::PGDUCKDB_DUCK_TIMESTAMP_OFFSET);
-		return TemplatedFilterOperation<int64_t, OP>(timestamp_datum, constant);
+		int64_t timestamp = static_cast<int64_t>(value + pgduckdb::PGDUCKDB_DUCK_TIMESTAMP_OFFSET);


Suggested change

int64_t timestamp = static_cast<int64_t>(value + pgduckdb::PGDUCKDB_DUCK_TIMESTAMP_OFFSET);

int64_t timestamp = static_cast<int64_t>(DatumGetTimestamp(value) + pgduckdb::PGDUCKDB_DUCK_TIMESTAMP_OFFSET);

JelteF · 2024-09-20T09:26:42Z

I updated the PR description with some additional info/explanation of the problem.

Datums can contain a pass-by-value or a pass-by-reference type. We were simply casting the Datum for a pass by value type, but that is not always correct. Specifically for floating point numbers this would not always result in the correct floating point number. And int64 values are pass-by-reference on 32bit systems, but pass-by-value on 64bit systems. Postgres provides the DatumGetXyz macros/functions to work around these problems. This starts using these functions in FilterOperationSwitch to avoid these casting problems there.

mkaruza · 2024-09-20T09:36:20Z

Update commit message.

JelteF · 2024-09-20T09:50:47Z

Update commit message.

Afaict the repo is configured to use the PR description as the commit message when merging by default, when doing a squash-merge. That's why I usually like to the PR description to explain a bit what the PR is doing before merging. (it's even better if the PR description is helpful when opening the PR ofcourse, to make it easy for reviewing, but in this case I was able to understand what the issue was from context I had).

mkaruza requested a review from JelteF September 19, 2024 09:15

mkaruza force-pushed the float-filter-op branch 2 times, most recently from 95fb4d9 to 89fc42d Compare September 19, 2024 09:23

mkaruza changed the title ~~Explicilty convert FLOAT4/FLOAT8 values before filter operation~~ Explicitly convert FLOAT4/FLOAT8 values before filter operation Sep 19, 2024

JelteF requested changes Sep 19, 2024

View reviewed changes

JelteF mentioned this pull request Sep 19, 2024

Use correct output column id list for query processing #195

Merged

mkaruza force-pushed the float-filter-op branch from 89fc42d to b1ebe29 Compare September 20, 2024 09:02

mkaruza changed the title ~~Explicitly convert FLOAT4/FLOAT8 values before filter operation~~ Explicitly convert datum values before filter operation Sep 20, 2024

mkaruza requested a review from JelteF September 20, 2024 09:04

JelteF approved these changes Sep 20, 2024

View reviewed changes

mkaruza force-pushed the float-filter-op branch from b1ebe29 to 49f9c50 Compare September 20, 2024 09:35

JelteF approved these changes Sep 20, 2024

View reviewed changes

mkaruza merged commit e4b914a into main Sep 20, 2024
3 checks passed

mkaruza deleted the float-filter-op branch September 20, 2024 14:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explicitly convert datum values before filter operation #194

Explicitly convert datum values before filter operation #194

mkaruza commented Sep 19, 2024 •

edited by JelteF

Loading

JelteF Sep 19, 2024

JelteF Sep 19, 2024 •

edited

Loading

JelteF left a comment •

edited

Loading

JelteF Sep 20, 2024

JelteF Sep 20, 2024

JelteF commented Sep 20, 2024

mkaruza commented Sep 20, 2024

JelteF commented Sep 20, 2024

	int32_t date = static_cast<int32_t>(value + pgduckdb::PGDUCKDB_DUCK_DATE_OFFSET);
	int32_t date = static_cast<int32_t>(DatumGetDate(value) + pgduckdb::PGDUCKDB_DUCK_DATE_OFFSET);

	int64_t timestamp = static_cast<int64_t>(value + pgduckdb::PGDUCKDB_DUCK_TIMESTAMP_OFFSET);
	int64_t timestamp = static_cast<int64_t>(DatumGetTimestamp(value) + pgduckdb::PGDUCKDB_DUCK_TIMESTAMP_OFFSET);

Explicitly convert datum values before filter operation #194

Explicitly convert datum values before filter operation #194

Conversation

mkaruza commented Sep 19, 2024 • edited by JelteF Loading

JelteF Sep 19, 2024

Choose a reason for hiding this comment

JelteF Sep 19, 2024 • edited Loading

Choose a reason for hiding this comment

JelteF left a comment • edited Loading

Choose a reason for hiding this comment

JelteF Sep 20, 2024

Choose a reason for hiding this comment

JelteF Sep 20, 2024

Choose a reason for hiding this comment

JelteF commented Sep 20, 2024

mkaruza commented Sep 20, 2024

JelteF commented Sep 20, 2024

mkaruza commented Sep 19, 2024 •

edited by JelteF

Loading

JelteF Sep 19, 2024 •

edited

Loading

JelteF left a comment •

edited

Loading