Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-49695][SQL] Postgres fix xor push-down #48144

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

andrej-db
Copy link
Contributor

What changes were proposed in this pull request?

This PR fixes the pushdown of ^ operator (XOR operator) for Postgres. Those two databases use this as exponent, rather then bitwise xor.

Fix is consisted of overriding the SQLExpressionBuilder to replace the '^' character with '#'.

Why are the changes needed?

Result is incorrect.

Does this PR introduce any user-facing change?

Yes. The user will now have a proper translation of the ^ operator.

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label Sep 18, 2024
@urosstan-db
Copy link
Contributor

Can we add some tests?

Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if you add an integration test, for instance here *PostgresExpressionPushdownSuite*

@andrej-db
Copy link
Contributor Author

I wanted to add tests, but didn't know where to put them... Don't know how oss tests postgre.

@urosstan-db
Copy link
Contributor

I wanted to add tests, but didn't know where to put them... Don't know how oss tests postgre.

V2JDBCTest is base class, and there is derived class for Postgres PostgresIntegrationSuite

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-49695] Postgres fix xor push-down [SPARK-49695][SQL] Postgres fix xor push-down Sep 18, 2024
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @andrej-db . +1 for adding a test case definitely.

cc @huaxingao , too.

@andrej-db
Copy link
Contributor Author

Added the test, let me know if this is in order.

@@ -986,4 +986,13 @@ private[v2] trait V2JDBCTest extends SharedSparkSession with DockerIntegrationFu
test("scan with filter push-down with date time functions") {
testDatetime(s"$catalogAndNamespace.${caseConvert("datetime")}")
}

test("xor operator push-down") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you can do explain formatted and check whether string contains "id" # 3, that will add a little bit of robustness.
Another thing we can do is to make unit test, and just invoke compilation of XOR expression and check whether col # constant is result of compilation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

neat, I like it

Copy link
Contributor

@urosstan-db urosstan-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving latest iteration (with tests)

PostgresIntegrationSuite: add test
Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, fix your test:

[info] - SPARK-49695: Postgres fix xor push-down *** FAILED *** (10 milliseconds)
[info]   org.apache.spark.sql.catalyst.ExtendedAnalysisException: [TABLE_OR_VIEW_NOT_FOUND] The table or view `bar` cannot be found. Verify the spelling and correctness of the schema and catalog.


override def compileExpression(expr: Expression): Option[String] = {
val builder = new PostgresSQLBuilder()
try {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps its better to only override visitBinaryArithmetics?

We have similar problem with some of the functions and we use dialectFunctionName to translate from Spark to local dialect

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 here, let's override last possible method in chain of execution

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

@milastdbx milastdbx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider refactoring this

Copy link
Contributor

@urosstan-db urosstan-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but please make test more assertive.

Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrej-db Could you fix builds and retrigger intergration tests:

[error] /home/runner/work/spark/spark/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/PostgresIntegrationSuite.scala:237:25: not found: type Filter
[error]       plan.isInstanceOf[Filter]
[error]                         ^
[error] one error found

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants