Skip to content

[SPARK-51983][PS] Prepare the test environment for pandas API on Spark with ANSI mode enabled #50779

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

ueshin
Copy link
Member

@ueshin ueshin commented May 1, 2025

What changes were proposed in this pull request?

Prepares the test environment for pandas API on Spark with ANSI mode enabled.

  • Remove forcibly disabling ANSI mode in tests
  • Add a new option compute.ansi_mode_support to keep the current behavior (default False)
    • eventually it should be True by default
  • Skip the failed tests affected by ANSI mode
  • Make pyspark-pandas tests run in the nightly Non-ANSI test to also run skipped tests with Non-ANSI mode

Why are the changes needed?

Currently pandas API on Spark doesn't support ANSI mode and show warnings if it's enabled.

>>> import pyspark.pandas as ps
>>> ps.range(10)
...: PandasAPIOnSparkAdviceWarning: The config 'spark.sql.ansi.enabled' is set to True. This can cause unexpected behavior from pandas API on Spark since pandas API on Spark follows the behavior of pandas, not SQL.
  warnings.warn(message, PandasAPIOnSparkAdviceWarning)

...

Now ANSI mode is enabled by default, pandas API on Spark should also support it.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

The existing tests should pass.

Was this patch authored or co-authored using generative AI tooling?

@HyukjinKwon
Copy link
Member

Merged to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants