Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for pandas 2.0 #28636

Merged
merged 5 commits into from
Jan 10, 2024
Merged

Add support for pandas 2.0 #28636

merged 5 commits into from
Jan 10, 2024

Conversation

caneff
Copy link
Contributor

@caneff caneff commented Sep 24, 2023

With all the tests now passing I added a tox config for pandas 2.0, and fixed setup.py and the gradle to support it. Explicitly not supporting 2.1 yet because of an issue I can't figure out (will insert issue number here when I make it).

Fixes #27221

@caneff
Copy link
Contributor Author

caneff commented Sep 24, 2023

R: @tvalentyn

@github-actions
Copy link
Contributor

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control

@codecov
Copy link

codecov bot commented Sep 24, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (7fabc12) 38.34% compared to head (66796bd) 38.28%.
Report is 13 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #28636      +/-   ##
==========================================
- Coverage   38.34%   38.28%   -0.07%     
==========================================
  Files         693      690       -3     
  Lines      102237   102029     -208     
==========================================
- Hits        39199    39058     -141     
+ Misses      61446    61391      -55     
+ Partials     1592     1580      -12     
Flag Coverage Δ
python 29.87% <ø> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@tvalentyn
Copy link
Contributor

Seeing an error in coverage suite:

E           ImportError: Unable to find a usable engine; tried using: 'pyarrow', 'fastparquet'.
E           A suitable version of pyarrow or fastparquet is required for parquet support.
E           Trying to import the above resulted in these errors:
E            - Pandas requires version '7.0.0' or newer of 'pyarrow' (version '4.0.1' currently installed).
E            - Missing optional dependency 'fastparquet'. fastparquet is required for parquet support. Use pip or conda to install fastparquet.

I suspect we need to configure some parquet io tests for pyarrow 4 to use only pandas 1.x; alternatively, we'd have to bump the lower bound for supported pyarrow versions.

@tvalentyn
Copy link
Contributor

Run Python 3.8 PostCommit

@tvalentyn
Copy link
Contributor

In postcommits, seeing one relevant error, rest seems flake, need to rerun.

�[31mFAILED�[0m apache_beam/examples/dataframe/flight_delays_it_test.py::�[1mFlightDelaysTest::test_flight_delays�[0m - TypeError: Cannot perform reduction 'mean' with string dtype

@tvalentyn
Copy link
Contributor

@caneff
Copy link
Contributor Author

caneff commented Oct 3, 2023

Run Python 3.8 PostCommit

1 similar comment
@caneff
Copy link
Contributor Author

caneff commented Oct 4, 2023

Run Python 3.8 PostCommit

@caneff
Copy link
Contributor Author

caneff commented Oct 4, 2023

Run Python_Coverage PreCommit

1 similar comment
@caneff
Copy link
Contributor Author

caneff commented Oct 4, 2023

Run Python_Coverage PreCommit

@svetakvsundhar
Copy link
Contributor

Run Python 3.8 PostCommit

@svetakvsundhar
Copy link
Contributor

(running again as the failure should be fixed in #28896)

@damccorm
Copy link
Contributor

@caneff @svetakvsundhar @tvalentyn what are next steps here?

@caneff
Copy link
Contributor Author

caneff commented Oct 30, 2023

Run Python_Coverage PreCommit

@tvalentyn
Copy link
Contributor

Run Python 3.8 PostCommit

@tvalentyn
Copy link
Contributor

Run Python_PVR_Flink PreCommit

@tvalentyn
Copy link
Contributor

Run Python_Coverage PreCommit

@tvalentyn
Copy link
Contributor

Run Python_Coverage Precommit

@tvalentyn
Copy link
Contributor

Run Python_Runners PreCommit

@tvalentyn
Copy link
Contributor

12:00:35 �[31mERROR: Cannot install pyarrow<4 and >=3 and pyarrow<5 and >=4 because these package versions have conflicting dependencies.
12:00:35   py38-pyarrow-3: FAIL code 1 (2.85 seconds)
12:00:35   evaluation failed :( (3.07 seconds)
12:00:35 ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts
12:00:35 �[39m
12:00:35 > Task :sdks:python:test-suites:tox:py38:testPy38pyarrow-3 FAILED

@tvalentyn
Copy link
Contributor

it seems like something is not right in definition of coverage test suites; I wonder if we don't follow tox.ini syntax correctly

@tvalentyn
Copy link
Contributor

somehow constraints for different pyarrow versions get conjoined. I wonder if listing pyarrow and pandas on the same line would help. see also: https://stackoverflow.com/questions/57024579/tox-ini-environment-with-multiple-dependencies

@caneff caneff force-pushed the tox_2_again branch 3 times, most recently from cc94b49 to 2869a6e Compare November 14, 2023 15:39
@caneff
Copy link
Contributor Author

caneff commented Nov 14, 2023

Run Python 3.8 PostCommit

@tvalentyn
Copy link
Contributor

looks like prior errors are resolved, checking whether remaining apache_beam.utils.multi_process_shared_test.MultiProcessSharedTest.test_connect error is a flake.

@tvalentyn
Copy link
Contributor

Run Python_Coverage PreCommit

@tvalentyn
Copy link
Contributor

tvalentyn commented Nov 15, 2023

Noting that Py3.8 have passed on https://ci-beam.apache.org/job/beam_PostCommit_Python38_PR/810/

@tvalentyn
Copy link
Contributor

Run Python 3.11 PostCommit

@AnandInguva
Copy link
Contributor

@caneff @damccorm merging this PR. Any objections?

@damccorm
Copy link
Contributor

SGTM

1 similar comment
@caneff
Copy link
Contributor Author

caneff commented Jan 10, 2024

SGTM

@AnandInguva AnandInguva merged commit 0a36805 into apache:master Jan 10, 2024
91 checks passed
JayajP pushed a commit to JayajP/beam that referenced this pull request Jan 22, 2024
* Add support for pandas 2.0

* Fix pyarrow tests for pandas 2 compat

* Fix tox.ini maybe
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support Pandas==2.x in Apache Beam.
5 participants