Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH (string dtype): convert string_view columns to future string dtype instead of object dtype in Parquet/Feather IO #60235

Merged

Conversation

jorisvandenbossche
Copy link
Member

@jorisvandenbossche jorisvandenbossche commented Nov 7, 2024

This is a follow-up on #60222, which allows passing string_view data to the string dtype constructor, but in this PR ensuring we also use this capability when reading Parquet (or Feather, ORC) files that might use that type.

PyArrow does not yet support writing string_view to Parquet, so we can't test it yet with Parquet, only with Feather.

@jorisvandenbossche jorisvandenbossche added Strings String extension data type and string data IO Parquet parquet, feather Arrow pyarrow functionality labels Nov 7, 2024
@jorisvandenbossche jorisvandenbossche added this to the 2.3 milestone Nov 7, 2024
@jorisvandenbossche jorisvandenbossche changed the title ENH (string dtype): convert string_view columns to future string dtype instead of object dtype in Parquet IO ENH (string dtype): convert string_view columns to future string dtype instead of object dtype in Parquet/Feather IO Nov 8, 2024
@mroeschke mroeschke merged commit f307a0a into pandas-dev:main Nov 11, 2024
51 checks passed
@mroeschke
Copy link
Member

Thanks @jorisvandenbossche

Copy link

lumberbot-app bot commented Nov 11, 2024

Owee, I'm MrMeeseeks, Look at me.

There seem to be a conflict, please backport manually. Here are approximate instructions:

  1. Checkout backport branch and update it.
git checkout 2.3.x
git pull
  1. Cherry pick the first parent branch of the this PR on top of the older branch:
git cherry-pick -x -m1 f307a0a3615d93c2177f6581133bdb541e12a93c
  1. You will likely have some merge/cherry-pick conflict here, fix them and commit:
git commit -am 'Backport PR #60235: ENH (string dtype): convert string_view columns to future string dtype instead of object dtype in Parquet/Feather IO'
  1. Push to a named branch:
git push YOURFORK 2.3.x:auto-backport-of-pr-60235-on-2.3.x
  1. Create a PR against branch 2.3.x, I would have named this PR:

"Backport PR #60235 on branch 2.3.x (ENH (string dtype): convert string_view columns to future string dtype instead of object dtype in Parquet/Feather IO)"

And apply the correct labels and milestones.

Congratulations — you did some good work! Hopefully your backport PR will be tested by the continuous integration and merged soon!

Remember to remove the Still Needs Manual Backport label once the PR gets merged.

If these instructions are inaccurate, feel free to suggest an improvement.

@jorisvandenbossche jorisvandenbossche deleted the string-dtype-parquet-string-view branch November 12, 2024 20:53
jorisvandenbossche added a commit to jorisvandenbossche/pandas that referenced this pull request Nov 12, 2024
…e instead of object dtype in Parquet/Feather IO (pandas-dev#60235)

(cherry picked from commit f307a0a)
@jorisvandenbossche
Copy link
Member Author

Manual backport -> #60291

jorisvandenbossche added a commit that referenced this pull request Nov 13, 2024
…uture string dtype instead of object dtype in Parquet/Feather IO (#60235) (#60291)

(cherry picked from commit f307a0a)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arrow pyarrow functionality IO Parquet parquet, feather Strings String extension data type and string data
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants