Skip to content

Proposal: parquet 53.0.0 feature branch #63

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
alamb opened this issue Jul 13, 2024 · 14 comments
Closed

Proposal: parquet 53.0.0 feature branch #63

alamb opened this issue Jul 13, 2024 · 14 comments
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Jul 13, 2024

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

We are now being careful about breaking changes (see https://github.com/apache/arrow-rs/blob/master/CONTRIBUTING.md#breaking-changes)

This means we can't merger PRs with breaking API changes to main until early August

However there are now three potentially large parquet changes that could conflict with each other and have API changes:

Describe the solution you'd like
Some way to avoid a massive set of merge conflicts when we start merging changes to master for parquet 53

I would also love to be able to review and merge smaller PRs rather than keep several large ones outstanding

Describe alternatives you've considered

I would like to propose we create a feature branch (e.g. parquet-53.0.0) in the arrow-rs repo that we can merge parquet API changes to and develop new features

Once main opens for 53 (in early August) we can merge the branch to main

This approach does require maintenance of the parquet 53 branch and runs the risk of accumulating merge conflicts as it diverges from master. I am willing to help do the proces

Additional context

@adriangb
Copy link
Contributor

If we can't merge breaking changes to main I agree this is the path forward. It's very generous of you to take up that burden.

For what it's worth I don't think apache/arrow-rs#6000 should include breaking changes, if there are any I think they're accidental. I'll double check. I do agree it might cause merge conflicts.

@etseidl
Copy link

etseidl commented Jul 13, 2024

FWIW I just tried a merge of apache/arrow-rs#5486 and apache/arrow-rs#6000. The only significant conflict was missing (new) arguments for the thrift OffsetIndex and ColumnIndex structs. Shouldn't be hard to integrate those two, but it's probably better for the metadata encoder to merge first (especially since apache/arrow-rs#5486 will now be broken up into smaller chunks). apache/arrow-rs#6045 and apache/arrow-rs#6000 should be easier to reconcile and can probably be merged in any order.

The bloom filter changes seem to be orthogonal to the other two, so I don't anticipate any issues there.

@alamb
Copy link
Contributor Author

alamb commented Jul 14, 2024

For what it's worth I don't think apache/arrow-rs#6000 should include breaking changes, if there are any I think they're accidental. I'll double check. I do agree it might cause merge conflicts.

Thanks @adriangb -- I agree the first PR won't have API conflicts. However, I expect some iteration on the APIs, so once we have merged / released new API (Stats builder) then any changes we make to the new API will require "breaking changes" so we would have to remember what APIs we have released / haven't released.

Maybe we could put it behind a feature flag or something that made it clear the API would change 🤔

@alamb
Copy link
Contributor Author

alamb commented Jul 14, 2024

The bloom filter changes seem to be orthogonal to the other two, so I don't anticipate any issues there.

I guess I was imagining the parquet metadata encoder would also potentially have the ability to write bloom filters and thus may be affected by apache/arrow-rs#6000

@alamb
Copy link
Contributor Author

alamb commented Jul 15, 2024

🤔 maybe I can simply make a 53.0.0-dev branch and we can merge the 53 features there 🤔

@alamb
Copy link
Contributor Author

alamb commented Jul 15, 2024

The more I think about this the more sense I think it makes to have a 53 feature branch for the next few weeks so we can not build up a massive set of conclits. I plan to create one tomorrow unless there are objections and start merging stuff there to clear the review queue

@etseidl
Copy link

etseidl commented Jul 15, 2024

The more I think about this the more sense I think it makes to have a 53 feature branch for the next few weeks so we can not build up a massive set of conclits. I plan to create one tomorrow unless there are objections and start merging stuff there to clear the review queue

Sounds good to me. In addition to apache/arrow-rs#6045, I have 3 stacked PRs ready to go, and I also have a plan for integrating with the changes from apache/arrow-rs#6000.

@alamb
Copy link
Contributor Author

alamb commented Jul 16, 2024

I have created https://github.com/apache/arrow-rs/tree/53.0.0-dev as the branch and will now begin retargeting and merging PRs.

This will likely require me to do some release note finagling but we'll handle that when we get there

@alamb
Copy link
Contributor Author

alamb commented Jul 24, 2024

label_issue.py automatically added labels {'next-major-release'} from apache/arrow-rs#5933

@alamb
Copy link
Contributor Author

alamb commented Jul 24, 2024

label_issue.py automatically added labels {'arrow'} from apache/arrow-rs#6018

@alamb
Copy link
Contributor Author

alamb commented Jul 24, 2024

label_issue.py automatically added labels {'auto-dependencies'} from apache/arrow-rs#6018

@alamb
Copy link
Contributor Author

alamb commented Jul 24, 2024

label_issue.py automatically added labels {'arrow-flight'} from apache/arrow-rs#6041

@alamb
Copy link
Contributor Author

alamb commented Jul 24, 2024

label_issue.py automatically added labels {'object-store'} from apache/arrow-rs#5930

@alamb
Copy link
Contributor Author

alamb commented Mar 20, 2025

Migrating from arrow-rs issue #6050

@alamb alamb transferred this issue from apache/arrow-rs Mar 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants