forked from apache/avro
-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Release 1.12.0 redpanda rebase #272
Merged
pgellert
merged 23 commits into
redpanda-data:release-1.12.0-redpanda
from
pgellert:release-1.12.0-redpanda-rebase
Aug 13, 2024
Merged
Release 1.12.0 redpanda rebase #272
pgellert
merged 23 commits into
redpanda-data:release-1.12.0-redpanda
from
pgellert:release-1.12.0-redpanda-rebase
Aug 13, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
pgellert
force-pushed
the
release-1.12.0-redpanda-rebase
branch
2 times, most recently
from
August 12, 2024 14:22
86877f4
to
a007403
Compare
Signed-off-by: Ben Pope <[email protected]> (cherry picked from commit 33cf3b2)
Node::getNode() is introduced to break dependencies. Signed-off-by: Ben Pope <[email protected]> (cherry picked from commit 6d429a8)
Node::getNode() is introduced to break dependencies. Signed-off-by: Ben Pope <[email protected]> (cherry picked from commit 88e6d1f)
Signed-off-by: Ben Pope <[email protected]> (cherry picked from commit 963dfee)
pgellert
force-pushed
the
release-1.12.0-redpanda-rebase
branch
from
August 12, 2024 14:54
a007403
to
415ce0a
Compare
7 tasks
Signed-off-by: Ben Pope <[email protected]> (cherry picked from commit ae72da2)
Signed-off-by: Ben Pope <[email protected]> (cherry picked from commit 5746dc6)
Signed-off-by: Ben Pope <[email protected]> (cherry picked from commit 5325063)
Don't require that the attribute is a string type. Regression introduced in 32aa94d Signed-off-by: Ben Pope <[email protected]> (cherry picked from commit 266f040)
The early exit for no match prevents a union containing a string from resolving a string. Signed-off-by: Ben Pope <[email protected]> (cherry picked from commit e275b5c)
Signed-off-by: Noah Watkins <[email protected]> (cherry picked from commit 1064134)
Signed-off-by: Noah Watkins <[email protected]> (cherry picked from commit c176f72)
Signed-off-by: Noah Watkins <[email protected]> (cherry picked from commit 9f2d4b3)
Signed-off-by: Noah Watkins <[email protected]> (cherry picked from commit 86aec3d)
This change bumps our C++ version to 17 and disables specific warnings that cause build errors. I was hitting three build errors, possibly due to the newer compiler: - lang/c++/impl/Compiler.cc:304:18: error: possibly dangling reference to a temporary [-Werror=dangling-reference] - /usr/include/boost/math/tools/config.hpp:23:6: error: #warning "The minimum language standard to use Boost.Math will be C++14 starting in July 2023 (Boost 1.82 release)" [-Werror=cpp] - avro/lang/c++/include/avro/Reader.hh:89:17: error: ‘v.avro::ReaderImpl<avro::NullValidator>::readValue(double&)::<unnamed union>::d’ may be used uninitialized [-Werror=maybe-uninitialized] The last one only occurred during `./build.sh install`, not during `./build.sh test` (cherry picked from commit d0967f5)
The Iceberg spec requires custom metadata fields for Avro files. It expects information such as the table scheme, the partition information and the manifest type to be in the Manifest File's metadata. This change makes the `setMetadata` method of `DataFileWriterBase` public instead of private, and adds a similar public method to `DataFileWriter`. (cherry picked from commit 98d18c3)
Maps can be represented by either the built-in map type, or an array of key-value pairs with the logical type `map`. The Iceberg "column_sizes" field is an example of the latter. This change allows arrays to be annotated with the `map` logical type, which was previously not supported in C++ Avro. (cherry picked from commit f2110ac)
`element-id` is an optional field used by Iceberg arrays and required by some Iceberg implmentations. (cherry picked from commit 966d457)
Iceberg manifests need to be able to write their JSONified Avro schemas in the Avro headers for certain Iceberg clients (e.g. PyIceberg) to be able to read them. Commit 966d457 was a stab at allowing this by hacking this custom field into the AST, but it did so as a string, instead of a long. This isn't exactly what other libraries output, but is sufficient for PyIceberg. To avoid surprises down the line, this quick hack switches us to serialize the long. (cherry picked from commit b2b82a4)
Re-adds the build for avrogencpp to the Redpanda open source build. Notable changes against the last attempt[1]: - fixes paths of avrogencpp.cc to be relativized to redpanda_build - adds the required avro include directory - depends on the avro module - sets the version (required by the avrogencpp tool) using a relativized version of the vanilla Avro version-parsing code - adds Boost program_options and regex [1] bc2e637 (cherry picked from commit 00cf62e)
The C++ avro library has readers and writers that depend on having an in-memory ValidSchema. The avrogencpp tool can take a JSON schema and generate code to interact with a given datum, but doesn't expose the ValidSchema, despite using it during codegen. This adds a valid_schema() method to the avrogencpp binary that builds the schema and exposes it via generated code. This is inspired largely by the kspp[1] library. Note, the escape() method used is copied from [2]. [1] https://github.com/bitbouncer/kspp/blob/8539f359e32bd3dd1360ac4616eab88e79aab607/tools/kspp_avrogencpp/kspp_avrogencpp.cpp [2] https://github.com/redpanda-data/avro/blob/1410e79f9df61669c2d52f6d0643e6c35156e615/lang/c%2B%2B/impl/NodeImpl.cc#L29-L69 (cherry picked from commit 2aef083)
A previous commit added a "element-id" attribute to arrays, which will be used in Iceberg manifests. This was previously only available when parsing a schema JSON file, not via the programmatic API. The way that Redpanda will build Iceberg manifests is going to change to use the programmatic API, in order to support some fields of the manifest that have a dynamic schema. To that end, this exposes the "element-id" via a new constructor for ArraySchemas. (cherry picked from commit 409503d)
Since release 1.12 avro uses the fmt library instead of boost::format. This was introduced in apache@19501fc. However, we want to avoid pulling in fmt using fetchcontent like the build currently does and instead we want to use find_package to reuse the existing fmt dependency in our vtools-based build. The version is fixed to the current fmt version used by the vtools build which works with the avro build. At the same time, we update the custom cmake build used by the redpanda open source build to also import the fmt dependency.
The file is using `std::find_if` without specifying the `algorithm` include. It has probably transitively included it previously. Without it, the redpanda open source build fails with the following: ``` /var/lib/buildkite-agent/builds/buildkite-amd64-builders-i-02d9cb4eee45a1cf5-1/redpanda/redpanda/build/release-ci/_deps/avro-src/lang/c++/impl/Node.cc:112:73: error: no member named 'find_if' in namespace 'std'; did you mean 'std::ranges::find_if'? 112 | if (!ns_.empty() && (ns_[0] == '.' || ns_[ns_.size() - 1] == '.' || std::find_if(ns_.begin(), ns_.end(), invalidChar1) != ns_.end())) { | ^~~~~~~~~~~~ | std::ranges::find_if ```
pgellert
force-pushed
the
release-1.12.0-redpanda-rebase
branch
from
August 12, 2024 17:27
415ce0a
to
a0dafd5
Compare
pgellert
changed the title
[WIP] Release 1.12.0 redpanda rebase
Release 1.12.0 redpanda rebase
Aug 12, 2024
BenPope
approved these changes
Aug 12, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Thanks for all the details!
pgellert
merged commit Aug 13, 2024
b3d7efe
into
redpanda-data:release-1.12.0-redpanda
1 check passed
This is awesome. Thanks for putting this together Gellert! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This rebases Redpanda's Avro fork on top of the release-1.12.0 tag in upstream Avro.
Process for rebasing
Rebased commits
The table below shows the mapping between the commits on release-1.11.1-redpanda and their cherry picked version on release-1.12.0-redpanda including dropped commits and newly added commits:
map
logical type for arrays.Testing
The changes are being tested against all 3 redpanda builds (oss, vtools, bazel) here: