Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release 1.12.0 redpanda rebase #272

Conversation

pgellert
Copy link

@pgellert pgellert commented Aug 9, 2024

This rebases Redpanda's Avro fork on top of the release-1.12.0 tag in upstream Avro.

Process for rebasing

git log release-1.11.1-redpanda --not upstream/branch-1.11 --pretty=format:"| %h | %an | %s |" |grep -v "Merge pull request"
  • Cherry pick all commits, skipping the ones that are noops because they have been already included:
git cherry-pick -x 33cf3b2b0 6d429a880 88e6d1f1c 963dfeea4 ae72da2a0 5746dc6bb 53250636b d9c693135 266f040ed e275b5cef 10641347e c176f72b8 9f2d4b385 86aec3d73 836cb1004 d0967f565 98d18c398 f2110ac83 966d457e7 acc9fbf57 2fce099de bc2e637af 23ae16d7d e8c4ca83b b2b82a455 00cf62e56 2aef08312 409503deb
  • Minor fixups including: accounting for changes for using fmt instead of boost::format, fixing avro unused argument warnings, dropping reverted commits, add missing headers to fix the redpanda build

Rebased commits

The table below shows the mapping between the commits on release-1.11.1-redpanda and their cherry picked version on release-1.12.0-redpanda including dropped commits and newly added commits:

Original commit New rebased commit Author Title
33cf3b2 e4e9eb0 Ben Pope Prefer make_shared
6d429a8 edd19c8 Ben Pope NodeSymbolic: Resolve should follow _actualNode
88e6d1f c7a2745 BenPope NodeSymbolic: Resolve should follow _actualNode
963dfee 4571154 Ben Pope NodeEnum: Support default value
ae72da2 952bdbb Ben Pope NodeEnum: Support compiling default value
5746dc6 e8bac21 Ben Pope NodeEnum: Support printJson for default value
5325063 5970178 Ben Pope NodePrimitive: Resolve AVRO_STRING <-> AVRO_BYTES
d9c6931 Dropped (duplicates 0163238) Ben Pope avrogencpp: Properly exclude constinit
266f040 2da75f1 Ben Pope c++/Compiler: CustomAttributes should allow json
e275b5c 931dcd4 Ben Pope c++: NodePrimitive::resolve allow futherResolution for STRING and BYTES
1064134 014a915 Noah Watkins build: add barebones cmake build
c176f72 3129565 Noah Watkins build: move cmake module into separate directory
9f2d4b3 2a56dbb Noah Watkins build: normalize include paths
86aec3d 93fe7d9 Noah Watkins fix -Winconsistent-missing-override warning
836cb10 Dropped (duplicates 258571b) Noah Watkins fix -Wfinal-dtor-non-final-class warning
d0967f5 1c6902a Jim Cipar Modify CMakeLists to fix build errors.
98d18c3 230fd84 Jim Cipar Allow custom metadata in data files.
f2110ac 4326f23 Jim Cipar Add map logical type for arrays.
966d457 cb32bf8 Jim Cipar Add element-id support for Avro arrays.
acc9fbf Dropped Gerrit Birkeland AVRO-3990 [C++] Fix invalid code generation for union with reserved name (apache#2930)
2fce099 Dropped Gerrit Birkeland AVRO-3992 [C++] Fix compiler warnings in code generated by schema with empty record (apache#2927)
bc2e637 Dropped Andrew Wong redpanda_build: add avrogencpp to RP build
23ae16d Dropped Andrew Wong Revert "redpanda_build: add avrogencpp to RP build"
e8c4ca8 Dropped Thiruvalluvan M G Fix for wrong behavior of Json codec when record schema has no fields (apache#2833)
b2b82a4 fbc8385 Andrew Wong allow for iceberg element-ids take 2
00cf62e d95eee9 Andrew Wong redpanda_build: add avrogencpp to RP build take 2
2aef083 17a2373 Andrew Wong avrogencpp: expose ValidSchema in generated code
409503d 6bd10af Andrew Wong C++: add elementId to programmatic API
Not present 854847a Gellert Peresztegi-Nagy build: import fmt using find_package
Not present a0dafd5 Gellert Peresztegi-Nagy fix algorithm header include

Testing

The changes are being tested against all 3 redpanda builds (oss, vtools, bazel) here:

@pgellert pgellert self-assigned this Aug 9, 2024
@github-actions github-actions bot added the C++ label Aug 9, 2024
@pgellert pgellert force-pushed the release-1.12.0-redpanda-rebase branch 2 times, most recently from 86877f4 to a007403 Compare August 12, 2024 14:22
BenPope and others added 4 commits August 12, 2024 15:49
Signed-off-by: Ben Pope <[email protected]>
(cherry picked from commit 33cf3b2)
Node::getNode() is introduced to break dependencies.

Signed-off-by: Ben Pope <[email protected]>
(cherry picked from commit 6d429a8)
Node::getNode() is introduced to break dependencies.

Signed-off-by: Ben Pope <[email protected]>
(cherry picked from commit 88e6d1f)
Signed-off-by: Ben Pope <[email protected]>
(cherry picked from commit 963dfee)
BenPope and others added 19 commits August 12, 2024 17:34
Signed-off-by: Ben Pope <[email protected]>
(cherry picked from commit ae72da2)
Signed-off-by: Ben Pope <[email protected]>
(cherry picked from commit 5746dc6)
Signed-off-by: Ben Pope <[email protected]>
(cherry picked from commit 5325063)
Don't require that the attribute is a string type.

Regression introduced in 32aa94d

Signed-off-by: Ben Pope <[email protected]>
(cherry picked from commit 266f040)
The early exit for no match prevents a union containing a string from
resolving a string.

Signed-off-by: Ben Pope <[email protected]>
(cherry picked from commit e275b5c)
Signed-off-by: Noah Watkins <[email protected]>
(cherry picked from commit 1064134)
Signed-off-by: Noah Watkins <[email protected]>
(cherry picked from commit c176f72)
Signed-off-by: Noah Watkins <[email protected]>
(cherry picked from commit 9f2d4b3)
Signed-off-by: Noah Watkins <[email protected]>
(cherry picked from commit 86aec3d)
This change bumps our C++ version to 17 and disables specific warnings
that cause build errors.

I was hitting three build errors, possibly due to the newer compiler:
- lang/c++/impl/Compiler.cc:304:18: error: possibly dangling reference to a temporary [-Werror=dangling-reference]
- /usr/include/boost/math/tools/config.hpp:23:6: error: #warning "The minimum language standard to use Boost.Math will be C++14 starting in July 2023 (Boost 1.82 release)" [-Werror=cpp]
- avro/lang/c++/include/avro/Reader.hh:89:17: error: ‘v.avro::ReaderImpl<avro::NullValidator>::readValue(double&)::<unnamed union>::d’ may be used uninitialized [-Werror=maybe-uninitialized]

The last one only occurred during `./build.sh install`, not during `./build.sh test`

(cherry picked from commit d0967f5)
The Iceberg spec requires custom metadata fields for Avro files. It
expects information such as the table scheme, the partition information
and the manifest type to be in the Manifest File's metadata.

This change makes the `setMetadata` method of `DataFileWriterBase`
public instead of private, and adds a similar public method to
`DataFileWriter`.

(cherry picked from commit 98d18c3)
Maps can be represented by either the built-in map type, or an array of
key-value pairs with the logical type `map`. The Iceberg "column_sizes"
field is an example of the latter.

This change allows arrays to be annotated with the `map` logical type,
which was previously not supported in C++ Avro.

(cherry picked from commit f2110ac)
`element-id` is an optional field used by Iceberg arrays and required by
some Iceberg implmentations.

(cherry picked from commit 966d457)
Iceberg manifests need to be able to write their JSONified Avro schemas
in the Avro headers for certain Iceberg clients (e.g. PyIceberg) to be
able to read them.

Commit 966d457 was a stab at allowing
this by hacking this custom field into the AST, but it did so as a
string, instead of a long. This isn't exactly what other libraries
output, but is sufficient for PyIceberg. To avoid surprises down the
line, this quick hack switches us to serialize the long.

(cherry picked from commit b2b82a4)
Re-adds the build for avrogencpp to the Redpanda open source build.

Notable changes against the last attempt[1]:
- fixes paths of avrogencpp.cc to be relativized to redpanda_build
- adds the required avro include directory
- depends on the avro module
- sets the version (required by the avrogencpp tool) using a relativized
  version of the vanilla Avro version-parsing code
- adds Boost program_options and regex

[1] bc2e637

(cherry picked from commit 00cf62e)
The C++ avro library has readers and writers that depend on having an
in-memory ValidSchema. The avrogencpp tool can take a JSON schema and
generate code to interact with a given datum, but doesn't expose the
ValidSchema, despite using it during codegen.

This adds a valid_schema() method to the avrogencpp binary that
builds the schema and exposes it via generated code.

This is inspired largely by the kspp[1] library.

Note, the escape() method used is copied from [2].

[1] https://github.com/bitbouncer/kspp/blob/8539f359e32bd3dd1360ac4616eab88e79aab607/tools/kspp_avrogencpp/kspp_avrogencpp.cpp
[2] https://github.com/redpanda-data/avro/blob/1410e79f9df61669c2d52f6d0643e6c35156e615/lang/c%2B%2B/impl/NodeImpl.cc#L29-L69

(cherry picked from commit 2aef083)
A previous commit added a "element-id" attribute to arrays, which will
be used in Iceberg manifests. This was previously only available when
parsing a schema JSON file, not via the programmatic API.

The way that Redpanda will build Iceberg manifests is going to change to
use the programmatic API, in order to support some fields of the
manifest that have a dynamic schema. To that end, this exposes the
"element-id" via a new constructor for ArraySchemas.

(cherry picked from commit 409503d)
Since release 1.12 avro uses the fmt library instead of boost::format.
This was introduced in apache@19501fc.

However, we want to avoid pulling in fmt using fetchcontent like the
build currently does and instead we want to use find_package to reuse
the existing fmt dependency in our vtools-based build.

The version is fixed to the current fmt version used by the vtools
build which works with the avro build.

At the same time, we update the custom cmake build used by the redpanda
open source build to also import the fmt dependency.
The file is using `std::find_if` without specifying the `algorithm`
include. It has probably transitively included it previously.

Without it, the redpanda open source build fails with the following:
```
/var/lib/buildkite-agent/builds/buildkite-amd64-builders-i-02d9cb4eee45a1cf5-1/redpanda/redpanda/build/release-ci/_deps/avro-src/lang/c++/impl/Node.cc:112:73: error: no member named 'find_if' in namespace 'std'; did you mean 'std::ranges::find_if'?
  112 |     if (!ns_.empty() && (ns_[0] == '.' || ns_[ns_.size() - 1] == '.' || std::find_if(ns_.begin(), ns_.end(), invalidChar1) != ns_.end())) {
      |                                                                         ^~~~~~~~~~~~
      |                                                                         std::ranges::find_if
```
@pgellert pgellert force-pushed the release-1.12.0-redpanda-rebase branch from 415ce0a to a0dafd5 Compare August 12, 2024 17:27
@pgellert pgellert changed the title [WIP] Release 1.12.0 redpanda rebase Release 1.12.0 redpanda rebase Aug 12, 2024
@pgellert pgellert marked this pull request as ready for review August 12, 2024 19:48
Copy link
Member

@BenPope BenPope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Thanks for all the details!

@pgellert pgellert merged commit b3d7efe into redpanda-data:release-1.12.0-redpanda Aug 13, 2024
1 check passed
@andrwng
Copy link

andrwng commented Aug 22, 2024

This is awesome. Thanks for putting this together Gellert!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants