-
-
Notifications
You must be signed in to change notification settings - Fork 26
Support overrides to configure code analysis and generation to maximize Rust type reuse #327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
6f884e3
to
6e03c85
Compare
ff98c47
to
d17842f
Compare
Signed-off-by: Brendan Hay <[email protected]>
d17842f
to
082c354
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for this very detailed and rigorous extension. I'm very positive on this in general, because the motivating example is similar (while admittedly more java-extreme) to what I am facing in other crds, and this provides a clean method for codifying replacements that could be upstreamed to generator crates such as kube-custom-resources-rs
or gateway-api-rs
(who historically have had to do much worse things).
This is right up there around my threshold of how much complexity to accept in PRs, but you've managed to do it with minimal changes to the other files and do it in a basically orthogonal module which in my opinion makes it worth it.
I'm going to need a little time to digest the source changes in overrides.rs
, so have only given some documentation notes and interface thoughts on the (very detailed 👍 ) example config so far. Particularly there are some naming and nesting issues that perhaps makes it a little harder to grasp. LMK what you think. 🙏
propertyRules: | ||
# The action to perform if the type and name-directed matches below succeed. | ||
# `replace` is the `String` to use as the property's Rust type. This will prevent a Rust | ||
# `Container` being generated for the property, if matching succeeds. | ||
- matchSuccess: | ||
replace: ResourceRequirements |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This propertyRules
name feels a little unclear (and forces me to read the entire doc about what it does before I get it). maybe this should contain a more explicit verb about the action it is doing; e.g. it's mostly a vector of rules for replacing types. Maybe it should be called something like replaceRules
? Not sure about this, because it can also omit.
The closest thing to this type of rule in the Kubernetes world that comes to mind is prometheus' relabelConfig which has some nice terminology that is kind of internalized for me. Maybe it's worth considering re-using parts of that, e.g. maybe action
over matchSuccess
and source
over matchName
, and target
etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They're rules, that only apply to properties. 😃 There's potentially other AST rewrites that can be performed beyond properties, (ie. on Rust types, rather than CRD types) and I introduced the top-level propertyRules
to try and leave room in the format for non-property related configuration. I'm unsure on alternative names, as you've noted replaceRules
isn't precise since you can do more than just replace. Maybe rewriteRules
to confuse ingress-nginx
users?
I have no issue with renaming match*
if you're settled on those you suggested, but I did find with terse names the rule structure became harder to parse when you have a large configuration with many CRD objects, as those tend to reuse names like action|source|target
a lot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good argument against replaceRules
. Agreed.
rewriteRules
is better, but feels imprecise as well. Parts of me wants to do something like relabelRules
except what we have is not rules that apply to "labels" but container names, and renameRules
doesn't really convey what we are renaming.
Tbh, I hadn't actually considered that when you say the "rules apply to properties", you were referring to the literal properties
in the schema, and not an abstract name for "stuff we are renaming". maybe propertyRules
is fine. 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no issue with renaming match* if you're settled on those you suggested, but I did find with terse names the rule structure became harder to parse when you have a large configuration with many CRD objects, as those tend to reuse names like action|source|target a lot.
That feels more like a structuring + convention problem to me. If we always conventionally do "the schema to match against" at the bottom (possibly with wrapped comments), then I don't think we need to guard quite this hard against confusion of people "mixing up our schema with theirs".
I also think the explicit action separated from matchSuccess
makes it clearer to see what the rule is intended to do, plus de-indents a bit.
propertyRules:
# Common resource requirements
- action: replace
target: ResourceRequirements
sourceName:
- exact: ..
sourceSchema:
# Big blob. Maybe Documentation about why we subset the schema to this blob here.
{}
- action: ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...although, thinking about it more. this does make it possible to do action: omit
and still set a target
(something the current enum prevents). maybe we could keep the enum and the only rename the matchSuccess to action
instead;
propertyRules:
# Common resource requirements
- action:
replace: ResourceRequirements
matchName:
- exact: ..
matchSchema:
# Big blob. Maybe Documentation about why we subset the schema to this blob here.
{}
- action: ...
but that gets rid of the helpful target
..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, actually, I think this can possibly be solved with deserializing PropertyAction
enum from an adjacently tagged enum:
with:
/// The behavior of this rule if the type and name-directed match phases succeed.
#[serde(tag = "action", content = "target")]
pub action: PropertyAction,
to allow;
propertyRules:
# Common resource requirements
- action: replace
target: ResourceRequirements
sourceName:
- exact: ..
sourceSchema:
# Big blob. Maybe Documentation about why we subset the schema to this blob here.
{}
- action: omit
sourceName: {..}
Maybe a key question around doing this is what other actions can you possibly foresee?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally find these suggestions more confusing than the status quo; source
is the CRD sub-property we're trying to match, target
is the result of running the action
. With the use of adjacently tagged enums (which I'm not a fan of - for evolutionary purposes) it implies target
is what you're matching, so in your example somehow we're going to replace occurrences of ResourceRequirements
. 🤷🏽
action: { replacement: ResourceRequirements }
or a variation of seems less ambiguous here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is fair. I see your point and don't have strong feelings towards it. Happy to leave it as is.
Yeah this is really cool & I'm happy to integrate this for as many CRDs as possible! |
@brendanhay you said this compliments #298, but it looks like it could lead towards solving it. I had intended to start working on #298, but had not yet started. Are you interested in positioning this effort towards a resolution of #298? If so, I would be supportive. |
@shaneutt This doesn’t introduce sharing of generated types (ie. those defined in a set of CRDs) - it introduces sharing with existing Rust types, by short-circuiting the generation of overriden types to reference existing types. This means if you invoke There are two complimentary levels of structural (type) sharing in play:
Hopefully that explanation makes sense - if you’re still happy to continue working on #298, go for it. Alternatively, I can also look into it once this PR has been refined or sent out into the sea. |
c795991
to
05e9b96
Compare
Signed-off-by: Brendan Hay <[email protected]>
05e9b96
to
9ca39cf
Compare
Yes, I understood that this doesn't actually solve #298 currently.
If you want to take #298 I'm inclined to defer to you. If you don't actually want it, no sweat. Just seems like you're on a roll, so if you want/need it, take it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
last go over. minor comment on testability
String, | ||
} | ||
|
||
#[cfg(test)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to have some unit tests here that is able to actually perform replacements. of course we could integration test this, but i feel it might not be too bad if we did what the analyzer unit tests are doing on a subset schema, and passing in some test override struct. should be enough then to check that the structs were actually replaced.
propertyRules: | ||
# The action to perform if the type and name-directed matches below succeed. | ||
# `replace` is the `String` to use as the property's Rust type. This will prevent a Rust | ||
# `Container` being generated for the property, if matching succeeds. | ||
- matchSuccess: | ||
replace: ResourceRequirements |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is fair. I see your point and don't have strong feelings towards it. Happy to leave it as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests notwithstanding, I happy to not block this as it stands on its own. If you are able to get the DCO check satisfied I am happy to merge this as is and put some more unit tests as a follow-up 😄
Motivation
This is complimentary to #298 (not a replacement for), and based roughly on how code synthesis is overridden/configured for the Haskell AWS and Google SDKs. I'm throwing this out there more for people to take a look at, than any driving force requiring it to be merged. It's currently being used to solve an immediate need to refine certain 3rd party CRDs (like
strimzi.io
) which are borderline unusable currently, due to the lack of structural sharing.Currently,
kopium
generated code (in say, the above linkedkube-custom-resources-rs
crate) is difficult to work with since even though the types in the CRD are straight-forwardly from Kubernetes (ie.k8s-openapi
proper),kopium
will always generate a new nominal type per property. This results in foreign type issues, endless amounts of conversion code to/from core Kubernetes types, and just generally makes it more preferrable to useserde_json::json!({ ... })
instead.This PR provides a way to compose multiple configuration files (termed "overrides", here) authored in YAML, that define an arbitrary number of rules directing how types should be modified during
kopium
's CRD analysis and generation phases. Initially this only performs verbatim type replacement, but it could be extended to perform other modifications, if desired (I've since addedmatchSuccess: omit
to allow removal of properties entirely.). The primary goal being to maximize the aforementioned structural sharing with existing types and prevent proliferation of pointless new nominal types, when possible.Additionally, since you can specify multiple configuration files, layered sets of rules can be composed. For example, a general set of overrides for
k8s_openapi::api::core::v1::*
related types via--overrides overrides.core.yaml
that can be applied consistently to every CRD, with additional CRD/group-specific overrides such as--overrides overrides.strimzi.yaml
, if needed.Implementation
The analysis behavior is extended as follows:
JSONSchemaProps::properties
to determine if aContainer
should be extracted, first check if the propertyname
under analysis has a configured override rule, either via an exact match (efficiently) or regular expression (much less efficiently). This is name-directed.schema
to determine if the entire rule "matches". This is type-directed (or structural.)Container
is created for the current property.Container
is created.The configuration and rules format is documented with extensive examples in overrides.example.yaml.
The approach is a defunctionalized version of how conditions and object references are currently overridden, via user-authored extensible configuration:
Which can be expressed in configuration as:
Note: all Rust types in the examples are fully qualified for simplicity. Since it's just a replacement
String
, you could choose to integrate this with a custom prelude and--no-prelude
, etc.Performance
Every property is checked for the existence of applicable overrides, how "expensive" this is in practice depends on how the rules are written. If
matchName.*.exact
is used to scope a rule to an explicit property name, it's a straight-forwardO(1)
HashMap::get()
. Otherwise, it becomes quite a bit more complicated as the regular expression based name-directed rules, or type-directed rules with no name-directed constraints, require a linear scan of all rules per property, assuming no rule matches. In the testing I've done against CRDs vendored in thekube-custom-resources-rs
source repository, there has been inappreciable slowdown, but YMMV - a note/caveat about pathological cases would probably be wise.Having no overrides incurs zero performance cost, having few overrides incurs some overhead beyond the status quo; but in reality you could construct extremely pathological match expressions or large rule sets. In the latter case, you'd probably be better served by modifying a CRD by hand, anyway.
Example
Running
kopium
(frommain
) on some of the Kafka-related CRDs taken fromstrimzi.io
with no additional command-line arguments (ie. no comments are being generated):cargo run -- -f kafka.yaml > kafka.rs
ls -lh kafka.yaml kafka.rs 469K kafka.yaml 405K kafka.rs # The generated Rust code is nearly the same size as the CRD.
Which contains an estimated more than 60-70% of duplicated types, and "fun" 90's Java naming, such as the following (annotations elided):
With the changes in this branch/PR, using the somewhat naive
overrides.example.yaml
:ls -lh kafka.yaml kafka.rs kafka-with-overrides.rs 469K kafka.yaml 405K kafka.rs 107K kafka-with-overrides.rs # 73% reduction in size.
Yields:
Which re-uses many existing
k8s-openapi
types, but could be tailored further with some domain knowledge of the CRD you're overriding. For example passing an additional--overrides overrides.strimzi.yaml
containing only:Yields:
Demonstrating a single rule for a common type providing a further 22% reduction, which should further motivate #298.