Support overrides to configure code analysis and generation to maximize Rust type reuse #327

brendanhay · 2025-04-06T07:49:44Z

Motivation

This is complimentary to #298 (not a replacement for), and based roughly on how code synthesis is overridden/configured for the Haskell AWS and Google SDKs. I'm throwing this out there more for people to take a look at, than any driving force requiring it to be merged. It's currently being used to solve an immediate need to refine certain 3rd party CRDs (like strimzi.io) which are borderline unusable currently, due to the lack of structural sharing.

Currently, kopium generated code (in say, the above linked kube-custom-resources-rs crate) is difficult to work with since even though the types in the CRD are straight-forwardly from Kubernetes (ie. k8s-openapi proper), kopium will always generate a new nominal type per property. This results in foreign type issues, endless amounts of conversion code to/from core Kubernetes types, and just generally makes it more preferrable to use serde_json::json!({ ... }) instead.

This PR provides a way to compose multiple configuration files (termed "overrides", here) authored in YAML, that define an arbitrary number of rules directing how types should be modified during kopium's CRD analysis and generation phases. Initially this only performs verbatim type replacement, but it could be extended to perform other modifications, if desired (I've since added matchSuccess: omit to allow removal of properties entirely.). The primary goal being to maximize the aforementioned structural sharing with existing types and prevent proliferation of pointless new nominal types, when possible.

Additionally, since you can specify multiple configuration files, layered sets of rules can be composed. For example, a general set of overrides for k8s_openapi::api::core::v1::* related types via --overrides overrides.core.yaml that can be applied consistently to every CRD, with additional CRD/group-specific overrides such as --overrides overrides.strimzi.yaml, if needed.

Implementation

The analysis behavior is extended as follows:

When descending into JSONSchemaProps::properties to determine if a Container should be extracted, first check if the property name under analysis has a configured override rule, either via an exact match (efficiently) or regular expression (much less efficiently). This is name-directed.
If an override rule is found, the rules within are matched against the property's schema to determine if the entire rule "matches". This is type-directed (or structural.)
If a match is successful, a replacement type is returned and no Container is created for the current property.
If no override was present in 1. or no match succeeds in 2. or 3. then the previous/existing behavior applies, and a new Container is created.

The configuration and rules format is documented with extensive examples in overrides.example.yaml.

The approach is a defunctionalized version of how conditions and object references are currently overridden, via user-authored extensible configuration:

fn is_object_ref(value: &JSONSchemaProps) -> bool {
    if let Some(p) = &value.properties {
        if p.len() != 7 {
            return false;
        }
        let api_version = p.get("apiVersion");
        let field_path = p.get("fieldPath");
        let kind = p.get("kind");
        let name = p.get("name");
        let ns = p.get("namespace");
        let rv = p.get("resourceVersion");
        let uid = p.get("uid");
        if [api_version, field_path, kind, name, ns, rv, uid]
            .iter()
            .all(|k| k.is_some())
        {
            return true;
        }
    }
    false
}

Which can be expressed in configuration as:

propertyRules:
  - matchSuccess:
      replace: k8s_openapi::api::core::v1::ObjectReference
    matchSchema:
      exhaustive:
        type: object
        properties:
          apiVersion:
            type: string
          fieldPath:
            type: string
          kind:
            type: string
          name:
            type: string
          namespace:
            type: string
          resourceVersion:
            type: string
          uid:
            type: string

Note: all Rust types in the examples are fully qualified for simplicity. Since it's just a replacement String, you could choose to integrate this with a custom prelude and --no-prelude, etc.

Performance

Every property is checked for the existence of applicable overrides, how "expensive" this is in practice depends on how the rules are written. If matchName.*.exact is used to scope a rule to an explicit property name, it's a straight-forward O(1) HashMap::get(). Otherwise, it becomes quite a bit more complicated as the regular expression based name-directed rules, or type-directed rules with no name-directed constraints, require a linear scan of all rules per property, assuming no rule matches. In the testing I've done against CRDs vendored in the kube-custom-resources-rs source repository, there has been inappreciable slowdown, but YMMV - a note/caveat about pathological cases would probably be wise.

Having no overrides incurs zero performance cost, having few overrides incurs some overhead beyond the status quo; but in reality you could construct extremely pathological match expressions or large rule sets. In the latter case, you'd probably be better served by modifying a CRD by hand, anyway.

Example

Running kopium (from main) on some of the Kafka-related CRDs taken from strimzi.io with no additional command-line arguments (ie. no comments are being generated):

curl -sL https://raw.githubusercontent.com/strimzi/strimzi-kafka-operator/refs/heads/main/install/cluster-operator/040-
Crd-kafka.yaml -o kafka.yaml

cargo run -- -f kafka.yaml > kafka.rs

ls -lh kafka.yaml kafka.rs
469K kafka.yaml
405K kafka.rs # The generated Rust code is nearly the same size as the CRD.

wc -l kafka.rs
7791

Which contains an estimated more than 60-70% of duplicated types, and "fun" 90's Java naming, such as the following (annotations elided):

pub struct KafkaCruiseControlTemplatePod {
    pub affinity: Option<KafkaCruiseControlTemplatePodAffinity>,
    pub enable_service_links: Option<bool>,
    pub host_aliases: Option<Vec<KafkaCruiseControlTemplatePodHostAliases>>,
    pub image_pull_secrets: Option<Vec<KafkaCruiseControlTemplatePodImagePullSecrets>>,
    pub metadata: Option<KafkaCruiseControlTemplatePodMetadata>,
    pub priority_class_name: Option<String>,
    pub scheduler_name: Option<String>,
    pub security_context: Option<KafkaCruiseControlTemplatePodSecurityContext>,
    pub termination_grace_period_seconds: Option<i64>,
    pub tmp_dir_size_limit: Option<String>,
    pub tolerations: Option<Vec<KafkaCruiseControlTemplatePodTolerations>>,
    pub topology_spread_constraints: Option<Vec<KafkaCruiseControlTemplatePodTopologySpreadConstraints>>,
    pub volumes: Option<Vec<KafkaCruiseControlTemplatePodVolumes>>,
}

#[derive(Serialize, Deserialize, Clone, Debug)]
pub struct KafkaCruiseControlTemplateCruiseControlContainer {
    pub env: Option<Vec<KafkaCruiseControlTemplateCruiseControlContainerEnv>>,
    pub security_context: Option<KafkaCruiseControlTemplateCruiseControlContainerSecurityContext>,
    pub volume_mounts: Option<Vec<KafkaCruiseControlTemplateCruiseControlContainerVolumeMounts>>,
}

// ...

pub struct KafkaCruiseControlTemplatePodAffinityPodAntiAffinityPreferredDuringSchedulingIgnoredDuringExecutionPodAffinityTerm {
    pub label_selector: Option<KafkaCruiseControlTemplatePodAffinityPodAntiAffinityPreferredDuringSchedulingIgnoredDuringExecutionPodAffinityTermLabelSelector>,
    pub match_label_keys: Option<Vec<String>>,
    pub mismatch_label_keys: Option<Vec<String>>,
    pub namespace_selector: Option<KafkaCruiseControlTemplatePodAffinityPodAntiAffinityPreferredDuringSchedulingIgnoredDuringExecutionPodAffinityTermNamespaceSelector>,
    pub namespaces: Option<Vec<String>>,
    pub topology_key: Option<String>,
}

With the changes in this branch/PR, using the somewhat naive overrides.example.yaml:

cargo run -- -f kafka.yaml --overrides overrides.example.yaml > kafka-with-overrides.rs

ls -lh kafka.yaml kafka.rs kafka-with-overrides.rs
469K kafka.yaml
405K kafka.rs
107K kafka-with-overrides.rs # 73% reduction in size.

wc -l kafka-with-overrides.rs
2298

Yields:

pub struct KafkaCruiseControlTemplatePod {
    pub affinity: Option<k8s_openapi::api::core::v1::Affinity>,
    pub enable_service_links: Option<bool>,
    pub host_aliases: Option<Vec<k8s_openapi::api::core::v1::HostAlias>>,
    pub image_pull_secrets: Option<Vec<k8s_openapi::api::core::v1::LocalObjectReference>>,
    pub metadata: Option<MyKafakMetadataTemplate>,
    pub priority_class_name: Option<String>,
    pub scheduler_name: Option<String>,
    pub security_context: Option<k8s_openapi::api::core::v1::PodSecurityContext>,
    pub termination_grace_period_seconds: Option<i64>,
    pub tmp_dir_size_limit: Option<String>,
    pub tolerations: Option<Vec<k8s_openapi::api::core::v1::Toleration>>,
    pub topology_spread_constraints: Option<k8s_openapi::api::core::v1::TopologySpreadConstraint>,
    pub volumes: Option<Vec<k8s_openapi::api::core::v1::Volume>>,
}

pub struct KafkaCruiseControlTemplateCruiseControlContainer {
    pub env: Option<Vec<k8s_openapi::api::core::v1::EnvVar>>,
    pub security_context: Option<k8s_openapi::api::core::v1::SecurityContext>,
    pub volume_mounts: Option<Vec<k8s_openapi::api::core::v1::VolumeMount>>,
}

Which re-uses many existing k8s-openapi types, but could be tailored further with some domain knowledge of the CRD you're overriding. For example passing an additional --overrides overrides.strimzi.yaml containing only:

propertyRules:
  - matchSuccess:
      replace: MetadataTemplate
    matchAnyName:
      - exact: metadata
    matchSchema:
      exhaustive:
        type: object
        properties:
          labels:
            additionalProperties:
              type: string
            type: object
          annotations:
            additionalProperties:
              type: string
            type: object

Yields:

ls -lh
83K kafka-with-overrides.rs

wc -l kafka-with-overrides.rs
1792

Demonstrating a single rule for a common type providing a further 22% reduction, which should further motivate #298.

Signed-off-by: Brendan Hay <[email protected]>

clux

Thanks a lot for this very detailed and rigorous extension. I'm very positive on this in general, because the motivating example is similar (while admittedly more java-extreme) to what I am facing in other crds, and this provides a clean method for codifying replacements that could be upstreamed to generator crates such as kube-custom-resources-rs or gateway-api-rs (who historically have had to do much worse things).

This is right up there around my threshold of how much complexity to accept in PRs, but you've managed to do it with minimal changes to the other files and do it in a basically orthogonal module which in my opinion makes it worth it.

I'm going to need a little time to digest the source changes in overrides.rs, so have only given some documentation notes and interface thoughts on the (very detailed 👍 ) example config so far. Particularly there are some naming and nesting issues that perhaps makes it a little harder to grasp. LMK what you think. 🙏

overrides.example.yaml

clux · 2025-04-08T15:14:09Z

overrides.example.yaml

+propertyRules:
+    # The action to perform if the type and name-directed matches below succeed.
+    # `replace` is the `String` to use as the property's Rust type. This will prevent a Rust
+    # `Container` being generated for the property, if matching succeeds.
+  - matchSuccess:
+      replace: ResourceRequirements


This propertyRules name feels a little unclear (and forces me to read the entire doc about what it does before I get it). maybe this should contain a more explicit verb about the action it is doing; e.g. it's mostly a vector of rules for replacing types. Maybe it should be called something like replaceRules? Not sure about this, because it can also omit.

The closest thing to this type of rule in the Kubernetes world that comes to mind is prometheus' relabelConfig which has some nice terminology that is kind of internalized for me. Maybe it's worth considering re-using parts of that, e.g. maybe action over matchSuccess and source over matchName, and target etc?

They're rules, that only apply to properties. 😃 There's potentially other AST rewrites that can be performed beyond properties, (ie. on Rust types, rather than CRD types) and I introduced the top-level propertyRules to try and leave room in the format for non-property related configuration. I'm unsure on alternative names, as you've noted replaceRules isn't precise since you can do more than just replace. Maybe rewriteRules to confuse ingress-nginx users?

I have no issue with renaming match* if you're settled on those you suggested, but I did find with terse names the rule structure became harder to parse when you have a large configuration with many CRD objects, as those tend to reuse names like action|source|target a lot.

Good argument against replaceRules. Agreed.

rewriteRules is better, but feels imprecise as well. Parts of me wants to do something like relabelRules except what we have is not rules that apply to "labels" but container names, and renameRules doesn't really convey what we are renaming.

Tbh, I hadn't actually considered that when you say the "rules apply to properties", you were referring to the literal properties in the schema, and not an abstract name for "stuff we are renaming". maybe propertyRules is fine. 😄

I have no issue with renaming match* if you're settled on those you suggested, but I did find with terse names the rule structure became harder to parse when you have a large configuration with many CRD objects, as those tend to reuse names like action|source|target a lot.

That feels more like a structuring + convention problem to me. If we always conventionally do "the schema to match against" at the bottom (possibly with wrapped comments), then I don't think we need to guard quite this hard against confusion of people "mixing up our schema with theirs".

I also think the explicit action separated from matchSuccess makes it clearer to see what the rule is intended to do, plus de-indents a bit.

propertyRules: # Common resource requirements - action: replace target: ResourceRequirements sourceName: - exact: .. sourceSchema: # Big blob. Maybe Documentation about why we subset the schema to this blob here. {} - action: ...

...although, thinking about it more. this does make it possible to do action: omit and still set a target (something the current enum prevents). maybe we could keep the enum and the only rename the matchSuccess to action instead;

propertyRules: # Common resource requirements - action: replace: ResourceRequirements matchName: - exact: .. matchSchema: # Big blob. Maybe Documentation about why we subset the schema to this blob here. {} - action: ...

but that gets rid of the helpful target..

Ah, actually, I think this can possibly be solved with deserializing PropertyAction enum from an adjacently tagged enum:

with:

/// The behavior of this rule if the type and name-directed match phases succeed. #[serde(tag = "action", content = "target")] pub action: PropertyAction,

to allow;

propertyRules: # Common resource requirements - action: replace target: ResourceRequirements sourceName: - exact: .. sourceSchema: # Big blob. Maybe Documentation about why we subset the schema to this blob here. {} - action: omit sourceName: {..}

Maybe a key question around doing this is what other actions can you possibly foresee?

I personally find these suggestions more confusing than the status quo; source is the CRD sub-property we're trying to match, target is the result of running the action. With the use of adjacently tagged enums (which I'm not a fan of - for evolutionary purposes) it implies target is what you're matching, so in your example somehow we're going to replace occurrences of ResourceRequirements. 🤷🏽

action: { replacement: ResourceRequirements } or a variation of seems less ambiguous here.

That is fair. I see your point and don't have strong feelings towards it. Happy to leave it as is.

clux · 2025-04-08T15:23:18Z

cc @shaneutt who might be working on #298 separately
cc @sebhoss who might receive PRs for this type of configuration in the future.

sebhoss · 2025-04-08T15:27:01Z

Yeah this is really cool & I'm happy to integrate this for as many CRDs as possible!

shaneutt · 2025-04-08T15:33:26Z

@brendanhay you said this compliments #298, but it looks like it could lead towards solving it. I had intended to start working on #298, but had not yet started. Are you interested in positioning this effort towards a resolution of #298? If so, I would be supportive.

brendanhay · 2025-04-08T20:00:02Z

@shaneutt This doesn’t introduce sharing of generated types (ie. those defined in a set of CRDs) - it introduces sharing with existing Rust types, by short-circuiting the generation of overriden types to reference existing types.

This means if you invoke kopium on 3x different CRDs, you’d need to write your proposed shared_types.rs by hand - as well as overrides.yaml rules to then rereference those 3x separate outputs to use the shared types.

There are two complimentary levels of structural (type) sharing in play:

Reuse of existing types defined outside the input CRD - overrides, much like the existing ObjectReference handling, address this.
Reuse of generated types defined within a set of input CRDs - which is what Add a brute force option to re-use identical structs #298 addresses.

Hopefully that explanation makes sense - if you’re still happy to continue working on #298, go for it. Alternatively, I can also look into it once this PR has been refined or sent out into the sea.

Signed-off-by: Brendan Hay <[email protected]>

shaneutt · 2025-04-09T12:59:16Z

@shaneutt This doesn’t introduce sharing of generated types (ie. those defined in a set of CRDs) - it introduces sharing with existing Rust types, by short-circuiting the generation of overriden types to reference existing types.

Yes, I understood that this doesn't actually solve #298 currently.

Hopefully that explanation makes sense - if you’re still happy to continue working on #298, go for it. Alternatively, I can also look into it once this PR has been refined or sent out into the sea.

If you want to take #298 I'm inclined to defer to you. If you don't actually want it, no sweat. Just seems like you're on a roll, so if you want/need it, take it.

clux

last go over. minor comment on testability

clux · 2025-04-09T14:41:30Z

src/overrides.rs

+    String,
+}
+
+#[cfg(test)]


It would be good to have some unit tests here that is able to actually perform replacements. of course we could integration test this, but i feel it might not be too bad if we did what the analyzer unit tests are doing on a subset schema, and passing in some test override struct. should be enough then to check that the structs were actually replaced.

clux · 2025-04-29T00:02:49Z

overrides.example.yaml

+propertyRules:
+    # The action to perform if the type and name-directed matches below succeed.
+    # `replace` is the `String` to use as the property's Rust type. This will prevent a Rust
+    # `Container` being generated for the property, if matching succeeds.
+  - matchSuccess:
+      replace: ResourceRequirements


That is fair. I see your point and don't have strong feelings towards it. Happy to leave it as is.

clux

Tests notwithstanding, I happy to not block this as it stands on its own. If you are able to get the DCO check satisfied I am happy to merge this as is and put some more unit tests as a follow-up 😄

brendanhay force-pushed the bh/override-configuration branch 7 times, most recently from 6f884e3 to 6e03c85 Compare April 6, 2025 23:02

brendanhay marked this pull request as draft April 7, 2025 01:31

brendanhay force-pushed the bh/override-configuration branch 4 times, most recently from ff98c47 to d17842f Compare April 7, 2025 04:02

Add support for overriding assigned/generated types

082c354

Signed-off-by: Brendan Hay <[email protected]>

brendanhay force-pushed the bh/override-configuration branch from d17842f to 082c354 Compare April 7, 2025 04:02

brendanhay marked this pull request as ready for review April 8, 2025 04:23

clux reviewed Apr 8, 2025

View reviewed changes

brendanhay force-pushed the bh/override-configuration branch from c795991 to 05e9b96 Compare April 8, 2025 20:45

Address some documentation errors

9ca39cf

Signed-off-by: Brendan Hay <[email protected]>

brendanhay force-pushed the bh/override-configuration branch from 05e9b96 to 9ca39cf Compare April 8, 2025 20:46

shaneutt mentioned this pull request Apr 19, 2025

improvement(dev-ex): compatible/identical types should be aliased and reused where possible kube-rs/gateway-api-rs#38

Open

brendanhay and others added 2 commits April 24, 2025 11:46

Updating overrides.example.yaml wording

093f38c

Merge branch 'main' into bh/override-configuration

e8f4847

clux reviewed Apr 29, 2025

View reviewed changes

clux approved these changes May 8, 2025

View reviewed changes

Uh oh!

Support overrides to configure code analysis and generation to maximize Rust type reuse #327

Are you sure you want to change the base?

Support overrides to configure code analysis and generation to maximize Rust type reuse #327

Uh oh!

Conversation

brendanhay commented Apr 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Implementation

Performance

Example

Uh oh!

clux left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

clux Apr 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

brendanhay Apr 8, 2025

Choose a reason for hiding this comment

Uh oh!

clux Apr 9, 2025

Choose a reason for hiding this comment

Uh oh!

clux Apr 9, 2025

Choose a reason for hiding this comment

Uh oh!

clux Apr 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

clux Apr 9, 2025

Choose a reason for hiding this comment

Uh oh!

brendanhay Apr 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

clux Apr 29, 2025

Choose a reason for hiding this comment

Uh oh!

clux commented Apr 8, 2025

Uh oh!

sebhoss commented Apr 8, 2025

Uh oh!

shaneutt commented Apr 8, 2025

Uh oh!

brendanhay commented Apr 8, 2025

Uh oh!

shaneutt commented Apr 9, 2025

Uh oh!

clux left a comment

Choose a reason for hiding this comment

Uh oh!

clux Apr 9, 2025

Choose a reason for hiding this comment

Uh oh!

clux Apr 29, 2025

Choose a reason for hiding this comment

Uh oh!

clux left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

brendanhay commented Apr 6, 2025 •

edited

Loading

clux Apr 8, 2025 •

edited

Loading

clux Apr 9, 2025 •

edited

Loading

brendanhay Apr 23, 2025 •

edited

Loading