Skip to content

Support overrides to configure code analysis and generation to maximize Rust type reuse #327

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

brendanhay
Copy link

@brendanhay brendanhay commented Apr 6, 2025

Motivation

This is complimentary to #298 (not a replacement for), and based roughly on how code synthesis is overridden/configured for the Haskell AWS and Google SDKs. I'm throwing this out there more for people to take a look at, than any driving force requiring it to be merged. It's currently being used to solve an immediate need to refine certain 3rd party CRDs (like strimzi.io) which are borderline unusable currently, due to the lack of structural sharing.

Currently, kopium generated code (in say, the above linked kube-custom-resources-rs crate) is difficult to work with since even though the types in the CRD are straight-forwardly from Kubernetes (ie. k8s-openapi proper), kopium will always generate a new nominal type per property. This results in foreign type issues, endless amounts of conversion code to/from core Kubernetes types, and just generally makes it more preferrable to use serde_json::json!({ ... }) instead.

This PR provides a way to compose multiple configuration files (termed "overrides", here) authored in YAML, that define an arbitrary number of rules directing how types should be modified during kopium's CRD analysis and generation phases. Initially this only performs verbatim type replacement, but it could be extended to perform other modifications, if desired (I've since added matchSuccess: omit to allow removal of properties entirely.). The primary goal being to maximize the aforementioned structural sharing with existing types and prevent proliferation of pointless new nominal types, when possible.

Additionally, since you can specify multiple configuration files, layered sets of rules can be composed. For example, a general set of overrides for k8s_openapi::api::core::v1::* related types via --overrides overrides.core.yaml that can be applied consistently to every CRD, with additional CRD/group-specific overrides such as --overrides overrides.strimzi.yaml, if needed.

Implementation

The analysis behavior is extended as follows:

  1. When descending into JSONSchemaProps::properties to determine if a Container should be extracted, first check if the property name under analysis has a configured override rule, either via an exact match (efficiently) or regular expression (much less efficiently). This is name-directed.
  2. If an override rule is found, the rules within are matched against the property's schema to determine if the entire rule "matches". This is type-directed (or structural.)
  3. If a match is successful, a replacement type is returned and no Container is created for the current property.
  4. If no override was present in 1. or no match succeeds in 2. or 3. then the previous/existing behavior applies, and a new Container is created.

The configuration and rules format is documented with extensive examples in overrides.example.yaml.

The approach is a defunctionalized version of how conditions and object references are currently overridden, via user-authored extensible configuration:

fn is_object_ref(value: &JSONSchemaProps) -> bool {
    if let Some(p) = &value.properties {
        if p.len() != 7 {
            return false;
        }
        let api_version = p.get("apiVersion");
        let field_path = p.get("fieldPath");
        let kind = p.get("kind");
        let name = p.get("name");
        let ns = p.get("namespace");
        let rv = p.get("resourceVersion");
        let uid = p.get("uid");
        if [api_version, field_path, kind, name, ns, rv, uid]
            .iter()
            .all(|k| k.is_some())
        {
            return true;
        }
    }
    false
}

Which can be expressed in configuration as:

propertyRules:
  - matchSuccess:
      replace: k8s_openapi::api::core::v1::ObjectReference
    matchSchema:
      exhaustive:
        type: object
        properties:
          apiVersion:
            type: string
          fieldPath:
            type: string
          kind:
            type: string
          name:
            type: string
          namespace:
            type: string
          resourceVersion:
            type: string
          uid:
            type: string

Note: all Rust types in the examples are fully qualified for simplicity. Since it's just a replacement String, you could choose to integrate this with a custom prelude and --no-prelude, etc.

Performance

Every property is checked for the existence of applicable overrides, how "expensive" this is in practice depends on how the rules are written. If matchName.*.exact is used to scope a rule to an explicit property name, it's a straight-forward O(1) HashMap::get(). Otherwise, it becomes quite a bit more complicated as the regular expression based name-directed rules, or type-directed rules with no name-directed constraints, require a linear scan of all rules per property, assuming no rule matches. In the testing I've done against CRDs vendored in the kube-custom-resources-rs source repository, there has been inappreciable slowdown, but YMMV - a note/caveat about pathological cases would probably be wise.

Having no overrides incurs zero performance cost, having few overrides incurs some overhead beyond the status quo; but in reality you could construct extremely pathological match expressions or large rule sets. In the latter case, you'd probably be better served by modifying a CRD by hand, anyway.

Example

Running kopium (from main) on some of the Kafka-related CRDs taken from strimzi.io with no additional command-line arguments (ie. no comments are being generated):

curl -sL https://raw.githubusercontent.com/strimzi/strimzi-kafka-operator/refs/heads/main/install/cluster-operator/040-
Crd-kafka.yaml -o kafka.yaml
cargo run -- -f kafka.yaml > kafka.rs
ls -lh kafka.yaml kafka.rs
469K kafka.yaml
405K kafka.rs # The generated Rust code is nearly the same size as the CRD.
wc -l kafka.rs
7791

Which contains an estimated more than 60-70% of duplicated types, and "fun" 90's Java naming, such as the following (annotations elided):

pub struct KafkaCruiseControlTemplatePod {
    pub affinity: Option<KafkaCruiseControlTemplatePodAffinity>,
    pub enable_service_links: Option<bool>,
    pub host_aliases: Option<Vec<KafkaCruiseControlTemplatePodHostAliases>>,
    pub image_pull_secrets: Option<Vec<KafkaCruiseControlTemplatePodImagePullSecrets>>,
    pub metadata: Option<KafkaCruiseControlTemplatePodMetadata>,
    pub priority_class_name: Option<String>,
    pub scheduler_name: Option<String>,
    pub security_context: Option<KafkaCruiseControlTemplatePodSecurityContext>,
    pub termination_grace_period_seconds: Option<i64>,
    pub tmp_dir_size_limit: Option<String>,
    pub tolerations: Option<Vec<KafkaCruiseControlTemplatePodTolerations>>,
    pub topology_spread_constraints: Option<Vec<KafkaCruiseControlTemplatePodTopologySpreadConstraints>>,
    pub volumes: Option<Vec<KafkaCruiseControlTemplatePodVolumes>>,
}

#[derive(Serialize, Deserialize, Clone, Debug)]
pub struct KafkaCruiseControlTemplateCruiseControlContainer {
    pub env: Option<Vec<KafkaCruiseControlTemplateCruiseControlContainerEnv>>,
    pub security_context: Option<KafkaCruiseControlTemplateCruiseControlContainerSecurityContext>,
    pub volume_mounts: Option<Vec<KafkaCruiseControlTemplateCruiseControlContainerVolumeMounts>>,
}

// ...

pub struct KafkaCruiseControlTemplatePodAffinityPodAntiAffinityPreferredDuringSchedulingIgnoredDuringExecutionPodAffinityTerm {
    pub label_selector: Option<KafkaCruiseControlTemplatePodAffinityPodAntiAffinityPreferredDuringSchedulingIgnoredDuringExecutionPodAffinityTermLabelSelector>,
    pub match_label_keys: Option<Vec<String>>,
    pub mismatch_label_keys: Option<Vec<String>>,
    pub namespace_selector: Option<KafkaCruiseControlTemplatePodAffinityPodAntiAffinityPreferredDuringSchedulingIgnoredDuringExecutionPodAffinityTermNamespaceSelector>,
    pub namespaces: Option<Vec<String>>,
    pub topology_key: Option<String>,
}

With the changes in this branch/PR, using the somewhat naive overrides.example.yaml:

cargo run -- -f kafka.yaml --overrides overrides.example.yaml > kafka-with-overrides.rs
ls -lh kafka.yaml kafka.rs kafka-with-overrides.rs
469K kafka.yaml
405K kafka.rs
107K kafka-with-overrides.rs # 73% reduction in size.
wc -l kafka-with-overrides.rs
2298

Yields:

pub struct KafkaCruiseControlTemplatePod {
    pub affinity: Option<k8s_openapi::api::core::v1::Affinity>,
    pub enable_service_links: Option<bool>,
    pub host_aliases: Option<Vec<k8s_openapi::api::core::v1::HostAlias>>,
    pub image_pull_secrets: Option<Vec<k8s_openapi::api::core::v1::LocalObjectReference>>,
    pub metadata: Option<MyKafakMetadataTemplate>,
    pub priority_class_name: Option<String>,
    pub scheduler_name: Option<String>,
    pub security_context: Option<k8s_openapi::api::core::v1::PodSecurityContext>,
    pub termination_grace_period_seconds: Option<i64>,
    pub tmp_dir_size_limit: Option<String>,
    pub tolerations: Option<Vec<k8s_openapi::api::core::v1::Toleration>>,
    pub topology_spread_constraints: Option<k8s_openapi::api::core::v1::TopologySpreadConstraint>,
    pub volumes: Option<Vec<k8s_openapi::api::core::v1::Volume>>,
}

pub struct KafkaCruiseControlTemplateCruiseControlContainer {
    pub env: Option<Vec<k8s_openapi::api::core::v1::EnvVar>>,
    pub security_context: Option<k8s_openapi::api::core::v1::SecurityContext>,
    pub volume_mounts: Option<Vec<k8s_openapi::api::core::v1::VolumeMount>>,
}

Which re-uses many existing k8s-openapi types, but could be tailored further with some domain knowledge of the CRD you're overriding. For example passing an additional --overrides overrides.strimzi.yaml containing only:

propertyRules:
  - matchSuccess:
      replace: MetadataTemplate
    matchAnyName:
      - exact: metadata
    matchSchema:
      exhaustive:
        type: object
        properties:
          labels:
            additionalProperties:
              type: string
            type: object
          annotations:
            additionalProperties:
              type: string
            type: object

Yields:

ls -lh
83K kafka-with-overrides.rs
wc -l kafka-with-overrides.rs
1792

Demonstrating a single rule for a common type providing a further 22% reduction, which should further motivate #298.

@brendanhay brendanhay force-pushed the bh/override-configuration branch 7 times, most recently from 6f884e3 to 6e03c85 Compare April 6, 2025 23:02
@brendanhay brendanhay marked this pull request as draft April 7, 2025 01:31
@brendanhay brendanhay force-pushed the bh/override-configuration branch 4 times, most recently from ff98c47 to d17842f Compare April 7, 2025 04:02
@brendanhay brendanhay force-pushed the bh/override-configuration branch from d17842f to 082c354 Compare April 7, 2025 04:02
@brendanhay brendanhay marked this pull request as ready for review April 8, 2025 04:23
Copy link
Member

@clux clux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this very detailed and rigorous extension. I'm very positive on this in general, because the motivating example is similar (while admittedly more java-extreme) to what I am facing in other crds, and this provides a clean method for codifying replacements that could be upstreamed to generator crates such as kube-custom-resources-rs or gateway-api-rs (who historically have had to do much worse things).

This is right up there around my threshold of how much complexity to accept in PRs, but you've managed to do it with minimal changes to the other files and do it in a basically orthogonal module which in my opinion makes it worth it.

I'm going to need a little time to digest the source changes in overrides.rs, so have only given some documentation notes and interface thoughts on the (very detailed 👍 ) example config so far. Particularly there are some naming and nesting issues that perhaps makes it a little harder to grasp. LMK what you think. 🙏

Comment on lines +9 to +14
propertyRules:
# The action to perform if the type and name-directed matches below succeed.
# `replace` is the `String` to use as the property's Rust type. This will prevent a Rust
# `Container` being generated for the property, if matching succeeds.
- matchSuccess:
replace: ResourceRequirements
Copy link
Member

@clux clux Apr 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This propertyRules name feels a little unclear (and forces me to read the entire doc about what it does before I get it). maybe this should contain a more explicit verb about the action it is doing; e.g. it's mostly a vector of rules for replacing types. Maybe it should be called something like replaceRules? Not sure about this, because it can also omit.

The closest thing to this type of rule in the Kubernetes world that comes to mind is prometheus' relabelConfig which has some nice terminology that is kind of internalized for me. Maybe it's worth considering re-using parts of that, e.g. maybe action over matchSuccess and source over matchName, and target etc?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're rules, that only apply to properties. 😃 There's potentially other AST rewrites that can be performed beyond properties, (ie. on Rust types, rather than CRD types) and I introduced the top-level propertyRules to try and leave room in the format for non-property related configuration. I'm unsure on alternative names, as you've noted replaceRules isn't precise since you can do more than just replace. Maybe rewriteRules to confuse ingress-nginx users?

I have no issue with renaming match* if you're settled on those you suggested, but I did find with terse names the rule structure became harder to parse when you have a large configuration with many CRD objects, as those tend to reuse names like action|source|target a lot.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good argument against replaceRules. Agreed.

rewriteRules is better, but feels imprecise as well. Parts of me wants to do something like relabelRules except what we have is not rules that apply to "labels" but container names, and renameRules doesn't really convey what we are renaming.

Tbh, I hadn't actually considered that when you say the "rules apply to properties", you were referring to the literal properties in the schema, and not an abstract name for "stuff we are renaming". maybe propertyRules is fine. 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no issue with renaming match* if you're settled on those you suggested, but I did find with terse names the rule structure became harder to parse when you have a large configuration with many CRD objects, as those tend to reuse names like action|source|target a lot.

That feels more like a structuring + convention problem to me. If we always conventionally do "the schema to match against" at the bottom (possibly with wrapped comments), then I don't think we need to guard quite this hard against confusion of people "mixing up our schema with theirs".

I also think the explicit action separated from matchSuccess makes it clearer to see what the rule is intended to do, plus de-indents a bit.

propertyRules:
  # Common resource requirements
  - action: replace
    target: ResourceRequirements
    sourceName:
      - exact: ..
    sourceSchema:
    # Big blob. Maybe Documentation about why we subset the schema to this blob here.
    {}

 - action: ...

Copy link
Member

@clux clux Apr 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...although, thinking about it more. this does make it possible to do action: omit and still set a target (something the current enum prevents). maybe we could keep the enum and the only rename the matchSuccess to action instead;

propertyRules:
  # Common resource requirements
  - action:
      replace: ResourceRequirements
    matchName:
      - exact: ..
    matchSchema:
    # Big blob. Maybe Documentation about why we subset the schema to this blob here.
    {}

 - action: ...

but that gets rid of the helpful target..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, actually, I think this can possibly be solved with deserializing PropertyAction enum from an adjacently tagged enum:

with:

    /// The behavior of this rule if the type and name-directed match phases succeed.
    #[serde(tag = "action", content = "target")]
    pub action: PropertyAction,

to allow;

propertyRules:
  # Common resource requirements
  - action: replace
    target: ResourceRequirements
    sourceName:
      - exact: ..
    sourceSchema:
    # Big blob. Maybe Documentation about why we subset the schema to this blob here.
    {}

  - action: omit
    sourceName: {..}

Maybe a key question around doing this is what other actions can you possibly foresee?

Copy link
Author

@brendanhay brendanhay Apr 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally find these suggestions more confusing than the status quo; source is the CRD sub-property we're trying to match, target is the result of running the action. With the use of adjacently tagged enums (which I'm not a fan of - for evolutionary purposes) it implies target is what you're matching, so in your example somehow we're going to replace occurrences of ResourceRequirements. 🤷🏽

action: { replacement: ResourceRequirements } or a variation of seems less ambiguous here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is fair. I see your point and don't have strong feelings towards it. Happy to leave it as is.

@clux
Copy link
Member

clux commented Apr 8, 2025

cc @shaneutt who might be working on #298 separately
cc @sebhoss who might receive PRs for this type of configuration in the future.

@sebhoss
Copy link
Contributor

sebhoss commented Apr 8, 2025

Yeah this is really cool & I'm happy to integrate this for as many CRDs as possible!

@shaneutt
Copy link
Member

shaneutt commented Apr 8, 2025

@brendanhay you said this compliments #298, but it looks like it could lead towards solving it. I had intended to start working on #298, but had not yet started. Are you interested in positioning this effort towards a resolution of #298? If so, I would be supportive.

@brendanhay
Copy link
Author

@shaneutt This doesn’t introduce sharing of generated types (ie. those defined in a set of CRDs) - it introduces sharing with existing Rust types, by short-circuiting the generation of overriden types to reference existing types.

This means if you invoke kopium on 3x different CRDs, you’d need to write your proposed shared_types.rs by hand - as well as overrides.yaml rules to then rereference those 3x separate outputs to use the shared types.

There are two complimentary levels of structural (type) sharing in play:

  1. Reuse of existing types defined outside the input CRD - overrides, much like the existing ObjectReference handling, address this.
  2. Reuse of generated types defined within a set of input CRDs - which is what Add a brute force option to re-use identical structs #298 addresses.

Hopefully that explanation makes sense - if you’re still happy to continue working on #298, go for it. Alternatively, I can also look into it once this PR has been refined or sent out into the sea.

@brendanhay brendanhay force-pushed the bh/override-configuration branch from c795991 to 05e9b96 Compare April 8, 2025 20:45
@brendanhay brendanhay force-pushed the bh/override-configuration branch from 05e9b96 to 9ca39cf Compare April 8, 2025 20:46
@shaneutt
Copy link
Member

shaneutt commented Apr 9, 2025

@shaneutt This doesn’t introduce sharing of generated types (ie. those defined in a set of CRDs) - it introduces sharing with existing Rust types, by short-circuiting the generation of overriden types to reference existing types.

Yes, I understood that this doesn't actually solve #298 currently.

Hopefully that explanation makes sense - if you’re still happy to continue working on #298, go for it. Alternatively, I can also look into it once this PR has been refined or sent out into the sea.

If you want to take #298 I'm inclined to defer to you. If you don't actually want it, no sweat. Just seems like you're on a roll, so if you want/need it, take it.

Copy link
Member

@clux clux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

last go over. minor comment on testability

String,
}

#[cfg(test)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to have some unit tests here that is able to actually perform replacements. of course we could integration test this, but i feel it might not be too bad if we did what the analyzer unit tests are doing on a subset schema, and passing in some test override struct. should be enough then to check that the structs were actually replaced.

Comment on lines +9 to +14
propertyRules:
# The action to perform if the type and name-directed matches below succeed.
# `replace` is the `String` to use as the property's Rust type. This will prevent a Rust
# `Container` being generated for the property, if matching succeeds.
- matchSuccess:
replace: ResourceRequirements
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is fair. I see your point and don't have strong feelings towards it. Happy to leave it as is.

Copy link
Member

@clux clux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests notwithstanding, I happy to not block this as it stands on its own. If you are able to get the DCO check satisfied I am happy to merge this as is and put some more unit tests as a follow-up 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants