Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 reconciler/apibinding: make group resources of APIBindings + CRD race-free #3251

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

sttts
Copy link
Member

@sttts sttts commented Jan 16, 2025

Summary

The conflict checker in the APIBinding reconciler computed a list of CRDs, both real and bound ones. It took the APIExport as reference instead of the bindings directly. This seems wrong. If a schema is removed from an export, or the whole export is deleted, the binding still exists, but it didn't contribute the conflict checker anymore.

This PR does the following:

  1. make the conflict checker use the APIBinding.status.boundResources as reference.
  2. store group resource bindings in an apis.kcp.io/resource-bindings annotation on the LogicalCluster. It records the APIBinding or whether it is a CRD. This annotation (a JSON map) is updated BEFORE (a) a CRD is created (admission) or (b) an APIBinding bind the group resource:
    internal.apis.kcp.io/resource-bindings: '{"partitions.topology.kcp.io":{"n":"topology.kcp.io"},"partitionsets.topology.kcp.io":{"n":"topology.kcp.io"},"shards.core.kcp.io":{"n":"shards.core.kcp.io"},"workspaces.tenancy.kcp.io":{"n":"tenancy.kcp.io"},"workspacetypes.tenancy.kcp.io":{"n":"tenancy.kcp.io"}}'

The tricky case is that we can only update the annotation during CRD creation via admission. And creation can fail. Then we have a dangling group resource reservation. This PR adds an expery timestamp to the group resource entries for CRD which matches the maximum request duration of 30s. For that time an APIBinding of the same group resource would fail (delayed). Don't know a better way to solve that.

Downside: CRD creation implies update of the LogicalCluster, with potential for conflicts. This might make CRD creation slower. So if you install 1000 CRDs, this PR will have an impact.

Related issue(s)

Fixes #3252

Release Notes

Fix critical race condition between APIBindings and CRDs potentially allowing the same resource to be bound by multiple bindings or CRDs, leading to data loss or inconsistent state.

@kcp-ci-bot kcp-ci-bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none Denotes a PR that doesn't merit a release note. dco-signoff: yes Indicates the PR's author has signed the DCO. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 16, 2025
@sttts sttts force-pushed the sttts-multiple-identities branch 3 times, most recently from 90885fe to e338b59 Compare January 18, 2025 20:13
@kcp-ci-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from sttts. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kcp-ci-bot kcp-ci-bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 18, 2025
@sttts sttts changed the title WIP: 🐛 reconciler/apibinding: check conflict via bound APIs in bindings, not exports WIP: 🐛 reconciler/apibinding: make group resources of APIBindings + CRD race-free Jan 18, 2025
@sttts sttts force-pushed the sttts-multiple-identities branch 4 times, most recently from 93f12c1 to 8ba5890 Compare January 20, 2025 10:04
@sttts sttts changed the title WIP: 🐛 reconciler/apibinding: make group resources of APIBindings + CRD race-free 🐛 reconciler/apibinding: make group resources of APIBindings + CRD race-free Jan 20, 2025
@kcp-ci-bot kcp-ci-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 20, 2025
@embik
Copy link
Member

embik commented Jan 20, 2025

/retest

@sttts sttts force-pushed the sttts-multiple-identities branch 3 times, most recently from e20f05d to 3f47600 Compare January 20, 2025 13:18
@sttts sttts force-pushed the sttts-multiple-identities branch from 3f47600 to facb4eb Compare January 20, 2025 13:24
@kcp-ci-bot kcp-ci-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Jan 20, 2025
@mjudeikis
Copy link
Contributor

/build-image

@kcp-ci-bot kcp-ci-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 28, 2025
@kcp-ci-bot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@embik embik added this to the v0.27.0 milestone Jan 31, 2025
@yastij
Copy link

yastij commented Feb 6, 2025

/cc

@kcp-ci-bot kcp-ci-bot requested a review from yastij February 6, 2025 10:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dco-signoff: yes Indicates the PR's author has signed the DCO. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bug: conflicting bound CRDs can exist in the same Workspace, resulting in non-deterministic behavior
5 participants