Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[spike] Prevent DataPlane Lockups #2195

Closed
3 of 5 tasks
shaneutt opened this issue Jan 19, 2022 · 3 comments
Closed
3 of 5 tasks

[spike] Prevent DataPlane Lockups #2195

shaneutt opened this issue Jan 19, 2022 · 3 comments
Assignees
Labels
area/feature New feature or request
Milestone

Comments

@shaneutt
Copy link
Contributor

shaneutt commented Jan 19, 2022

Problem Statement

As of v2.x we have added several pieces of API validation for resources such as KongConsumers, Secrets, Gateways, and HTTPRoutes which include validation for multiple objects which refer to or are related to one another, for instance:

  1. KongConsumer credentials Secret multi-object validation for unique-key constraint violations
  2. HTTPRoute validations

These validations help to avoid lockups of dataplane synchronization but currently only if the validating webhook has been enabled and we're aware that the webhook validation is technical best effort with some timing issue edge cases that can occur.

The purpose of this task is to develop a comprehensive solution forward that enables us to run validation across multiple objects to avoid configuration that would lock up the data-plane while also improving the long term maintainability of validation code.

Proposed Solution

  • create a KEP describing an implementation that is 100% capable of avoiding data-plane lockups, looks like we may just be able to add this at the dataplane.KongClient level because the dataplane object cache already has a mutex.

Additional information

Notes:

  • there may be some prior art in CertManager for solving this problem

Acceptance Criteria

A "design proposal/draft PR/KEP" exists that provides a solution to the below points:

  • As an end-user of the KIC if I have provided configuration which is invalid due to conflicts between multiple objects, I get a clear and present message to this affect so I know exactly what resources and what configuration needs to be changed
  • As an end-user of the KIC if I have created an invalid configuration this should not halt back-end data-plane configurations for other unrelated resource updates
  • As a developer/contributor to the KIC it should be clear and simply to contribute to code that validates resources, particularly when validating multiple resources in relation
  • The consumer-credential ordering issue (see KongConsumer Secrets being required to exist before the KongConsumer causes some friction for deployments #2324) must be resolved by the new implementation
@shaneutt shaneutt added area/debt area/maintenance Cleanup, refactoring, and other maintenance improvements that don't change functionality. labels Jan 19, 2022
@shaneutt shaneutt self-assigned this Jan 19, 2022
@shaneutt shaneutt removed their assignment Feb 1, 2022
@scseanchow scseanchow changed the title Multi-Object Validation Complexity [spike] Multi-Object Validation Complexity Feb 22, 2022
@shaneutt
Copy link
Contributor Author

I'm considering a compromise between the two extremes which I'll be writing up in a KEP: effectively significantly reducing multi-object validation, but not eliminating it, and simultaneously making it possible to run validation at the dataplane.KongClient level in a way that is synchronous so as to avoid data-plane configuration lock up as noted in issues like #680.

@shaneutt shaneutt changed the title [spike] Multi-Object Validation Complexity [spike] Validation to avoid DataPlane Lockups Mar 10, 2022
@shaneutt shaneutt removed their assignment Mar 16, 2022
@mflendrich
Copy link
Contributor

Original bug report for the ordering issue on credentials: #2324 (comment)

@scseanchow scseanchow added this to the Release v3.0.0 milestone May 24, 2022
@mflendrich mflendrich changed the title [spike] Validation to avoid DataPlane Lockups [spike] Prevent DataPlane Lockups Jul 19, 2022
@mflendrich mflendrich added bug Something isn't working and removed area/debt area/maintenance Cleanup, refactoring, and other maintenance improvements that don't change functionality. labels Jul 19, 2022
@mflendrich mflendrich modified the milestones: KIC Next Major Release, Release v3.0.0 Aug 17, 2022
@mflendrich mflendrich assigned mflendrich and czeslavo and unassigned mflendrich Sep 22, 2023
@mflendrich mflendrich added area/feature New feature or request and removed bug Something isn't working priority/medium labels Sep 22, 2023
@mflendrich
Copy link
Contributor

Closing in favor of more targeted issues

As an end-user of the KIC if I have provided configuration which is invalid due to conflicts between multiple objects, I get a clear and present message to this affect so I know exactly what resources and what configuration needs to be changed

For DB-less covered by #3205
For DB mode covered by #3492

As an end-user of the KIC if I have created an invalid configuration this should not halt back-end data-plane configurations for other unrelated resource updates

The former part (_not halt back-end data-plane configurations) covered by #4048
The latter part (syncing for other unrelated resource updates) will come with redesign of the CP/DP protocol

As a developer/contributor to the KIC it should be clear and simply to contribute to code that validates resources, particularly when validating multiple resources in relation

validating relationships can only reliably happen post-facto because webhooks have only partial knowledge of the target state. Such validations can currently be added and use (1) events (2) status conditions to surface conflicts. Closing as stale.

The consumer-credential ordering issue (see #2324) must be resolved by the new implementation

the severity of this issue has greatly reduced with the events mechanism - closing this part as it's lacking clear scope in today's context. Closing as stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/feature New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants