Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit amount of proxy restarts during one reconciliation #155

Closed
17 tasks done
triffer opened this issue May 26, 2023 · 9 comments
Closed
17 tasks done

Limit amount of proxy restarts during one reconciliation #155

triffer opened this issue May 26, 2023 · 9 comments
Assignees
Labels
area/service-mesh Issues or PRs related to service-mesh kind/feature Categorizes issue or PR as related to a new feature.

Comments

@triffer
Copy link
Collaborator

triffer commented May 26, 2023

Description

During reconciliation we read all proxies to filter out proxies that needs to be restarted. Since the amount of proxies is not deterministic we should introduce two safe-guards.

  1. We should limit how many proxies we read by using pagination.
  2. We should limit how many proxies we restart during one reconciliation.

By implementing this feature we'll make sure that the reconciliation is more responsible, since we are not blocking the reconciliation loop for a long time if there are many proxies to be restarted.
We should define some starting values, e.g. limit proxy restart to 30 and measure how this performs.

Reasons

Increase stability and reliability of the reconciliation of the Istio module.

TODO:

  • Get familiar with code
  • Basic implementation
  • Introduce envtest since controller fake client do not support pagination did not workout due to list condition for pods to be in Running status.phase
  • Refine tests and implementation
  • Integration test needed? too difficult to implement with reasonable efforts and reliability
  • Manual testing
  • Discuss few points on solution - https://sap-btp.slack.com/archives/C01L56SQ0QH/p1721045723840319
  • Utilize continue token between reconciliations + proper Ready condition for reconciliation runs
  • run as POC and worked but unreliable (missed pods) - decided to keep it simple and avoid sharing context between reconciliations to avoid complexity also when continue token becomes invalid and unsure how to handle this case. we should rather tune limit parameters instead for now.
  • Review

Findings:

  • Client Cache must be excluded for Pod resources. With Cache turned one the Client did not return a Continue token when Limit was specified.

DoD:

  • provide documentation
  • release notes and What's New updates for Kyma customers
  • provide unit tests
  • provide integration tests n/a
  • test on production-like environment
  • verify resource limits
  • followup issue n/a

Attachments

https://kubernetes.io/docs/reference/using-api/api-concepts/#retrieving-large-results-sets-in-chunks

PR

@triffer triffer added kind/feature Categorizes issue or PR as related to a new feature. area/service-mesh Issues or PRs related to service-mesh labels May 26, 2023
@kyma-bot
Copy link
Contributor

This issue or PR has been automatically marked as stale due to the lack of recent activity.
Thank you for your contributions.

This bot triages issues and PRs according to the following rules:

  • After 60d of inactivity, lifecycle/stale is applied
  • After 7d of inactivity since lifecycle/stale was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Close this issue or PR with /close

If you think that I work incorrectly, kindly raise an issue with the problem.

/lifecycle stale

@kyma-bot kyma-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 25, 2023
@kyma-bot
Copy link
Contributor

kyma-bot commented Aug 1, 2023

This issue or PR has been automatically closed due to the lack of activity.
Thank you for your contributions.

This bot triages issues and PRs according to the following rules:

  • After 60d of inactivity, lifecycle/stale is applied
  • After 7d of inactivity since lifecycle/stale was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle stale

If you think that I work incorrectly, kindly raise an issue with the problem.

/close

@kyma-bot kyma-bot closed this as completed Aug 1, 2023
@kyma-bot
Copy link
Contributor

kyma-bot commented Aug 1, 2023

@kyma-bot: Closing this issue.

In response to this:

This issue or PR has been automatically closed due to the lack of activity.
Thank you for your contributions.

This bot triages issues and PRs according to the following rules:

  • After 60d of inactivity, lifecycle/stale is applied
  • After 7d of inactivity since lifecycle/stale was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle stale

If you think that I work incorrectly, kindly raise an issue with the problem.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@strekm strekm removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 3, 2023
@strekm strekm reopened this Aug 3, 2023
@kyma-bot
Copy link
Contributor

kyma-bot commented Oct 2, 2023

This issue or PR has been automatically marked as stale due to the lack of recent activity.
Thank you for your contributions.

This bot triages issues and PRs according to the following rules:

  • After 60d of inactivity, lifecycle/stale is applied
  • After 7d of inactivity since lifecycle/stale was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Close this issue or PR with /close

If you think that I work incorrectly, kindly raise an issue with the problem.

/lifecycle stale

@kyma-bot kyma-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 2, 2023
@kyma-bot
Copy link
Contributor

kyma-bot commented Oct 9, 2023

This issue or PR has been automatically closed due to the lack of activity.
Thank you for your contributions.

This bot triages issues and PRs according to the following rules:

  • After 60d of inactivity, lifecycle/stale is applied
  • After 7d of inactivity since lifecycle/stale was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle stale

If you think that I work incorrectly, kindly raise an issue with the problem.

/close

@kyma-bot kyma-bot closed this as completed Oct 9, 2023
@kyma-bot
Copy link
Contributor

kyma-bot commented Oct 9, 2023

@kyma-bot: Closing this issue.

In response to this:

This issue or PR has been automatically closed due to the lack of activity.
Thank you for your contributions.

This bot triages issues and PRs according to the following rules:

  • After 60d of inactivity, lifecycle/stale is applied
  • After 7d of inactivity since lifecycle/stale was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle stale

If you think that I work incorrectly, kindly raise an issue with the problem.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@strekm strekm removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 30, 2023
@strekm strekm reopened this Oct 30, 2023
@kyma-bot
Copy link
Contributor

This issue or PR has been automatically marked as stale due to the lack of recent activity.
Thank you for your contributions.

This bot triages issues and PRs according to the following rules:

  • After 60d of inactivity, lifecycle/stale is applied
  • After 7d of inactivity since lifecycle/stale was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Close this issue or PR with /close

If you think that I work incorrectly, kindly raise an issue with the problem.

/lifecycle stale

@kyma-bot kyma-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 29, 2023
@kyma-bot
Copy link
Contributor

kyma-bot commented Jan 5, 2024

This issue or PR has been automatically closed due to the lack of activity.
Thank you for your contributions.

This bot triages issues and PRs according to the following rules:

  • After 60d of inactivity, lifecycle/stale is applied
  • After 7d of inactivity since lifecycle/stale was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle stale

If you think that I work incorrectly, kindly raise an issue with the problem.

/close

@kyma-bot
Copy link
Contributor

kyma-bot commented Jan 5, 2024

@kyma-bot: Closing this issue.

In response to this:

This issue or PR has been automatically closed due to the lack of activity.
Thank you for your contributions.

This bot triages issues and PRs according to the following rules:

  • After 60d of inactivity, lifecycle/stale is applied
  • After 7d of inactivity since lifecycle/stale was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle stale

If you think that I work incorrectly, kindly raise an issue with the problem.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@kyma-bot kyma-bot closed this as completed Jan 5, 2024
@strekm strekm removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 24, 2024
@strekm strekm reopened this Jan 24, 2024
@videlov videlov self-assigned this Jul 5, 2024
@videlov videlov assigned videlov and unassigned videlov Jul 11, 2024
@werdes72 werdes72 assigned werdes72 and unassigned werdes72 Jul 16, 2024
@videlov videlov self-assigned this Jul 18, 2024
@strekm strekm closed this as completed Aug 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/service-mesh Issues or PRs related to service-mesh kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

5 participants