Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

☂️ CA-MCM Overhaul #895

Open
unmarshall opened this issue Jan 19, 2024 · 0 comments
Open

☂️ CA-MCM Overhaul #895

unmarshall opened this issue Jan 19, 2024 · 0 comments
Labels
area/control-plane Control plane related kind/enhancement Enhancement, improvement, extension kind/epic Large multi-story topic lifecycle/stale Nobody worked on this for 6 months (will further age) priority/3 Priority (lower number equals higher priority)

Comments

@unmarshall
Copy link
Contributor

unmarshall commented Jan 19, 2024

How to categorize this issue?
/area control-plane
/kind epic
/priority 3

What would you like to be added:
TO BE FILLED

Why is this needed:
There are several reasons why we need to relook at CA and MCM in isolation and their interplay.

  • MCM code base over the period of time has become quite complex and difficult to maintain.
  • CA
    • We maintain a fork of CA in order to include MCM provider. There is a periodic effort to sync and release with every new k8s version.
    • Over a period of time our fork's divergence from the upstream has increased, where-in we have started to alter core codebase of CA (provider agnostic). One such issue: Keep drain logic optional autoscaler#99 highlights this need and Avoid fixing nodegroup size autoscaler#30 commented fixNodeGroupSize logic in core CA.
    • With over 90+ CLI options it is a bit tricky to tune CA for any consumer.
    • The design philosophy of CA centres around creation of node groups which has a 1:1 correspondence with a specific machine type and zone. This is limiting as consumers wish to be more flexible w.r.t machine types across zones as resource quotas per machine type in any specific zone are always a challenge.
    • Spot instances across providers recommend to have greater flexibility w.r.t machine types, zones and regions to ensure greater probabilities to get spot instances and reduce the possibility for spot evictions. This can technically be realised using node groups but it can easily lead to a combinatorial explosion of node groups and complicated expander rules.
    • Scheduler module that is used in CA differs in configuration from kube-scheduler thus creating chances of differing outcomes w.r.t pod scheduling. One such issue: Computation of expendable pods does not consider preemption policy kubernetes/autoscaler#6227 was raised recently.
  • CA and MCM have overlap w.r.t functionalities that it offers. This results in race conditions due to concurrent actions taken by CA and MCM. This further leads to over complicating the code base to handle these. For instance Ensure that there is a single actor which reduces the machine deployment replicas autoscaler#181 was raised highlighting one such issue.
  • There is also an ask to make CA into a library but due to a massive effort this is unlikely to be taken up.
  • Due to several binaries (CA running in a separate pod), MCM(having 2 containers in a single pod) we have the following problems:

The quantum of change and the new direction (if any) proposed as part of this epic will also have an impact on the on-ongoing discussions on how to enhance worker pool configurations: gardener/gardener#8142 and an internal (draft) proposal on Enhancing Gardener's Worker Pool Configuration.

@unmarshall unmarshall added the kind/enhancement Enhancement, improvement, extension label Jan 19, 2024
@gardener-robot gardener-robot added area/control-plane Control plane related kind/epic Large multi-story topic priority/3 Priority (lower number equals higher priority) labels Jan 19, 2024
@gardener-robot gardener-robot added the lifecycle/stale Nobody worked on this for 6 months (will further age) label Sep 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/control-plane Control plane related kind/enhancement Enhancement, improvement, extension kind/epic Large multi-story topic lifecycle/stale Nobody worked on this for 6 months (will further age) priority/3 Priority (lower number equals higher priority)
Projects
None yet
Development

No branches or pull requests

2 participants