Skip to content
This repository has been archived by the owner on Oct 11, 2023. It is now read-only.

Some pods not receiving a sidecar on cluster boot #276

Open
ajaffie opened this issue Jan 31, 2020 · 4 comments
Open

Some pods not receiving a sidecar on cluster boot #276

ajaffie opened this issue Jan 31, 2020 · 4 comments

Comments

@ajaffie
Copy link

ajaffie commented Jan 31, 2020

Describe the bug
We shut down our cluster automatically every night and boot it in the morning because it is solely a dev cluster and does not need to run 24/7. Sometimes, when the cluster boots in the morning, pods deployed in our root dev space via helm in CI/CD do not get sidecars attached. I'm thinking this is caused by the pods starting before the azds admission controller, but I'm not sure. Killing the pods and letting them get re-created by the deployment results in pods with the sidecar injected.

To Reproduce

  1. Deploy a service into a dev space using the azds chart via helm, not azds up.
  2. Shut down the cluster by turning off the underlying VMs.
  3. Start the VMs.
  4. The pod may or may not have the devspaces-proxy sidecar injected.

Expected behavior
The pods should always have the sidecar injected.

@stepro
Copy link
Member

stepro commented Jan 31, 2020

@ajaffie, yes I think your analysis is correct here. We had various reasons for running the webhook admission controller in the cluster rather than as part of our managed Dev Spaces endpoint, but I will need to go back and understand why that was. I will respond on this thread when I have more information.

@stepro
Copy link
Member

stepro commented Jan 31, 2020

I think the reason for this was largely a case of not doing the necessary work to enable mutual-TLS (server authenticates with client, client authenticates with server).

How frustrating would you say this is for you? I wonder if changing the failurePolicy property on the MutatingWebhookConfiguration to Fail would actually give you what you want, assuming that failure causes the pod to go into a backoff state and keep trying to start up until the webhook is up and running. Might be worth a try. I will warn you though that as dev spaces manages the objects in the azds namespace, this change might get overwritten at some point. Let me know if this works and we can figure out on the team how to proceed. Thanks!

@rakeshvanga
Copy link
Contributor

@ajaffie, did you happen to try out the option mentioned by @stepro ?
I have tested this out with MutatingWebhookConfiguration: Fail option. The pods in the azure dev spaces managed namespace will wait until the admission controller starts properly.
Note: It doesn't matter if the service is up'ed using azds up but the namespace into which the service is being deployed should be created using azds space select.

@ajaffie
Copy link
Author

ajaffie commented Mar 12, 2020

Sorry about that, this is kind of a low priority since it's so easily worked around so it got left behind. I've changed the failure policy to Fail and will see if the issue happens again over the next few days.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants