Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Helm chart install is incompatible #303

Open
bboreham opened this issue Oct 24, 2019 · 2 comments · Fixed by helm/charts#18969
Open

Helm chart install is incompatible #303

bboreham opened this issue Oct 24, 2019 · 2 comments · Fixed by helm/charts#18969
Labels
bug Something isn't working

Comments

@bboreham
Copy link
Contributor

The selector in the helm chart has two labels, whereas the one we serve for auto-update has one:

2019-10-24T14:34:41.838259571Z time="2019-10-24T14:34:41Z" level=info msg="Updating self from https://get.weave.works/k8s/agent.yaml?instanceID=<redacted>"
2019-10-24T14:34:41.856270392Z time="2019-10-24T14:34:41Z" level=info msg="Revision before self-update: 2"
2019-10-24T14:34:42.861367005Z time="2019-10-24T14:34:42Z" level=error msg="Failed to execute kubectl apply: The Deployment \"weave-agent\" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{\"app\":\"weave-cloud\", \"release\":\"weave-cloud\", \"name\":\"weave-agent\"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable\nFull output:\nnamespace/weave configured\nserviceaccount/weave-agent unchanged\nclusterrole.rbac.authorization.k8s.io/weave-agent configured\nclusterrolebinding.rbac.authorization.k8s.io/weave-agent configured\nThe Deployment \"weave-agent\" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{\"app\":\"weave-cloud\", \"release\":\"weave-cloud\", \"name\":\"weave-agent\"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable"
@bboreham bboreham added the bug Something isn't working label Oct 24, 2019
@errordeveloper
Copy link
Contributor

errordeveloper commented Nov 15, 2019

@bboreham here is the plan based on what we discussed the other day, let me know if you agree and I'll start implementing it.

We don't know how many agents were installed from helm chart and are still stuck in this state (see #306). There is probably a way to find out by analysing the request log and cross-checking agents that hit agent.yaml and are unconnected.

Aside from the question of how many agents there are in this state, a solution to recover agents from this state would involve the following.

We add a new parameter to the agent.yaml, let's call it agent-generation. It would allow us to migrate from one version to another in the future.

So for this case we should be able to apply following migration plan:

  • release new agents that uses agent-generation=v1
  • update the service to support optional custom-selector=<labelSelector>
  • update helm chart to use new agent
  • wait some period of time (e.g. a day or two)
  • assume that all agents which hit the service without agent-generation set are broken
  • fix broken agents by redirecting agent.yaml to agent.yaml?agent-generation=v1&custom-selector=<helmLabels>
  • agent-generation and custom-selector are sticky, i.e. any requests with these params carry these forward as part of URL used for subsequent requests

This plan could work in theory, but there is a risks of further break in cases like the following:

  • agent was installed without helm, but didn't run at the time we executed the migration plan, e.g.:

    • cluster had zero nodes where agent could run and gets scaled up later
    • the agent was scaled to zero replicas and scaled up later
  • agent configuration is maintained by the user, e.g.:

    • they re-create their clusters and run kubectl apply -f agent.yaml (where agent.yaml is a copy of the config that they downloaded at some point)
    • they forked the helm chart, and possibly made further changes to it
  • user didn't use same helm install command and values of labels could different

I suppose all of these cases can be considered as unorthodox usage that we cannot support, but I'm not sure.

Aside from this, I am not quite sure the helm chart fix would stick - the helm community appears to mandate use of app and release labels as it currently stands in our chart.

@bboreham
Copy link
Contributor Author

bboreham commented Mar 1, 2021

Thinking about this again, a Weave Cloud helm chart should install just the launcher, then it can install everything else like it normally does.

This would parallel, for instance, https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack, which installs prometheus-operator which then in turn installs Prometheus and some exporters.
The Weave Cloud Launcher functions as an "operator".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants