Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add verification step to WGE bootstrap #3732

Open
wants to merge 1 commit into
base: 3647-add-verify-step
Choose a base branch
from

Conversation

Samra10
Copy link
Member

@Samra10 Samra10 commented Dec 17, 2023

What changed?

  • Remove the component status checker because a dynamic list should exist.
  • Add a status checker for the ready property of the helmrelease.

Why was this change made?

  • To depend on the HR status regardless of the dependent components list.

How did you validate the change?

  • Explain how a reviewer can verify the change themselves

  • Throw interactive or non-interactive, after flux finishes HR reconciliation it will wait for the HelmRelease Ready status to be True, or timeout finish

  • Unit tests -- what is covered, what cannot be covered; are
    there tests that fail without the change?

  • Add unit_tests for case HR ready is True, and case timeout and status still not True.

@Samra10 Samra10 added the exclude from release notes Use this label to exclude a PR from the release notes label Dec 17, 2023
@Samra10 Samra10 changed the base branch from main to 3647-add-verify-step December 17, 2023 23:28
@Samra10 Samra10 marked this pull request as ready for review December 17, 2023 23:28
Copy link
Contributor

@enekofb enekofb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it feels better this direction so we have helm release a target. however, how this approach is different to flux native helm release wait?

btw, when you tested the helm release default wait mechanism ... what was the testing scenario so i could have a look in case find out anything ...

i just suspect that is missconfiguration during upgrades in wge side.

if you https://helm.sh/docs/helm/helm_upgrade/ ... my hypothesis is that

 --wait                                       if set, will wait until all Pods, PVCs, Services, and minimum number of Pods of a Deployment, StatefulSet, or ReplicaSet are in a ready state before marking the release as successful. It will wait for as long as --timeout

this condition

and minimum number of Pods of a Deployment,

is met but the application might not be healthy ...

likely because of this fluxcd/helm-controller#355 (comment)

@enekofb
Copy link
Contributor

enekofb commented Dec 18, 2023

Helm considers a deployment with one replica and strategy rollingUpdate with maxUnavailable of 1 to be ready when it has 1 unavailable deployment. (Even if the deployment's one and only pod has entered CrashLoopBackOff and readiness and liveness checks have all failed... with maxUnavailable of 1 and replicas of 1, the deployment technically has no more than the allowed number of unavailable pods, so it is considered ready.)

@enekofb
Copy link
Contributor

enekofb commented Dec 18, 2023

Options explored:

  1. helmrelease cr dependensOn
  2. helmrelease values -> not having dependencies
  3. helm release depndency from the helm chart -> not necessarily being used (for example template controller)
  4. compute helm release ready status on code (this PR)
  5. flux helm controller (a desirable solution but not having the expected results)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
exclude from release notes Use this label to exclude a PR from the release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants