Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rollout stuck in infinite "Progressing" even after progressDeadlineSeconds exceeds #3988

Open
2 tasks done
greykurotsuki opened this issue Dec 6, 2024 · 0 comments
Open
2 tasks done
Labels
bug Something isn't working

Comments

@greykurotsuki
Copy link

greykurotsuki commented Dec 6, 2024

Checklist:

  • I've included steps to reproduce the bug.
  • I've included the version of argo rollouts.

Describe the bug

In the default scenario, Rollouts work as expected: after a ReplicaSet becomes healthy, it starts the analysis and promotes the new stable version. However, I encountered an issue when deploying a broken revision.

For example, I intentionally misconfigured the startupProbe to make it fail, so the ReplicaSet could never become ready. I expected the Rollout to mark this revision as Degraded and abort the preview release once progressDeadlineSeconds was exceeded. This behavior worked as expected for the first broken revision, but did not for revision 4, which has different image tag and is broken.

To Reproduce

My configurations:

bluegreen-service.yaml:

apiVersion: v1
kind: Service
metadata:
  name: bluegreen-demo-preview
  labels:
    app: bluegreen-demo
spec:
  ports:
    - port: 80
      targetPort: http
      protocol: TCP
      name: http
  selector:
    app: bluegreen-demo
---
apiVersion: v1
kind: Service
metadata:
  name: bluegreen-demo
  labels:
    app: bluegreen-demo
spec:
  ports:
    - port: 80
      targetPort: http
      protocol: TCP
      name: http
  selector:
    app: bluegreen-demo

bluegreen-rollout.yaml:

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: bluegreen-demo
  labels:
    app: bluegreen-demo
spec:
  replicas: 3
  revisionHistoryLimit: 1
  progressDeadlineSeconds: 60
  progressDeadlineAbort: true
  selector:
    matchLabels:
      app: bluegreen-demo
  template:
    metadata:
      labels:
        app: bluegreen-demo
    spec:
      containers:
      - name: bluegreen-demo
        image: argoproj/rollouts-demo:blue # initially "blue", but changed to "green" to create a new "broken" release
        imagePullPolicy: Always
        ports:
        - name: http
          containerPort: 8080
          protocol: TCP
        resources:
          requests:
            memory: 32Mi
            cpu: 5m
        startupProbe: # initially commented section, but uncommented once I need to break the new release.
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 5
          failureThreshold: 30
  strategy:
    blueGreen:
      autoPromotionEnabled: false
      activeService: bluegreen-demo
      previewService: bluegreen-demo-preview

And a kustomization.yaml to deploy it:

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

resources:
- bluegreen-rollout.yaml
- bluegreen-service.yaml
  1. Start with a fresh Rollout object with a healthy revision (Revision 1).
  2. Deploy a broken revision (Revision 2) with the misconfigured health check.
    Result: Rollout correctly marked it as Degraded after progressDeadlineSeconds and aborted the release.
  3. Deploy a healthy revision (Revision 3).
    Result: Revision 3 became the new stable version.
  4. Deploy another broken revision (Revision 4).
    Issue: The Rollout got stuck in an infinite Progressing state with the message: “active service cutover pending.”

Expected behavior

The Rollout should mark Revision 4 as Degraded and abort the release, similar to how it handled Revision 2.

Screenshots

Initial release
Broken revision 2  expected behavior
Successful revision 3
Broken revision 4, stuck in infinite progressing state

Version

Argo Rollouts Version: 2.37.7 (Chart version)
Argo Rollouts Image: quay.io/argoproj/argo-rollouts:v1.7.2

Logs

Logs for the entire controller:
kubectl logs -n argo-rollouts deployment/argo-rollouts
controller-logs.txt

Logs for a specific rollout:
kubectl logs -n argo-rollouts deployment/argo-rollouts | grep rollout=bluegreen-demo
bluegreen-rollout-grep-logs.txt


Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.

@greykurotsuki greykurotsuki added the bug Something isn't working label Dec 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant