Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: if a step-based promotion fails, the stage get stuck in the promoting phase until a new promotion succeeds #2693

Open
krancour opened this issue Oct 7, 2024 · 4 comments

Comments

@krancour
Copy link
Member

krancour commented Oct 7, 2024

This doesn't seem like the right thing to do.

After a Promotion fails or errors, a Stage may be between states -- neither the beginning state nor the target state. (Whether this is true or not would really depend on the particulars of the user-defined promotion process.) imho, this calls for us to move the Stage out of the Promoting phase and into some new phase like "Indeterminate."

@hiddeco any thoughts here?

P.S. Let's not have this issue snowball into one about automated rollbacks. Assuming we will do that at some point in the future, recognizing and categorizing these conditions as called for by this issue is certainly a prerequisite.

@hiddeco
Copy link
Contributor

hiddeco commented Oct 8, 2024

Is the Stage actually still reporting it is promoting (which would be surprising to me, as I think we look at the terminated state of the last promotion)?

Or is this more about the Stage saying it's on Freight X, while it's actually "partially" on Freight Y?

@krancour
Copy link
Member Author

krancour commented Oct 8, 2024

It still says phase is Promoting, which isn't deliberate. But in terms of fixing it, it seems wrong to put it to Verifying or Steady.

@hiddeco
Copy link
Contributor

hiddeco commented Oct 8, 2024

Based on the code that synchronizes the Promotions for a Stage, I would expect the Stage to be in a Steady phase as soon as the Promotion reaches a terminal state. 🤔

@krancour
Copy link
Member Author

krancour commented Oct 8, 2024

So there are two things here.

  1. I agree with your interpretation of what the code says should be happening and I want to figure out how I've observed something different.

  2. I'm questioning whether what we think the code says should be happening is ideal to begin with. If a Promotion failed, depending on the particulars of the process, "steady" might be a poor descriptor for the currrent state.

Don't worry about this more until I can reproduce it and give some more details about how it happened.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants