-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add job timeout #199
Conversation
@@ -42,6 +43,7 @@ import ( | |||
) | |||
|
|||
const jobSelector = "stackrun.deployments.plural.sh" | |||
const jobTimout = time.Minute * 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think this should be 30 or 60 minutes (could have a weird scheduling edge case that takes 10 minutes for benign reasons I think)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -72,15 +74,24 @@ func (r *StackRunJobReconciler) Reconcile(ctx context.Context, req ctrl.Request) | |||
// Update step statuses, i.e., when stack run was successful or failed. | |||
for _, step := range stackRun.Steps { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh we should actually not do these updates here (let the harness fully own them and it isn't going to impact anything on the api if they remain pending)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
} | ||
} | ||
|
||
// Exit if stack run is not in running state (run status already updated), | ||
// or if the job is still running (harness controls run status). | ||
if stackRun.Status != console.StackStatusRunning || job.Status.CompletionTime.IsZero() { | ||
return ctrl.Result{}, nil | ||
if isActiveJobTimout(stackRun.Status, job) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should also manually delete the job here, or perhaps kill its pod. I'm not sure if the ttl on the job will pick up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@michaeljguarino PTAL |
Add time out for stack run job reconciliation. After timeout mark the stack run with an error state.