Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add job timeout #199

Merged
merged 4 commits into from
May 28, 2024
Merged

feat: add job timeout #199

merged 4 commits into from
May 28, 2024

Conversation

zreigz
Copy link
Member

@zreigz zreigz commented May 28, 2024

Add time out for stack run job reconciliation. After timeout mark the stack run with an error state.

@zreigz zreigz added the enhancement New feature or request label May 28, 2024
@zreigz zreigz changed the title add job timeout feat: add job timeout May 28, 2024
@@ -42,6 +43,7 @@ import (
)

const jobSelector = "stackrun.deployments.plural.sh"
const jobTimout = time.Minute * 10
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think this should be 30 or 60 minutes (could have a weird scheduling edge case that takes 10 minutes for benign reasons I think)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -72,15 +74,24 @@ func (r *StackRunJobReconciler) Reconcile(ctx context.Context, req ctrl.Request)
// Update step statuses, i.e., when stack run was successful or failed.
for _, step := range stackRun.Steps {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh we should actually not do these updates here (let the harness fully own them and it isn't going to impact anything on the api if they remain pending)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}
}

// Exit if stack run is not in running state (run status already updated),
// or if the job is still running (harness controls run status).
if stackRun.Status != console.StackStatusRunning || job.Status.CompletionTime.IsZero() {
return ctrl.Result{}, nil
if isActiveJobTimout(stackRun.Status, job) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should also manually delete the job here, or perhaps kill its pod. I'm not sure if the ttl on the job will pick up

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@github-actions github-actions bot added size/M and removed size/S labels May 28, 2024
@zreigz
Copy link
Member Author

zreigz commented May 28, 2024

@michaeljguarino PTAL

@zreigz zreigz merged commit f9f252f into main May 28, 2024
15 checks passed
@zreigz zreigz deleted the job-timeout branch May 28, 2024 14:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request size/M
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants