Replies: 1 comment
-
Armada retries jobs that didn't start correctly. We don't do restarts once the pod is running. It is an interesting feature request though! Typically Armada is coupled with some kind of workflow engine so Airflow (for example) would handle restarts if a job failed. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi there, I was wondering for a little help, I am testing Armada Quickstart for my uses-cases and I have trouble understanding the scheduler Job retry functionality.
Reading through the scheduler.go, I understood if a job fails, it will be retried until the max number of retries is reached.
So I am running this pod which randomly exits with either a 0 or 1 return code so that I emulate job failures:
I submit let's say 4 of these jobs, 2 succeed, and 2 fail. Those which fail would never be re-tried and Armada would clean up all the records of them from Kubernetes after ~10 min.
In addition, I tried to use
restartPolicy: OnFailure
to utilize kubernetes native feature, which also did not work. I didkubectl get po <failed_pod> -o yaml
and found thatrestartPolicy: OnFailure
was not passed and it was set to "Never".Please explain to me how Job Retry works on job/whatever level ( without using K8s native RestartPolicy: OnFailure) and how can I make this example work.
Any help would be appreciated as I could not find any info/existing complaints on this topic!
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions