Optional SIGTERM at Job Failure #60

Xtigyro · 2020-04-12T17:12:20Z

I would like to THANK YOU for this brilliant piece of software!

I have a use case in which supercronic acts as the entry-point of a container. That leads to the necessity to restart the container if the job that is run by supercronic fails.

The text was updated successfully, but these errors were encountered:

krallin · 2020-04-19T12:55:44Z

Can you clarify your use case a little bit, and explain what you're using Supercronic for here? I'm a little confused by exactly what you're trying to achieve. Thanks!

Xtigyro · 2020-04-19T13:28:49Z

@krallin At Puppetlabs we're about to include supercronic in the official r10k Docker image. We need it so we can run r10k as a sidecar container in K8s using our Puppet Server Helm chart.

When the run of r10k fails - we need the container itself to declare in its run status that there's a problem. It would be handy if it marks its run as failed - so that K8s will restart it, thus making clear that there was a problem with the run of r10k.

Right now we work around this through (https://github.com/Xtigyro/puppetserver-helm-chart/blob/edge/templates/r10k-code.configmap.yaml#L25):

    exec /docker-entrypoint.sh deploy environment --config /etc/puppetlabs/puppet/r10k_code.yaml \
    --puppetfile {{ template "r10k.code.args" . }} > ~/.r10k_code_cronjob.out 2>&1
    retVal=$?
    if [ "$retVal" -eq "0" ]; then
      touch ~/.r10k_code_cronjob.success > /dev/null 2>&1
    else
      rm ~/.r10k_code_cronjob.success > /dev/null 2>&1
    fi
    exit $retVal

CC: @underscorgan @slconley

krallin · 2020-04-19T15:26:25Z

So, you're saying you want to run r10k on a periodic basis, and you want to the container to exit if it ever fails. Is that correct?

Xtigyro · 2020-04-19T18:20:36Z

Yes, correct.
Thanks for your time, BTW!

Xtigyro · 2020-04-26T14:00:11Z

@krallin Another option can be supercronic to make externally visible the status of the scheduled jobs - and I mean not only by logging info in stdout.

If you set readiness/liveness K8s probes for the container that supercronic is always running in - you can check the state of the jobs (just examples):

from a supercronic local log file that matches the name of the job.
by running supercronic status [name_of_your_job].
through an HTTP API of supercronic where the status of the scheduled jobs is available.

CC: @underscorgan @slconley @Iristyle

hartwork · 2020-11-14T14:38:01Z

The SIGTERM feature that is being asked for in the first post can already by emulated by wrapping the command in the crontab line by bash -c "original command here || kill -15 $PPID" where $PPID is the process ID of supercronic as provided by the shell. For a demo:

# ./supercronic -debug <(echo '* * * * * * * bash -c "false || kill -15 $PPID"')
INFO[2020-11-14T15:36:12+01:00] read crontab: /dev/fd/63                     
DEBU[2020-11-14T15:36:12+01:00] try parse(7): * * * * * * * bash -c "false || kill -15 $PPID"[0:13] = * * * * * * * 
DEBU[2020-11-14T15:36:12+01:00] job will run next at 2020-11-14 15:36:13 +0100 CET  job.command="bash -c \"false || kill -15 $PPID\"" job.position=0 job.schedule="* * * * * * *"
INFO[2020-11-14T15:36:13+01:00] starting                                      iteration=0 job.command="bash -c \"false || kill -15 $PPID\"" job.position=0 job.schedule="* * * * * * *"
INFO[2020-11-14T15:36:13+01:00] received terminated, shutting down           
INFO[2020-11-14T15:36:13+01:00] waiting for jobs to finish                   
INFO[2020-11-14T15:36:13+01:00] job succeeded                                 iteration=0 job.command="bash -c \"false || kill -15 $PPID\"" job.position=0 job.schedule="* * * * * * *"
DEBU[2020-11-14T15:36:13+01:00] job will run next at 2020-11-14 15:36:14 +0100 CET  job.command="bash -c \"false || kill -15 $PPID\"" job.position=0 job.schedule="* * * * * * *"
DEBU[2020-11-14T15:36:13+01:00] shutting down                                 job.command="bash -c \"false || kill -15 $PPID\"" job.position=0 job.schedule="* * * * * * *"
INFO[2020-11-14T15:36:13+01:00] exiting

Would that be too brittle?

On bird eye level, terminating supercronic on failures seems to be intended as a mean of signaling to the outer world. From a monitoring point of view, rather than waiting for someone to signal "I'm dead", checking for repeated "I'm alive" would probably make more robust monitoring. So if the context is monitoring, either the existing Prometheus HTTP endpoint or a new HTTP status endpoint would serve the bigger picture best, probably. What do you think?

hartwork · 2020-11-24T12:55:48Z

@Xtigyro any thoughts?

Xtigyro · 2020-11-24T15:00:07Z

@hartwork It's required so Kubernetes can monitor the Liveness/Readiness probes. A new HTTP status endpoint will be perfect!

hartwork · 2020-11-24T15:47:35Z

@Xtigyro what about the bash wrapper and the existing Prometheus endpoint?

Xtigyro · 2020-11-24T16:58:08Z

@hartwork The BASH wrapper is a bit dirty but can be used. The Prom endpoint cannot work with a vanilla K8s cluster.

krallin · 2020-11-27T17:32:08Z

@Xtigyro : can you clarify what you mean by:

The Prom endpoint cannot work with a vanilla K8s cluster.

I'm not sure I follow what cannot work there. FWIW, I do think exposing a status API would be reasonable to do, but I also would like to make sure that whatever does not work with the Prometheus endpoint would work with a status API :)

hartwork · 2020-11-27T17:56:42Z

The Prom endpoint cannot work with a vanilla K8s cluster.

I think he refers to Kubernetes liveness HTTP requests. Kubernetes looks at HTTP status codes and considers 200 <= x < 400 as success. It doesn't seem to support a check for absence of word supercronic_failed_executions in the HTTP response body. @Xtigyro could you confirm or correct me on this?

It might be possible to build a Kubernetes liveness command to do the same, based on nothing but Bash, wget and supercronic's existing Prometheus metrics endpoint. I imagine something like this to work well in practice:

bash -c '( set -o pipefail ; wget -O- http://localhost:9746/metrics 2>/dev/null | fgrep supercronic_ | { ! grep -q failed_executions ; } )'

What do you think?

Xtigyro · 2020-11-28T07:40:25Z

@hartwork @krallin Yup - spot on - that was exactly my point.

I'm gonna test that command and let you know - it looks alright to me, should be useful.

On a side note - in the Puppet Server Helm chart we've worked around it like this: https://github.com/puppetlabs/puppetserver-helm-chart/blob/master/templates/r10k-code.configmap.yaml#L20_L40

CC: @underscorgan @slconley @Iristyle @scottcressi @mwaggett @nwolfe @adrienthebo @dhollinger @binford2k @alannab

Xtigyro · 2020-11-28T16:37:25Z

http://localhost:9746/metrics

@hartwork A question - is this exposed by default?

~ $ wget -O- http://localhost:9746/metrics
Connecting to localhost:9746 (127.0.0.1:9746)
wget: can't connect to remote host (127.0.0.1): Connection refused

hartwork · 2020-11-28T16:44:07Z

@hartwork A question - is this exposed by default?

It needs something like supercronic -prometheus-listen-address localhost:9746 [..] but that's all. The :9746 can be omitted if your copy of supercronic is fresh enough to include PR #81.

Xtigyro · 2020-12-05T19:25:22Z

Guys, replacing one Shell command with another doesn't make enough of a difference for us.

One of these might improve the current situation:

from a supercronic local log file that matches the name of the job.
by running supercronic status [name_of_your_job].
through an HTTP API of supercronic where the status of the scheduled jobs is available.

If it's too time-consuming any of those to be developed - we will completely understand and we should just close the feature request.

CC: @underscorgan @slconley @Iristyle @scottcressi @mwaggett @nwolfe @adrienthebo @dhollinger @binford2k @alannab

hartwork · 2020-12-05T21:38:41Z

Guys, replacing one Shell command with another doesn't make enough of a difference for us.

@Xtigyro could you elaborate? You asked for a liveness check for vanilla kubernetes and it seems like the command above would satisfy that request. If it's not the actual set of requirements, please share the actual current set of requirements. Thank you.

Xtigyro · 2020-12-06T15:29:50Z

The current workaround we use is essentially a much simpler Shell command which is also used in a different way - so it's more resource efficient. We were looking for a real status health check compatible with K8s probes.

The current situation is good enough anyway - so if such per job status health check won't be beneficial in general for supercronic - then we better just leave it the way it is. It's still a great piece of software which we're gonna keep using.

All good, in other words. 🙇‍♂️

Xtigyro mentioned this issue Apr 12, 2020

Changes the K8s-cronjob to a sidecar puppetlabs/pupperware#206

Closed

neurosnap added the question label Jun 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optional SIGTERM at Job Failure #60

Optional SIGTERM at Job Failure #60

Xtigyro commented Apr 12, 2020 •

edited

Loading

krallin commented Apr 19, 2020

Xtigyro commented Apr 19, 2020 •

edited

Loading

krallin commented Apr 19, 2020

Xtigyro commented Apr 19, 2020

Xtigyro commented Apr 26, 2020 •

edited

Loading

hartwork commented Nov 14, 2020

hartwork commented Nov 24, 2020

Xtigyro commented Nov 24, 2020

hartwork commented Nov 24, 2020

Xtigyro commented Nov 24, 2020

krallin commented Nov 27, 2020

hartwork commented Nov 27, 2020 •

edited

Loading

Xtigyro commented Nov 28, 2020

Xtigyro commented Nov 28, 2020

hartwork commented Nov 28, 2020

Xtigyro commented Dec 5, 2020 •

edited

Loading

hartwork commented Dec 5, 2020

Xtigyro commented Dec 6, 2020 •

edited

Loading

Optional SIGTERM at Job Failure #60

Optional SIGTERM at Job Failure #60

Comments

Xtigyro commented Apr 12, 2020 • edited Loading

krallin commented Apr 19, 2020

Xtigyro commented Apr 19, 2020 • edited Loading

krallin commented Apr 19, 2020

Xtigyro commented Apr 19, 2020

Xtigyro commented Apr 26, 2020 • edited Loading

hartwork commented Nov 14, 2020

hartwork commented Nov 24, 2020

Xtigyro commented Nov 24, 2020

hartwork commented Nov 24, 2020

Xtigyro commented Nov 24, 2020

krallin commented Nov 27, 2020

hartwork commented Nov 27, 2020 • edited Loading

Xtigyro commented Nov 28, 2020

Xtigyro commented Nov 28, 2020

hartwork commented Nov 28, 2020

Xtigyro commented Dec 5, 2020 • edited Loading

hartwork commented Dec 5, 2020

Xtigyro commented Dec 6, 2020 • edited Loading

Xtigyro commented Apr 12, 2020 •

edited

Loading

Xtigyro commented Apr 19, 2020 •

edited

Loading

Xtigyro commented Apr 26, 2020 •

edited

Loading

hartwork commented Nov 27, 2020 •

edited

Loading

Xtigyro commented Dec 5, 2020 •

edited

Loading

Xtigyro commented Dec 6, 2020 •

edited

Loading