forked from flux-framework/flux-core
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
sdexec: add a mechanism to time out units
Problem: an sdexec imp-shell unit can run into the following problem: - flux-shell is killed/terminates - there are unkillable children of flux-shell - the IMP won't exit until the cgroup is empty - the job appears to be running until the IMP exits with the shell exit code This adds a configurable stop timer to sdexec. It is configured via subprocess command options and is disabled by default: SDEXEC_STOP_TIMER_SEC Specify the timeout value in seconds. If non-negative, this enables the stop timer. SDEXEC_STOP_TIMER_SIGNAL Specify a signal to send to the unit after the timeout. By default, SIGKILL is used. The behavior of the stop timer is follows: - The timer is activated when the unit enters "deactivating" state. - After STOP_TIMER_SEC seconds, STOP_TIMER_SIGNAL is sent to the unit. - After another STOP_TIMER_SEC seconds, the unit is abandonded and subprocess exec RPC is terminated with an error. This can be used with Type=notify and changes to the IMP notify STOPPING=1 between shell exit and cgroup polling. STOPPING=1 causes the unit to transition to deactivating state which triggers the timer.
- Loading branch information
Showing
1 changed file
with
120 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters