Skip to content

Commit

Permalink
builds.py: let mq lock be reclaimed after 30min (#337)
Browse files Browse the repository at this point in the history
A typical job takes about 5-10 min. If the lock has been held for 30 min
in a single job, we assume something has gone wrong and the lock should
be released so other jobs (or humans needing the machine) can proceed.

This step is only necessary when other timeouts have failed. It is to
guard against the case where the lock release after timeout has failed,
and the post-step lock release also has failed. This can happen when the
machine queue server is temporarily unreachable on the network, and then
comes back with the lock still in place. In this case, there is nothing
the scripts here can do to release that lock.

Signed-off-by: Gerwin Klein <[email protected]>
  • Loading branch information
lsf37 authored Mar 1, 2024
1 parent 750696d commit d136cb0
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions seL4-platforms/builds.py
Original file line number Diff line number Diff line change
Expand Up @@ -451,8 +451,8 @@ def mq_run(success_str: str,


def mq_lock(machine: str) -> List[str]:
"""Get lock for a machine."""
return ['time', 'mq.sh', 'sem', '-wait', machine, '-k', job_key()]
"""Get lock for a machine. Allow lock to be reclaimed after 30min."""
return ['time', 'mq.sh', 'sem', '-wait', machine, '-k', job_key(), '-T', '1800']


def mq_release(machine: str) -> List[str]:
Expand Down

0 comments on commit d136cb0

Please sign in to comment.