-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify poll handling of prematurely deleted job log dir #72
Simplify poll handling of prematurely deleted job log dir #72
Conversation
# A polling task lets us know that a task has failed because it's | ||
# log folder has been deleted whilst the task was active: | ||
if ( | ||
getattr(ctx, 'out', None) | ||
and JOB_FILES_REMOVED_MESSAGE in ctx.out | ||
): | ||
LOG.error( | ||
f'Task {ctx.cmd[-1]} failed because task log directory' | ||
f'\n{"/".join(ctx.cmd[-2:])}\nhas been removed.' | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have moved this error logging into the _poll_task_job_callback
method seeing as it only applies for poll, not kill or submit which also use this _manip_task_jobs_callback
method
self.task_events_mgr.process_message( | ||
itask, log_lvl, TASK_OUTPUT_FAILED, jp_ctx.time_run_exit, flag) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a duplication of what the next line does anyway
cylc/cylc/flow/task_events_mgr.py
Lines 788 to 802 in 5b2b180
elif message.startswith(FAIL_MESSAGE_PREFIX): | |
# Task received signal. | |
if ( | |
flag == self.FLAG_RECEIVED | |
and itask.state.is_gt(TASK_STATUS_FAILED) | |
): | |
# Already failed. | |
return True | |
signal = message[len(FAIL_MESSAGE_PREFIX):] | |
self._db_events_insert(itask, "signaled", signal) | |
self.workflow_db_mgr.put_update_task_jobs( | |
itask, {"run_signal": signal}) | |
if self._process_message_failed( | |
itask, event_time, self.JOB_FAILED, forced | |
): |
# Main loop is roughly 1 second, but integer rounding may give an apparent 2 | ||
# seconds delay, so set threshold as 2 seconds. | ||
run_ok "${TEST_NAME_BASE}-poll-time" \ | ||
cmp_times "${PREDICTED_POLL_TIME}" "${ACTUAL_POLL_TIME}" '10' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test was horribly flaky for me locally on 8.4.x. Also the timings test was so loose so as to be pointless (threshold was at some point set to 10s instead of 2s as mentioned in the comment). I don't think it's feasible to test the timings
749ff2f
into
wxtim:fix.handle_deletion_of_job_logs
No description provided.