-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Log job failure even when there are retries configured #6169
base: 8.4.x
Are you sure you want to change the base?
Conversation
73714c8
to
8f20ab0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, I don't think it's this simple.
- I think this diff means that polled log messages for task failure will go back to being duplicated.
- This only covers failure, but submission failure also has retries so may require similar treatment.
- I think the failures before retries are exhausted will now get logged at CRITICAL level rather than INFO.
I think that you have a particular closed issue in mind, but I can't find it... Can you point it out to me?
I think that submission failure is already handled correctly - it certainly is in the simplistic case where you feed it
These are logged at critical - and I think they should be?
This would be consistent with submit failure... |
2c7e480
to
3cedf2f
Compare
No, I'm not thinking of the other log message duplication issue. The change made here bypassed logic that was used for suppressing duplicate log messages (the 8f20ab0#diff-d6de42ef75ecc801c26a6be3a9dc4885b64db89d32bce6f07e319595257b9b2eL930 However, in your more recent "fix" commit, you have put this back the way it was before: 3cedf2f#diff-d6de42ef75ecc801c26a6be3a9dc4885b64db89d32bce6f07e319595257b9b2eR930 |
This does not apply to submit failure, because submit failure will always log a critical warning through the
|
3cedf2f
to
1341355
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Test failures)
1341355
to
ce0498e
Compare
@wxtim this seems to have diverged a bit from what I thought was agreed above, which was:
Now, if there are retries we only get the retry warning. (Which I think is back to the problem we were trying to fix here, although the logging location has changed to the methods that would support the fix). |
fc12804
to
5453cac
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The execution retry log message is now:
WARNING - [1/foo:waiting] failed/ERR - retrying in ...
Whereas the submission retry log message is still just:
WARNING - [1/foo:waiting] retrying in ...
I think it would make sense to update this too.
Here's a screenshot comparing two execution failures (one retry) with two submission failures (one retry): Contrary to @MetRonnie 's suggestion (I think) you just need to get rid of the |
This comment was marked as resolved.
This comment was marked as resolved.
Not really necessary because both entail a full "retry" (i.e., submit the job, then execute it). |
Still waiting on my suggestion above, I think. In case it got a bit lost in the log lines: # submission retry (good)
ERROR - [1/foo/01:preparing] submission failed
INFO - [1/foo/01:preparing] => waiting
WARNING - [1/foo:waiting] retrying in PT5S (after 2025-01-09T11:08:20+13:00) # execution retry (needs tweak)
ERROR - [1/foo/01:running] failed/ERR
INFO - [1/foo/01:running] => waiting
WARNING - [1/foo:waiting] failed/ERR - retrying in PT5S (after 2025-01-09T11:05:49+13:00) (Pretty minor, but there's no need to double-log the |
Done, but not pushed, because I was was also looking at Ronnie's Comments about the tests. |
This PR has stalled, what needs to be done? |
I thought I'd responded to Hilary's comments. Will double check that I have and poke him. Also rebase. |
81a3c87
to
4cb02e4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good now 🎉
This can go to 8.4.1 IMO - a quick review at this point, in spite of the rather long history! |
Co-authored-by: Hilary James Oliver <[email protected]> response to review
4cb02e4
to
08a7265
Compare
Closes #6151
Check List
CONTRIBUTING.md
and added my name as a Code Contributor.setup.cfg
(andconda-environment.yml
if present).CHANGES.md
entry included if this is a change that can affect users?.?.x
branch.