-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
job status transition in ReqClient vs JobStatus #7799
Comments
Just few days ago we created #7794 but we discussed that it would not make sense to set jobs from "COMPLETED" to "KILLED". The code above was introduced in #5650 and I do not find its operational source. I can see in LHCb that we have atm 2 jobs "DONE" with minorStatus "Marked for termination". I need some arguments on how to continue. |
We have jobs in Status='Completed' and MinorStatus='Marked for termination' Looking into the job LoggingInfo
Supposedly, the user killed the job while it was in RMS, with Status='Completed'. |
I think In this case the final job state should be Done or Failed as in the checks just above. So, the lines https://github.com/DIRACGrid/DIRAC/blob/v8.0.51/src/DIRAC/RequestManagementSystem/Client/ReqClient.py#L371-L373 can be suppressed. If we have this case, it means that the job kill command was given too late to influence the job execution, the job was already executed, so the kill command should be just ignored. This can be coded in the JobManager.__killJob(): set the status to Killed and only if it is successful - set the minor status to "Marked for termination" as a separate call |
That may be right from the system point of view. But requests may take time to be processed,
I thought "Marked for termination" is to be set to MinorStatus when the Status cannot be immediately set to "Killed", so that the status can go to "Killed" when there is no more bloker. |
This code suggests transition of job status from 'Completed' to 'Killed'
https://github.com/DIRACGrid/DIRAC/blob/v8.0.51/src/DIRAC/RequestManagementSystem/Client/ReqClient.py#L371-L373
but there is no such state transition defined in
https://github.com/DIRACGrid/DIRAC/blob/v8.0.51/src/DIRAC/WorkloadManagementSystem/Client/JobStatus.py#L88
Which is the policy?
The text was updated successfully, but these errors were encountered: