-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Have BKILL signal SIGKILL after SIGTERM #7482
Conversation
I wanted to use
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #7482 +/- ##
==========================================
- Coverage 84.51% 84.45% -0.07%
==========================================
Files 387 387
Lines 23081 23087 +6
Branches 877 886 +9
==========================================
- Hits 19507 19498 -9
- Misses 3463 3482 +19
+ Partials 111 107 -4 ☔ View full report in Codecov by Sentry. |
182a004
to
fae11b0
Compare
|
fae11b0
to
1ce8dda
Compare
Have you actually tested that it works? |
Use |
28b7b3b
to
d6140b4
Compare
I am not sure if the C part works. It seems to be logging duplicate. I wonder what happens when I pass NULL to the stdout-arg/stderr-arg instead of nullptr or a file. Is it then redirected to stdout? (Notice how it in the middle stops for a bit, then starts printing the same JobId again. Is this from the separate process?)
EDIT:
|
d6140b4
to
a0ba7c8
Compare
They actually had a really good description! |
77f6356
to
8e3f6b9
Compare
job->lsf_jobnr_char); | ||
|
||
spawn_blocking(driver->rsh_cmd, 2, (const char **)argv, NULL, NULL); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get why this spawn_blocking
is removed, what use is there of argv
when you just free
it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, this was removed by accident
free(argv[1]); | ||
free(argv); | ||
|
||
char **argv2 = (char **)calloc(2, sizeof *argv2); | ||
argv2[0] = driver->remote_lsf_server; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not checked what CHECK_ALLOC is doing, but should it be included here as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CHECK_ALLOC
was something that was introduced because that's what util_malloc
did. There is no reason for it to exist.
8e3f6b9
to
5b063dd
Compare
This commits makes the LSF driver spawn a non-blocking separate process running `bkill -s SIGKILL` to make sure the job is actually torn down.
5b063dd
to
7220a76
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good now! 🚀
Issue
Resolves #7405
Approach
This PR makes the LSF drivers spawn a new process to run
bkill -s SIGKILL
in a fire-and-forget manner.(Screenshot of new behavior in GUI if applicable)
When applicable