-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Race condition with rstrnt-reboot giving control back to the runtest script #219
Comments
A If |
@veruu @wackrat I've written code to include a trap for SIGTERM but I agree with wackrat that it's not a full-proof solution. |
@cbouchar we added that as a workaround to all our tests when we ran into this problem. However, other people may run into the same issue who don't have it in place for all of the tests they are running so it may still be a good idea to have it fixed on restraint side, or at least document this need very well and make it obvious |
As long as callers of It is possible to assign a command with |
@wackrat |
We hit a race condition with rstrnt-reboot where the restraint process got terminated by systemd and it returned execution back to the test script. This happened when running restraint in standalone mode.
Specific case where we hit this problem: We have a script that installs a kernel, reboots the machine and checks the correct new kernel booted.
You can imagine the short version of the script as (pseudocode)
(Full version is available at https://gitlab.com/cki-project/kernel-tests/-/tree/main/distribution/kpkginstall)
After rstrnt-reboot was called, our logs contained an unexpected line
The generated index file by restraint had a new "exit_code" FAIL result.
In the console log, we see messages from systemd sending SIGTERM to running tasks:
The timing of all these logs/results matches up. The rstrnt-reboot command terminated and returned the control to the test script, which then exited with the nonzero retcode of rstrnt-reboot (since that's the last command it ran). The script exiting with the retcode of the last executed command is expected (as there is no result reporting in that execution branch), rstrnt-reboot terminating and returning the control before the actual reboot happens is not. This is confirmed by looking into the code for rstrnt-reboot which explicitly states this should not happen: https://github.com/beaker-project/restraint/blob/master/scripts/rstrnt-reboot#L13
The text was updated successfully, but these errors were encountered: