-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Always terminate the container if pihole-FTL binary exits #1649
Conversation
…urally or via an error. Don't attempt to restart it, allow the container's restart policy to do this. Signed-off-by: Adam Warner <[email protected]>
c762e72
to
210a117
Compare
Are you trying to re-invent what |
Not hugely - S6 is heavier than it needs to be for what we're doing. As this stands, its what the container should have been doing from the start. If FTL is not running, then the container should also stop. |
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
How much overhead is there when starting the container? I think I've seen in some thread (sorry, don't have a precise reference right now) a mentioning that restarting the container is slow on low-end hardware because of something else (gravity?) happening before the container comes to the point where FTL is launched. Edit: Reference here: pi-hole/FTL#2077
Note this is a v5 deployment, v6 may already have improved on this. |
That depends entirely. This function is run on every container start before FTL is started: docker-pi-hole/src/bash_functions.sh Lines 103 to 125 in 89eb7f6
If it is a new container, and
In the v5 image, gravity runs on every container start although there is an environment variable there to disable that behaviour Note that this environment variable does nothing on V6, as we have removed the gravity on boot entirely (unless the database does not exist) |
I still think the hand-crafted solution this PR adds is more complicated and error-prone than using s6. The issue I see here is that we are starting FTL from within a bash script that is sourced by another bash script, running it in the background, moving the exit code around and kill one bash script in case of failure from within another. The logic is difficult to follow. s6 adds ~5,2 MB when added via a multi-stage build. We could easily add different "pre-start" scripts e.g. to migrate v5-> v6 (and maybe later v6 -> v7), could use s6's
|
…start_ftl process and pass it's exit code to stop. Add in additional while/wait to ensure that after the FTL log exists, it contains the "FTL Started" line Signed-off-by: Adam Warner <[email protected]>
5336f3d
to
9e37aa8
Compare
I'm still not a fan of adding s6 - removing it was one of the motivating factors for the v6 rewrite. I've made some adjustments based on your comments, and it should be much easier to follow now |
Co-authored-by: RD WebDesign <[email protected]> Signed-off-by: Adam Warner <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I shut it down with docker stop pihole
and the exit code reported is not the one from FTL, but from the container istself
2024-10-08 19:40:29.879 UTC [164M] INFO: Thread housekeeper (1) is idle, terminating it.
2024-10-08 19:40:29.879 UTC [164M] INFO: All threads joined
2024-10-08 19:40:29.880 UTC [164M] INFO: Stored 0 API sessions in the database
2024-10-08 19:40:30.630 UTC [164M] INFO: ########## FTL terminated after 9s 122ms (code 0)! ##########
[i] pihole-FTL exited with status 143
Is it expected that FTL crashes when signal 11 is sent?
Apparently
|
Yeah, I saw this, this is actually SIGTERM, and signals that the container gracefully stopped. What I am not sure about here, though, is why it's getting populated in the exit code of FTL Signal 11 is what @DL6ER told me to use in order to simulate a crash |
Log after a
|
Yes trap stop TERM INT QUIT HUP ERR
[snip]
stop() {
local FTL_EXIT_CODE=$1
# Only attempt to close pihole-FTL if it is running, it may already have crashed
if pgrep pihole-FTL >/dev/null; then
echo ""
echo " [i] Container stop requested..."
echo " [i] pihole-FTL is running - Attempting to shut it down cleanly"
echo ""
# Ensure pihole-FTL shuts down cleanly on SIGTERM/SIGINT
ftl_pid=$(pgrep pihole-FTL)
killall --signal 15 pihole-FTL
[snip] |
Maybe it is due to the return in the trap |
@yubiuser try this |
On a crash, this is the signal that is sent by the kernel to the process to allow for a (semi-)graceful shutdown. Sending it manually to the process is entirely artificial in this case but indeed a rather often used mechanism to simulate a crash. FTL will be unable to say where the crash happened in this case as seen in
(copied from the PR description text) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See individual comments.
# Wait until the FTL log contains the "FTL started" message before continuing | ||
while ! grep -q '########## FTL started' /var/log/pihole/FTL.log; do | ||
sleep 0.5 | ||
done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What when the container is (re)started more than once per day? In this case, you may have this message multiple times in the file. I don't think restarting the container on the same day is unlikely, especially when users are playing around with environment variables, etc.
Obviously my comment assumes you have persistent /var/log
but this seems like a sensible thing users may be doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking again at:
pihole | 2024-10-06 22:42:42.417 BST [56M] INFO: ########## FTL started on 0d9149f7f785! ##########
A working solution may be reading the PID from the PID file and expecting this in the log, like
# Wait until the FTL log contains the "FTL started" message before continuing | |
while ! grep -q '########## FTL started' /var/log/pihole/FTL.log; do | |
sleep 0.5 | |
done | |
# Wait until the FTL log contains the "FTL started" message before continuing | |
FTL_PID_FILE="$(getFTLConfigValue files.pid)" | |
FTL_PID="$(cat "${FTL_PID_FILE}")" | |
while ! grep -q "[${FTL_PID}M] INFO: ########## FTL started" /var/log/pihole/FTL.log; do | |
sleep 0.5 | |
done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried a few times to get this grep working (I had to put the FTL_PID=
part into a while loop until it was populated otherwise FTL_PID
was blank - however the grep didn't work, and I don't really know why!
@@ -75,33 +97,63 @@ start() { | |||
# Start tailing the FTL log from the most recent "FTL Started" message |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Start tailing the FTL log from the most recent "FTL Started" message
This existing comment (emphasis on "the most recent") seems to support my comment above concerning the non-uniqueness of this message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it depends on the machine (just from looking at logs posted by others here) but the FTL pid is almost always the same between container restarts/destructions - so it doesn't really make for much of a unique identifier
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although now in three consecutive runs on my machine, this does not ring true. 55, then 54, then 53
e640e08
to
931b216
Compare
echo "" | ||
echo " [i] Container stop requested..." | ||
echo " [i] pihole-FTL is running - Attempting to shut it down cleanly" | ||
echo "" | ||
# Ensure pihole-FTL shuts down cleanly on SIGTERM/SIGINT | ||
ftl_pid=$(pgrep pihole-FTL) | ||
|
||
killall --signal 15 pihole-FTL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if it's worth doing something like:
(sleep 3 && killall --signal 15 pihole-FTL) &
wait $CAPSH_PID
in case the $CAPSH_PID is already dead by the time we get to it after killing FTL? Or even
killall --signal 15 pihole-FTL &
wait $CAPSH_PID
Am I just being overly paranoid about slower systems here?
Last commit fixed the propagation and stop via
|
Huh, could have sworn it did when I tried it... I'll go back and try again! |
931b216
to
add6973
Compare
… code being passed to it, then FTL is still running - so we stop FTL and then get the exit code from the capsh PID by immediately `wait`ing it after killing FTL Signed-off-by: Adam Warner <[email protected]>
What does this PR aim to accomplish?:
Inspired by a conversation with @DL6ER - currently if pihole-FTL crashes, the container just sits there doing nothing - users are unable to do anything with it.
This PR addresses the issue by ensure that the container is always terminated in case of
pihole-FTL
being terminated. We also store the exit code of FTL, so that we can exit the container with the same code (useful in case ofon-failure[:max-retries]
restart policy)`docker stop pihole`
`docker exec -it pihole killall --signal 15 pihole-FTL`
`docker exec -it pihole killall --signal 11 pihole-FTL`
By submitting this pull request, I confirm the following:
git rebase
)