Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Processess spawned with 'cmd' not getting terminated when your Ctrl-C #169

Open
dbp opened this issue Sep 13, 2014 · 34 comments
Open

Processess spawned with 'cmd' not getting terminated when your Ctrl-C #169

dbp opened this issue Sep 13, 2014 · 34 comments

Comments

@dbp
Copy link

dbp commented Sep 13, 2014

So I have a phony task that starts a process. I expect that if I Ctrl-C, that the signal will be sent to the process, and then shake will continue / terminate (and this is how make works) . But, instead shake terminates (with 'user interrupt'), and the process is still running in the background.

I looked at the various options to run commands with, but didn't see anything obvious. Is the current behavior expected? It seems wrong...

@dbp
Copy link
Author

dbp commented Sep 13, 2014

Just as a followup, I've replaced calls to cmd Shell with system from System.Process and that does have the right semantics.

@ndmitchell
Copy link
Owner

I haven't thought about the issue much, so thanks for your insight. However, when running Shake in staunch mode (which is where the approaches differ most) I would expect ctrl-c to exit Shake, and Shake to clean up the processes (if possible) not just abort the one process. That seems not to match make? I need to do some experimentation.

@dbp
Copy link
Author

dbp commented Sep 14, 2014

I think it really depends on what the processes are intended to do. I've been using Shake to run tasks and start up (sometimes interactive) processes (something I've used make for in the past). In that context, when there is an interactive process (which is not really something that's currently defined), I think I would want input to go to it. In particular, if there was a sequence of commands, it's possible that I would want to do some cleanup in case a command failed, and I think aborting it manually counts as a failure case.

But when the processes aren't interactive, I think what you are describing makes a lot more sense. Because one really common annoyance with, for example, shell scripts that run various processes, is when you have to Ctrl-C repeatedly until you can get the signal to get delivered when the script is what's running, so it actually exits (or just look up it's PID and deliver the signal with kill).

And of course you could expect many processes to be running at once in the non-interactive case, at which point there is no sensible answer to who should recieve a signal (unless it's all of them).

So perhaps there should be some distinction between interactive commands and non? Or, maybe the documentation should just point out that using cmd for interactive stuff isn't really recommended, and just point to System.Process (which I haven't run into a problem with yet).

@ndmitchell
Copy link
Owner

You can certainly always call System.Process, but I'd like it to be possible with cmd too. I agree we probably want interactive vs non-interactive. I'll use this ticket to add Interactive as a setting, which gives the command stdin and responds to Ctrl-C as you describe. I suspect the code will be easy to write, I'm just not sure how to go about automatically testing it since it's all about what happens when you hit the keyboard.

@dbp
Copy link
Author

dbp commented Sep 15, 2014

Well, you can test at the very least that cmd Interactive still runs. But yeah, seems hard to test that the special behavior is working. At least, though, it will be very obvious to users of Shake if it stops working (so it shouldn't lead to subtle bugs - famous last words).

@ndmitchell
Copy link
Owner

One of the reasons I'm really keen on having automated tests is that I develop on Windows, and my continuous integration tests run on Linux, so if I have tests I can be sure it works consistently. It's not fatal if I can't do that, but it is annoying.

Testing it out, I found that stdin in both is inherited from the process, so it seems cmd already works just fine with stdin, and any changes just concern Ctrl-C.

For Ctrl-C handling the docs (and my experiments) suggest that even using System.Process you still get a user exception, and the thread still dies. If you want to clean up after a process then using actionFinally and actionOnException might be sufficient. Can you give me a small example where process works but cmd fails? Are you catching exceptions yourself?

@ndmitchell
Copy link
Owner

See also snowleopard/hadrian#696 (comment) which describes issues around this problem

@AndreasPK
Copy link

So I tried this using the following code:

import Development.Shake
import Development.Shake.Command
import Development.Shake.FilePath
import Development.Shake.Util
import Control.Concurrent
import System.Process

main :: IO ()
main = shakeArgs shakeOptions{shakeFiles="_build"} $ do
    want ["_build/run" <.> exe]



    "_build/run" <.> exe %> \out -> do
        -- cmd_ "Sleep.exe" "" :: Action ()
        liftIO $ system "Sleep.exe"
        liftIO $ threadDelay $ 1000*1000*100

And at least for me under mintty the behaviour of cmd, system and process was the same.

@ndmitchell
Copy link
Owner

@AndreasPK and what was that behaviour? Did it work and clean up properly?

@ndmitchell
Copy link
Owner

FWIW I've pushed a lot of patches to this part of the code since the issue was first raised 5 years ago.

@AndreasPK
Copy link

@AndreasPK and what was that behaviour? Did it work and clean up properly?

No, it fails to terminate the subprocess and stays active in the background (when run in in the msys shell).

@hasufell
Copy link

hasufell commented Dec 6, 2019

This is still a problem. We have a jenkins that runs shake, which runs docker. If we cancel a job, docker will keep running, because SIGTERM is not propagated.

@ndmitchell
Copy link
Owner

@hasufell - is this pure Linux, or involving msys/mingw or similar?

@hasufell
Copy link

hasufell commented Dec 8, 2019

@ndmitchell linux

@ndmitchell
Copy link
Owner

I note that Docker seems to behave funny in #731 too, CC @symbiont-arnaud-bailly. I've no real idea what is happening here (and completely out of my depth). Maybe the way Docker and System.Process interact is different to normal?

@hasufell
Copy link

hasufell commented Dec 14, 2019

I don't think this has anything to do with docker. Sending SIGTERM to our docker containers manually works fine. We are not using actionFinally. If you look at the behavior of make, then it sends a SIGTERM to the subprocesses it has spawned. Shake doesn't.

#!/bin/sh

trap 'echo kill && kill -KILL $child 2>/dev/null' TERM
sleep 500000 &
child=$!
wait $child
all:
	./run.sh
#!/usr/bin/env -S cabal new-run
{- cabal:
build-depends: base
            , shake
-}

module Main (main) where

import Development.Shake
import Development.Shake.Classes
import Development.Shake.Command
import Development.Shake.FilePath
import Development.Shake.Rule
import Prelude hiding ((*>))



main :: IO ()
main = do
  shakeArgs shakeOptions { shakeFiles = "_build/"
                         , shakeVerbosity = Loud
                         , shakeThreads = 0
                         , shakeVersion = "4"} $ do

    phony "foo" $ do
      cmd_ "./run.sh"
      pure ()

sending SIGTERM to the shake process will not propagate to the script.

@ndmitchell
Copy link
Owner

The actionFinally ticket is probably not anything to do with actionFinally, but it has Docker as the example command that fails to terminate, making it likely to be the same as your example.

In Shake, when an exception happens, I call interruptProcessGroupOf which is documented as sending SIGINT. Then three seconds after that returns I call terminateProcess which sends SIGKILL. Is that sufficient to kill Docker? Does the SIGINT mess things up? Does the function abort in Shake get called?

(I'm on Windows not Linux, so happy to take any advice on the right thing to do here.)

@hasufell
Copy link

hasufell commented Dec 15, 2019

Then three seconds after that returns I call terminateProcess which sends SIGKILL.

Afais it sends SIGTERM.

Is that sufficient to kill Docker?

Yes, as described in the minimal example, this has nothing to do with docker.

Does the SIGINT mess things up?

SIGINT propagates properly when I press ctrl-c and abort gets called.

Does the function abort in Shake get called?

When you send SIGTERM to the shake process: no. But this is the expected behavior, which is convention in make too.

@hasufell
Copy link

hasufell commented Dec 15, 2019

Afais, the problem is

withCreateProcessCompat cp $ \inh outh errh pid ->
withExceptions (abort pid) $ withTimeout poTimeout (abort pid) $ do

withExceptions :: IO () -> IO a -> IO a
withExceptions stop go = do
bar <- newBarrier
v <- mask $ \unmask -> do
forkFinally (unmask go) $ signalBarrier bar
unmask (waitBarrier bar) `onException` do
forkIO stop
waitBarrier bar
either throwIO return v

This is not signal handling code at all. This is just catching asynchronous exceptions. SIGINT seems to be one, but SIGTERM is not.

Also see:

@ndmitchell
Copy link
Owner

Should I be installing a signal handler that turns SIGTERM into an exception so everything terminates properly? Should all Haskell processes be doing that?

@hasufell
Copy link

To me it seems async exceptions are messy. Taken from the gnu manual

Macro: int SIGTERM
The SIGTERM signal is a generic signal used to cause program termination. Unlike SIGKILL, this signal can be blocked, handled, and ignored. It is the normal way to politely ask a program to terminate.

The shell command kill generates SIGTERM by default.

So, in my opinion, a program that receives SIGTERM and decides it is OK to terminate, should also clean up after itself. That is... send SIGTERM to sub-processes, wait for a grace period, then send SIGKILL.

I don't think the question can be answered for every process, though.

@ndmitchell
Copy link
Owner

It seems that if your advice is right (and I have no reason to suspect its not!), then it should be the default of every Haskell program to turn SIGTERM into an async exception and let that take out the program. Why isn't that done for everyone by default? I don't think anything makes Shake special in this regard. Just trying to understand the context before doing something different for Shake.

@hasufell
Copy link

hasufell commented Dec 15, 2019

I think the problem here is that as the gnu manual says: SIGTERM may be ignored!

If we turn it into an async exception and people don't catch it... oops. I think this isn't meant to be an exception. So in my opinion it's really up to the program implementor on whether and how to define it. For everything else, there is SIGKILL.

@symbiont-arnaud-bailly
Copy link

FWIW, I have written some tool last year that spawned processes and managed them, and I baked in signal handling (SIGINT and SIGTERM essentially) because that's the expected behaviour in Unix-land: When a user hits Ctrl-C it expects the foreground process to terminate and cleanup properly.

@hasufell
Copy link

hasufell commented Dec 16, 2019

When a user hits Ctrl-C it expects the foreground process to terminate and cleanup properly.

That's true. When you run the following C program without explicit signal handling and hit Ctrl-C, then it terminates the child processes as well:

#include <unistd.h>

int main() {
  int mypid = fork();

  if (0 == mypid) {
    sleep(500000);
  } else {
    while (1) {
      sleep(2);
    }
  }

  return (0);
}

However, if you send SIGTERM to the parent, the child will keep running. This is why I think the default behavior for haskell programs should be left unchanged, but SIGTERM be implemented specifically for shake to match make behavior.

@symbiont-arnaud-bailly
Copy link

symbiont-arnaud-bailly commented Dec 16, 2019

When you run the following C program without explicit signal handling and hit Ctrl-C, then it terminates the child processes as well:

That's true but it's because the shell itself handles SIGINT properly, propagating the signal to the foreground process group. If you properly detach the subprocess, creating its own subgroup, then it won't receive SIGINT.

If you manually send SIGTERM using a process's PID, then the signal is not by default propagated to children which explains why the child process does not terminate in this case.

IMHO, shake should handle SIGINT and SIGTERM in such a way that all its subprocesses receive the signal, using standard POSIX signal handling which, obviously, only make sense on an Unix system. And it should probably use the Windows specific mechanism to provide this behaviour on Win32 platform.

@hasufell
Copy link

That's true but it's because the shell itself handles SIGINT properly

Nice catch. Now I'm starting to wonder what happens if that shell behavior might not in fact be problematic if the process has its own implementation of reacting to SIGINT?

@symbiont-arnaud-bailly
Copy link

If a subprocess of a shell handles SIGINT, I guess it can do whatever it please, including refusing to quit :) That's why there is SIGKILL and SIGSTOP which cannot have handlers attached.

@hasufell
Copy link

I meant that the shell will apparently send SIGINT to the sub processes without letting the parent process decide what to do?

@symbiont-arnaud-bailly
Copy link

If they are part of the same process group, which is the case by default, yes. But then the parent could handle SIGCHLD :)

@symbiont-arnaud-bailly
Copy link

The semantics of signals is fascinating, and the number of potential pitfalls staggering!

@hasufell
Copy link

@ndmitchell do you have a clearer picture about the problem now? Do you need further information?

@ndmitchell
Copy link
Owner

Thanks for all your discussion on this topic, but my current state is I'm totally confused. There was a long discussion about signals at #748 and StackOverflow info at https://stackoverflow.com/questions/61856063/spawning-a-process-with-create-group-true-set-pgid-hangs-when-starting-docke/61881005#61881005. It would be great to get a 5-year-old style answer to:

  • What behaviour should change?
  • What should Shake do to make that change?
  • Are there any downsides to that change?
  • Why should Shake do this and not all Haskell programs?

@mpickering
Copy link
Contributor

I don't think this has anything to do with docker. Sending SIGTERM to our docker containers manually works fine. We are not using actionFinally. If you look at the behavior of make, then it sends a SIGTERM to the subprocesses it has spawned. Shake doesn't.

#!/bin/sh

trap 'echo kill && kill -KILL $child 2>/dev/null' TERM
sleep 500000 &
child=$!
wait $child
all:
	./run.sh
#!/usr/bin/env -S cabal new-run
{- cabal:
build-depends: base
            , shake
-}

module Main (main) where

import Development.Shake
import Development.Shake.Classes
import Development.Shake.Command
import Development.Shake.FilePath
import Development.Shake.Rule
import Prelude hiding ((*>))



main :: IO ()
main = do
  shakeArgs shakeOptions { shakeFiles = "_build/"
                         , shakeVerbosity = Loud
                         , shakeThreads = 0
                         , shakeVersion = "4"} $ do

    phony "foo" $ do
      cmd_ "./run.sh"
      pure ()

sending SIGTERM to the shake process will not propagate to the script.

  1. This reply is about SIGTERM, not SIGINT. Ctrl-C sends SIGINT to the process and so SIGTERM is irrelevant. For SIGTERM handling you need to install a custom handler as there's no special handling in the RTS as there is for SIGINT.

  2. If you modify your program to catch SIGINT then shake cleans up correctly.

So I'm not sure there is even a problem with SIGINT.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants