Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RSDK-9591 - Add KillGroup to ManagedProcess #399

Merged
merged 11 commits into from
Jan 9, 2025
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 30 additions & 0 deletions pexec/managed_process.go
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,10 @@ type ManagedProcess interface {
// there's any system level issue stopping the process.
Stop() error

// Kill will attempt to kill the process group and not wait for completion. Only use this if
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the story behind killing the process group? Is the viam-server part of the same process group? If we're intentionally trying to kill the viam-server + all modules running in one swift process group kill -9, what's the utility in adding this functionality to the goutils/managed process code?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the viam-server is not part of the same process group, each ManagedProcess is its own process group, I'm guessing the intention is to be able to Kill or Stop all spawned processes from one ManagedProcess at once

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the viam-server is not part of the same process group, each ManagedProcess is its own process group

I thought* that was a possibility. But I didn't see the code that did that. Is that in our code or the SDK side?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

attrs := &syscall.SysProcAttr{Setpgid: true}
sets the process group id, and since we didn't give the process group id a value, it will default to the child process' id https://pkg.go.dev/syscall#SysProcAttr.Setpgid

cheukt marked this conversation as resolved.
Show resolved Hide resolved
// comfortable with leaking resources (in cases where exiting the program as quickly as possible is desired).
Kill()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we call this KillGroup?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated


// Status return nil when the process is both alive and owned.
// If err is non-nil, process may be a) alive but not owned or b) dead.
Status() error
Expand Down Expand Up @@ -432,3 +436,29 @@ func (p *managedProcess) Stop() error {
}
return errors.Errorf("non-successful exit code: %d", p.cmd.ProcessState.ExitCode())
}

func (p *managedProcess) Kill() {
// Minimally hold a lock here so that we can signal the
// management goroutine to stop. We will attempt to kill the
// process even if p.stopped is true.
p.mu.Lock()
if !p.stopped {
close(p.killCh)
p.stopped = true
}

if p.cmd == nil {
p.mu.Unlock()
return
}
p.mu.Unlock()

// Since p.cmd is mutex guarded and we just signaled the manage
// goroutine to stop, no new Start can happen and therefore
// p.cmd can no longer be modified rendering it safe to read
// without a lock held.
// We are intentionally not checking the error here, we are already
// in a bad state.
//nolint:errcheck,gosec
p.forceKill()
}
2 changes: 2 additions & 0 deletions pexec/managed_process_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -702,3 +702,5 @@ func (fp *fakeProcess) UnixPid() (int, error) {
in reality tests should just depend on the methods they rely on. UnixPid is not one
of those methods (for better or worse)`)
}

func (fp *fakeProcess) Kill() {}
9 changes: 9 additions & 0 deletions pexec/managed_process_unix.go
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ package pexec

import (
"os"
"os/exec"
"os/user"
"strconv"
"syscall"
Expand Down Expand Up @@ -126,6 +127,14 @@ func (p *managedProcess) kill() (bool, error) {
return forceKilled, nil
}

// forceKill kills everything in the process group. This will not wait for completion and may result in a zombie process.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a zombie process

I have heard this term used in a couple different ways. What do you mean by "zombie process"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the process will have finished but not waited on by the parent process - it is still in the process table. I'm explicitly not waiting on the child process here so the Kills will be zombies until the viam-server terminates and the system reaps these processes. https://en.wikipedia.org/wiki/Zombie_process.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// forceKill kills everything in the process group. This will not wait for completion and may result in a zombie process.
// forceKill kills everything in the process group. This will not wait for completion and may result in the kill becoming a zombie process.

func (p *managedProcess) forceKill() error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we call this forceKillGroup?

pidStr := strconv.Itoa(-p.cmd.Process.Pid)
p.logger.Infof("killing entire process group %d", p.cmd.Process.Pid)
benjirewis marked this conversation as resolved.
Show resolved Hide resolved
//nolint:gosec
return exec.Command("kill", "-9", pidStr).Start()
}

func isWaitErrUnknown(err string, forceKilled bool) bool {
// This can easily happen if the process does not handle interrupts gracefully
// and it won't provide us any exit code info.
Expand Down
8 changes: 7 additions & 1 deletion pexec/managed_process_windows.go
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,6 @@ func parseSignal(sigStr, name string) (syscall.Signal, error) {
return 0, errors.New("signals not supported on Windows")
}


func (p *managedProcess) sysProcAttr() (*syscall.SysProcAttr, error) {
ret := &syscall.SysProcAttr{
CreationFlags: syscall.CREATE_NEW_PROCESS_GROUP,
Expand Down Expand Up @@ -107,6 +106,13 @@ func (p *managedProcess) kill() (bool, error) {
return forceKilled, nil
}

// forceKill kills everything in the process tree. This will not wait for completion and may result in a zombie process.
cheukt marked this conversation as resolved.
Show resolved Hide resolved
func (p *managedProcess) forceKill() error {
pidStr := strconv.Itoa(p.cmd.Process.Pid)
p.logger.Infof("force killing entire process tree %d", p.cmd.Process.Pid)
return exec.Command("taskkill", "/t", "/f", "/pid", pidStr).Start()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you test this on windows?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not yet, I just copied this from kill() a bit above

}

func isWaitErrUnknown(err string, forceKilled bool) bool {
if !forceKilled {
return false
Expand Down
Loading