Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RSDK-9591 - Add KillGroup to ManagedProcess #399

Merged
merged 11 commits into from
Jan 9, 2025

Conversation

cheukt
Copy link
Member

@cheukt cheukt commented Dec 27, 2024

This is part one of two PRs that will hopefully help with shutting down all module processes before viam-server exits. Part 2 is here

This is still a draft as I'm looking for thoughts and ideas around making this better.

The idea behind this is for a Kill() call to propagate from the viam-server, and we should not block on anything if possible. The kill()s here do not wait for the execed process to complete, and this is intentional - I couldn't find if syscall.Kill waits for completion, so did not use. I wasn't able to find documentation about whether kill -9 blocks on the signal getting received or if it immediately returns, so opted for .Start() instead of .Run(), which means we do not Wait() for the process to return. This does mean that the assumption is that the viam-server ends immediately after and that the system will properly await the processes that we spawned to kill the managed process, otherwise those processes will be zombie'd. I can see changing it so that we do a .Run() instead, but might mean we would have to do the module process killing in parallel in viam-server.

There is a todo to add Kill to the processManager, but a quick glance at how Close works for processManager makes me think that it should be a followup task

@@ -32,6 +32,10 @@ type ManagedProcess interface {
// there's any system level issue stopping the process.
Stop() error

// Kill will attempt to kill the process group and not wait for completion. Only use this if
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the story behind killing the process group? Is the viam-server part of the same process group? If we're intentionally trying to kill the viam-server + all modules running in one swift process group kill -9, what's the utility in adding this functionality to the goutils/managed process code?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the viam-server is not part of the same process group, each ManagedProcess is its own process group, I'm guessing the intention is to be able to Kill or Stop all spawned processes from one ManagedProcess at once

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the viam-server is not part of the same process group, each ManagedProcess is its own process group

I thought* that was a possibility. But I didn't see the code that did that. Is that in our code or the SDK side?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

attrs := &syscall.SysProcAttr{Setpgid: true}
sets the process group id, and since we didn't give the process group id a value, it will default to the child process' id https://pkg.go.dev/syscall#SysProcAttr.Setpgid

Copy link
Member

@benjirewis benjirewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Just some initial comments.

@@ -126,6 +127,14 @@ func (p *managedProcess) kill() (bool, error) {
return forceKilled, nil
}

// forceKill kills everything in the process group. This will not wait for completion and may result in a zombie process.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a zombie process

I have heard this term used in a couple different ways. What do you mean by "zombie process"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the process will have finished but not waited on by the parent process - it is still in the process table. I'm explicitly not waiting on the child process here so the Kills will be zombies until the viam-server terminates and the system reaps these processes. https://en.wikipedia.org/wiki/Zombie_process.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// forceKill kills everything in the process group. This will not wait for completion and may result in a zombie process.
// forceKill kills everything in the process group. This will not wait for completion and may result in the kill becoming a zombie process.

pexec/managed_process_unix.go Show resolved Hide resolved
func (p *managedProcess) forceKill() error {
pidStr := strconv.Itoa(p.cmd.Process.Pid)
p.logger.Infof("force killing entire process tree %d", p.cmd.Process.Pid)
return exec.Command("taskkill", "/t", "/f", "/pid", pidStr).Start()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you test this on windows?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not yet, I just copied this from kill() a bit above

Copy link
Member

@benjirewis benjirewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a test that a managed process that starts another process in the same process can both be killed by Kill (or KillGroup if you take my renaming suggestion)?

@@ -32,6 +32,10 @@ type ManagedProcess interface {
// there's any system level issue stopping the process.
Stop() error

// Kill will attempt to kill the process group and not wait for completion. Only use this if
// comfortable with leaking resources (in cases where exiting the program as quickly as possible is desired).
Kill()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we call this KillGroup?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@@ -126,6 +127,14 @@ func (p *managedProcess) kill() (bool, error) {
return forceKilled, nil
}

// forceKill kills everything in the process group. This will not wait for completion and may result in a zombie process.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// forceKill kills everything in the process group. This will not wait for completion and may result in a zombie process.
// forceKill kills everything in the process group. This will not wait for completion and may result in the kill becoming a zombie process.

@@ -126,6 +127,14 @@ func (p *managedProcess) kill() (bool, error) {
return forceKilled, nil
}

// forceKill kills everything in the process group. This will not wait for completion and may result in a zombie process.
func (p *managedProcess) forceKill() error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we call this forceKillGroup?

@cheukt cheukt changed the title RSDK-9591 - Add Kill to ManagedProcess RSDK-9591 - Add KillGroup to ManagedProcess Jan 9, 2025
@viambot viambot added safe to test Mark as safe to test and removed safe to test Mark as safe to test labels Jan 9, 2025
@cheukt cheukt marked this pull request as ready for review January 9, 2025 15:00
@cheukt
Copy link
Member Author

cheukt commented Jan 9, 2025

added one test as well

@cheukt cheukt requested review from dgottlieb and benjirewis January 9, 2025 15:00
pexec/managed_process.go Outdated Show resolved Hide resolved
pexec/managed_process.go Show resolved Hide resolved
pexec/managed_process_unix.go Outdated Show resolved Hide resolved
pexec/managed_process_windows.go Outdated Show resolved Hide resolved
@cheukt
Copy link
Member Author

cheukt commented Jan 9, 2025

you know, I'm surprised that it passes too

@viambot viambot added safe to test Mark as safe to test and removed safe to test Mark as safe to test labels Jan 9, 2025
@viambot viambot added safe to test Mark as safe to test and removed safe to test Mark as safe to test labels Jan 9, 2025
@viambot viambot added safe to test Mark as safe to test and removed safe to test Mark as safe to test labels Jan 9, 2025
@viambot viambot removed the safe to test Mark as safe to test label Jan 9, 2025
@viambot viambot added the safe to test Mark as safe to test label Jan 9, 2025
@viambot viambot added safe to test Mark as safe to test and removed safe to test Mark as safe to test labels Jan 9, 2025
@viambot viambot added safe to test Mark as safe to test and removed safe to test Mark as safe to test labels Jan 9, 2025
@viambot viambot added safe to test Mark as safe to test and removed safe to test Mark as safe to test labels Jan 9, 2025
@viambot viambot added safe to test Mark as safe to test and removed safe to test Mark as safe to test labels Jan 9, 2025
@cheukt cheukt merged commit 95db6b6 into viamrobotics:main Jan 9, 2025
6 checks passed
@cheukt cheukt deleted the shutdown-faster branch January 9, 2025 23:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
safe to test Mark as safe to test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants