Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bare ssh image doesn't handle child processes #565

Closed
andy108369 opened this issue Aug 21, 2024 · 5 comments
Closed

bare ssh image doesn't handle child processes #565

andy108369 opened this issue Aug 21, 2024 · 5 comments
Assignees

Comments

@andy108369
Copy link
Collaborator

andy108369 commented Aug 21, 2024

bare ssh image doesn't handle child processes;
https://akash.network/docs/providers/provider-faq-and-guide/#kill-zombie-processes kills such container when they unintentionally spawn Z-ombie (defunct) processes.

We should exec /usr/sbin/sshd -D instead of exec tail -f /dev/null to fix it.

Good ssh container example is here

Zombie process reproducer

Thanks Deval for the reprocuder

step 1

---
version: "2.0"
services:
  deployment-rit-test:
    image: ghcr.io/akash-network/ubuntu-2404-ssh:1
    expose:
      - port: 80
        as: 80
        to:
          - global: true
      - port: 22
        as: 22
        to:
          - global: true
    env:
      - >-
        SSH_PUBKEY=[SSH_PUBKEY]
    params:
      storage:
        data:
          mount: /mnt/blabs
          readOnly: false
profiles:
  compute:
    deployment-rit-test:
      resources:
        cpu:
          units: 32
        memory:
          size: 64GB
        storage:
          - size: 128GB
          - name: data
            size: 256GB
            attributes:
              persistent: true
              class: beta3
        gpu:
          units: 1
          attributes:
            vendor:
              nvidia:
                - model: rtx4090
  placement:
    dcloud:
      pricing:
        deployment-rit-test:
          denom: uakt
          amount: 10000
deployment:
  deployment-rit-test:
    dcloud:
      profile: deployment-rit-test
      count: 1

step 2

apt-get update
apt-get upgrade -y
apt-get install -y git nano wget curl build-essential

wget https://repo.anaconda.com/archive/Anaconda3-2024.06-1-Linux-x86_64.sh
chmod +x ./Anaconda3-2024.06-1-Linux-x86_64.sh
./Anaconda3-2024.06-1-Linux-x86_64.sh
(I installed the anaconda3 at /mnt/blabs/anaconda3)
export PATH=$PATH:/mnt/blabs/anaconda3/bin # or set at .bashrc

conda init

cd /mnt/blabs
git clone https://github.com/apple/ml-ferret
cd ml-ferret

conda create -n ferret python=3.10 -y
conda activate ferret
pip install --upgrade pip
pip install -e . # this is the break point if the kill_zombie script is runing.

zombie process:
image

@andy108369 andy108369 self-assigned this Aug 21, 2024
@andy108369
Copy link
Collaborator Author

andy108369 commented Aug 22, 2024

Test results

I've tested everything & explained in details below.
PR to fix this.

Current state

  1. Build & run
cd awesome-akash
docker build -t base-ssh:ubuntu -f base-ssh/Dockerfile.ubuntu base-ssh
docker run --rm -ti -p 22 -e SSH_PUBKEY="$(cat ~/.ssh/id_ed25519.pub)" base-ssh:ubuntu
  1. Note the mapped ssh port
$ docker ps -l
CONTAINER ID   IMAGE             COMMAND                  CREATED         STATUS         PORTS                                     NAMES
8ad9d18eb553   base-ssh:ubuntu   "/usr/local/bin/init…"   2 minutes ago   Up 2 minutes   0.0.0.0:32769->22/tcp, :::32769->22/tcp   inspiring_banzai
  1. ssh to the mapped ssh port
$ ssh -p 32769 root@localhost
  1. run zombie process reproducer
apt update
apt install --no-install-recommends -y -- ca-certificates wget gcc libc6-dev
wget https://raw.githubusercontent.com/jordanvrtanoski/zombie-docker-demo/main/zombie.c
gcc -o zombie zombie.c
nohup ./zombie >/dev/null 2>&1 & disown %1
  1. verify whether there are any zombie (defunct) processes
  • Pod point of view:
root@33129d49b9d5:~# ps -eo pid,ppid,stat,cmd --forest
    PID    PPID STAT CMD
      1       0 Ss+  tail -f /dev/null
      9       1 Ss   sshd: /usr/sbin/sshd [listener] 0 of 10-100 startups
     10       9 Ss    \_ sshd: root@pts/1
     21      10 Ss        \_ -bash
   3244      21 R+            \_ ps -eo pid,ppid,stat,cmd --forest
   3240       1 Z    [zombie] <defunct>
   3241       1 Z    [zombie] <defunct>
   3242       1 Z    [zombie] <defunct>
root@8ad9d18eb553:~# 
  • Host point of view:
$ ps -eo pid,ppid,stat,cmd --forest
    PID    PPID STAT CMD
    ...
  43263       1 Sl   /usr/bin/containerd-shim-runc-v2 -namespace moby -id 33129d49b9d596bb04f352c5da23e18dfa8068992cee671b5c4464f503f9fa2b -a
  43282   43263 Ss+   \_ tail -f /dev/null
  43312   43282 Ss        \_ sshd: /usr/sbin/sshd [listener] 0 of 10-100 startups
  43325   43312 Ss        |   \_ sshd: root@pts/1
  43337   43325 Ss+       |       \_ -bash
  46634   43282 Z         \_ [zombie] <defunct>
  46635   43282 Z         \_ [zombie] <defunct>
  46636   43282 Z         \_ [zombie] <defunct>

As you can see tail -f /dev/null running with PID 1, but tail is not a service manager (such as systemd, tini, dumb-init, runit, supervisord, s6, ...) , i.e. it simply doesn't handle that.

PID 1 and Child Process Reaping

PID 1 process is responsible for handling (reaping) orphaned child processes after their parent process close.

The PID 1 process, also known as the init process, is responsible for reaping child processes that become orphaned when their parent process exits. This is a key role of PID 1 because if it doesn't reap these orphaned processes, they become zombie processes, which can eventually exhaust system resources

From man ps:

           Z    defunct ("zombie") process, terminated but not  reaped  by
               its parent

From man 2 wait:

A child that terminates, but has not been waited for becomes a "zombie". The kernel maintains
a minimal set of information about the zombie process (PID, termination status, resource usage
information) in order to allow the parent to later perform a wait to obtain information about
the child. As long as a zombie is not removed from the system via a wait, it will consume a
slot in the kernel process table, and if this table fills, it will not be possible to create
further processes. If a parent process terminates, then its "zombie" children (if any) are
adopted by init(1), (or by the nearest "subreaper" process as defined through the use of the
prctl(2) PR_SET_CHILD_SUBREAPER operation); init(1) automatically performs a wait to remove the
zombies.

More details:

man 2 wait
man 2 waitpid
man 1 init

What's "reaping"? How does it work?

"reaping" in the context of process management refers to the action that a parent process takes to handle the termination of its child processes. When a child process exits, it sends a SIGCHLD signal to the parent. The parent process must then call the wait() (or a related system call like waitpid()) to read the exit status of the child process and remove its entry from the process table. This action is what prevents the child process from becoming a "zombie" or "defunct" process.

Key Points:

  • Reaping: The act of the parent process calling wait() or waitpid() to handle the exit status of a terminated child process.
  • Zombie (defunct) processes: These occur when a child process has exited, but its parent hasn't called wait() to collect its exit status. The process remains in the process table, occupying resources, until it's reaped by the parent.

In essence, reaping ensures that resources are properly freed and that zombie processes don't accumulate. When you use systemd, tini, dumb-init, runit, supervisord or s6 as an init system in Container, it takes on this responsibility if the main process doesn't handle it, thereby preventing orphaned child processes from becoming zombies.

Signal Handling

It's important to note that a process running with PID 1 cannot be terminated using SIGKILL or SIGTERM. However, it may respond to other signals like SIGINT or SIGHUP, depending on its implementation and configuration.

In this context, the tail command, when running as PID 1 within a pod, would not handle these signals as it typically would outside this environment.

Service managers such as systemd, tini, dumb-init, runit, supervisord, s6 and others are designed to handle signals like SIGINT or SIGHUP, depending on their implementation. When they receive these signals, they initiate a graceful shutdown process, ensuring that the services they manage are properly terminated.

When Kubernetes (via kubelet/containerd or Docker) stops a process, it sends a SIGTERM signal to the primary container process—in our case, this is tail (which runs as PID 1 from the pod's perspective and as PID 43282 from the host's perspective).

Docker typically waits for 10 seconds after sending SIGTERM. If the process doesn't terminate within this period, Docker sends a SIGKILL signal to forcibly kill the container. In Kubernetes, the default grace period is 15 seconds when using kubectl delete pod, after which a SIGKILL is sent, unless this period is explicitly adjusted using the --grace-period flag.

To verify this you can test it:

  1. Issue stop (SIGTERM) to the container
$ time -p docker stop $(docker ps -lq)
8ad9d18eb553
real 10.33
user 0.03
sys 0.03
  1. Notice the container gets killed and exits with 137 exit code
$ docker run --rm -ti -p 22 -e SSH_PUBKEY="$(cat ~/.ssh/id_ed25519.pub)" base-ssh:ubuntu 
Thu Aug 22 07:35:03 UTC 2024: SSH_PUBKEY (ssh-ed25519 AAAAC3Nz***) written to ~/.ssh/authorized_keys
$ echo $?
137
  • 137 is 128+n, where n is 9 (SIGKILL).

EXTRA: built-in docker-init service manager

This is mainly for those who was not aware of docker having this feature in-built.
Docker's docker-init is a copy of the tini code, integrated into Docker to provide init system functionality within containers.

You can add --init to the docker run as follows:

docker run --rm -ti -p 22 -e SSH_PUBKEY="$(cat ~/.ssh/id_ed25519.pub)" base-ssh:ubuntu

--init - Runs an init inside the container that forwards signals and reaps processes.

  • Pod point of view:
$ ssh -p 32770 root@localhost
root@df816e1e1373:~# ps -eo pid,ppid,stat,cmd --forest
    PID    PPID STAT CMD
      1       0 Ss   /sbin/docker-init -- /usr/local/bin/init.sh tail -f /dev/null
      7       1 S+   tail -f /dev/null
     10       1 Ss   sshd: /usr/sbin/sshd [listener] 0 of 10-100 startups
     11      10 Ss    \_ sshd: root@pts/1
     22      11 Ss        \_ -bash
   3651      22 R+            \_ ps -eo pid,ppid,stat,cmd --forest
   3647       1 S    ./zombie
   3649    3647 Z     \_ [zombie] <defunct>

In couple of seconds zombies were reaped by the docker-init process:

root@df816e1e1373:~# ps -eo pid,ppid,stat,cmd --forest
    PID    PPID STAT CMD
      1       0 Ss   /sbin/docker-init -- /usr/local/bin/init.sh tail -f /dev/null
      7       1 S+   tail -f /dev/null
     10       1 Ss   sshd: /usr/sbin/sshd [listener] 0 of 10-100 startups
     11      10 Ss    \_ sshd: root@pts/1
     22      11 Ss        \_ -bash
   3652      22 R+            \_ ps -eo pid,ppid,stat,cmd --forest
root@df816e1e1373:~# 
  • Host point of view:
$ ps -eo pid,ppid,stat,cmd --forest
    PID    PPID STAT CMD
    ...
  46882       1 Sl   /usr/bin/containerd-shim-runc-v2 -namespace moby -id df816e1e1373715c5bd86fdf7e451b83dcb8e93e4949f7b85c46d2c241fb9f8a -a
  46902   46882 Ss    \_ /sbin/docker-init -- /usr/local/bin/init.sh tail -f /dev/null
  46929   46902 S+        \_ tail -f /dev/null
  46932   46902 Ss        \_ sshd: /usr/sbin/sshd [listener] 0 of 10-100 startups
  46944   46932 Ss            \_ sshd: root@pts/1
  46956   46944 Ss                \_ -bash
  50886   46956 S+                    \_ ./zombie
  50887   50886 S+                        \_ ./zombie
  50889   50887 Z+                        |   \_ [zombie] <defunct>
  50888   50886 Z+                        \_ [zombie] <defunct>

In couple of seconds zombies were reaped by the docker-init process:

  46882       1 Sl   /usr/bin/containerd-shim-runc-v2 -namespace moby -id df816e1e1373715c5bd86fdf7e451b83dcb8e93e4949f7b85c46d2c241fb9f8a -a
  46902   46882 Ss    \_ /sbin/docker-init -- /usr/local/bin/init.sh tail -f /dev/null
  46929   46902 S+        \_ tail -f /dev/null
  46932   46902 Ss        \_ sshd: /usr/sbin/sshd [listener] 0 of 10-100 startups
  46944   46932 Ss            \_ sshd: root@pts/1
  46956   46944 Ss+               \_ -bash

Container app gets terminated in time (without waiting for the timeout after sending SIGTERM (15) and performing ungraceful SIGKILL (9)) as the signal gets properly managed by the docker-init parent process.

$ time -p docker stop $(docker ps -lq)
df816e1e1373
real 0.32
user 0.03
sys 0.03

Container app exits with the 143 exit code instead of 137 as when without process manager such as docker-init / tini / ....
Which decodes to 143-128 = 15 (SIGTERM) as expected from docker stop.

$ docker run --init --rm -ti -p 22 -e SSH_PUBKEY="$(cat ~/.ssh/id_ed25519.pub)" base-ssh:ubuntu
Thu Aug 22 08:17:02 UTC 2024: SSH_PUBKEY (ssh-ed25519 AAAAC3Nz***) written to ~/.ssh/authorized_keys
$ echo $?
143

Fixed state (with tini)

By simply adding tini to the Dockerfile as ENTRYPOINT fixes this issue:

# Add Tini
ADD https://github.com/krallin/tini/releases/download/v0.19.0/tini /tini
RUN chmod +x /tini
ENTRYPOINT ["/tini", "--"]

Adjusting tini ENTRYPOINT to our use-case where we want to start /usr/local/bin/init.sh first:

ENTRYPOINT ["/tini", "--", "/usr/local/bin/init.sh"]
  • Pod point of view:

First seconds of zombie processes running:

root@3c9b95c872c7:~# ps -eo pid,ppid,stat,cmd --forest
    PID    PPID STAT CMD
      1       0 Ss   /tini -- /usr/local/bin/init.sh tail -f /dev/null
      7       1 S+   tail -f /dev/null
     10       1 Ss   sshd: /usr/sbin/sshd [listener] 0 of 10-100 startups
     11      10 Ss    \_ sshd: root@pts/1
     22      11 Ss        \_ -bash
   3240      22 R+            \_ ps -eo pid,ppid,stat,cmd --forest
root@3c9b95c872c7:~# nohup ./zombie >/dev/null 2>&1 & disown %1
[1] 3241
root@3c9b95c872c7:~# ps -eo pid,ppid,stat,cmd --forest
    PID    PPID STAT CMD
      1       0 Ss   /tini -- /usr/local/bin/init.sh tail -f /dev/null
      7       1 S+   tail -f /dev/null
     10       1 Ss   sshd: /usr/sbin/sshd [listener] 0 of 10-100 startups
     11      10 Ss    \_ sshd: root@pts/1
     22      11 Ss        \_ -bash
   3241      22 S             \_ ./zombie
   3242    3241 S             |   \_ ./zombie
   3244    3242 Z             |   |   \_ [zombie] <defunct>
   3243    3241 Z             |   \_ [zombie] <defunct>
   3245      22 R+            \_ ps -eo pid,ppid,stat,cmd --forest

After 1-2 seconds of zombie processes running:

root@3c9b95c872c7:~# ps -eo pid,ppid,stat,cmd --forest
    PID    PPID STAT CMD
      1       0 Ss   /tini -- /usr/local/bin/init.sh tail -f /dev/null
      7       1 S+   tail -f /dev/null
     10       1 Ss   sshd: /usr/sbin/sshd [listener] 0 of 10-100 startups
     11      10 Ss    \_ sshd: root@pts/1
     22      11 Ss        \_ -bash
   3246      22 R+            \_ ps -eo pid,ppid,stat,cmd --forest
   3242       1 S    ./zombie
   3244    3242 Z     \_ [zombie] <defunct>

After 3-4 seconds since zombie process was started:

root@3c9b95c872c7:~# ps -eo pid,ppid,stat,cmd --forest
    PID    PPID STAT CMD
      1       0 Ss   /tini -- /usr/local/bin/init.sh tail -f /dev/null
      7       1 S+   tail -f /dev/null
     10       1 Ss   sshd: /usr/sbin/sshd [listener] 0 of 10-100 startups
     11      10 Ss    \_ sshd: root@pts/1
     22      11 Ss        \_ -bash
   3247      22 R+            \_ ps -eo pid,ppid,stat,cmd --forest
root@3c9b95c872c7:~# 
  • Host point of view:

First seconds of zombie processes running:

$ ps -eo pid,ppid,stat,cmd --forest
    PID    PPID STAT CMD
    ...
  55730       1 Sl   /usr/bin/containerd-shim-runc-v2 -namespace moby -id 3c9b95c872c76e123d7a87a7c5cb2eea039eaa89f2b6d061202cffab300f0d38 -a
  55751   55730 Ss    \_ /tini -- /usr/local/bin/init.sh tail -f /dev/null
  55778   55751 S+        \_ tail -f /dev/null
  55781   55751 Ss        \_ sshd: /usr/sbin/sshd [listener] 0 of 10-100 startups
  55800   55781 Ss            \_ sshd: root@pts/1
  55812   55800 Ss+               \_ -bash
  59131   55812 S                     \_ ./zombie
  59132   59131 S                         \_ ./zombie
  59134   59132 Z                         |   \_ [zombie] <defunct>
  59133   59131 Z                         \_ [zombie] <defunct>

After 1-2 seconds of zombie processes running:

$ ps -eo pid,ppid,stat,cmd --forest
    PID    PPID STAT CMD
    ...
  55730       1 Sl   /usr/bin/containerd-shim-runc-v2 -namespace moby -id 3c9b95c872c76e123d7a87a7c5cb2eea039eaa89f2b6d061202cffab300f0d38 -a
  55751   55730 Ss    \_ /tini -- /usr/local/bin/init.sh tail -f /dev/null
  55778   55751 S+        \_ tail -f /dev/null
  55781   55751 Ss        \_ sshd: /usr/sbin/sshd [listener] 0 of 10-100 startups
  55800   55781 Ss        |   \_ sshd: root@pts/1
  55812   55800 Ss+       |       \_ -bash
  59132   55751 S         \_ ./zombie
  59134   59132 Z             \_ [zombie] <defunct>

After 3-4 seconds since zombie process was started:

$ ps -eo pid,ppid,stat,cmd --forest
    PID    PPID STAT CMD
    ...
  55730       1 Sl   /usr/bin/containerd-shim-runc-v2 -namespace moby -id 3c9b95c872c76e123d7a87a7c5cb2eea039eaa89f2b6d061202cffab300f0d38 -a
  55751   55730 Ss    \_ /tini -- /usr/local/bin/init.sh tail -f /dev/null
  55778   55751 S+        \_ tail -f /dev/null
  55781   55751 Ss        \_ sshd: /usr/sbin/sshd [listener] 0 of 10-100 startups
  55800   55781 Ss            \_ sshd: root@pts/1
  55812   55800 Ss+               \_ -bash

tini also handles signals properly

Container app gets terminated in time (without waiting for the timeout after sending SIGTERM (15) and performing ungraceful SIGKILL (9)) as the signal gets properly managed by the tini parent process.

$ time -p docker stop $(docker ps -lq)
3c9b95c872c7
real 0.29
user 0.02
sys 0.03

Container app exits with the 143 exit code instead of 137 as when without process manager such as docker-init / tini / ....
Which decodes to 143-128 = 15 (SIGTERM) as expected from docker stop.

$ docker run --rm -ti -p 22 -e SSH_PUBKEY="$(cat ~/.ssh/id_ed25519.pub)" base-ssh:ubuntu
Thu Aug 22 09:14:24 UTC 2024: SSH_PUBKEY (ssh-ed25519 AAAAC3Nz***) written to ~/.ssh/authorized_keys
$ echo $?
143

Refs

#517
https://akash.network/docs/providers/provider-faq-and-guide/#kill-zombie-processes
https://computingpost.medium.com/how-to-use-tini-init-system-in-docker-containers-69283d0099ed

andy108369 added a commit to andy108369/awesome-akash that referenced this issue Aug 22, 2024
andy108369 added a commit to andy108369/awesome-akash that referenced this issue Aug 22, 2024
…ling (akash-network#565)

Use tini as an init system to manage orphaned child processes, ensuring they
don't become zombie (defunct) processes by reaping (cleaning up) them when
their parent process doesn't.

Tini will also correctly handle signals like SIGTERM (15), allowing child
processes to terminate gracefully within the allotted time, rather than
being forcefully killed with SIGKILL after a 15-second timeout.
ygrishajev pushed a commit that referenced this issue Aug 22, 2024
…ling (#565) (#566)

Use tini as an init system to manage orphaned child processes, ensuring they
don't become zombie (defunct) processes by reaping (cleaning up) them when
their parent process doesn't.

Tini will also correctly handle signals like SIGTERM (15), allowing child
processes to terminate gracefully within the allotted time, rather than
being forcefully killed with SIGKILL after a 15-second timeout.
@andy108369
Copy link
Collaborator Author

@andy108369
Copy link
Collaborator Author

andy108369 commented Aug 22, 2024

Todo

Update:

@andy108369
Copy link
Collaborator Author

#571

@andy108369
Copy link
Collaborator Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant