Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes ssh to Debian 12.8 container: #478

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

ksalerno99
Copy link
Contributor

@ksalerno99 ksalerno99 commented Dec 31, 2024

  1. enable quiet login by removing /etc/update-motd.d/10-uname and
    truncating /etc/motd
  2. modify PAM sshd session setting making pam_loginuid module optional

Testing done

Before making change, PAM would immediately terminate an SSH attempt to the container (RC=254):

root@docker-server:~# ssh -i .ssh/id_rsa -p9022 [email protected]
Linux 277b17f48f34 5.14.0-503.11.1.el9_5.ppc64le #1 SMP Mon Sep 30 10:17:22 EDT 2024 ppc64le

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Connection to 129.40.94.225 closed.

root@docker-server:~# echo $?
254

The second issue was Jenkins authentication via SSH expects a quiet login, and even though we set "PrintMotd no" in /etc/ssh/sshd_config, /etc/pam.d/sshd is overriding this with "session optional pam_motd.so motd=/run/motd.dynamic"

Submitter checklist

  • Make sure you are opening from a topic/feature/bugfix branch (right side) and not your main branch!
  • Ensure that the pull request title represents the desired changelog entry
  • Please describe what you did
  • Link to relevant issues in GitHub or Jira
  • Link to relevant pull requests, esp. upstream and downstream changes
  • Ensure you have provided tests - that demonstrates feature works or fixes the issue

   1) enable quiet login by removing /etc/update-motd.d/10-uname and
      truncating /etc/motd
   2) modify PAM sshd session setting making pam_loginuid module optional
@ksalerno99 ksalerno99 requested a review from a team as a code owner December 31, 2024 11:54
@ksalerno99
Copy link
Contributor Author

This is what the symptom looks like from Jenkins adding an SSH agent:

  1. motd banner issue
SSHLauncher{host='192.168.153.100', port=9022, credentialsId='b6c895a6-34e0-435b-9849-226ef6a756dc', jvmOptions='', javaPath='', prefixStartSlaveCmd='', suffixStartSlaveCmd='', launchTimeoutSeconds=60, maxNumRetries=10, retryWaitTime=15, sshHostKeyVerificationStrategy=hudson.plugins.sshslaves.verifiers.NonVerifyingKeyVerificationStrategy, tcpNoDelay=true, trackCredentials=true}
[12/31/24 11:49:05] [SSH] Opening SSH connection to 192.168.153.100:9022.
[12/31/24 11:49:05] [SSH] WARNING: SSH Host Keys are not being verified. Man-in-the-middle attacks may be possible against this connection.
[12/31/24 11:49:05] [SSH] Authentication successful.
SSH connection reports a garbage before a command execution.
Check your .bashrc, .profile, and so on to make sure it is quiet.
The received junk text is as follows:
Linux 277b17f48f34 5.14.0-503.11.1.el9_5.ppc64le #1 SMP Mon Sep 30 10:17:22 EDT 2024 ppc64le

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.

null
[12/31/24 11:49:05] Launch failed - cleaning up connection
[12/31/24 11:49:05] [SSH] Connection closed.
  1. After resolving the motd issue, then you are hit with the PAM module terminating your session:
SSHLauncher{host='192.168.153.100', port=9022, credentialsId='b6c895a6-34e0-435b-9849-226ef6a756dc', jvmOptions='', javaPath='', prefixStartSlaveCmd='', suffixStartSlaveCmd='', launchTimeoutSeconds=60, maxNumRetries=10, retryWaitTime=15, sshHostKeyVerificationStrategy=hudson.plugins.sshslaves.verifiers.NonVerifyingKeyVerificationStrategy, tcpNoDelay=true, trackCredentials=true}
[12/31/24 11:57:14] [SSH] Opening SSH connection to 192.168.153.100:9022.
[12/31/24 11:57:14] [SSH] WARNING: SSH Host Keys are not being verified. Man-in-the-middle attacks may be possible against this connection.
[12/31/24 11:57:14] [SSH] Authentication successful.
[12/31/24 11:57:14] [SSH] The remote user's environment is:
[12/31/24 11:57:14] [SSH] Starting sftp client.
ERROR: [12/31/24 11:57:14] [SSH] SFTP failed. Copying via SCP.
java.io.IOException: Unexpected end of sftp stream.
	at PluginClassLoader for trilead-api//com.trilead.ssh2.SFTPv3Client.readBytes(SFTPv3Client.java:217)
	at PluginClassLoader for trilead-api//com.trilead.ssh2.SFTPv3Client.receiveMessage(SFTPv3Client.java:240)
	at PluginClassLoader for trilead-api//com.trilead.ssh2.SFTPv3Client.init(SFTPv3Client.java:864)
	at PluginClassLoader for trilead-api//com.trilead.ssh2.SFTPv3Client.<init>(SFTPv3Client.java:108)
	at PluginClassLoader for trilead-api//com.trilead.ssh2.SFTPv3Client.<init>(SFTPv3Client.java:119)
	at PluginClassLoader for trilead-api//com.trilead.ssh2.jenkins.SFTPClient.<init>(SFTPClient.java:43)
	at PluginClassLoader for ssh-slaves//hudson.plugins.sshslaves.SSHLauncher.copyAgentJar(SSHLauncher.java:677)
	at PluginClassLoader for ssh-slaves//hudson.plugins.sshslaves.SSHLauncher.lambda$launch$0(SSHLauncher.java:456)
	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.base/java.lang.Thread.run(Unknown Source)
[12/31/24 11:57:14] [SSH] Remote file system root /home/jenkins/agent does not exist. Will try to create it...
Failed to create /home/jenkins/agent
[12/31/24 11:57:14] [SSH] Copying latest remoting.jar...
java.io.IOException: Could not copy remoting.jar into '/home/jenkins/agent' on agent
	at PluginClassLoader for ssh-slaves//hudson.plugins.sshslaves.SSHLauncher.copySlaveJarUsingSCP(SSHLauncher.java:827)
	at PluginClassLoader for ssh-slaves//hudson.plugins.sshslaves.SSHLauncher.copyAgentJar(SSHLauncher.java:733)
	at PluginClassLoader for ssh-slaves//hudson.plugins.sshslaves.SSHLauncher.lambda$launch$0(SSHLauncher.java:456)
	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.base/java.lang.Thread.run(Unknown Source)
Caused by: java.io.IOException: Error during SCP transfer.
	at PluginClassLoader for trilead-api//com.trilead.ssh2.SCPClient.put(SCPClient.java:518)
	at PluginClassLoader for ssh-slaves//hudson.plugins.sshslaves.SSHLauncher.copySlaveJarUsingSCP(SSHLauncher.java:825)
	... 6 more
Caused by: java.io.IOException: Remote scp terminated with error code -1
	at PluginClassLoader for trilead-api//com.trilead.ssh2.SCPClient.readResponse(SCPClient.java:54)
	at PluginClassLoader for trilead-api//com.trilead.ssh2.SCPClient.sendBytes(SCPClient.java:135)
	at PluginClassLoader for trilead-api//com.trilead.ssh2.SCPClient.put(SCPClient.java:514)
	... 7 more
[12/31/24 11:57:14] Launch failed - cleaning up connection
[12/31/24 11:57:14] [SSH] Connection closed.

@ksalerno99
Copy link
Contributor Author

After successfully applying patch and re-launching docker-ssh-agent, our Jenkins SSH Agent is online:

SSHLauncher{host='192.168.153.100', port=9022, credentialsId='b6c895a6-34e0-435b-9849-226ef6a756dc', jvmOptions='', javaPath='', prefixStartSlaveCmd='', suffixStartSlaveCmd='', launchTimeoutSeconds=60, maxNumRetries=10, retryWaitTime=15, sshHostKeyVerificationStrategy=hudson.plugins.sshslaves.verifiers.NonVerifyingKeyVerificationStrategy, tcpNoDelay=true, trackCredentials=true}
[12/31/24 12:03:21] [SSH] Opening SSH connection to 192.168.153.100:9022.
[12/31/24 12:03:21] [SSH] WARNING: SSH Host Keys are not being verified. Man-in-the-middle attacks may be possible against this connection.
[12/31/24 12:03:21] [SSH] Authentication successful.
[12/31/24 12:03:21] [SSH] The remote user's environment is:
AGENT_WORKDIR=/home/jenkins/agent
BASH=/usr/bin/bash
BASHOPTS=checkwinsize:cmdhist:complete_fullquote:extquote:force_fignore:globasciiranges:globskipdots:hostcomplete:interactive_comments:patsub_replacement:progcomp:promptvars:sourcepath
BASH_ALIASES=()
BASH_ARGC=([0]="0")
BASH_ARGV=()
BASH_CMDS=()
BASH_EXECUTION_STRING=set
BASH_LINENO=()
BASH_LOADABLES_PATH=/usr/local/lib/bash:/usr/lib/bash:/opt/local/lib/bash:/usr/pkg/lib/bash:/opt/pkg/lib/bash:.
BASH_SOURCE=()
BASH_VERSINFO=([0]="5" [1]="2" [2]="15" [3]="1" [4]="release" [5]="powerpc64le-unknown-linux-gnu")
BASH_VERSION='5.2.15(1)-release'
DIRSTACK=()
EUID=1000
GROUPS=()
HOME=/home/jenkins
HOSTNAME=6a25fddc64a7
HOSTTYPE=powerpc64le
IFS=$' \t\n'
JAVA_HOME=/opt/java/openjdk
JAVA_OPTS=-Djava.util.logging.config.file=/var/jenkins_home/log.properties
JENKINS_AGENT_HOME=/home/jenkins
JENKINS_AGENT_SSH_PUBKEY='ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQCv12/jDoRww1hWGxIgwSJrST2cMfQMOKl7kCc01vUcBEGDbGFRhL8+KfzyEaGK1S5/ABHaE3mJ0nWleakn2HoGPWtmOSyd2P3voGaY4hBNPiOKPUqu+ZTHdDvKUVoWyUX6IG49ZgFQyYipIQjq3cUjlYqk5pIb3C53iRaMrr+j224DdfttdhV79gaw0A47h/tEl21moG3N+A2TCSi7eXKCZ/DEhQkiMRX9YV35gMr3PSMFWINfra9r+u5bfHa5ddNwAhmEb7AaqlFtc/UTxtXXm0wUp/kn/baw22yYZRCrUE7aWEeV/kfqgtnS8YC2sAeqRLAwxUEC3zJgPQ8mUSuTzRO6osUl+iDLp61BlHMFc65fkRK9y/LH9pnyF2Or8xX3DWK5636gZmIjaDYdlWO4FJnn+IIFp+DOkxA/O5qqxv4EQzjXGlkNaUgeaFyeY/YckK1GtuuPaSPOx8G5vudNrqgKS4v3FbRJLODVAQ/b6bTzD2MINW3M3MKsRS07CWs= root@docker-server'
LC_ALL=C.UTF-8
LOGNAME=jenkins
MACHTYPE=powerpc64le-unknown-linux-gnu
MOTD_SHOWN=pam
OPTERR=1
OPTIND=1
OSTYPE=linux-gnu
PATH=/opt/java/openjdk/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PIPESTATUS=([0]="0")
PPID=18
PS4='+ '
PWD=/home/jenkins
SHELL=/bin/bash
SHELLOPTS=braceexpand:hashall:interactive-comments
SHLVL=1
SSH_CLIENT='172.17.0.1 36234 22'
SSH_CONNECTION='172.17.0.1 36234 172.17.0.11 22'
TERM=dumb
UID=1000
USER=jenkins
_=']'
[12/31/24 12:03:21] [SSH] Starting sftp client.
[12/31/24 12:03:21] [SSH] Copying latest remoting.jar...
Source agent hash is A2E96D08000E53966B4B8F68E4999483. Installed agent hash is A2E96D08000E53966B4B8F68E4999483
Verified agent jar. No update is necessary.
Expanded the channel window size to 4MB
[12/31/24 12:03:21] [SSH] Starting agent process: cd "/home/jenkins/agent" && java  -jar remoting.jar -workDir /home/jenkins/agent -jar-cache /home/jenkins/agent/remoting/jarCache
Dec 31, 2024 12:03:21 PM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir
INFO: Using /home/jenkins/agent/remoting as a remoting work directory
Dec 31, 2024 12:03:22 PM org.jenkinsci.remoting.engine.WorkDirManager setupLogging
INFO: Both error and output logs will be printed to /home/jenkins/agent/remoting
<===[JENKINS REMOTING CAPACITY]===>channel started
Remoting version: 3283.v92c105e0f819
Launcher: SSHLauncher
Communication Protocol: Standard in/out
This is a Unix agent
Agent successfully connected and online

@ksalerno99
Copy link
Contributor Author

Tested on current weekly:

root@docker-server:~# env | grep JENKINS
JENKINS_SHA=ccef73536436ced77776c994cfc86897d6c3899efe8904def444036663111c9b
JENKINS_VERSION=2.491

@MarkEWaite
Copy link
Contributor

I wasn't able to duplicate the issue from a Debian 12 computer. I don't object to the change, but would like to confirm that it is an issue that I can duplicate before the change is merged. Any suggestions of the mistake I might be making?

Steps that I took while trying to duplicate the issue:

  1. Run a Java 21 version of the most recent agent container image using the command:
    docker run -d --rm --name=agent --publish 2200:22 \
        -e "JENKINS_AGENT_SSH_PUBKEY=ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIIheIXWrE/ZvcM0WKHdcOEsLUaJZ4rgXrfrhe7w3fcjI" \
        jenkins/ssh-agent:jdk21
    
  2. Define a Jenkins agent that uses /home/jenkins/agent as the user home directory, uses an ssh private key credential that matches the provided public key, and uses port 2200 to the host running the container. Confirmed that the agent is running Debian 12 from the Jenkins script console for the agent:
    println "cat /etc/os-release".execute().text
    PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
    NAME="Debian GNU/Linux"
    VERSION_ID="12"
    VERSION="12 (bookworm)"
    VERSION_CODENAME=bookworm
    ID=debian
    HOME_URL="https://www.debian.org/"
    SUPPORT_URL="https://www.debian.org/support"
    BUG_REPORT_URL="https://bugs.debian.org/"
    
  3. Connect from a Debian 12 computer (fully patched) to the Jenkins agent with the command:
    ssh -i .ssh/id_ed25519-mine -p 2200 jenkins@my-hostname
    

The ssh connection worked and the agent connected to Jenkins as expected.

@ksalerno99
Copy link
Contributor Author

Interesting... I wonder if the PAM issue is specific to the ppc64le debian-slim docker image?

I wasn't able to duplicate the issue from a Debian 12 computer. I don't object to the change, but would like to confirm that it is an issue that I can duplicate before the change is merged. Any suggestions of the mistake I might be making?

Steps that I took while trying to duplicate the issue:

1. Run a Java 21 version of the most recent agent container image using the command:
   ```
   docker run -d --rm --name=agent --publish 2200:22 \
       -e "JENKINS_AGENT_SSH_PUBKEY=ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIIheIXWrE/ZvcM0WKHdcOEsLUaJZ4rgXrfrhe7w3fcjI" \
       jenkins/ssh-agent:jdk21
   ```

2. Define a Jenkins agent that uses `/home/jenkins/agent` as the user home directory, uses an ssh private key credential that matches the provided public key, and uses port 2200 to the host running the container.  Confirmed that the agent is running Debian 12 from the Jenkins script console for the agent:
   ```
   println "cat /etc/os-release".execute().text
   PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
   NAME="Debian GNU/Linux"
   VERSION_ID="12"
   VERSION="12 (bookworm)"
   VERSION_CODENAME=bookworm
   ID=debian
   HOME_URL="https://www.debian.org/"
   SUPPORT_URL="https://www.debian.org/support"
   BUG_REPORT_URL="https://bugs.debian.org/"
   ```

3. Connect from a Debian 12 computer (fully patched) to the Jenkins agent with the command:
   ```
   ssh -i .ssh/id_ed25519-mine -p 2200 jenkins@my-hostname
   ```

The ssh connection worked and the agent connected to Jenkins as expected.

@ksalerno99
Copy link
Contributor Author

This build test was on Ubuntu 24.04.1 LTS for ppc64le using docker-ce v27.4.1 and github.com/docker/buildx v0.19.3 48d6a39

Copy link
Contributor

@MarkEWaite MarkEWaite left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen a frequent recommendation to reduce the number of layers in the container image. Since those two RUN commands are part of the SSH configuration of the container image, could you attach them to the preceding command that performs SSH configuration? That way we don't add two very small layers to the image.

@ksalerno99
Copy link
Contributor Author

Done!

I've seen a frequent recommendation to reduce the number of layers in the container image. Since those two RUN commands are part of the SSH configuration of the container image, could you attach them to the preceding command that performs SSH configuration? That way we don't add two very small layers to the image.

@ksalerno99
Copy link
Contributor Author

Mark, I narrowed down the issue to be related to if your Docker platform has CAP_AUDIT_WRITE capability or not.

If Docker removes the audit capability of the container, then the PAM module "pam_loginuid" breaks login because its purpose is to assign the audit attribute on the process.

Interesting... I wonder if the PAM issue is specific to the ppc64le debian-slim docker image?

I wasn't able to duplicate the issue from a Debian 12 computer. I don't object to the change, but would like to confirm that it is an issue that I can duplicate before the change is merged. Any suggestions of the mistake I might be making?
Steps that I took while trying to duplicate the issue:

1. Run a Java 21 version of the most recent agent container image using the command:

docker run -d --rm --name=agent --publish 2200:22
-e "JENKINS_AGENT_SSH_PUBKEY=ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIIheIXWrE/ZvcM0WKHdcOEsLUaJZ4rgXrfrhe7w3fcjI"
jenkins/ssh-agent:jdk21


2. Define a Jenkins agent that uses `/home/jenkins/agent` as the user home directory, uses an ssh private key credential that matches the provided public key, and uses port 2200 to the host running the container.  Confirmed that the agent is running Debian 12 from the Jenkins script console for the agent:

println "cat /etc/os-release".execute().text
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"


3. Connect from a Debian 12 computer (fully patched) to the Jenkins agent with the command:

ssh -i .ssh/id_ed25519-mine -p 2200 jenkins@my-hostname

The ssh connection worked and the agent connected to Jenkins as expected.

@dduportal
Copy link
Contributor

Mark, I narrowed down the issue to be related to if your Docker platform has CAP_AUDIT_WRITE capability or not.

If Docker removes the audit capability of the container, then the PAM module "pam_loginuid" breaks login because its purpose is to assign the audit attribute on the process.

Interesting... I wonder if the PAM issue is specific to the ppc64le debian-slim docker image?

I wasn't able to duplicate the issue from a Debian 12 computer. I don't object to the change, but would like to confirm that it is an issue that I can duplicate before the change is merged. Any suggestions of the mistake I might be making?
Steps that I took while trying to duplicate the issue:

1. Run a Java 21 version of the most recent agent container image using the command:

docker run -d --rm --name=agent --publish 2200:22
-e "JENKINS_AGENT_SSH_PUBKEY=ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIIheIXWrE/ZvcM0WKHdcOEsLUaJZ4rgXrfrhe7w3fcjI"
jenkins/ssh-agent:jdk21


2. Define a Jenkins agent that uses `/home/jenkins/agent` as the user home directory, uses an ssh private key credential that matches the provided public key, and uses port 2200 to the host running the container.  Confirmed that the agent is running Debian 12 from the Jenkins script console for the agent:

println "cat /etc/os-release".execute().text
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"


3. Connect from a Debian 12 computer (fully patched) to the Jenkins agent with the command:

ssh -i .ssh/id_ed25519-mine -p 2200 jenkins@my-hostname

The ssh connection worked and the agent connected to Jenkins as expected.

Interesting, good catch! For the sake of the general security, I believe we should set the test harness to test with this capability dropped, and then fix the PAM stuff like you did.

Medium term, we should test that the image can run with both --cap-drop=ALL and --read-only (we have an image with SSHD-rsyncd only at https://github.com/jenkins-infra/docker-rsyncd/blob/main/Dockerfile which works with these minimalistic permissions).

@ksalerno99
Copy link
Contributor Author

Damien, I got it to work with docker run --cap-add AUDIT_CONTROL

When I tried with --cap-add AUDIT_WRITE it still failed, so that must have been misinformation I read previously. What's confusing is AUDIT_CONTROL is privileged while I never had to run with --privileged before.

@MarkEWaite, what version of docker-ce are you running?

Reference:
https://docs.docker.com/engine/containers/run/#runtime-privilege-and-linux-capabilities

Ken

Mark, I narrowed down the issue to be related to if your Docker platform has CAP_AUDIT_WRITE capability or not.
If Docker removes the audit capability of the container, then the PAM module "pam_loginuid" breaks login because its purpose is to assign the audit attribute on the process.

Interesting... I wonder if the PAM issue is specific to the ppc64le debian-slim docker image?

I wasn't able to duplicate the issue from a Debian 12 computer. I don't object to the change, but would like to confirm that it is an issue that I can duplicate before the change is merged. Any suggestions of the mistake I might be making?
Steps that I took while trying to duplicate the issue:

1. Run a Java 21 version of the most recent agent container image using the command:

docker run -d --rm --name=agent --publish 2200:22
-e "JENKINS_AGENT_SSH_PUBKEY=ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIIheIXWrE/ZvcM0WKHdcOEsLUaJZ4rgXrfrhe7w3fcjI"
jenkins/ssh-agent:jdk21


2. Define a Jenkins agent that uses `/home/jenkins/agent` as the user home directory, uses an ssh private key credential that matches the provided public key, and uses port 2200 to the host running the container.  Confirmed that the agent is running Debian 12 from the Jenkins script console for the agent:

println "cat /etc/os-release".execute().text
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"


3. Connect from a Debian 12 computer (fully patched) to the Jenkins agent with the command:

ssh -i .ssh/id_ed25519-mine -p 2200 jenkins@my-hostname
The ssh connection worked and the agent connected to Jenkins as expected.

Interesting, good catch! For the sake of the general security, I believe we should set the test harness to test with this capability dropped, and then fix the PAM stuff like you did.

Medium term, we should test that the image can run with both --cap-drop=ALL and --read-only (we have an image with SSHD-rsyncd only at https://github.com/jenkins-infra/docker-rsyncd/blob/main/Dockerfile which works with these minimalistic permissions).

@MarkEWaite
Copy link
Contributor

@MarkEWaite, what version of docker-ce are you running?

Docker version 27.4.1, build b9d17ea

@ksalerno99
Copy link
Contributor Author

Damien, thank you for the rsyncd reference Dockerfile. I like the way you handled the motd issue better there. I am going to update this pull request with that method and also test running with all capabilities removed.

Mark, I narrowed down the issue to be related to if your Docker platform has CAP_AUDIT_WRITE capability or not.
If Docker removes the audit capability of the container, then the PAM module "pam_loginuid" breaks login because its purpose is to assign the audit attribute on the process.

Interesting... I wonder if the PAM issue is specific to the ppc64le debian-slim docker image?

I wasn't able to duplicate the issue from a Debian 12 computer. I don't object to the change, but would like to confirm that it is an issue that I can duplicate before the change is merged. Any suggestions of the mistake I might be making?
Steps that I took while trying to duplicate the issue:

1. Run a Java 21 version of the most recent agent container image using the command:

docker run -d --rm --name=agent --publish 2200:22
-e "JENKINS_AGENT_SSH_PUBKEY=ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIIheIXWrE/ZvcM0WKHdcOEsLUaJZ4rgXrfrhe7w3fcjI"
jenkins/ssh-agent:jdk21


2. Define a Jenkins agent that uses `/home/jenkins/agent` as the user home directory, uses an ssh private key credential that matches the provided public key, and uses port 2200 to the host running the container.  Confirmed that the agent is running Debian 12 from the Jenkins script console for the agent:

println "cat /etc/os-release".execute().text
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"


3. Connect from a Debian 12 computer (fully patched) to the Jenkins agent with the command:

ssh -i .ssh/id_ed25519-mine -p 2200 jenkins@my-hostname
The ssh connection worked and the agent connected to Jenkins as expected.

Interesting, good catch! For the sake of the general security, I believe we should set the test harness to test with this capability dropped, and then fix the PAM stuff like you did.

Medium term, we should test that the image can run with both --cap-drop=ALL and --read-only (we have an image with SSHD-rsyncd only at https://github.com/jenkins-infra/docker-rsyncd/blob/main/Dockerfile which works with these minimalistic permissions).

more elegant way the following:

1) Quiet login: disabling motd in PAM
2) enabling dropping of privileges in container: remove the requirement for
   CAP_AUDIT_CONTROL in PAM for SSH login
@ksalerno99
Copy link
Contributor Author

Report for running the following Jenkins containers with --cap-drop=ALL and --read-only, fully tested adding agents and running test jobs:

docker-inbound-agent:

	--cap-drop=ALL	WORKS
	--read-only	DOES NOT WORK
		error: (Jenkins job failes to run on agent)
		FATAL: Unable to produce a script file
		java.io.IOException: Read-only file system

docker-ssh-agent:

	--cap-drop=ALL	DOES NOT WORK
		Minimal capabilities discovered through trial-and-error:
		--cap-drop=ALL \
		--cap-add=SYS_CHROOT \
		--cap-add=FOWNER \
		--cap-add=CHOWN \
		--cap-add=DAC_OVERRIDE \
		--cap-add=SETUID \
		--cap-add=SETGID \
		--cap-add=NET_BIND_SERVICE \
		--cap-add=AUDIT_WRITE 

(For reference, the default capabilities that remain disabled are KILL, FSETID, MKNOD, NET_RAW, SETFCAP, SETPCAP)

	--read-only	DOES NOT WORK
		error: (launching agent)
		/usr/local/bin/setup-sshd: line 41: /home/jenkins/.ssh/authorized_keys: Read-only file system

docker (controller):

	--cap-drop=ALL 	WORKS
	--read-only	DOES NOT WORK
		error: (launching controller)
		Running from: /usr/share/jenkins/jenkins.war
		webroot: /var/jenkins_home/war
		Exception in thread "main" java.io.UncheckedIOException: Jenkins failed to create a temporary file in /tmp: java.io.IOException: Read-only file system
		at executable.Main.extractFromJar(Main.java:435)
		at executable.Main.main(Main.java:249)
		Caused by: java.io.IOException: Read-only file system
		at java.base/java.io.UnixFileSystem.createFileExclusively0(Native Method)
		at java.base/java.io.UnixFileSystem.createFileExclusively(Unknown Source)
		at java.base/java.io.File.createTempFile(Unknown Source)
		at executable.Main.extractFromJar(Main.java:432)

@ksalerno99
Copy link
Contributor Author

I'm still unable to determine what's really different between Docker running on ppc64le vs amd64. Note that the issue only happens on real ppc64le hardware: I'm testing on a RHEL9 POWER10 host, same exact build procedure running docker inside of a Podman Ubuntu container works without the patch for ppc64le on top of QEMU user-static on my WSL2 RHEL9 x86_64 host. I also sanity tested amd64 on x86_64 to rule-out QEMU.

The only difference I came up with is the Docker control group namespace is set to private on the POWER10 host while on the x86_64 host it is set to host.

ppc64le: docker info
...
 Security Options:
  seccomp
   Profile: builtin
  cgroupns
 Kernel Version: 5.14.0-503.11.1.el9_5.ppc64le
 Operating System: Ubuntu 24.04.1 LTS
 OSType: linux
 Architecture: ppc64le
 CPUs: 128
 Total Memory: 14.7GiB
...

amd64: docker info
...
 Security Options:
  seccomp
   Profile: builtin
 Kernel Version: 5.15.167.4-microsoft-standard-WSL2
 Operating System: Ubuntu 24.04.1 LTS (containerized)
 OSType: linux
 Architecture: x86_64
 CPUs: 16
 Total Memory: 15.26GiB
...

ppc64le: docker inspect docker-ssh-agent | sort >ppc64le.spec
amd64: docker inspect docker-ssh-agent | sort >amd64.spec

diff ppc64le.spec amd64.spec
...
<             "CgroupnsMode": "private",
---
>             "CgroupnsMode": "host",
...

The actual Linux kernel capabilities enabled themselves are identical on both the POWER10 host and x86_64 host:

root@docker-server:~# uname -m
ppc64le
root@docker-server:~# capsh --print
Current: =ep
Bounding set =cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,cap_audit_read,cap_perfmon,cap_bpf,cap_checkpoint_restore
Ambient set =
Current IAB:
Securebits: 00/0x0/1'b0 (no-new-privs=0)
 secure-noroot: no (unlocked)
 secure-no-suid-fixup: no (unlocked)
 secure-keep-caps: no (unlocked)
 secure-no-ambient-raise: no (unlocked)
uid=0(root) euid=0(root)
gid=0(root)
groups=
Guessed mode: HYBRID (4)

root@docker-server:~# uname -m
x86_64
root@docker-server:~# capsh --print
Current: =ep
Bounding set =cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,cap_audit_read,cap_perfmon,cap_bpf,cap_checkpoint_restore
Ambient set =
Current IAB:
Securebits: 00/0x0/1'b0 (no-new-privs=0)
 secure-noroot: no (unlocked)
 secure-no-suid-fixup: no (unlocked)
 secure-keep-caps: no (unlocked)
 secure-no-ambient-raise: no (unlocked)
uid=0(root) euid=0(root)
gid=0(root)
groups=
Guessed mode: HYBRID (4)

@ksalerno99
Copy link
Contributor Author

My question is: why doesn't the x86_64 implementation execute these pluggable authentication modules?

session    optional     pam_motd.so  motd=/run/motd.dynamic
session    optional     pam_motd.so noupdate
session    required     pam_loginuid.so

Maybe the power implementation is correctly running these as they should. It's interesting the rsyncd container also needed to suppress motd in pam.

I'm still unable to determine what's really different between Docker running on ppc64le vs amd64. Note that the issue only happens on real ppc64le hardware: I'm testing on a RHEL9 POWER10 host, same exact build procedure running docker inside of a Podman Ubuntu container works without the patch for ppc64le on top of QEMU user-static on my WSL2 RHEL9 x86_64 host. I also sanity tested amd64 on x86_64 to rule-out QEMU.

The only difference I came up with is the Docker control group namespace is set to private on the POWER10 host while on the x86_64 host it is set to host.

ppc64le: docker info
...
 Security Options:
  seccomp
   Profile: builtin
  cgroupns
 Kernel Version: 5.14.0-503.11.1.el9_5.ppc64le
 Operating System: Ubuntu 24.04.1 LTS
 OSType: linux
 Architecture: ppc64le
 CPUs: 128
 Total Memory: 14.7GiB
...

amd64: docker info
...
 Security Options:
  seccomp
   Profile: builtin
 Kernel Version: 5.15.167.4-microsoft-standard-WSL2
 Operating System: Ubuntu 24.04.1 LTS (containerized)
 OSType: linux
 Architecture: x86_64
 CPUs: 16
 Total Memory: 15.26GiB
...

ppc64le: docker inspect docker-ssh-agent | sort >ppc64le.spec
amd64: docker inspect docker-ssh-agent | sort >amd64.spec

diff ppc64le.spec amd64.spec
...
<             "CgroupnsMode": "private",
---
>             "CgroupnsMode": "host",
...

The actual Linux kernel capabilities enabled themselves are identical on both the POWER10 host and x86_64 host:

root@docker-server:~# uname -m
ppc64le
root@docker-server:~# capsh --print
Current: =ep
Bounding set =cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,cap_audit_read,cap_perfmon,cap_bpf,cap_checkpoint_restore
Ambient set =
Current IAB:
Securebits: 00/0x0/1'b0 (no-new-privs=0)
 secure-noroot: no (unlocked)
 secure-no-suid-fixup: no (unlocked)
 secure-keep-caps: no (unlocked)
 secure-no-ambient-raise: no (unlocked)
uid=0(root) euid=0(root)
gid=0(root)
groups=
Guessed mode: HYBRID (4)

root@docker-server:~# uname -m
x86_64
root@docker-server:~# capsh --print
Current: =ep
Bounding set =cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,cap_audit_read,cap_perfmon,cap_bpf,cap_checkpoint_restore
Ambient set =
Current IAB:
Securebits: 00/0x0/1'b0 (no-new-privs=0)
 secure-noroot: no (unlocked)
 secure-no-suid-fixup: no (unlocked)
 secure-keep-caps: no (unlocked)
 secure-no-ambient-raise: no (unlocked)
uid=0(root) euid=0(root)
gid=0(root)
groups=
Guessed mode: HYBRID (4)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants