Skip to content
This repository has been archived by the owner on Jun 28, 2024. It is now read-only.

ci/openshift-ci: Enable selinux in CI runs #5798

Merged
merged 1 commit into from
Jan 19, 2024

Conversation

ldoktor
Copy link
Contributor

@ldoktor ldoktor commented Dec 4, 2023

as kata-deploy does not currently handles selinux, this requires manual relabel of the /opt/kata folder where custom binaries are deployed.

Fixes: #5802

@katacontainersbot katacontainersbot added the size/small Small and simple task label Dec 4, 2023
@ldoktor ldoktor marked this pull request as draft December 4, 2023 12:34
@ldoktor
Copy link
Contributor Author

ldoktor commented Dec 4, 2023

Oups, there is still some issue, let me debug it first

@ldoktor
Copy link
Contributor Author

ldoktor commented Dec 5, 2023

I'm still getting some issues, let's not merge this just yet.

@ldoktor
Copy link
Contributor Author

ldoktor commented Dec 6, 2023

It seems to be working, I'm not sure about the use of qemu_exec_t though.

image: alpine
securityContext:
privileged: true
command: ["/bin/sh", "-c", "nsenter --target 1 --mount bash -c \"ls -alZ /opt/kata/bin; semanage fcontext -a -t bin_t '/(.*/)?opt/kata/bin(/.*)?'; semanage fcontext -a -t bin_t '/(.*/)?opt/kata/libexec(/.*)?'; semanage fcontext -a -t bin_t '/(.*/)?opt/kata/runtime-rs/bin(/.*)?'; restorecon -Rv /opt/kata; ls -alZ /opt/kata/bin\"; sleep infinity"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ldoktor I think you will need to pass -e to /bin/sh so to immediately exit error on nsenter failure. Otherwise it will always sleep and exit success.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: loop through the directories names, like:

for dir in bin libexec runtime-rs/bin; do echo semanage fcontext -a -t bin_t '/(.*/)?opt/kata/'$dir'(/.*)?'; done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, I started with one dir but the list keeps growing :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for the return code, we don't really care as the pod might be executed multiple times so IMO best effort way should do. Note we might be running on system without selinux completely and in such case it should also succeed.


# FIXME: Remove when https://github.com/kata-containers/kata-containers/pull/8417 is resolved
# Selinux context is currently not handled by kata-deploy
oc apply -f ${deployments_dir}/relable_selinux.yaml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it be relabel_selinux instead of relable?

Here you also will need to check the pod is running to ensure the relabel operation succeeded.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, it's true we should perhaps wait for a message of completion before proceeding. As for the successfullness I don't think that's a good idea as we don't know it's a selinux capable system.

app: restorecon
spec:
serviceAccountName: kata-deploy-sa
hostPID: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity: is the hostPID needed so that it can nsenter --target 1 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know, yes.

@wainersm
Copy link
Contributor

@ldoktor you will need the Fixes label in at least one commit to make Commit Message Check / Commit Message Check happy.

@ldoktor
Copy link
Contributor Author

ldoktor commented Dec 12, 2023

OK since there are multiple changes requested I added a new commit to better visualize the changes. The only thing I have not tackled is the return code of the relabel as at this point I'd rather use a best-effort approach as we might be running on non-selinux host or multiple times. If you insist I can add a check for ls -Z of one of the changed dirs, but IMO that is unnecessary.

@wainersm
Copy link
Contributor

Thanks for addressing my comments @ldoktor ! I don't have further suggestions, you can squash if you wish.

@ldoktor
Copy link
Contributor Author

ldoktor commented Dec 12, 2023

Rebased

@ldoktor ldoktor force-pushed the selinux branch 2 times, most recently from ad19130 to 1d989e1 Compare December 13, 2023 09:14
@ldoktor
Copy link
Contributor Author

ldoktor commented Dec 13, 2023

I see, the Fixes needs to be #number and not a link to GH.

@wainersm
Copy link
Contributor

/test

Copy link
Contributor

@wainersm wainersm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @ldoktor !

@ldoktor
Copy link
Contributor Author

ldoktor commented Jan 3, 2024

@gkurz, @jepio, @fidencio would any of you have some time to review this?

image: alpine
securityContext:
privileged: true
command: ["/bin/sh", "-c", "nsenter --target 1 --mount bash -xc \"for ENTRY in '/(.*/)?opt/kata/share/kata-.*(/.*)?(/.*)?' '/(.*/)?opt/kata/share/ovmf(/.*)?' '/(.*/)?opt/kata/share/tdvf(/.*)?' '/(.*/)?opt/kata/libexec(/.*)?'; do semanage fcontext -a -t qemu_exec_t \\\"\\$ENTRY\\\"; done; restorecon -v -R /opt/kata\"; echo NSENTER_FINISHED_WITH: $?; sleep infinity"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand the motivation behind each regexp. Also on a typical RHEL or RHCOS setup, only the QEMU binary gets the qemu_exec_t label. Other files have different labels, e.g. virtiofsd has bin_t.

Can you clarify @ldoktor ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello Greg, I used qemu_exec_t as that one is defined in container-selinux and contains the map and execute privileges, which is what we need for binaries and bios files. The bin_t would do as well but might be too generic.

As for the /(.*/)?opt part it's because selinux policy uses realpath and /opt is mounted from /var/opt on azure so we need to allow this extra flexibility.

There might be few shortcuts as it should only serve the CI pipeline. Once we hammer-out the details the proper implementation should become part of kata-containers/kata-containers#8417 Anyway I'm listening so if you know how to do things properly, please let me know (I'm just a selinux user, not a policy writer)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello Greg, I used qemu_exec_t as that one is defined in container-selinux and contains the map and execute privileges, which is what we need for binaries and bios files. The bin_t would do as well but might be too generic.

Hi Lukas,

My point is that this CI is meant to test kata in an OCP environment. Why using different labels that the one you get on a real OCP setup ?

As for the /(.*/)?opt part it's because selinux policy uses realpath and /opt is mounted from /var/opt on azure so we need to allow this extra flexibility.

I'm not concerned by the /(.*/)?opt part but by the rest of the regexps... how did you come up with them ?

There might be few shortcuts as it should only serve the CI pipeline. Once we hammer-out the details the proper implementation should become part of kata-containers/kata-containers#8417 Anyway I'm listening so if you know how to do things properly, please let me know (I'm just a selinux user, not a policy writer)

Same here... hence my questions ;-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the real OCP cluster we rely on system-wide installed tools (qemu) while in the CI pipeline we use the latest built versions that are deployed via kata-deploy. The problem is that currently kata-deploy does not cover the selinux labels at all which results in selinux rejects.

For the rules I just collected the required privileges and found the "closest matching label" and, for consistency, enabled it on all executable/binary files delivered by the kata-deploy script, although I only tested qemu. This was sufficient for e2e pipeline to succeed while keeping selinux enabled in the CI, which is a step forward.

Obviously after the kata-deploy script is improved this CI-pipeline "hack" should be removed but for now it might serve as an incubator to see what all is needed in kata-deploy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarifications @ldoktor.

@wainersm
Copy link
Contributor

hi @gkurz , we would like to get this PR merged so that it will be possible to test upstream kata on Openshift with SELinux turned on (current we turn it off). do you have a reason to nack this change?

@beraldoleal mind to review this one too?

Copy link
Member

@beraldoleal beraldoleal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except from @gkurz comments and two small nitpics, lgtm.

@@ -181,3 +181,11 @@ if [ ${SELINUX_PERMISSIVE} == "yes" ]; then
# The new SELinux configuration will trigger another reboot.
wait_for_reboot
fi

# FIXME: Remove when https://github.com/kata-containers/kata-containers/pull/8417 is resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: My suggestion would be to convert this FIXME note into an Github issue, regardless if it will be solved soon or not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to create an issue in the tests repo and to keep a FIXME mentioning this issue in the code for greater visibility. If this code is moved to the main repo before kata-containers/kata-containers#8417 is merged, the issue will need to be converted to a kata-containers issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

image: alpine
securityContext:
privileged: true
command: ["/bin/sh", "-c", "nsenter --target 1 --mount bash -xc \"for ENTRY in '/(.*/)?opt/kata/share/kata-.*(/.*)?(/.*)?' '/(.*/)?opt/kata/share/ovmf(/.*)?' '/(.*/)?opt/kata/share/tdvf(/.*)?' '/(.*/)?opt/kata/libexec(/.*)?'; do semanage fcontext -a -t qemu_exec_t \\\"\\$ENTRY\\\"; done; restorecon -v -R /opt/kata\"; echo NSENTER_FINISHED_WITH: $?; sleep infinity"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: To improve readability here, if possible, it would be nice to break the lines. Something like:

command: ["/bin/sh", "-c", "
    set -e;  # Exit on any error
    nsenter --target 1 --mount bash -xc '
        for ENTRY in \
            \"/(.*/)?opt/kata/share/kata-.*(/.*)?(/.*)?\" \
            \"/(.*/)?opt/kata/share/ovmf(/.*)?\" \
            \"/(.*/)?opt/kata/share/tdvf(/.*)?\" \
            \"/(.*/)?opt/kata/libexec(/.*)?\"; 
        do 
            semanage fcontext -a -t qemu_exec_t \"$ENTRY\" || { echo \"Error in semanage command\"; exit 1; }
        done;
        restorecon -v -R /opt/kata || { echo \"Error in restorecon command\"; exit 1; }
    ';
    echo NSENTER_FINISHED_WITH: $?;
    sleep infinity
"]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, looks better and provides more output. I'll add a Starting the relabel message at the beginning and test everything again and hopefully we can merge this before the weekend to get few runs to see the stability.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oups, I missed the /opt/kata/bin between the rebases... I'll include it as well.

Copy link
Member

@gkurz gkurz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @gkurz , we would like to get this PR merged so that it will be possible to test upstream kata on Openshift with SELinux turned on (current we turn it off). do you have a reason to nack this change?

Not anymore. I needed some clarifications and I'm fine with @ldoktor's answers 😃

@beraldoleal mind to review this one too?

\lgtm with @beraldoleal 's suggestions.

Thanks @ldoktor !

as kata-deploy does not currently handles selinux, this requires manual
relabel of the /opt/kata folder where custom binaries are deployed.

Fixes: kata-containers#5802

Signed-off-by: Lukáš Doktor <[email protected]>
@ldoktor
Copy link
Contributor Author

ldoktor commented Jan 19, 2024

Changes:

  • Added /opt/kata/bin and /opt/kata/runtime-rs/bin paths
  • Increased busy-loop to relable to 300s (30s was not enough for azure)
  • Line-breaks in container's cmd
  • (Created issue on GH about workaround removal)
  • Rebased

Tested on azure 4.14, worked well.

@gkurz
Copy link
Member

gkurz commented Jan 19, 2024

/test

@wainersm wainersm merged commit eb30da4 into kata-containers:main Jan 19, 2024
8 of 9 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
size/small Small and simple task
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enable selinux in OCP e2e pipeline
5 participants