Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add systemd configurations to strengthen OS core security #17107

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG-3.0.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
- GHA to verify checklist items completion in PR descriptions ([#10800](https://github.com/opensearch-project/OpenSearch/pull/10800))
- Allow to pass the list settings through environment variables (like [], ["a", "b", "c"], ...) ([#10625](https://github.com/opensearch-project/OpenSearch/pull/10625))
- Views, simplify data access and manipulation by providing a virtual layer over one or more indices ([#11957](https://github.com/opensearch-project/OpenSearch/pull/11957))
- Add systemd configurations to strengthen OS core security ([#17107](https://github.com/opensearch-project/OpenSearch/pull/17107))

### Dependencies
- Update Apache Lucene to 10.1.0 ([#16366](https://github.com/opensearch-project/OpenSearch/pull/16366))
Expand Down
91 changes: 91 additions & 0 deletions distribution/packages/src/common/systemd/opensearch.service
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we remove the diff from this file? we can update our docs to use the template based config for more advanced security option. That would also prevent this change to be a breaking change.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes sure 👍🏻

Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ ExecStart=/usr/share/opensearch/bin/systemd-entrypoint -p ${PID_DIR}/opensearch.
# logging, you can simply remove the "quiet" option from ExecStart.
StandardOutput=journal
StandardError=inherit
SyslogIdentifier=opensearch

# Specifies the maximum file descriptor number that can be opened by this process
LimitNOFILE=65535
Expand Down Expand Up @@ -60,6 +61,96 @@ SuccessExitStatus=143
# Allow a slow startup before the systemd notifier module kicks in to extend the timeout
TimeoutStartSec=75

# Prevent modifications to the control group filesystem
ProtectControlGroups=true

# Prevent loading or reading kernel modules
ProtectKernelModules=true

# Prevent altering kernel tunables (sysctl parameters)
ProtectKernelTunables=true

# Set device access policy to 'closed', allowing access only to specific devices
DevicePolicy=closed

# Make /proc invisible to the service, enhancing isolation
ProtectProc=invisible

# Make /usr, /boot, and /etc read-only (less restrictive than 'strict')
ProtectSystem=full

# Prevent changes to control groups (redundant with earlier setting, can be removed)
ProtectControlGroups=yes

# Prevent changing the execution domain
LockPersonality=yes


# System call filtering
# System call filterings which restricts which system calls a process can make
# @ means allowed
# ~ means not allowed
SystemCallFilter=@system-service
SystemCallFilter=~@reboot
SystemCallFilter=~@swap

SystemCallErrorNumber=EPERM

# Capability restrictions
# Remove the ability to block system suspends
CapabilityBoundingSet=~CAP_BLOCK_SUSPEND

# Remove the ability to establish leases on files
CapabilityBoundingSet=~CAP_LEASE

# Remove the ability to use system resource accounting
CapabilityBoundingSet=~CAP_SYS_PACCT

# Remove the ability to configure TTY devices
CapabilityBoundingSet=~CAP_SYS_TTY_CONFIG

# Remov below capabilities:
# - CAP_SYS_ADMIN: Various system administration operations
# - CAP_SYS_PTRACE: Ability to trace processes
# - CAP_NET_ADMIN: Various network-related operations
CapabilityBoundingSet=~CAP_SYS_ADMIN ~CAP_SYS_PTRACE ~CAP_NET_ADMIN


# Address family restrictions
RestrictAddressFamilies=AF_INET AF_INET6 AF_UNIX

# Filesystem Access

ReadWritePaths=/var/log/opensearch
ReadWritePaths=/var/lib/opensearch
ReadWritePaths=/mnt/snapshots

## Allow read access to system files
ReadOnlyPaths=/etc/os-release /usr/lib/os-release /etc/system-release

## Allow read access to Linux IO stats
ReadOnlyPaths=/proc/self/mountinfo /proc/diskstats
Copy link
Collaborator

@reta reta Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @RajatGupta02 , I am wondering how does these conflicting settings

ReadOnlyPaths=/proc/self/mountinfo /proc/diskstats

and

ProtectProc=invisible

resolve?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ProtectProc=invisible limits the visibility of /proc entries to the process itself whereas ReadOnlyPaths enforces a read-only policy, but it does not hide or restrict visibility.

So, in this case the service can read from /proc/self/mountinfo and /proc/diskstats, but it cannot write to them due to ReadOnlyPaths, these paths are not hidden by ProtectProc=invisible because they are specific files the service is allowed to access.


## Allow read access to control group stats
ReadOnlyPaths=/proc/self/cgroup /sys/fs/cgroup/cpu /sys/fs/cgroup/cpu/-
ReadOnlyPaths=/sys/fs/cgroup/cpuacct /sys/fs/cgroup/cpuacct/- /sys/fs/cgroup/memory /sys/fs/cgroup/memory/-


RestrictNamespaces=true

NoNewPrivileges=true

# Memory and execution protection
MemoryDenyWriteExecute=true # Prevent creating writable executable memory mappings
SystemCallArchitectures=native # Allow only native system calls
KeyringMode=private # Service does not share key material with other services
LockPersonality=true # Prevent changing ABI personality
RestrictSUIDSGID=true # Prevent creating SUID/SGID files
RestrictRealtime=true # Prevent acquiring realtime scheduling
ProtectHostname=true # Prevent changes to system hostname
ProtectKernelLogs=true # Prevent reading/writing kernel logs
ProtectClock=true # Prevent tampering with the system clock

[Install]
WantedBy=multi-user.target

Expand Down
Loading