Ping, for review feedback.
On Tue, Sep 26, 2023 at 05:11:44PM +0100, Daniel P. Berrangé wrote:
The 'systemd-analyze security' command looks at the unit
file
configuration and reports on any settings which increase the
attack surface for the daemon. Since most systemd units are
fairly minimalist, this is generally informing us about settings
that we never put any thought into using before.
In its current configuration it reports
# systemd-analyze security virtlogd.service
...snip...
→ Overall exposure level for virtlogd.service: 9.6 UNSAFE 😨
which is pretty terrible as a score.
If we apply all of the recommendations that appear possible
without (knowingly) breaking functionality it reports:
# systemd-analyze security virtlogd.service
...snip...
→ Overall exposure level for virtlogd.service: 2.2 OK 🙂
which is a pretty decent improvement.
Some of the settings we would like to enable require a systemd
version that is newer than that available in our oldest distro
target - RHEL-8 at v239.
NB, RestrictSUIDSGID is technically newer than 239, but RHEL-8
backported it, and other distros we target have it by default.
Remaining recommendations are
✗ CapabilityBoundingSet=~CAP_(DAC_*|FOWNER|IPC_OWNER)
We block FOWNER/IPC_OWNER, but can't block the two DAC
capabilities. Historically apps/users might point QEMU
to log files in $HOME, pre-created with their own user
ID.
✗ IPAddressDeny=
Not required since RestrictAddressFamilies blocks IP
usage. Ignoring this avoids the overhead of creating
a traffic filter than will never be used.
✗ NoNewPrivileges=
Highly desirable, but cannot enable it yet, because it
will block the ability to transition to the virtlogd_t
SELinux domain during execve. The SELinux policy needs
fixing to permit this transition under NNP first.
✗ PrivateTmp=
There is a decent chance people have VMs configured
with a serial port logfile pointing at /tmp. We would
cause a regression to use private /tmp for logging
✗ PrivateUsers=
This would put virtlogd inside a user namespace where
its root is in fact unprivileged. Same problem as the
User= setting below
✗ ProcSubset=
Libraries we link to might read certain non-PID related
files from /proc
✗ ProtectClock=
Requires v245
✗ ProtectHome=
Same problem as PrivateTmp=. There's a decent chance
that someone has a VM configured to write a logfile
to /home
✗ ProtectHostname=
Requires v241
✗ ProtectKernelLogs
Requires v244
✗ ProtectProc
Requires v247
✗ ProtectSystem=
We only set it to 'full', as 'strict' is not viable for
our required usage
✗ RootDirectory=/RootImage=
We are not capable of running inside a custom chroot
given needs to write log files to arbitrary places
✗ RestrictAddressFamilies=~AF_UNIX
We need AF_UNIX to communicate with other libvirt daemons
✗ SystemCallFilter=~@resources
We link to libvirt.so which links to libnuma.so which has
a constructor that calls set_mempolicy. This is highly
undesirable todo during a constructor.
✗ User=/DynamicUser=
This is highly desirable, but we currently read/write
logs as root, and directories we're told to write into
could be anywhere. So using a non-root user would have
a major risk of regressions for applications and also
have upgrade implications
Signed-off-by: Daniel P. Berrangé <berrange(a)redhat.com>
---
src/logging/virtlogd.service.in | 94 +++++++++++++++++++++++++++++++++
1 file changed, 94 insertions(+)
diff --git a/src/logging/virtlogd.service.in b/src/logging/virtlogd.service.in
index 8e245ddb43..9e3838ff34 100644
--- a/src/logging/virtlogd.service.in
+++ b/src/logging/virtlogd.service.in
@@ -20,5 +20,99 @@ OOMScoreAdjust=-900
# per systemd recommendations
LimitNOFILE=1024:524288
+CapabilityBoundingSet=~CAP_AUDIT_CONTROL
+CapabilityBoundingSet=~CAP_AUDIT_READ
+CapabilityBoundingSet=~CAP_AUDIT_WRITE
+CapabilityBoundingSet=~CAP_BLOCK_SUSPEND
+CapabilityBoundingSet=~CAP_CHOWN
+# Mgmt app/user might have pre-created log files that we're
+# told to open and write to, or be storing them in otherwise
+# inaccessible locations like $HOME. So we need to ignore
+# DAC permission checks.
+#CapabilityBoundingSet=~CAP_DAC_OVERRIDE
+#CapabilityBoundingSet=~CAP_DAC_READ_SEARCH
+CapabilityBoundingSet=~CAP_FOWNER
+CapabilityBoundingSet=~CAP_FSETID
+CapabilityBoundingSet=~CAP_IPC_LOCK
+CapabilityBoundingSet=~CAP_IPC_OWNER
+CapabilityBoundingSet=~CAP_KILL
+CapabilityBoundingSet=~CAP_LEASE
+CapabilityBoundingSet=~CAP_LINUX_IMMUTABLE
+CapabilityBoundingSet=~CAP_MAC_ADMIN
+CapabilityBoundingSet=~CAP_MAC_OVERRIDE
+CapabilityBoundingSet=~CAP_MKNOD
+CapabilityBoundingSet=~CAP_NET_ADMIN
+CapabilityBoundingSet=~CAP_NET_BIND_SERVICE
+CapabilityBoundingSet=~CAP_NET_BROADCAST
+CapabilityBoundingSet=~CAP_NET_RAW
+CapabilityBoundingSet=~CAP_SETFCAP
+CapabilityBoundingSet=~CAP_SETPCAP
+CapabilityBoundingSet=~CAP_SETGID
+CapabilityBoundingSet=~CAP_SETUID
+CapabilityBoundingSet=~CAP_SYSLOG
+CapabilityBoundingSet=~CAP_SYS_ADMIN
+CapabilityBoundingSet=~CAP_SYS_BOOT
+CapabilityBoundingSet=~CAP_SYS_CHROOT
+CapabilityBoundingSet=~CAP_SYS_MODULE
+CapabilityBoundingSet=~CAP_SYS_NICE
+CapabilityBoundingSet=~CAP_SYS_PACCT
+CapabilityBoundingSet=~CAP_SYS_PTRACE
+CapabilityBoundingSet=~CAP_SYS_RAWIO
+CapabilityBoundingSet=~CAP_SYS_RESOURCE
+CapabilityBoundingSet=~CAP_SYS_TIME
+CapabilityBoundingSet=~CAP_SYS_TTY_CONFIG
+CapabilityBoundingSet=~CAP_WAKE_ALARM
+
+LockPersonality=true
+MemoryDenyWriteExecute=true
+# Cannot enable this as it prevents transitioning to
+# the confined SELinux virtlogd_t domain on execve
+# unless we modify the policy to allow this.
+#NoNewPrivileges=true
+PrivateDevices=true
+PrivateMounts=true
+PrivateNetwork=true
+# XXX someone could configure QEMU to log a serial port to an
+# arbitrary directory, including /tmp, even if this is ill-advised
+#PrivateTmp=true
+# Not until oldest build target has systemd >= v245
+#ProtectClock=true
+ProtectControlGroups=true
+# Not until oldest build target has systemd >= v241
+#ProtectHostname=true
+# Not until oldest build target has systemd >= v244
+#ProtectKernelLogs=true
+ProtectKernelModules=true
+ProtectKernelTunables=true
+# Not until oldest build target has systemd >= v247
+#ProtectProc=invisible
+ProtectSystem=full
+RestrictAddressFamilies=AF_UNIX
+RestrictNamespaces=~cgroup
+RestrictNamespaces=~ipc
+RestrictNamespaces=~mnt
+RestrictNamespaces=~net
+RestrictNamespaces=~pid
+RestrictNamespaces=~user
+RestrictNamespaces=~uts
+RestrictRealtime=true
+RestrictSUIDSGID=true
+SystemCallArchitectures=native
+SystemCallFilter=~@clock
+SystemCallFilter=~@debug
+SystemCallFilter=~@module
+SystemCallFilter=~@mount
+SystemCallFilter=~@raw-io
+SystemCallFilter=~@reboot
+SystemCallFilter=~@swap
+SystemCallFilter=~@privileged
+# Unfortunately we link to libnuma via libvirt.so which
+# has a constructor that runs unconditionally that invokes
+# set_mempolicy()
+#SystemCallFilter=~@resources
+SystemCallFilter=~@cpu-emulation
+SystemCallFilter=~@obsolete
+UMask=077
+
[Install]
Also=virtlogd.socket
--
2.41.0
With regards,
Daniel
--
|:
https://berrange.com -o-
https://www.flickr.com/photos/dberrange :|
|:
https://libvirt.org -o-
https://fstop138.berrange.com :|
|:
https://entangle-photo.org -o-
https://www.instagram.com/dberrange :|