How to harden a systemd service unit
New to securing and tuning systemd services? Start with the how to harden a systemd service unit article to learn tuning step-by-step, including the usage of relevant tools.
Why harden systemd service units in the first place?
Systemd service units are often configured by a basic set of settings. This allows most people to run the service without any issues. While that is fine, it also means that there is typically room for improvement, especially when it comes to security. Over the years many new unit settings were added, including some great systemd security features.
Hardening your own services is not difficult, but it requires a good approach to find the optimal balance between security and a running service. If you tighten the security measures a bit too much, then the service won’t work. If you are too sloppy, then you don’t benefit from the great sandboxing features that systemd has to offer. In this article we look at how to take a step-by-step approach, and increase the security measures in levels.
Hardening profiles
With many people running the same software packages, we crafted some hardening profiles.
This article is to support those hardening profiles and also shows how we came up with the settings.
Restricting executable paths
Related settings:
- ExecPaths
- NoExecPaths
The easiest way to find out what components are started or required is by using the Linux Audit Framework.
Stop service
The first step is to stop the service, so we can do a clean start.
systemctl stop dovecot.service
Activate audit rule and start service
We are interested in all events where a binary is started. For Linux systems this means we are interested in the syscall execve(2).
With that in mind, we define our audit rule, where we capture the syscall with the -s option. The -k is used to label it with a key, which we later can use to quickly find the relevant entries.
auditctl -a exit,always -F arch=b64 -S execve -k all-execve
Note: this rule is defined on a 64-bits architecture, which is common, but may be different for your system.
Start the service directly after enabling the audit rule, so the audit can log and we don’t have pollution from other processes.
systemctl start dovecot.service
Stop audit rule and search
Let the software run for a bit, then disable the audit rules by deleting it.
auditctl -D
Time to query all entries that were happening upon activating the audit rule.
ausearch -i -t today -k all-execve
This will show a list of entries. As we are only interested in the lines mentioning the binaries on disk, we can filter a bit more.
# ausearch -i -ts today -k all-execve | grep item=0
type=PATH msg=audit(12/16/2024 21:07:52.598:6323) : item=0 name=/usr/bin/systemctl inode=526685 dev=fe:01 mode=file,755 ouid=root ogid=root rdev=00:00 nametype=NORMAL cap_fp=none cap_fi=none cap_fe=0 cap_fver=0 cap_frootid=0
type=PATH msg=audit(12/16/2024 21:07:52.602:6324) : item=0 name=/bin/systemd-tty-ask-password-agent inode=526697 dev=fe:01 mode=file,755 ouid=root ogid=root rdev=00:00 nametype=NORMAL cap_fp=none cap_fi=none cap_fe=0 cap_fver=0 cap_frootid=0
type=PATH msg=audit(12/16/2024 21:07:52.634:6326) : item=0 name=/usr/sbin/dovecot inode=570398 dev=fe:01 mode=file,755 ouid=root ogid=root rdev=00:00 nametype=NORMAL cap_fp=none cap_fi=none cap_fe=0 cap_fver=0 cap_frootid=0
type=PATH msg=audit(12/16/2024 21:07:52.638:6327) : item=0 name=/usr/bin/doveconf inode=570395 dev=fe:01 mode=file,755 ouid=root ogid=root rdev=00:00 nametype=NORMAL cap_fp=none cap_fi=none cap_fe=0 cap_fver=0 cap_frootid=0
type=PATH msg=audit(12/16/2024 21:07:52.642:6328) : item=0 name=/usr/sbin/dovecot inode=570398 dev=fe:01 mode=file,755 ouid=root ogid=root rdev=00:00 nametype=NORMAL cap_fp=none cap_fi=none cap_fe=0 cap_fver=0 cap_frootid=0
type=PATH msg=audit(12/16/2024 21:07:52.662:6330) : item=0 name=/usr/lib/dovecot/log inode=664678 dev=fe:01 mode=file,755 ouid=root ogid=root rdev=00:00 nametype=NORMAL cap_fp=none cap_fi=none cap_fe=0 cap_fver=0 cap_frootid=0
type=PATH msg=audit(12/16/2024 21:07:52.666:6331) : item=0 name=/usr/lib/dovecot/anvil inode=664653 dev=fe:01 mode=file,755 ouid=root ogid=root rdev=00:00 nametype=NORMAL cap_fp=none cap_fi=none cap_fe=0 cap_fver=0 cap_frootid=0
type=PATH msg=audit(12/16/2024 21:07:52.666:6332) : item=0 name=/usr/lib/dovecot/config inode=664656 dev=fe:01 mode=file,755 ouid=root ogid=root rdev=00:00 nametype=NORMAL cap_fp=none cap_fi=none cap_fe=0 cap_fver=0 cap_frootid=0
type=PATH msg=audit(12/16/2024 21:08:04.266:6334) : item=0 name=/usr/sbin/auditctl inode=570333 dev=fe:01 mode=file,755 ouid=root ogid=root rdev=00:00 nametype=NORMAL cap_fp=none cap_fi=none cap_fe=0 cap_fver=0 cap_frootid=0
This list gives us a great start. For this list we exclude anything related to systemd itself (systemctl, systemd-tty-ask-password-agent) and the audit framework (auditctl).
With a little bit of scripting we can pull in the sixth field, sort, make it unique, strip out some commands, then show it as a single line:
# ausearch -i -ts today -k all-execve | grep item=0 | awk '{print $6}' | awk -F= '{print $2}' | sort | uniq | grep -vE "(systemctl|systemd|auditctl)" | tr '\n' ' '`
/usr/bin/doveconf /usr/lib/dovecot/anvil /usr/lib/dovecot/config /usr/lib/dovecot/log /usr/sbin/dovecot
This line are the executables we at least need for our service to run. We can now define an explicit deny for the root path using NoExecPaths and add our allowed binaries to ExecPaths.
[Service]
NoExecPaths=/
ExecPaths=/usr/bin/doveconf /usr/lib/dovecot/anvil /usr/lib/dovecot/config /usr/lib/dovecot/log /usr/sbin/dovecot
After adding these lines, it is time to restart the service and see if everything stays working.
systemctl restart dovecot.service
Restricting capabilities and syscalls
Most processes that run as a daemon will require some of the available Linux capabilities. Some developers define these capabilities clearly, but most of them don’t. In that case, we need to figure out what capabilities are required to operate correctly. As an extension to these capabilities, we have the [syscalls](<{{ relref “/kernel/syscalls/_index.md” >}}) that are used. These system functions allow the user space program to communicate with the kernel in a standardized way. To have a process working correctly, we need to make sure that it also can use the syscalls it requires, similarly to the capabilities. This is also where capabilities and syscalls come together, as usually the usage of syscalls give a very good hint on what capabilities are required.
To find about more about the capabilities and syscalls, we have a few options that we can use. Let’s have a look at them, so we can tune our systemd services the best way possible.
Option 1: Using strace
Inspect and adjust the existing service
The first action that we are going to take is to [edit a systemd unit] and add the strace command. This means that strace needs to be installed.
First we want to find the current ExecStart value. We need this, so we can add it to our override file, prepended with the strace
command.
systemctl cat dovecot.service | grep ExecStart
Next step is to edit the service.
systemctl edit dovecot.service
Your editor will open and it is time to define a [Service] block with two additional lines. The first one clears the existing ExecStart, while the second one add the strace command to it.
[Service]
ExecStart=
ExecStart=/usr/bin/strace --absolute-timestamps=precision:us --daemonize --follow-forks --output=/tmp/strace.log /usr/sbin/dovecot -F
Restart the service
systemctl restart dovecot.service
Perform some basic tasks
The service should be running now, and strace will track what it is doing in the background. This may result in a lot of logging, so we just leave this on for a short moment of time, like a few minutes. In the case of a HTTP server, you could a few requests, for a mail server it would be useful to send an email, and so on.
Copy the log file
There should be a in log file, most likely stored as /tmp/systemd-private-IDENTIFIER-systemd-SERVICENAME-RANDOMSTRING/tmp/strace.log. Obviously the path is different on each system and run.
Stop the service and comment out the ExecStart lines
Next step is to stop the service. Edit the service unit again and disable the lines by commenting it out. If the service needs to be running, start it again.
First analysis of the strace log
The log file will be filled with syscalls that are requested. While they might look cryptic at first, we can learn a lot about the functionality that a service needs. Let’s have a look at a few of those lines:
43193 19:10:39.162977 execve("/usr/sbin/dovecot", ["/usr/sbin/dovecot", "-F"], 0x7ffecf1f3558 /* 7 vars */) = 0
43193 19:10:39.163186 brk(NULL) = 0x560b8b386000
43193 19:10:39.163241 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f2425f00000
43193 19:10:39.163264 access("/etc/ld.so.preload", R_OK) = 0
43193 19:10:39.163287 openat(AT_FDCWD, "/etc/ld.so.preload", O_RDONLY|O_CLOEXEC) = 3
43193 19:10:39.163313 newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=0, ...}, AT_EMPTY_PATH) = 0
43193 19:10:39.163338 close(3) = 0
43193 19:10:39.163358 openat(AT_FDCWD, "/usr/lib/dovecot/glibc-hwcaps/x86-64-v4/libsystemd.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
43193 19:10:39.163417 newfstatat(AT_FDCWD, "/usr/lib/dovecot/glibc-hwcaps/x86-64-v4", 0x7ffd1fb8bab0, 0) = -1 ENOENT (No such file or directory)
43193 19:10:39.163438 openat(AT_FDCWD, "/usr/lib/dovecot/glibc-hwcaps/x86-64-v3/libsystemd.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
In this case we see execve(2), a call to execute a binary, followed by some file requests. At this moment we are not interested in the specifics yet. First action is to gather all the system calls. We can do this by getting the third column. Unfortunately, the third column is not just showing the system calls alone, but also some parameters. That is something we need to filter out.
awk '{print $3}' strace-dovecot.log | awk -F\( '{print $1}' | grep -E "^[a-z]" | sort | uniq -c | sort -k1 -n
Breakdown of this command:
- Print the third column using awk
- Split this output at the parentheses sign, we want to have the left part (syscall)
- To reduce some clutter, only show those names that start with lowercase character
- Sort the output
- Count all unique occurrences
- Sort the new output by the first key (the number of occurrences) and do this with numeric rules in mind
The output is a list of matches and might look like this:
1 capget
1 capset
1 exit_group
1 fdatasync
1 fstatfs
1 link
1 readlink
1 rt_sigreturn
1 sendmsg
1 socketpair
1 symlink
1 writev
2 chroot
2 mkdir
2 rename
2 rt_sigprocmask
2 sendto
2 sysinfo
2 wait4
3 chdir
3 dup
5 getpeername
5 uname
6 clone
6 mknodat
6 setgroups
7 getsockopt
7 getuid
8 alarm
8 setgid
8 setuid
9 arch_prctl
9 execve
9 geteuid
9 getgid
9 rseq
9 set_tid_address
10 epoll_create
10 getegid
11 munmap
14 chown
15 set_robust_list
16 prctl
19 futex
20 getdents64
23 getpid
25 prlimit64
26 getrandom
34 access
36 setsockopt
36 unlink
37 accept
39 listen
47 brk
52 connect
54 rt_sigaction
59 getsockname
63 bind
70 dup2
78 epoll_wait
81 write
86 lseek
87 pipe2
88 pread64
93 mprotect
99 umask
116 socket
253 epoll_ctl
298 read
323 close
381 mmap
452 openat
570 newfstatat
930 fcntl
These syscalls are useful to look them up and see in which filter sets they belong to.
Option 2: Using the SystemCallLog setting
With the help of the systemd unit setting SystemCallLog we can log any any matches and is available since systemd 247. The interesting part of this setting is that we can tell it to only log those system calls that do or do NOT match.
Enable the log setting
Since most services need a basic set, we will be granting our service unit the @system-service filter set. So that is also the first set that we will define in the log. All syscalls that are NOT part of this set, can be discovered using the following configuration.
[Service]
SystemCallLog=~@system-service
Restart service and check seccomp output
If we restart our service and then filter on the recent items related to seccomp, we can find if anything would be blocked.
systemctl restart dovecot.service && journalctl _AUDIT_TYPE_NAME=SECCOMP --since "1 min ago"
We see the following entry showing up:
Dec 16 23:46:47 debian-test audit[44560]: SECCOMP auid=4294967295 uid=0 gid=114 ses=4294967295 subj=unconfined pid=44560 comm="anvil" exe="/usr/lib/dovecot/anvil" sig=0 arch=c000003e syscall=161 compat=0 ip=0x7f7d7904db57 code=0x7ffc0000
In this case the syscall has number 161, which translates on x86_64 to the chroot(2) syscall. With help of the capabilities overview we can see that the chroot(2) syscall is part of the capability CAP_SYS_CHROOT. So we need to make sure that this program is able to properly use this functionality.
When we look at the syscall filter sets used by systemd, then we can see that chroot is part of the filter set @privileged and @mount. The latter is a common filter set for system services. So besides giving the capability, we will grant the filter set @mount. Before we do that, we extend our existing logging.
Adjusting the service
[Service]
SystemCallLog=~@mount @system-service
We restart the service again, followed by the journalctl command. This time we only request the items of the very last minute.
# systemctl restart dovecot.service && journalctl _AUDIT_TYPE_NAME=SECCOMP --since "1 min ago"
-- No entries --
No entries are displayed. Instead of @mount, we could also try if just chroot it enough. Let’s remove @mount and add chroot to the end of the list.
[Service]
SystemCallLog=~@system-service chroot
Again it will show no entries, so we know that just granting chroot is already enough. This will restrict make the set as small as possible.
Enable syscall filtering with SystemCallFilter
Now that we know what filter set(s) and syscall(s) we need, we can start enabling the syscall filtering with the SystemCallFilter setting.
Let’s reorder the setting a bit and begin with the filter set(s), followed by the individual syscalls.
[Service]
SystemCallFilter=@system-service chroot
Tip: you can rename Log into Filter, but don’t forget to remove the tilde (~) at the beginning of the line.
Restart the service to test if everything is still working as expected.
Enable capabilities filtering with CapabilityBoundingSet
Capabilities are not as easily logged as syscalls. At the same time, we have seen the two options to gather the system calls. By looking them up in the capabilities overview we know that the chroot syscall requires CAP_SYS_CHROOT.
A few common capabilities for processes to run:
Capability | Purpose | Related syscalls |
---|---|---|
CAP_CHOWN | Allow changing file ownerships | chown |
CAP_DAC_OVERRIDE | Bypasses file read, write, and execute permission checks | mount, utime, utimensat |
CAP_NET_BIND_SERVICE | Bind a socket to a privileged port number below 1024 | bind |
CAP_SETGID | Allows making changes to the group ID of a process | clone, getgroups, seteuid, setfsgid, setgid, setgroups, setresuid, setreuid |
CAP_SETUID | Allows making changes to the user ID of a process | clone, keyctl, seteuid, setfsuid, setresuid, setreuid, setuid |
So let’s add these items and allow chroot functionality.
[Service]
CapabilityBoundingSet=CAP_CHOWN CAP_DAC_OVERRIDE CAP_NET_BIND_SERVICE CAP_SETGID CAP_SETUID CAP_SYS_CHROOT
If all is well, a restart of the service should still succeed.
Restricting resources
Systemd comes with two groups of settings that restrict resources which start with Restrict and Protect. Let’s have a look at common options to further enhance the security posture of our system services.
KeyringMode
The kernel keyring provides key material to services, such as security data, encryption keys, and authentication information. If a service does not need access to the keyring of a particular user (including root), then systemd allows restricting this using the KeyringMode setting.
When in doubt that key material is requested by a service, inspect the program code or use strace
to track the following syscalls:
- add_key(2)
- request_key(2)
If no access is needed to key material, then lock access down.
[Service]
KeyringMode=private
ProtectClock
With the setting ProtectClock we can prevent a service from making any changes to the system clock.
Most processes should only allowed to read clock information, but not modify it. The obvious exception to this is a service like a NTP daemon or program such as rdate. For most services it is therefore safe to prevent the service attempting to make changes to the system clock.
[Service]
ProtectClock=yes
ProtectHostname
A process rarely needs to change the hostname or NIS domain name of the system. In this case the ProtectHostname can be used to prevent this.
[Service]
ProtectHostname=yes
To know if changes to the hostname or NIS domain name are needed, we can look for the following syscalls:
- sethostname(2)
- setdomainname(2)
If these are not present, then this setting can be enabled.
ProtectKernelModules
Most services do not need to load new kernel modules. With systemd unit setting ProtectKernelModules the explicit loading of kernel modules can be blocked.
This setting can be applied to most system services. Some software, especially focused on network traffic capture, may use a custom kernel module and have the need to load it. But otherwise it is safe to block it, preventing any unauthorized loading of kernel modules.
[Service]
ProtectKernel=yes
This section is under development and new settings are being added.