« Back to Systemd

How to harden a systemd service unit

New to securing and tuning systemd services? Start with the how to harden a systemd service unit article to learn tuning step-by-step, including the usage of relevant tools.

Why harden systemd service units in the first place?

Systemd service units are often configured by a basic set of settings. This allows most people to run the service without any issues. While that is fine, it also means that there is typically room for improvement, especially when it comes to security. Over the years many new unit settings were added, including some great systemd security features.

Hardening your own services is not difficult, but it requires a good approach to find the optimal balance between security and a running service. If you tighten the security measures a bit too much, then the service won’t work. If you are too sloppy, then you don’t benefit from the great sandboxing features that systemd has to offer. In this article we look at how to take a step-by-step approach, and increase the security measures in levels.

Hardening profiles

With many people running the same software packages, we crafted some hardening profiles.

This article is to support those hardening profiles and also shows how we came up with the settings.

Restricting executable paths

Related settings:

  • ExecPaths
  • NoExecPaths

The easiest way to find out what components are started or required is by using the Linux Audit Framework.

Stop service

The first step is to stop the service, so we can do a clean start.

systemctl stop dovecot.service

Activate audit rule and start service

We are interested in all events where a binary is started. For Linux systems this means we are interested in the syscall execve(2).

With that in mind, we define our audit rule, where we capture the syscall with the -s option. The -k is used to label it with a key, which we later can use to quickly find the relevant entries.

auditctl -a exit,always -F arch=b64 -S execve -k all-execve

Note: this rule is defined on a 64-bits architecture, which is common, but may be different for your system.

Start the service directly after enabling the audit rule, so the audit can log and we don’t have pollution from other processes.

systemctl start dovecot.service

Let the software run for a bit, then disable the audit rules by deleting it.

auditctl -D

Time to query all entries that were happening upon activating the audit rule.

ausearch -i -t today -k all-execve

This will show a list of entries. As we are only interested in the lines mentioning the binaries on disk, we can filter a bit more.

# ausearch -i -ts today -k all-execve | grep item=0
type=PATH msg=audit(12/16/2024 21:07:52.598:6323) : item=0 name=/usr/bin/systemctl inode=526685 dev=fe:01 mode=file,755 ouid=root ogid=root rdev=00:00 nametype=NORMAL cap_fp=none cap_fi=none cap_fe=0 cap_fver=0 cap_frootid=0 
type=PATH msg=audit(12/16/2024 21:07:52.602:6324) : item=0 name=/bin/systemd-tty-ask-password-agent inode=526697 dev=fe:01 mode=file,755 ouid=root ogid=root rdev=00:00 nametype=NORMAL cap_fp=none cap_fi=none cap_fe=0 cap_fver=0 cap_frootid=0 
type=PATH msg=audit(12/16/2024 21:07:52.634:6326) : item=0 name=/usr/sbin/dovecot inode=570398 dev=fe:01 mode=file,755 ouid=root ogid=root rdev=00:00 nametype=NORMAL cap_fp=none cap_fi=none cap_fe=0 cap_fver=0 cap_frootid=0 
type=PATH msg=audit(12/16/2024 21:07:52.638:6327) : item=0 name=/usr/bin/doveconf inode=570395 dev=fe:01 mode=file,755 ouid=root ogid=root rdev=00:00 nametype=NORMAL cap_fp=none cap_fi=none cap_fe=0 cap_fver=0 cap_frootid=0 
type=PATH msg=audit(12/16/2024 21:07:52.642:6328) : item=0 name=/usr/sbin/dovecot inode=570398 dev=fe:01 mode=file,755 ouid=root ogid=root rdev=00:00 nametype=NORMAL cap_fp=none cap_fi=none cap_fe=0 cap_fver=0 cap_frootid=0 
type=PATH msg=audit(12/16/2024 21:07:52.662:6330) : item=0 name=/usr/lib/dovecot/log inode=664678 dev=fe:01 mode=file,755 ouid=root ogid=root rdev=00:00 nametype=NORMAL cap_fp=none cap_fi=none cap_fe=0 cap_fver=0 cap_frootid=0 
type=PATH msg=audit(12/16/2024 21:07:52.666:6331) : item=0 name=/usr/lib/dovecot/anvil inode=664653 dev=fe:01 mode=file,755 ouid=root ogid=root rdev=00:00 nametype=NORMAL cap_fp=none cap_fi=none cap_fe=0 cap_fver=0 cap_frootid=0 
type=PATH msg=audit(12/16/2024 21:07:52.666:6332) : item=0 name=/usr/lib/dovecot/config inode=664656 dev=fe:01 mode=file,755 ouid=root ogid=root rdev=00:00 nametype=NORMAL cap_fp=none cap_fi=none cap_fe=0 cap_fver=0 cap_frootid=0 
type=PATH msg=audit(12/16/2024 21:08:04.266:6334) : item=0 name=/usr/sbin/auditctl inode=570333 dev=fe:01 mode=file,755 ouid=root ogid=root rdev=00:00 nametype=NORMAL cap_fp=none cap_fi=none cap_fe=0 cap_fver=0 cap_frootid=0 

This list gives us a great start. For this list we exclude anything related to systemd itself (systemctl, systemd-tty-ask-password-agent) and the audit framework (auditctl).

With a little bit of scripting we can pull in the sixth field, sort, make it unique, strip out some commands, then show it as a single line:

# ausearch -i -ts today -k all-execve | grep item=0 | awk '{print $6}' | awk -F= '{print $2}' | sort | uniq | grep -vE "(systemctl|systemd|auditctl)" | tr '\n' ' '`
/usr/bin/doveconf /usr/lib/dovecot/anvil /usr/lib/dovecot/config /usr/lib/dovecot/log /usr/sbin/dovecot

This line are the executables we at least need for our service to run. We can now define an explicit deny for the root path using NoExecPaths and add our allowed binaries to ExecPaths.

[Service]
NoExecPaths=/
ExecPaths=/usr/bin/doveconf /usr/lib/dovecot/anvil /usr/lib/dovecot/config /usr/lib/dovecot/log /usr/sbin/dovecot

After adding these lines, it is time to restart the service and see if everything stays working.

systemctl restart dovecot.service

Restricting capabilities and syscalls

Most processes that run as a daemon will require some of the available Linux capabilities. Some developers define these capabilities clearly, but most of them don’t. In that case, we need to figure out what capabilities are required to operate correctly. As an extension to these capabilities, we have the [syscalls](<{{ relref “/kernel/syscalls/_index.md” >}}) that are used. These system functions allow the user space program to communicate with the kernel in a standardized way. To have a process working correctly, we need to make sure that it also can use the syscalls it requires, similarly to the capabilities. This is also where capabilities and syscalls come together, as usually the usage of syscalls give a very good hint on what capabilities are required.

To find about more about the capabilities and syscalls, we have a few options that we can use. Let’s have a look at them, so we can tune our systemd services the best way possible.

Option 1: Using strace

Inspect and adjust the existing service

The first action that we are going to take is to [edit a systemd unit] and add the strace command. This means that strace needs to be installed.

First we want to find the current ExecStart value. We need this, so we can add it to our override file, prepended with the strace command.

systemctl cat dovecot.service | grep ExecStart

Next step is to edit the service.

systemctl edit dovecot.service

Your editor will open and it is time to define a [Service] block with two additional lines. The first one clears the existing ExecStart, while the second one add the strace command to it.

[Service]
ExecStart=
ExecStart=/usr/bin/strace --absolute-timestamps=precision:us --daemonize --follow-forks --output=/tmp/strace.log /usr/sbin/dovecot -F

Restart the service

systemctl restart dovecot.service

Perform some basic tasks

The service should be running now, and strace will track what it is doing in the background. This may result in a lot of logging, so we just leave this on for a short moment of time, like a few minutes. In the case of a HTTP server, you could a few requests, for a mail server it would be useful to send an email, and so on.

Copy the log file

There should be a in log file, most likely stored as /tmp/systemd-private-IDENTIFIER-systemd-SERVICENAME-RANDOMSTRING/tmp/strace.log. Obviously the path is different on each system and run.

Stop the service and comment out the ExecStart lines

Next step is to stop the service. Edit the service unit again and disable the lines by commenting it out. If the service needs to be running, start it again.

First analysis of the strace log

The log file will be filled with syscalls that are requested. While they might look cryptic at first, we can learn a lot about the functionality that a service needs. Let’s have a look at a few of those lines:

43193 19:10:39.162977 execve("/usr/sbin/dovecot", ["/usr/sbin/dovecot", "-F"], 0x7ffecf1f3558 /* 7 vars */) = 0
43193 19:10:39.163186 brk(NULL)         = 0x560b8b386000
43193 19:10:39.163241 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f2425f00000
43193 19:10:39.163264 access("/etc/ld.so.preload", R_OK) = 0
43193 19:10:39.163287 openat(AT_FDCWD, "/etc/ld.so.preload", O_RDONLY|O_CLOEXEC) = 3
43193 19:10:39.163313 newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=0, ...}, AT_EMPTY_PATH) = 0
43193 19:10:39.163338 close(3)          = 0
43193 19:10:39.163358 openat(AT_FDCWD, "/usr/lib/dovecot/glibc-hwcaps/x86-64-v4/libsystemd.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
43193 19:10:39.163417 newfstatat(AT_FDCWD, "/usr/lib/dovecot/glibc-hwcaps/x86-64-v4", 0x7ffd1fb8bab0, 0) = -1 ENOENT (No such file or directory)
43193 19:10:39.163438 openat(AT_FDCWD, "/usr/lib/dovecot/glibc-hwcaps/x86-64-v3/libsystemd.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)

In this case we see execve(2), a call to execute a binary, followed by some file requests. At this moment we are not interested in the specifics yet. First action is to gather all the system calls. We can do this by getting the third column. Unfortunately, the third column is not just showing the system calls alone, but also some parameters. That is something we need to filter out.

awk '{print $3}' strace-dovecot.log | awk -F\( '{print $1}' | grep -E "^[a-z]" | sort | uniq -c | sort -k1 -n

Breakdown of this command:

  • Print the third column using awk
  • Split this output at the parentheses sign, we want to have the left part (syscall)
  • To reduce some clutter, only show those names that start with lowercase character
  • Sort the output
  • Count all unique occurrences
  • Sort the new output by the first key (the number of occurrences) and do this with numeric rules in mind

The output is a list of matches and might look like this:

      1 capget
      1 capset
      1 exit_group
      1 fdatasync
      1 fstatfs
      1 link
      1 readlink
      1 rt_sigreturn
      1 sendmsg
      1 socketpair
      1 symlink
      1 writev
      2 chroot
      2 mkdir
      2 rename
      2 rt_sigprocmask
      2 sendto
      2 sysinfo
      2 wait4
      3 chdir
      3 dup
      5 getpeername
      5 uname
      6 clone
      6 mknodat
      6 setgroups
      7 getsockopt
      7 getuid
      8 alarm
      8 setgid
      8 setuid
      9 arch_prctl
      9 execve
      9 geteuid
      9 getgid
      9 rseq
      9 set_tid_address
     10 epoll_create
     10 getegid
     11 munmap
     14 chown
     15 set_robust_list
     16 prctl
     19 futex
     20 getdents64
     23 getpid
     25 prlimit64
     26 getrandom
     34 access
     36 setsockopt
     36 unlink
     37 accept
     39 listen
     47 brk
     52 connect
     54 rt_sigaction
     59 getsockname
     63 bind
     70 dup2
     78 epoll_wait
     81 write
     86 lseek
     87 pipe2
     88 pread64
     93 mprotect
     99 umask
    116 socket
    253 epoll_ctl
    298 read
    323 close
    381 mmap
    452 openat
    570 newfstatat
    930 fcntl

These syscalls are useful to look them up and see in which filter sets they belong to.

Option 2: Using the SystemCallLog setting

With the help of the systemd unit setting SystemCallLog we can log any any matches and is available since systemd 247. The interesting part of this setting is that we can tell it to only log those system calls that do or do NOT match.

Enable the log setting

Since most services need a basic set, we will be granting our service unit the @system-service filter set. So that is also the first set that we will define in the log. All syscalls that are NOT part of this set, can be discovered using the following configuration.

[Service]
SystemCallLog=~@system-service

Restart service and check seccomp output

If we restart our service and then filter on the recent items related to seccomp, we can find if anything would be blocked.

systemctl restart dovecot.service && journalctl _AUDIT_TYPE_NAME=SECCOMP --since "1 min ago"

We see the following entry showing up:

Dec 16 23:46:47 debian-test audit[44560]: SECCOMP auid=4294967295 uid=0 gid=114 ses=4294967295 subj=unconfined pid=44560 comm="anvil" exe="/usr/lib/dovecot/anvil" sig=0 arch=c000003e syscall=161 compat=0 ip=0x7f7d7904db57 code=0x7ffc0000

In this case the syscall has number 161, which translates on x86_64 to the chroot(2) syscall. With help of the capabilities overview we can see that the chroot(2) syscall is part of the capability CAP_SYS_CHROOT. So we need to make sure that this program is able to properly use this functionality.

When we look at the syscall filter sets used by systemd, then we can see that chroot is part of the filter set @privileged and @mount. The latter is a common filter set for system services. So besides giving the capability, we will grant the filter set @mount. Before we do that, we extend our existing logging.

Adjusting the service

[Service]
SystemCallLog=~@mount @system-service

We restart the service again, followed by the journalctl command. This time we only request the items of the very last minute.

# systemctl restart dovecot.service && journalctl _AUDIT_TYPE_NAME=SECCOMP --since "1 min ago"
-- No entries --

No entries are displayed. Instead of @mount, we could also try if just chroot it enough. Let’s remove @mount and add chroot to the end of the list.

[Service]
SystemCallLog=~@system-service chroot

Again it will show no entries, so we know that just granting chroot is already enough. This will restrict make the set as small as possible.

Enable syscall filtering with SystemCallFilter

Now that we know what filter set(s) and syscall(s) we need, we can start enabling the syscall filtering with the SystemCallFilter setting.

Let’s reorder the setting a bit and begin with the filter set(s), followed by the individual syscalls.

[Service]
SystemCallFilter=@system-service chroot

Tip: you can rename Log into Filter, but don’t forget to remove the tilde (~) at the beginning of the line.

Restart the service to test if everything is still working as expected.

Enable capabilities filtering with CapabilityBoundingSet

Capabilities are not as easily logged as syscalls. At the same time, we have seen the two options to gather the system calls. By looking them up in the capabilities overview we know that the chroot syscall requires CAP_SYS_CHROOT.

A few common capabilities for processes to run:

CapabilityPurposeRelated syscalls
CAP_CHOWNAllow changing file ownershipschown
CAP_DAC_OVERRIDEBypasses file read, write, and execute permission checksmount, utime, utimensat
CAP_NET_BIND_SERVICEBind a socket to a privileged port number below 1024bind
CAP_SETGIDAllows making changes to the group ID of a processclone, getgroups, seteuid, setfsgid, setgid, setgroups, setresuid, setreuid
CAP_SETUIDAllows making changes to the user ID of a processclone, keyctl, seteuid, setfsuid, setresuid, setreuid, setuid

So let’s add these items and allow chroot functionality.

[Service]
CapabilityBoundingSet=CAP_CHOWN CAP_DAC_OVERRIDE CAP_NET_BIND_SERVICE CAP_SETGID CAP_SETUID CAP_SYS_CHROOT

If all is well, a restart of the service should still succeed.

Restricting resources

Systemd comes with two groups of settings that restrict resources which start with Restrict and Protect. Let’s have a look at common options to further enhance the security posture of our system services.

KeyringMode

The kernel keyring provides key material to services, such as security data, encryption keys, and authentication information. If a service does not need access to the keyring of a particular user (including root), then systemd allows restricting this using the KeyringMode setting.

When in doubt that key material is requested by a service, inspect the program code or use strace to track the following syscalls:

  • add_key(2)
  • request_key(2)

If no access is needed to key material, then lock access down.

[Service]
KeyringMode=private

ProtectClock

With the setting ProtectClock we can prevent a service from making any changes to the system clock.

Most processes should only allowed to read clock information, but not modify it. The obvious exception to this is a service like a NTP daemon or program such as rdate. For most services it is therefore safe to prevent the service attempting to make changes to the system clock.

[Service]
ProtectClock=yes

ProtectHostname

A process rarely needs to change the hostname or NIS domain name of the system. In this case the ProtectHostname can be used to prevent this.

[Service]
ProtectHostname=yes

To know if changes to the hostname or NIS domain name are needed, we can look for the following syscalls:

  • sethostname(2)
  • setdomainname(2)

If these are not present, then this setting can be enabled.

ProtectKernelModules

Most services do not need to load new kernel modules. With systemd unit setting ProtectKernelModules the explicit loading of kernel modules can be blocked.

This setting can be applied to most system services. Some software, especially focused on network traffic capture, may use a custom kernel module and have the need to load it. But otherwise it is safe to block it, preventing any unauthorized loading of kernel modules.

[Service]
ProtectKernel=yes

This section is under development and new settings are being added.

Relevant commands in this article

Like to learn more about the commands that were used in this article? Have a look, for some there is also a cheat sheet available.

Feedback

Small picture of Michael Boelen

This article has been written by our Linux security expert Michael Boelen. With focus on creating high-quality articles and relevant examples, he wants to improve the field of Linux security. No more web full of copy-pasted blog posts.

Discovered outdated information or have a question? Share your thoughts. Thanks for your contribution!

Mastodon icon

Related articles

Like to learn more? Here is a list of articles within the same category or having similar tags.