« Back to Systemd

Troubleshooting a failed systemd unit (with examples)

Units in systemd may fail for a variety of reasons. In this article some examples are collected to help with troubleshooting them.

Got an issue that you can’t get resolved by using this article? Share it via the contact page and we try to help!

Common causes

Incorrect file permissions

Most services need to read or write data. When an underlying directory or specified file can’t be opened (for reading or writing), then the service may exit. Typically this is done with an exit code greater than 0, to indicate that execution of the program was terminated early.

Configuration too strict

When using the systemd security features to secure units, the configuration might prevent a service from starting as well. Commonly in that case a syscall is blocked or the file permissions or ownership are incorrect.

Generic troubleshooting steps

Systemd provides several ways of discovering why a service does not want to stop or terminated early or unexpectedly. Typically it will require only a few of the methods below to see when and why this was the case.

Check the status of a service

systemctl status nginx

This output will typically include a few lines of the journal

Consult the journal for more entries

If the output from the status subcommand does not reveal the details, then consider listing all entries of the day.

journalctl -u nginx --since="today"

Still not giving the answer, then query the journal without the unit and show the last 50 lines.

journalctl -n 50

Additional steps to resolve failed units

Restart the service

The obvious step after making changes it to restart it.

systemctl restart nginx.service

Reset the unit

Restarting not making a difference? Try resetting the failed unit, as this resets also restart counters.

systemctl reset-failed nginx.service

Restore an altered unit

If you made changes to a unit, consider clearing them.

Not sure if a unit file was changed? Look at the output of the cat subcommand and see if it shows any overrides.

systemctl cat nginx.service

Configuration changed? Make a copy of the override.conf or any other drop-in files. Then use revert to restore a unit to its original configuration.

systemctl revert nginx.service

Suspecting a blocked syscall?

Typically syscalls are blocked due to the use of SystemCallFilter. In that case, seccomp might be responsible for blocking a particular syscall.

To see if this is the case, query them from the journal.

journalctl _AUDIT_TYPE_NAME=SECCOMP

Example output:

Dec 10 01:41:46 test audit[15307]: SECCOMP auid=4294967295 uid=0 gid=0 ses=4294967295 subj=unconfined pid=15307 comm="smtpd" exe="/usr/sbin/smtpd" sig=31 arch=c000003e syscall=161 compat=0 ip=0x7fc8600cfb57 code=0x80000000

Examples

Let’s have look at examples with cause and more advanced troubleshooting steps.

Failed to locate executable

Apache refuses to start as it gets a Failed to locate executable followed by the name of binary and Permission denied.

Dec 13 13:17:09 test systemd[1]: Starting apache2.service - The Apache HTTP Server...
Dec 13 13:17:09 test (pachectl)[28050]: apache2.service: Failed to locate executable /usr/sbin/apachectl: Permission denied
Dec 13 13:17:09 test (pachectl)[28050]: apache2.service: Failed at step EXEC spawning /usr/sbin/apachectl: Permission denied
Dec 13 13:17:09 test systemd[1]: apache2.service: Control process exited, code=exited, status=203/EXEC

The Permission denied part of the message is giving a good hint. This may be caused by file permissions, or due to the setting NoExecPaths together with NoExecPaths.

If this happens, check the following items:

  • Is the binary that is failed really a binary? (file /usr/sbin/apachectl)
  • Is it a symbolic link? (include that path as well)
  • Is it a shell script? (include path, usually /bin/sh)
  • Does the binary have the executable permission bit set? (chmod +x /path/to/file)

Core-dump

Let’s have a look at an example where nginx does not want to start and failing with a coredump (Failed with result ‘core-dump’).

# journalctl -u nginx --since="today"
Jun 22 10:13:26 test systemd[1]: Starting A high performance web server and a reverse proxy server...
Jun 22 10:13:27 test systemd[1]: nginx.service: Control process exited, code=dumped, status=31/SYS
Jun 22 10:13:27 test systemd[1]: nginx.service: Failed with result 'core-dump'.
Jun 22 10:13:27 test systemd[1]: Failed to start A high performance web server and a reverse proxy server.

The logging above does not give us a real clue why this failed. So let’s have a look at the journal, but without the filter for just the nginx service.

Jun 22 10:13:26 test systemd[1]: Starting A high performance web server and a reverse proxy server...
Jun 22 10:13:26 test audit[48711]: SECCOMP auid=4294967295 uid=0 gid=0 ses=4294967295 subj=unconfined pid=48711 comm="nginx" exe="/usr/sbin/nginx" sig=31 arch=c000003e syscall=41 compat=0 ip=0x7f698563db3b code=0x80000000
Jun 22 10:13:26 test kernel: kauditd_printk_skb: 1 callbacks suppressed
Jun 22 10:13:26 test kernel: audit: type=1326 audit(1719051206.884:68): auid=4294967295 uid=0 gid=0 ses=4294967295 subj=unconfined pid=48711 comm="nginx" exe="/usr/sbin/nginx" sig=31 arch=c000003e syscall=41 compat=0 ip=0x7f698563db3b code=0x80000000
Jun 22 10:13:27 test systemd[1]: nginx.service: Control process exited, code=dumped, status=31/SYS
Jun 22 10:13:27 test systemd[1]: nginx.service: Failed with result 'core-dump'.
Jun 22 10:13:27 test systemd[1]: Failed to start A high performance web server and a reverse proxy server.

In this case we see SECCOMP showing an issue for our nginx process. So that probably means our process wanted to use a syscall it was not allowed to use. Fortunately, we can troubleshoot this, as the syscall (41) is mentioned. The challenge is that this is a number, which does not say much yet.

Next step is getting this syscall number translated to the underlying name for our platform (uname -m shows ‘x86_64’). If we look this up online or in the kernel sourceExternal link , we see this is related to the syscall socket(2). Newer architectures typically use the generic syscall tableExternal link .

As we now know that this syscall is used for creating a new socket to allow network communication on a port, we know that an important part of the networking functionality failed. As SECCOMP shows this message, it is most likely related to a syscall being blocked, or a group of syscalls. We have to check if we have the option SystemCallFilter set in our unit file.

We can query this property and see if it exists and is set:

# systemctl show --no-pager --property=SystemCallFilter nginx.service 
SystemCallFilter=_llseek _newselect access add_key alarm arch_prctl brk cacheflush capget capset chdir chmod chown chown32 clock_getres clock_getres_time64 clock_gettime clock_gettime64 clock_nanosleep clock_nanosleep_time64 clone clone3 close close_range copy_file_range creat dup dup2 dup3 epoll_create epoll_create1 epoll_ctl epoll_ctl_old epoll_pwait epoll_pwait2 epoll_wait epoll_wait_old eventfd eventfd2 execve execveat exit exit_group faccessat faccessat2 fadvise64 fadvise64_64 fallocate fchdir fchmod fchmodat fchown fchown32 fchownat fcntl fcntl64 fdatasync fgetxattr flistxattr flock fork fremovexattr fsetxattr fstat fstat64 fstatat64 fstatfs fstatfs64 fsync ftruncate ftruncate64 futex futex_time64 futimesat get_mempolicy get_robust_list get_thread_area getcpu getcwd getdents getdents64 getegid getegid32 geteuid geteuid32 getgid getgid32 getgroups getgroups32 getitimer getpgid getpgrp getpid getppid getpriority getrandom getresgid getresgid32 getresuid getresuid32 getrlimit getrusage getsid gettid gettimeofday getuid getuid32 getxattr inotify_add_watch inotify_init inotify_init1 inotify_rm_watch io_cancel io_destroy io_getevents io_pgetevents io_pgetevents_time64 io_setup io_submit io_uring_enter io_uring_register io_uring_setup ioctl ioprio_get ioprio_set ipc kcmp keyctl kill lchown lchown32 lgetxattr link linkat listxattr llistxattr lremovexattr lseek lsetxattr lstat lstat64 madvise mbind membarrier migrate_pages mkdir mkdirat mknod mknodat mlock mlock2 mlockall mmap mmap2 move_pages mprotect mq_getsetattr mq_notify mq_open mq_timedreceive mq_timedreceive_time64 mq_timedsend mq_timedsend_time64 mq_unlink mremap msgctl msgget msgrcv msgsnd msync munlock munlockall munmap name_to_handle_at nanosleep newfstatat nice oldfstat oldlstat oldolduname oldstat olduname open openat openat2 pause personality pidfd_open pidfd_send_signal pipe pipe2 poll ppoll ppoll_time64 prctl pread64 preadv preadv2 prlimit64 process_madvise process_vm_readv process_vm_writev pselect6 pselect6_time64 pwrite64 pwritev pwritev2 read readahead readdir readlink readlinkat readv remap_file_pages removexattr rename renameat renameat2 request_key restart_syscall rmdir rseq rt_sigaction rt_sigpending rt_sigprocmask rt_sigqueueinfo rt_sigreturn rt_sigsuspend rt_sigtimedwait rt_sigtimedwait_time64 rt_tgsigqueueinfo sched_get_priority_max sched_get_priority_min sched_getaffinity sched_getattr sched_getparam sched_getscheduler sched_rr_get_interval sched_rr_get_interval_time64 sched_setaffinity sched_setattr sched_setparam sched_setscheduler sched_yield select semctl semget semop semtimedop semtimedop_time64 sendfile sendfile64 set_mempolicy set_robust_list set_thread_area set_tid_address set_tls setfsgid setfsgid32 setfsuid setfsuid32 setgid setgid32 setgroups setgroups32 setitimer setns setpgid setpriority setregid setregid32 setresgid setresgid32 setresuid setresuid32 setreuid setreuid32 setrlimit setsid setuid setuid32 setxattr shmat shmctl shmdt shmget sigaction sigaltstack signal signalfd signalfd4 sigpending sigprocmask sigreturn sigsuspend splice stat stat64 statfs statfs64 statx swapcontext symlink symlinkat sync sync_file_range sync_file_range2 syncfs sysinfo tee tgkill time timer_create timer_delete timer_getoverrun timer_gettime timer_gettime64 timer_settime timer_settime64 timerfd_create timerfd_gettime timerfd_gettime64 timerfd_settime timerfd_settime64 times tkill truncate truncate64 ugetrlimit umask uname unlink unlinkat unshare userfaultfd utime utimensat utimensat_time64 utimes vfork vmsplice wait4 waitid waitpid write writev

So yes, this is definitely being active. This list will show the allowed syscalls, so let’s have a look.

After a looking at the sorted list there is no ‘socket’ available, nor is bind(2) or connect(2). As software that relies on network functions like that, it makes sense that nginx will not be able to run properly. To resolve this, allow the @network-io group in the SystemCallFilter. How do we know this is the related group? That is easy to figure out from the syscall filter overview.

Relevant commands in this article

Like to learn more about the commands that were used in this article? Have a look, for some there is also a cheat sheet available.

Related articles

Like to learn more? Here is a list of articles within the same category or having similar tags.

Feedback

Small picture of Michael Boelen

This article has been written by our Linux security expert Michael Boelen. With focus on creating high-quality articles and relevant examples, he wants to improve the field of Linux security. No more web full of copy-pasted blog posts.

Discovered outdated information or have a question? Share your thoughts. Thanks for your contribution!

Mastodon icon