Troubleshooting a failed systemd unit (with examples)
Units in systemd may fail for a variety of reasons. In this article some examples are collected to help with troubleshooting them.
Got an issue that you can’t get resolved by using this article? Share it via the contact page and we try to help!
Common causes
Incorrect file permissions
Most services need to read or write data. When an underlying directory or specified file can’t be opened (for reading or writing), then the service may exit. Typically this is done with an exit code greater than 0, to indicate that execution of the program was terminated early.
Configuration too strict
When using the systemd security features to secure units, the configuration might prevent a service from starting as well. Commonly in that case a syscall is blocked or the file permissions or ownership are incorrect.
Generic troubleshooting steps
Systemd provides several ways of discovering why a service does not want to stop or terminated early or unexpectedly. Typically it will require only a few of the methods below to see when and why this was the case.
Check the status of a service
systemctl status nginx
This output will typically include a few lines of the journal
Consult the journal for more entries
If the output from the status subcommand does not reveal the details, then consider listing all entries of the day.
journalctl -u nginx --since="today"
Still not giving the answer, then query the journal without the unit and show the last 50 lines.
journalctl -n 50
Additional steps to resolve failed units
Restart the service
The obvious step after making changes it to restart it.
systemctl restart nginx.service
Reset the unit
Restarting not making a difference? Try resetting the failed unit, as this resets also restart counters.
systemctl reset-failed nginx.service
Restore an altered unit
If you made changes to a unit, consider clearing them.
Not sure if a unit file was changed? Look at the output of the cat subcommand and see if it shows any overrides.
systemctl cat nginx.service
Configuration changed? Make a copy of the override.conf or any other drop-in files. Then use revert to restore a unit to its original configuration.
systemctl revert nginx.service
Examples
Let’s have look at examples with cause and more advanced troubleshooting steps.
Core-dump
Let’s have a look at an example where nginx does not want to start and failing with a coredump (Failed with result ‘core-dump’).
# journalctl -u nginx --since="today"
Jun 22 10:13:26 test systemd[1]: Starting A high performance web server and a reverse proxy server...
Jun 22 10:13:27 test systemd[1]: nginx.service: Control process exited, code=dumped, status=31/SYS
Jun 22 10:13:27 test systemd[1]: nginx.service: Failed with result 'core-dump'.
Jun 22 10:13:27 test systemd[1]: Failed to start A high performance web server and a reverse proxy server.
The logging above does not give us a real clue why this failed. So let’s have a look at the journal, but without the filter for just the nginx service.
Jun 22 10:13:26 test systemd[1]: Starting A high performance web server and a reverse proxy server...
Jun 22 10:13:26 test audit[48711]: SECCOMP auid=4294967295 uid=0 gid=0 ses=4294967295 subj=unconfined pid=48711 comm="nginx" exe="/usr/sbin/nginx" sig=31 arch=c000003e syscall=41 compat=0 ip=0x7f698563db3b code=0x80000000
Jun 22 10:13:26 test kernel: kauditd_printk_skb: 1 callbacks suppressed
Jun 22 10:13:26 test kernel: audit: type=1326 audit(1719051206.884:68): auid=4294967295 uid=0 gid=0 ses=4294967295 subj=unconfined pid=48711 comm="nginx" exe="/usr/sbin/nginx" sig=31 arch=c000003e syscall=41 compat=0 ip=0x7f698563db3b code=0x80000000
Jun 22 10:13:27 test systemd[1]: nginx.service: Control process exited, code=dumped, status=31/SYS
Jun 22 10:13:27 test systemd[1]: nginx.service: Failed with result 'core-dump'.
Jun 22 10:13:27 test systemd[1]: Failed to start A high performance web server and a reverse proxy server.
In this case we see SECCOMP showing an issue for our nginx process. So that probably means our process wanted to use a syscall it was not allowed to use. Fortunately, we can troubleshoot this, as the syscall (41) is mentioned. The challenge is that this is a number, which does not say much yet.
Next step is getting this syscall number translated to the underlying name for our platform (uname -m
shows ‘x86_64’). If we look this up online or in the kernel source, we see this is related to the syscall socket(2).
As we now know that this syscall is used for creating a new socket to allow network communication on a port, we know that an important part of the networking functionality failed. As SECCOMP shows this message, it is most likely related to a syscall being blocked, or a group of syscalls. We have to check if we have the option SystemCallFilter set in our unit file.
We can query this property and see if it exists and is set:
# systemctl show --no-pager --property=SystemCallFilter nginx.service
SystemCallFilter=_llseek _newselect access add_key alarm arch_prctl brk cacheflush capget capset chdir chmod chown chown32 clock_getres clock_getres_time64 clock_gettime clock_gettime64 clock_nanosleep clock_nanosleep_time64 clone clone3 close close_range copy_file_range creat dup dup2 dup3 epoll_create epoll_create1 epoll_ctl epoll_ctl_old epoll_pwait epoll_pwait2 epoll_wait epoll_wait_old eventfd eventfd2 execve execveat exit exit_group faccessat faccessat2 fadvise64 fadvise64_64 fallocate fchdir fchmod fchmodat fchown fchown32 fchownat fcntl fcntl64 fdatasync fgetxattr flistxattr flock fork fremovexattr fsetxattr fstat fstat64 fstatat64 fstatfs fstatfs64 fsync ftruncate ftruncate64 futex futex_time64 futimesat get_mempolicy get_robust_list get_thread_area getcpu getcwd getdents getdents64 getegid getegid32 geteuid geteuid32 getgid getgid32 getgroups getgroups32 getitimer getpgid getpgrp getpid getppid getpriority getrandom getresgid getresgid32 getresuid getresuid32 getrlimit getrusage getsid gettid gettimeofday getuid getuid32 getxattr inotify_add_watch inotify_init inotify_init1 inotify_rm_watch io_cancel io_destroy io_getevents io_pgetevents io_pgetevents_time64 io_setup io_submit io_uring_enter io_uring_register io_uring_setup ioctl ioprio_get ioprio_set ipc kcmp keyctl kill lchown lchown32 lgetxattr link linkat listxattr llistxattr lremovexattr lseek lsetxattr lstat lstat64 madvise mbind membarrier migrate_pages mkdir mkdirat mknod mknodat mlock mlock2 mlockall mmap mmap2 move_pages mprotect mq_getsetattr mq_notify mq_open mq_timedreceive mq_timedreceive_time64 mq_timedsend mq_timedsend_time64 mq_unlink mremap msgctl msgget msgrcv msgsnd msync munlock munlockall munmap name_to_handle_at nanosleep newfstatat nice oldfstat oldlstat oldolduname oldstat olduname open openat openat2 pause personality pidfd_open pidfd_send_signal pipe pipe2 poll ppoll ppoll_time64 prctl pread64 preadv preadv2 prlimit64 process_madvise process_vm_readv process_vm_writev pselect6 pselect6_time64 pwrite64 pwritev pwritev2 read readahead readdir readlink readlinkat readv remap_file_pages removexattr rename renameat renameat2 request_key restart_syscall rmdir rseq rt_sigaction rt_sigpending rt_sigprocmask rt_sigqueueinfo rt_sigreturn rt_sigsuspend rt_sigtimedwait rt_sigtimedwait_time64 rt_tgsigqueueinfo sched_get_priority_max sched_get_priority_min sched_getaffinity sched_getattr sched_getparam sched_getscheduler sched_rr_get_interval sched_rr_get_interval_time64 sched_setaffinity sched_setattr sched_setparam sched_setscheduler sched_yield select semctl semget semop semtimedop semtimedop_time64 sendfile sendfile64 set_mempolicy set_robust_list set_thread_area set_tid_address set_tls setfsgid setfsgid32 setfsuid setfsuid32 setgid setgid32 setgroups setgroups32 setitimer setns setpgid setpriority setregid setregid32 setresgid setresgid32 setresuid setresuid32 setreuid setreuid32 setrlimit setsid setuid setuid32 setxattr shmat shmctl shmdt shmget sigaction sigaltstack signal signalfd signalfd4 sigpending sigprocmask sigreturn sigsuspend splice stat stat64 statfs statfs64 statx swapcontext symlink symlinkat sync sync_file_range sync_file_range2 syncfs sysinfo tee tgkill time timer_create timer_delete timer_getoverrun timer_gettime timer_gettime64 timer_settime timer_settime64 timerfd_create timerfd_gettime timerfd_gettime64 timerfd_settime timerfd_settime64 times tkill truncate truncate64 ugetrlimit umask uname unlink unlinkat unshare userfaultfd utime utimensat utimensat_time64 utimes vfork vmsplice wait4 waitid waitpid write writev
So yes, this is definitely being active. This list will show the allowed syscalls, so let’s have a look.
After a looking at the sorted list there is no ‘socket’ available, nor is bind(2) or connect(2). As software that relies on network functions like that, it makes sense that nginx will not be able to run properly. To resolve this, allow the @network-io group in the SystemCallFilter. How do we know this is the related group? That is easy to figure out from the syscall filter overview.