Auditing systemd: solving failed units with systemctl
Solving failed units with systemctl
Systemd is an alternative service manager to the more traditional init system. To ensure the system is healthy, failed units should be investigated on a regular basis. Sooner or later a unit might fail and showing up the systemctl listing. In this article we have a look at how to solve it.
Why do services fail?
During the start of the system, enabled services are started and queued to be executed. Most processes will start correctly and systemd logs the related status in the journal. However, in some cases a service might enter a “failed state”, as a result of another command not finishing properly.
# systemctl
UNIT LOAD ACTIVE SUB DESCRIPTION
-.mount loaded active mounted /
boot.mount loaded active mounted /boot
dev-hugepages.mount loaded active mounted Huge Pages File System
● dev-mqueue.mount loaded failed failed POSIX Message Queue File System
run-user-0.mount loaded active mounted /run/user/0
sys-kernel-config.mount loaded active mounted Configuration File System
sys-kernel-debug.mount loaded active mounted Debug File System
tmp.mount loaded active mounted Temporary Directory
Services usually fail because of a missing dependency (e.g. a file or mount point), missing configuration, or incorrect permissions. In this example we see that the dev-mqueue unit with type mount fails. As the type is a mount, the reason is most likely because mounting a particular partition failed.
By using the systemctl status command we can see the details of the dev-mqueue.mount unit:
# systemctl status dev-mqueue.mount
● dev-mqueue.mount - POSIX Message Queue File System
Loaded: loaded (/usr/lib/systemd/system/dev-mqueue.mount; static)
Active: failed (Result: exit-code) since Sun 2014-11-23 17:53:10 CET; 4min 12s ago
Where: /dev/mqueue
What: mqueue
Docs: man:mq_overview(7)
http://www.freedesktop.org/wiki/Software/systemd/APIFileSystems
Process: 446 ExecMount=/bin/mount -n mqueue /dev/mqueue -t mqueue (code=exited, status=32)
Nov 23 17:53:10 localhost.localdomain systemd[1]: dev-mqueue.mount mount process exited, code=exited status=32
Nov 23 17:53:10 localhost.localdomain systemd[1]: Failed to mount POSIX Message Queue File System.
Nov 23 17:53:10 localhost.localdomain systemd[1]: Unit dev-mqueue.mount entered failed state.
This information shows the related command which was executed. We see the unit failed on exit-code as it was not the expected value of 0 (actually it is 32). Manually running the command shows the device /dev/mqueue is missing.
Similar to this service, IPMI fails on our virtual machine. As there is no /dev/ipmi* device, the service can’t start and fails:
# systemctl status ipmievd.service
? ipmievd.service - Ipmievd Daemon
Loaded: loaded (/usr/lib/systemd/system/ipmievd.service; enabled)
Active: failed (Result: exit-code) since Sun 2014-11-23 16:08:48 CET; 1h 36min ago
Process: 550 ExecStart=/usr/sbin/ipmievd $IPMIEVD_OPTIONS (code=exited, status=1/FAILURE)
Nov 23 16:08:47 localhost.localdomain ipmievd[550]: Could not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such file or directory
Nov 23 16:08:47 localhost.localdomain ipmievd[550]: Could not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such file or directory
Nov 23 16:08:47 localhost.localdomain ipmievd[550]: ipmievd: using pidfile /var/run/ipmievd.pid0
Nov 23 16:08:47 localhost.localdomain ipmievd[550]: Could not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such file or directory
Nov 23 16:08:47 localhost.localdomain ipmievd[550]: Unable to open interface
Nov 23 16:08:48 localhost.localdomain systemd[1]: ipmievd.service: control process exited, code=exited status=1
Nov 23 16:08:48 localhost.localdomain systemd[1]: Failed to start Ipmievd Daemon.
Nov 23 16:08:48 localhost.localdomain systemd[1]: Unit ipmievd.service entered failed state.
Nov 23 16:08:48 localhost.localdomain systemd[1]: ipmievd.service failed.
Clearing failed units
You can manually clear out failed units with the systemctl reset-failed command. This can be done for all units, or a single one.
Services which are no longer needed, are better to be stopped and disabled.
systemctl stop rngd.service
systemctl disable rngd.service
That’s all!