Auditing systemd: solving failed units with systemctl

Auditing systemd

Solving failed units with systemctl

Systemd is an alternative service manager to the more traditional init system. To ensure the system is healthy, failed units should be investigated on a regular basis. Sooner or later a unit might fail and showing up the systemctl listing. In this article we have a look at how to solve it.

Why do services fail?

During the start of the system, enabled services are started and queued to be executed. Most processes will start correctly and systemd logs the related status in the journal. However, in some cases a service might enter a “failed state”, as a result of another command not finishing properly.

[root@localhost ~]# systemctl UNIT                                     LOAD   ACTIVE SUB       DESCRIPTION
 -.mount                                  loaded active mounted   /
 boot.mount                               loaded active mounted   /boot
 dev-hugepages.mount                      loaded active mounted   Huge Pages File System
 ● dev-mqueue.mount                       loaded failed failed    POSIX Message Queue File System
 run-user-0.mount                         loaded active mounted   /run/user/0
 sys-kernel-config.mount                  loaded active mounted   Configuration File System
 sys-kernel-debug.mount                   loaded active mounted   Debug File System
 tmp.mount                                loaded active mounted   Temporary Directory

Services usually fail because of a missing dependency (e.g. a file or mount point), missing configuration, or incorrect permissions. In this example we see that the dev-mqueue unit with type mount fails. As the type is a mount, the reason is most likely because mounting a particular partition failed.

By using the systemctl status command we can see the details of the dev-mqueue.mount unit:

[root@localhost ~]# systemctl status dev-mqueue.mount● dev-mqueue.mount - POSIX Message Queue File System
   Loaded: loaded (/usr/lib/systemd/system/dev-mqueue.mount; static)
   Active: failed (Result: exit-code) since Sun 2014-11-23 17:53:10 CET; 4min 12s ago
    Where: /dev/mqueue
     What: mqueue
     Docs: man:mq_overview(7)
           http://www.freedesktop.org/wiki/Software/systemd/APIFileSystems
  Process: 446 ExecMount=/bin/mount -n mqueue /dev/mqueue -t mqueue (code=exited, status=32)

Nov 23 17:53:10 localhost.localdomain systemd[1]: dev-mqueue.mount mount process exited, code=exited status=32
Nov 23 17:53:10 localhost.localdomain systemd[1]: Failed to mount POSIX Message Queue File System.
Nov 23 17:53:10 localhost.localdomain systemd[1]: Unit dev-mqueue.mount entered failed state.

This shows the related command which was executed. We see the unit failed on exit-code as it was not the expected value of 0 (actually it is 32). Manually running the command shows the device /dev/mqueue is missing.

Similar to this service, IPMI fails on our virtual machine. As there is no /dev/ipmi* device, the service can’t start and fails:

[root@localhost ~]# systemctl status ipmievd.service ? ipmievd.service - Ipmievd Daemon
 Loaded: loaded (/usr/lib/systemd/system/ipmievd.service; enabled)
 Active: failed (Result: exit-code) since Sun 2014-11-23 16:08:48 CET; 1h 36min ago
 Process: 550 ExecStart=/usr/sbin/ipmievd $IPMIEVD_OPTIONS (code=exited, status=1/FAILURE)
Nov 23 16:08:47 localhost.localdomain ipmievd[550]: Could not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such file or directory
 Nov 23 16:08:47 localhost.localdomain ipmievd[550]: Could not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such file or directory
 Nov 23 16:08:47 localhost.localdomain ipmievd[550]: ipmievd: using pidfile /var/run/ipmievd.pid0
 Nov 23 16:08:47 localhost.localdomain ipmievd[550]: Could not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such file or directory
 Nov 23 16:08:47 localhost.localdomain ipmievd[550]: Unable to open interface
 Nov 23 16:08:48 localhost.localdomain systemd[1]: ipmievd.service: control process exited, code=exited status=1
 Nov 23 16:08:48 localhost.localdomain systemd[1]: Failed to start Ipmievd Daemon.
 Nov 23 16:08:48 localhost.localdomain systemd[1]: Unit ipmievd.service entered failed state.
 Nov 23 16:08:48 localhost.localdomain systemd[1]: ipmievd.service failed.

Clearing failed units

You can manually clear out failed units with the systemctl reset-failed command. This can be done for all units, or a single one.

Services which are no longer needed, are better to be stopped and disabled.

systemctl stop rngd.service
systemctl disable rngd.service

That’s all!

One more thing...

Keep learning

So you are interested in Linux security? Join the Linux Security Expert training program, a practical and lab-based training ground. For those who want to become (or stay) a Linux security expert.

See training package




Lynis Enterprise screenshot to help with system hardeningSecurity scanning with Lynis and Lynis Enterprise

Run automated security scans and increase your defenses. Lynis is an open source security tool to perform in-depth audits. It helps with system hardening, vulnerability discovery, and compliance.


Download

One comment

  • NestorNestor

    My brother recommended I might like this web site.
    He was once entirely right. Thiss submit actually made my
    day. You cann’t believe just how much time I had spent for this info!
    Thanks!

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.