Understanding what runs on your Linux system (and why)

Linux processes and daemons

Each Linux system has a bunch of processes running. Most of these processes might be familiar to you if you regularly use a command like ps or top to display them. Processes may look like just an item in a list. They are actually complicated pieces of code that are tamed by a memory manager. To truly understand how your system is running, knowledge of process (or memory) management is of great help. So let’s make a jump into the internals of Linux by learning the tools at our disposal.

What is a process?

Each system has a particular goal it wants to achieve. Such a goal could be providing a website to anonymous visitors all over the world. To enable that, there should be something listening to the individual website requests, process them, and finally send back the related website page. We call this a process and it consists of machine code. These are individual instructions on what the system should do. These instructions include reading an image from the hard disk, sending data via the network interface, or saving an error message in a log file.

Processes come in different forms. The most common type is the commands you may type into a shell program. A shell is a “wrapper” for your Linux console or virtual terminal screen (when using SSH). Most users use the default Bourne Again Shell (or bash). It allows you to type in text, and it will act upon this input. For example, when you type in a command like ls, it sees this as a known command and executes it.

The real magic happens when you run commands. In this event, the shell will decide to run a built-in action or start a program from the hard disk. We call these programs on disk a ‘binary’. This binary itself is stored with a specific format, which is typically ELF, or Executable and Linkable Format.

What is a Linux daemon?

Some processes have the goal to run for a long time on the system in the background. This could be to fulfill requests like scanning an incoming email or sending back a page of a website. These processes are called daemons. Besides the duration, another big difference is that daemons do not need interaction with the terminal. Typically they won’t send any data to it but use log files instead. Daemons are often started directly after the operating system started. Most have a ‘d’ at the end of the process name, to hint that they are a daemon process.

The name daemon comes from an experiment based on Maxwell’s demon, that had the job of sorting things in the background.

Good to remember: A daemon is always a process, but not all processes are a daemon

What about services?

Typically the term ‘service’ was used on Windows systems. With the introduction of systemd, this term is now more applicable for Linux as well. A service is a combination of resources to provide some functionality. For example an SSH service, that consists of running the related daemon and any dependencies like networking.

Running processes

There is a lot of information to gather and show running processes. Common tools for this job include the ps and top command. Let’s start with some basic output that you may get from the first command.

See running processes on Linux system with the ps command

What see in this screenshot is a small listing of processes. Nothing really fancy, right?The process name itself, or actual the command, is at the right (CMD column). We see that the first process is

The most common thing we look at, is the process name itself or actual the command. This one is displayed at the right (CMD column). We see that the first process is /sbin/init, which is a common system manager for Linux distributions. The other processes have brackets in their name, which is an indicator that they are kernel threads. This screenshot gives a hint for that, as it includes both the PID and PPID columns. The PID is the process identifier, a unique number for a process on that system at that time. The PPID is the parent process identifier. Both the init process and [kthreadd] have a PPID of zero, which means they don’t have a parent process. In other words, these processes stand on their own.

If we continue to read the output, we see UID in the process listing, which is the user identifier. Typically the user that started this process, or the owner of the process. The small C column is the CPU usage. On most systems you will see that many processes have a zero, meaning there isn’t a lot going on. Later in the list of processes, there is a good example: the unattended-upgrade tool is demanding a good share of CPU cycles (67 percent).

Output of ps command with C column (CPU usage)

If we look at the TTY column, we see that many processes have a question mark. This means they are non-interactive and have no need for a terminal. User ‘michael’ connected via SSH (third line) and got the bash shell (first line). That process has the ‘tty1’ terminal, which means it is an interactive process. Finally, the STIME refers to when the process was started. It shows just the time if it was today. Otherwise, a date will be included. The TIME column is how much time the process consumed in CPU usage. Processes that are CPU hungry will have a higher number here.

Note: Linux users typically use -ef for the ps command, where BSD users are familiar with dashless aux. The output is similar, but minor differences may exist between the operating systems, the version of ps, and the related flags used.

Process data

The Linux kernel is a complicated machine in itself. It forms the bridge between hardware and software. Its primary goal is to make sure that both sides behave while still process as many requests as possible. A challenging task with hardware interrupts calling continuously for attention, and software known to be sometimes less stable than anticipated.

To account for everything running on the system, the kernel needs to track every movement on the system. Especially memory management needs attention. The memory is divided into several zones and then provisioned to the running processes. To prevent misusage, there are guards that monitor the requests for more memory. The goal is to prevent running Out Of Memory (OOM). Otherwise, the OOM killer has to be unleashed from its cage and it starts killing processes to free up memory. Other guards ensure one process can’t see the data of other processes, which would be bad for security. Similarly, there are protections that prevent memory segments being incorrectly used, like a data-only area that suddenly runs (malicious) machine code.

Some of the internal data maintained by the kernel is also useful for the system administrator. The pseudo-filesystem /proc is used for this. A directory is created per PID in this filesystem. In Linux, everything is a file. So each directory consists of a bunch of files. Most of them can be viewed by using the cat command.

Using /proc directory to see process details

Some examples:

  • cmdline – displayed with ps
  • comm – command
  • limits – restrictions like maximum file handles
  • mounts – the visible mounts for the process
  • sched – time scheduler details
  • smaps – shared memory details
  • status – process details, ownership, memory usage

While it is interesting to review all files, some smart people created tools to read this data and show it in a more friendly way. Let’s have a look at that.

Monitoring processes

To really learn what your system is doing, you need the right tools for the job. Typically you need a combination of tools to find the cause. For example, a system with a high load could have multiple causes. One of them is high IO, meaning data being written to the disk, or send over the network. So troubleshooting may require taking two or three steps before getting closer to the source.

Linux monitoring, troubleshooting, and in-depth analysis

Let’s learn a bit about processes by answering the most common questions. These are situations you may encounter to run a system and keep it stable.

How much memory has the system available?

The free command allows you to see the memory and swap statistics. This is useful to determine the totals, used, and free memory.

free

Monitoring used and free memory on Linux system
How can I find all the PIDs for a specific program?

The first thing that may come in mind, is using the ps command. Then combine it with the grep command and just show the lines that you want to have.

ps -ef | grep nginx

If you like to create an one-liner, or put things in a shell script, use the pgrep command.

pgrep nginx

If you want to have it formatted (e.g. with a comma between each PID), use it with the -d and -f flag.

pgrep -d',' -f nginx

How much memory does a process use?

Linux systems will try to use as much memory as possible for performance reasons. So it is good to know that there are a few values of importance here. The first field is RSS, which stands for Resident Set Size. While there are whole books about memory management, you can see this field as the memory it needs to put the program into memory. Typically just a few megabytes or less. Then there is VSZ, the Virtual Memory Size. This is the memory that the program has access to. This is usually a lot more than the RSS. But it doesn’t mean all this memory is really in use.

To see specifically the details, we can use the ps command for this task. We can even filter the fields we need and tell it which PID (or PIDs) we are interested in.

ps -o vsz,rss,cmd --pid $(pgrep command)

Make sure that the pgrep command returns something, otherwise the pid argument has no values.

Display memory usage of a process on Linux

If you want to keep monitoring a running process, you can use the top command. By providing a specific PID (or multiple), you can filter out the process you are interested in. Great for monitoring database instances, or your favorite web server daemon.

top -c -p $(pgrep -d',' -f nginx)

Which process has the most disk activity?

Disk usage is one of the most common reasons for a high load. In this case, finding the possible culprit is easy with the pidstat command.

pidstat -d

To see an actual disk usage, use the iosnoop command.

Show which process performs disk activity on Linux

Which new processes are started?

The Linux audit framework could be used to monitor for specific system calls. An easier way is to use the execsnoop command. Depending on your distribution you may be able to install this via the package manager.

execsnoop

Monitoring new processes on Linux with execsnoop

 

Any other tools or one-liners you use during troubleshooting? Let it know in the comments!

One more thing...

Keep learning

So you are interested in Linux security? Join the Linux Security Expert training program, a practical and lab-based training ground. For those who want to become (or stay) a Linux security expert.

See training package




Lynis Enterprise screenshot to help with system hardeningSecurity scanning with Lynis and Lynis Enterprise

Run automated security scans and increase your defenses. Lynis is an open source security tool to perform in-depth audits. It helps with system hardening, vulnerability discovery, and compliance.


Download

8 comments

  • David RamirezDavid Ramirez

    Great article, good for a novice but also still teaching some tricks to a experienced sysadmin.

    Reply
  • Keith PawsonKeith Pawson

    This is an excellent article, something I fear will disappear with all this Serverless Cloud stuff coming through. Those tools iosnoop and execsnoop are new to me and a great find – they are now part of my collection and toolbox. Thanks for taking the time to write this article.

    Reply
  • Adam DanischewskiAdam Danischewski

    Thanks – this is a useful, well written and well illustrated article.

    Reply
  • rushrush

    Thanks for this superb illustrated work, it spiced up my Linux learning.

    Reply
  • artodetoartodeto

    Still a great article well written in 2021.

    Thanks for your work. I can imagine how much time it takes to write something precice like this.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.