Linux kernel scheduler

Introduction

Every Linux system runs multiple smaller tasks at the same time. But a CPU can do only so much processing at a given moment of time. For this reason, the Linux kernel has a scheduling system. It queues tasks and aims to process them all in a timely matter with additional focus on the priority of each task.

What is CPU scheduling?

The CPU scheduler processes tasks and selects which task to run. It can be compared to an orchestrator and keeps a good eye on all available tasks. It decides when the next task will be executed, based on events like:

  • Running task terminates
  • Running task starts to sleep and waiting for an event
  • New task is starting
  • Task that is sleeping is wakes up by an event
  • Running task is running too long for its allowed time slice

While the system is running, tasks get swapped in and out on the CPU. With multi-core CPUs, this orchestration is an important part of the stability of kernel. The CPU scheduler therefore has multiple goals to ensure tasks are running at the right time, but also keeping the system stable.

CPU scheduling is a balancing act with the following goals:

  • Be fair to tasks
  • Provide the correct time slice depending on the task priority
  • Reduce the response time for tasks
  • High throughput for tasks to allow task completion
  • Be efficient
    • Proper balancing between CPUs
    • Reduce power usage
    • Overhead for scheduler

Scheduling classes and policies

The Linux scheduler uses multiple classes to define the type of task scheduling:

  • Completely Fair Scheduler (CFS)
  • Real-time (RT)
  • Early Deadline First (EDF)

Completely Fair Scheduler (CFS)

The default scheduler in Linux is the Completely Fair Scheduler (CFS). It aims to provide fair CPU time to all tasks by assigning time to processes. It tracks the amount of time used by the process and compares this with their assigned priority to decide if this is fair. It’s a good scheduler for most processes, but not for those processes that need high-performance or real-time processing. For example, if you play some audio, you don’t want it to stutter.

CFS is written by Ingo Molnar and introduced in Linux 2.6.23, replacing the then default O(1) scheduler.

Policies:

  • SCHED_NORMAL
  • SCHED_OTHER
  • SCHED_BATCH
  • SCHED_IDLE

SCHED_NORMAL

The SCHED_NORMAL policy focuses on regular tasks. These tasks have no real-time requirements for processing. Tasks can get a priority named the nice value, which starts at -20 (highest priority) up to 19 (lowest priority). You may recognize these values when using the nice command.

SCHED_OTHER

This is the older name for SCHED_NORMAL.

SCHED_BATCH

The scheduler policy SCHED_BATCH is aimed at batch processing tasks which are not time-critical. They can run in the background, but do want to have a good throughput. They get lower priority time slices and also use a nice value.

SCHED_IDLE

Policy SCHED_IDLE is as the name implies for idle tasks. They are not time-critical and can get time slices when the system is idle. Usually these are background tasks, such as maintenance tasks, including for the kernel itself.

Real-time (RT)

Policies:

  • SCHED_FIFO
  • SCHED_RR

SCHED_FIFO

SCHED_FIFO is a policy for real-time tasks which need guaranteed CPU time. FIFO is short for First In First Out and uses the ‘first-come, first served’ principle, yet with a priority-based sort order. The priority is defined by a number (1-99), with 1 being the lowest priority. Tasks that get assigned a higher priority will get the CPU time they need and won’t be interrupted by tasks with a lower priority.

SCHED_RR

SCHED_RR is a scheduler where the ‘RR’ stands for Round Robin. It is similar to SCHED_FIFO, but allows multiple tasks to have the same priority. With SCHED_FIFO a task with a high priority may use too much CPU time preventing other tasks from getting enough. SCHED_RR tries to divide this, so multiple tasks with the same priority may share the CPU time.

Early Deadline First (EDF)

The scheduling class EDF (Early Deadline First) is focused on tasks that need high-performance or a low latency, like processing audio and video. They have a deadline to be completed, otherwise it will impact the task. A good example is a conference call where the slightest delays in audio processing will make the call unusable.

Policies:

  • SCHED_DEADLINE

SCHED_DEADLINE

SCHED_DEADLINE is a scheduling policy designed for real-time applications that requires guaranteed CPU time, such as for multimedia and gaming. It will effectively put in all efforts to process these tasks with the highest priority on the CPUs.

SCHED_DEADLINE was introduced in Linux 3.14.

Scheduler settings

The available settings for the scheduler can be seen with the sysctl command or via the /proc file system.

# sysctl -a --pattern kernel.sched
kernel.sched_autogroup_enabled = 1
kernel.sched_cfs_bandwidth_slice_us = 5000
kernel.sched_deadline_period_max_us = 4194304
kernel.sched_deadline_period_min_us = 100
kernel.sched_rr_timeslice_ms = 100
kernel.sched_rt_period_us = 1000000
kernel.sched_rt_runtime_us = 950000
kernel.sched_schedstats = 0
kernel.sched_util_clamp_max = 1024
kernel.sched_util_clamp_min = 1024
kernel.sched_util_clamp_min_rt_default = 1024

Useful commands for scheduler

CommandDescription
chrtSets scheduler policy and priority for a running process or new command
niceRuns a command with a specified priority
reniceAlters processes to run with another priority

Explanation of the values in /proc/PID/sched

The Linux kernel uses the scheduler to run tasks for processing by the CPU and stores statistics in /proc/PID/sched file. Learn about these details.

Relevant linux kernel scheduler commands

Like to learn more about the commands used in this section? Have a look at the cheat sheets or the related command page.

  • nice
  • sysctl