Overview of Linux capabilities

Introduction

Linux uses capabilities to split privileged actions from the actual root user, making it possible that some processes perform them, often as a non-privileged user. If you are new to this subject, have a look at the Linux capabilities 101.

CAP_AUDIT_CONTROL

Enable or disable kernel auditing, see or change filter rules, and retrieve status

Capability to manage the Linux kernel auditing framework, including all actions such as turning it on and off, seeing and reloading filter rules. This capability is normally not needed for services. Systemd and tools like auditctl that deal with this functionality will need this capability.

To assign this capability to a binary use: setcap 'cap_audit_control=+ep' BINARY.

CAP_AUDIT_READ

Ability to read log events from audit framework

Capability that uses a multicast netlink socket to make log events available.

To assign this capability to a binary use: setcap 'cap_audit_read=+ep' BINARY.

CAP_AUDIT_WRITE

Write audit events to audit log

Capability to allow writing to the Linux kernel auditing framework and its underlying log

To assign this capability to a binary use: setcap 'cap_audit_write=+ep' BINARY.

CAP_BLOCK_SUSPEND

Features to block system suspend

Capability allowing processes to prevent a system going into suspend mode

Related system calls

  • epoll_ctl - manage (add, modify, remove) entries in epoll instance, which is used to monitor if I/O is allowed on the defined set of file descriptors. Similar to poll(), with additional benefits.

Related files in /proc

  • /proc/sys/wake_lock

To assign this capability to a binary use: setcap 'cap_block_suspend=+ep' BINARY.

CAP_BPF

Privileged BPF operations

Capability that was introduced in Linux 5.8 to remove it from the overloaded CAP_SYS_ADMIN capability. It allows to perform operations related to extended Berkeley Packet Filters, such as apply filters for networking, increase security (e.g. sandboxing, SECCOMP), process tracing, and observability purposes.

Related system calls

  • bpf

To assign this capability to a binary use: setcap 'cap_bpf=+ep' BINARY.

CAP_CHECKPOINT_RESTORE

Allow creating a checkpoint or restore it

Capability available since Linux 5.9, which was part of the overloaded CAP_SYS_ADMIN. It allows non-root users to facilitating making a checkpoint/restore

Related system calls

  • clone - similar to fork() to create a child process, with more fine-grained options to define what is shared between calling process and child. This system call can also make a new process part of newly created namespace by specifying a flag.

Related files in /proc

  • /proc/sys/kernel/ns_last_pid
  • /proc/PID/map_files

To assign this capability to a binary use: setcap 'cap_checkpoint_restore=+ep' BINARY.

CAP_CHOWN

Changes to file UIDs/GIDs

Capability to make changes to the user or group ID of files while not being the owner of these files. This capability is often needed for services start as root but hand over execution tasks to child processes under a non-privileged user. In that case it needs to be able to adjust file permissions so these children can read/write data.

Related system calls

  • chown - changes ownership of file specified by pathname, dereferenced if file is a symbolic link

To assign this capability to a binary use: setcap 'cap_chown=+ep' BINARY.

CAP_DAC_OVERRIDE

Bypasses file read, write, and execute permission checks

Capability to bypass permissions checks related to file read, write, or execution. DAC itself is an abbreviation for descretionary access control, referring to how permissions are linked to the resource.

Related system calls

  • mount
  • utime - change access and modification times of inode
  • utimensat

To assign this capability to a binary use: setcap 'cap_dac_override=+ep' BINARY.

Bypasses permissions checks to read files or read/execute of directory

Capability similar to CAP_DAC_OVERRIDE, but focused on bypassing the file permissions to allow reading of files or directories and their content

Related system calls

  • link - create new link (hard link) to existing file
  • open - opens file specified by pathname
  • open_by_handle_at

To assign this capability to a binary use: setcap 'cap_dac_read_search=+ep' BINARY.

CAP_FOWNER

Bypass for permission checks on files to allow specific operations

Capability that bypasses permission checks for files where otherwise the caller and owner of the file should be the same. It does not cover functions that are part of CAP_DAC_OVERRIDE and CAP_DAC_READ_SEARCH. This capability might be needed for some services to do proper cleanup, such as the removal of files or directories.

Related system calls

  • chmod - change mode of the file, dereferenced for symbolic links
  • ioctl_iflags
  • open - opens file specified by pathname
  • rename - rename a file, move it between directories if required
  • rmdir - delete directory
  • unlink - delete name from filesystem
  • utime - change access and modification times of inode
  • utimensat

To assign this capability to a binary use: setcap 'cap_fowner=+ep' BINARY.

CAP_FSETID

Special actions related to set-user-ID and set-group-ID

Capability that does not clear SETUID/SETGUID mode bits upon file change, and allows SETGUID change without being part of group(s)

Related system calls

  • chmod - change mode of the file, dereferenced for symbolic links

To assign this capability to a binary use: setcap 'cap_fsetid=+ep' BINARY.

CAP_IPC_LOCK

Lock memory and allocate memory using huge pages

Capability that may lock memory so it does not go to swap, and the usage of huge pages, which are memory segments with a bigger size than normal.

Related system calls

  • memfd_create
  • mlock - lock pages in a specified address range, so they are guaranteed to stay in memory instead of being swapped to disk
  • mlock2 - same as mlock() if flags is 0. With flag MLOCK_ONFAULT is locks the current resident pages, the mark the range so currently nonresident pages are locked later when they are used (page fault)
  • mlockall - similar to mlock, but tries to lock all the memory pages of the calling process to prevent swapping
  • mmap - create new mapping in the virtual address space of the calling process
  • shmctl
  • shmget

To assign this capability to a binary use: setcap 'cap_ipc_lock=+ep' BINARY.

CAP_IPC_OWNER

Bypass permission checks on System V IPC objects

Capability to skip permission checks on Inter-Process Communication (IPC) like Unix domain socket

Related system calls

  • msgctl
  • msgget
  • msgop
  • semctl
  • semget
  • semop
  • shmctl
  • shmget
  • shmop

To assign this capability to a binary use: setcap 'cap_ipc_owner=+ep' BINARY.

CAP_KILL

Bypass permission checks for sending process signals

Capability to allow sending process signals and bypass permissions checks, allowing to stop processes that are not owned

Related system calls

  • kill

To assign this capability to a binary use: setcap 'cap_kill=+ep' BINARY.

CAP_LEASE

Leases on files

Capability to establish leases on files. This allows the holder of the lease to be notified when some other process tries to open or truncate the related file.

Related system calls

  • fcntl - performs an action on file defined by a file descriptor, such as setting flags

To assign this capability to a binary use: setcap 'cap_lease=+ep' BINARY.

CAP_LINUX_IMMUTABLE

Set or clear flags to make a file append-only or immutable

Capability that allows setting or clearing the O_APPEND flag so that only data can be appended to a file, or making it immutable with flag FS_IMMUTABLE_FL.

Related system calls

  • ioctl_iflags

To assign this capability to a binary use: setcap 'cap_linux_immutable=+ep' BINARY.

CAP_MAC_ADMIN

Functionality for Smacl Linux security module

Capability to allow MAC configuration or state changes, which is the Smack Linux Security Module (LSM)

To assign this capability to a binary use: setcap 'cap_mac_admin=+ep' BINARY.

CAP_MAC_OVERRIDE

Override for Mandatory Access Control (MAC)

Capability to override MAC and implemented for the Smack Linux security module (LSM)

To assign this capability to a binary use: setcap 'cap_mac_override=+ep' BINARY.

CAP_MKNOD

Special files

Capability to create a normal file, device special file, or a named pipe. Not needed for most services.

To assign this capability to a binary use: setcap 'cap_mknod=+ep' BINARY.

CAP_NET_ADMIN

Network management functions

Capability with a wide range of network-related actions, such as configuring an interface. It also allows making changes to the IP firewall, routing tables, multicast settings, promiscuous mode, define Type-of-Service (TOS), and set special flags to troubleshoot. Normal services should not have this capability.

Related system calls

To assign this capability to a binary use: setcap 'cap_net_admin=+ep' BINARY.

CAP_NET_BIND_SERVICE

Bind a socket to a privileged port number below 1024

This capability is typically required for services like HTTP, HTTPS, SMTP, that use port numbers in the 1-1023 range.

To assign this capability to a binary use: setcap 'cap_net_bind_service=+ep' BINARY.

CAP_NET_BROADCAST

Make socket broadcasts and listen to multicast messages

Capability that allows socket broadcasts and list to multicast messages. This capability seems to be unused.

To assign this capability to a binary use: setcap 'cap_net_broadcast=+ep' BINARY.

CAP_NET_RAW

Usage of RAW and PACKET sockets, allow transparant proxying

Capability to use RAW and PACKET sockets, with option to bind to any address and allow transparant proxying of traffic

To assign this capability to a binary use: setcap 'cap_net_raw=+ep' BINARY.

CAP_PERFMON

Activities related to performance monitoring

Capability that has been moved since Linux 5.8 from CAP_SYS_ADMIN to reduce overloaded capability. It provides options for perf to monitor performance, such as perf events and BPF operations.

To assign this capability to a binary use: setcap 'cap_perfmon=+ep' BINARY.

CAP_SETFCAP

Capabilities on a file

Capability to set capabilities on a file, and map user ID 0 in new user namespace

To assign this capability to a binary use: setcap 'cap_setfcap=+ep' BINARY.

CAP_SETGID

Group-ID actions

Capability to allow changes to process GIDs and supplementary GIDs, forging of GID for Unix domain sockets, writing the group ID mapping in user name spaces.

Related system calls

  • clone - similar to fork() to create a child process, with more fine-grained options to define what is shared between calling process and child. This system call can also make a new process part of newly created namespace by specifying a flag.
  • getgroups - returns supplementary group IDs of calling process
  • seteuid
  • setfsgid
  • setgid - set effective group ID of calling process, with CAP_SETGID capability it also sets real GID and saved set-group-ID
  • setresuid
  • setreuid

To assign this capability to a binary use: setcap 'cap_setgid=+ep' BINARY.

CAP_SETPCAP

Add file capabilities to bounding set of a thread

Capability to extend the thread's bounding set with an additional capability.

Related system calls

  • capget - retrieve thread capabilities
  • prctl

To assign this capability to a binary use: setcap 'cap_setpcap=+ep' BINARY.

CAP_SETUID

Make arbitrary changes to file UIDs and GIDs (see chown(2)).

Related system calls

  • clone - similar to fork() to create a child process, with more fine-grained options to define what is shared between calling process and child. This system call can also make a new process part of newly created namespace by specifying a flag.
  • keyctl - allow user-space programs to take actions on keys, such as updating, revocation, ownership
  • seteuid
  • setfsuid
  • setresuid
  • setreuid
  • setuid - set effective user ID of calling process, with CAP_SETUID capability it also sets real UID and saved set-user-ID

To assign this capability to a binary use: setcap 'cap_setuid=+ep' BINARY.

CAP_SYS_ADMIN

Generic system administration tasks that require additional privileges

Capability with a wide range of functions related to the file system, resources, process inspection. The Linux kernel developers are offloading functionality from this capability and putting them in separate entities. So newer kernels will most likely grant less options when using this capability, and reconsidered when this is assigned to a binary or via alternative ways. Also the list of related system calls might be incorrect due to these changes. Normal services should not need this capability, as it may introduce a security risk.

Related system calls

  • bdflush
  • bpf
  • clone - similar to fork() to create a child process, with more fine-grained options to define what is shared between calling process and child. This system call can also make a new process part of newly created namespace by specifying a flag.
  • fanotify_init
  • getdomainname
  • gethostname
  • getrlimit - get resource limits
  • ioctl_fslabel
  • ioctl_getfsmap
  • ioctl_tty
  • ioprio_set
  • io_submit - submit asynchronous I/O blocks for processing, can be cancelled with io_cancel()
  • keyctl - allow user-space programs to take actions on keys, such as updating, revocation, ownership
  • lookup_dcookie
  • madvise
  • mount
  • msgctl
  • open_by_handle_at
  • pciconfig_read
  • perf_event_open
  • pivot_root
  • prctl
  • ptrace - process tracing; usually for breakpoint debugging and system call tracing
  • quotactl
  • seccomp
  • semctl
  • setns
  • shmctl
  • swapon
  • syslog
  • umount
  • unshare

To assign this capability to a binary use: setcap 'cap_sys_admin=+ep' BINARY.

CAP_SYS_BOOT

Ability to reboot the system or load a kernel for later execution

Capability to prepare a reboot by loading a kernel for later execution or actually request a reboot. This capability is normally not needed for normal services and restricted to systemd or commands like reboot and shutdown.

Related system calls

  • kexec_file_load - similar to kexec_load(), but uses file descriptor for kernel and initrd (initial ram disk)
  • kexec_load - load new kernel for later execution
  • reboot - reboots the system, or enables/disables reboot keystroke (default: Ctrl+Alt+Delete; changed using loadkeys(1))

To assign this capability to a binary use: setcap 'cap_sys_boot=+ep' BINARY.

CAP_SYS_CHROOT

Change root directory or mount namespaces

capability allows to change root directory or mount namespaces, so a process has a different view of the file system

Related system calls

  • chroot
  • setns

To assign this capability to a binary use: setcap 'cap_sys_chroot=+ep' BINARY.

CAP_SYS_MODULE

Actions related to loading and unloading Linux kernel modules

This capability includes the functionality that makes it possible to load and activate a kernel module or removing it from the kernel. This applies to commands like insmod and modprobe. With finit_module(2) or init_module(2) a module can be loaded, while delete_module(2) does performs the action to unload it.

Related system calls

  • delete_module - tries to remove an unused loadable module entry which is related currently loaded Linux kernel module (LKM)
  • finit_module - similar to init_module(); loads image (ELF) but refers to a file description
  • init_module - load image (ELF) into the kernel space including the required steps to initialize it, including triggering the init() function of the module

To assign this capability to a binary use: setcap 'cap_sys_module=+ep' BINARY.

CAP_SYS_NICE

Capability to adjust scheduling policies and priorities for processes, altering nice level, and change the CPU affinity. It also allows adjust the I/O scheduling class and priority for processes.

Related system calls

  • getpriority
  • ioprio_set
  • mbind
  • migrate_pages
  • move_pages
  • nice - change process priority, with +19 (lowest priority) up to to -20 (high priority)
  • sched_setaffinity
  • sched_setparam
  • spu_create

To assign this capability to a binary use: setcap 'cap_sys_nice=+ep' BINARY.

CAP_SYS_PACCT

Enable or disable process accounting

Capability to enable or disable if process accounting should be used with the acct(2) syscall.

To assign this capability to a binary use: setcap 'cap_sys_pacct=+ep' BINARY.

CAP_SYS_PTRACE

Inspect and trace processes

Capabilities to trace processes and inspect them, including access to their memory. This capability is used for debugging and troubleshooting purposes and should normally not assigned to normal services. A common tool to use this capability is strace.

Related system calls

  • get_robust_list
  • kcmp
  • ptrace - process tracing; usually for breakpoint debugging and system call tracing
  • userfaultfd

To assign this capability to a binary use: setcap 'cap_sys_ptrace=+ep' BINARY.

CAP_SYS_RAWIO

Raw Input/Output (IO) operations

This capability allows access to files that are normally not accessible, specific device commands and operations, and access or adjust memory limits that normally out of range.

Related system calls

  • ioperm
  • iopl

Related device files

  • /dev/kmem
  • /dev/mem

Related files in /proc

  • /proc/bus/pci
  • /proc/kcore
  • /proc/sys/vm/mmap_min_addr

To assign this capability to a binary use: setcap 'cap_sys_rawio=+ep' BINARY.

CAP_SYS_RESOURCE

System resources related to CPU usage, disk, file system, and quota limits

This set of capabilities is focused on overriding limits that are normally set by a quote or kernel setting. It allows a process with this capability to go over the limit. This capability is normally not needed for common services, but more applicable to system administration tools.

Related system calls

  • fcntl - performs an action on file defined by a file descriptor, such as setting flags
  • getrlimit - get resource limits
  • ioctl_iflags
  • msgctl
  • msgop
  • prctl

To assign this capability to a binary use: setcap 'cap_sys_resource=+ep' BINARY.

CAP_SYS_TIME

Change to system time

Capability to change the system time (hardware clock). Normally not needed for common services. Used by a command like date.

Related system calls

  • adjtimex - reads and optionally sets adjustment parameters for clock adjustment algorithm used on Linux (RFC 5905)
  • gettimeofday - get time or timezone
  • stime
  • settimeofday - set time or timezone

To assign this capability to a binary use: setcap 'cap_sys_time=+ep' BINARY.

CAP_SYS_TTY_CONFIG

This capability has functionality related to the terminal, like simulating a hangup on the terminal, sort of an initialize for other users and start with a fresh login.

Related system calls

  • ioctl_console
  • vhangup

To assign this capability to a binary use: setcap 'cap_sys_tty_config=+ep' BINARY.

CAP_SYSLOG

Privileged syslog(2) calls

Capability to use some privileged functions for syslog, including reading or clearing the kernel message ring buffer

Related system calls

  • syslog

Related files in /proc

  • /proc/sys/kernel/kptr_restrict
  • /proc/sys/kernel/printk

To assign this capability to a binary use: setcap 'cap_syslog=+ep' BINARY.

CAP_WAKE_ALARM

Trigger event that will wake up the system (set CLOCK_REALTIME_ALARM and CLOCK_BOOTTIME_ALARM timers)

This capability is required for events like waking up the system when it is suspended.

Related system calls

  • clock_nanosleep
  • timer_create
  • timerfd_create

To assign this capability to a binary use: setcap 'cap_wake_alarm=+ep' BINARY.