« Back to systemd

Systemd syscall filtering

Introduction

Systemd uses seccomp to implement filtering by syscalls. Syscalls are system functions and are usually provided by GLIBC. This is a generic library full with functions to allow communication between a process and the kernel. With seccomp support, these syscalls can be blocked. Systemd uses this to allow or deny specific systems functions with the SystemCallFilter.

Besides allowing or deny specific syscalls, systemd also provides predefined sets. These sets group similar or related functionality into a filter set, that then can be used to allow or deny.

What syscalls does a process use?

A web server running nginx should obviously be allowed to listen to network traffic for a port like 80 or 443. Most likely it does not need to be able to change the system clock, while a NTP daemon should. But how do you know what syscalls are used in the first place?

Dynamic analysis

Want to discover functions what syscalls are used by a running process? You may use strace on a running process, although this may crash it or decrease its performance. So when possible do this only on systems that are not in production.

Binary analysis

Another way is looking at the binary and see what functions are used.

strings /usr/sbin/nginx | grep -E '^[a-z0-9_]{4,32}\(\)' | awk '{print $1}' | sort | uniq

Filter sets

Systemd uses filter sets to allow or deny functionality per group. To see the content of a set (e.g. @clock):

systemd-analyze syscall-filter @clock

To simplify looking up this information, they are collected here in this overview.

@default

Description: These system calls are always permitted

SyscallPurpose
arch_prctl
brkChange the location of program break, specifically the end of the process's data segment
cacheflushFlushes contents of cache(s) for user addresses in specified range
clock_getresRetrieve the resolution (precision) of a specified clock
clock_getres_time64
clock_gettimeRetrieve time from specified clock
clock_gettime6464-bit version of clock_gettime()
clock_nanosleep
clock_nanosleep_time64
execveExecutes the program referred to by specified pathname
exitTerminates the calling process, parent process will receive a SIGCHLD signal
exit_group
futexProvides a method for waiting until certain condition becomes true
futex_time64
futex_waitv
get_robust_list
get_thread_area
getegidReturns effective group ID of the calling process
getegid32
geteuidRetrieve effective user ID of calling process
geteuid3232-bit version of geteuid()
getgidReturns real group ID of the calling process
getgid3232-bit version of getgid()
getgroupsReturns supplementary group IDs of calling process
getgroups3232-bit version of getgroups()
getpgidRetrieve process group ID (PGID)
getpgrp
getpidReturns process ID (PID) of calling process
getppidReturns process ID (PID) of parent of the calling process
getrandomReceive random bytes
getresgid
getresgid32
getresuid
getresuid32
getrlimitGet resource limits
getsidReceive session ID of a defined process
gettidReturns thread ID (TID) of caller. Same as process ID (PID) for single-threaded applications, otherwise different
gettimeofdayGet time or timezone
getuidGet resource limits
getuid3232-bit version of getuid()
membarrier
mmapCreate new mapping in the virtual address space of the calling process
mmap2
mprotect
munmapDeletes the mappings for specified address range and marks range to generate invalid memory references
nanosleep
pause
prlimit64
restart_syscall
riscv_flush_icache
riscv_hwprobe
rseq
rt_sigreturn
sched_getaffinity
sched_yield
set_robust_list
set_thread_area
set_tid_address
set_tls
sigreturn
timeReturn time; as number of seconds since the Epoch (1970-01-01 00:00:00 +0000 (UTC))
ugetrlimit

@aio

Description: Asynchronous IO

SyscallPurpose
io_cancelAttempts to cancel asynchronous I/O operation that was submitted by io_submit()
io_destroy
io_getevents
io_pgetevents
io_pgetevents_time64
io_setup
io_submitSubmit asynchronous I/O blocks for processing, can be cancelled with io_cancel()
io_uring_enter
io_uring_register
io_uring_setup

@basic-io

Description: Basic IO

Almost all software requires this set to open a file, read from it, or write to it.

SyscallPurpose
_llseek
closeClose file descriptor
close_range
dupDuplicate file descriptor; more specifically it allocates a new file descriptor that also refers to open file description oldfd
dup2Same as dup(), duplicate file descriptor; difference is that it uses file descriptor number specified in newfd
dup3Same as dup2(); difference is that caller can force close-on-exec flag (O_CLOEXEC) to be set
lseekReposition file offset for read/write
pread64
preadv
preadv2
pwrite64
pwritev
pwritev2
readRead from file descriptor
readvRead buffers from file
writeWrite to file descriptor
writevWrites buffers to file

@chown

Description: Ability to change ownership of files and directories

SyscallPurpose
chownChanges ownership of file specified by pathname, dereferenced if file is a symbolic link
chown3232-bit version of chown()
fchownChanges ownership of file, referred to by open file descriptor (fd)
fchown3232-bit version of fchown()
fchownatSimilar to fchown(), but deals differently with relative paths
lchownLike chown(), does not dereference symbolic links
lchown3232-bit version of lchown()

@clock

Description: Ability to change system time

Note: this is rarely needed for normal services.

SyscallPurpose
adjtimexReads and optionally sets adjustment parameters for clock adjustment algorithm used on Linux (RFC 5905)
clock_adjtimeBehaves like adjtimex(), takes an additional clk_id argument to define the clock
clock_adjtime6464-bit version of clock_adjtime()
clock_settimeSet time of specified clock
clock_settime6464-bit version of clock_setttime()
settimeofdaySet time or timezone

@cpu-emulation

Description: Ability to do CPU emulation

SyscallPurpose
modify_ldt
subpage_prot
switch_endian
vm86
vm86old

@debug

Description: Debugging, performance monitoring, tracing functionality

Note: this is normally only used by tools like strace and perf.

SyscallPurpose
lookup_dcookie
perf_event_open
pidfd_getfd
ptraceProcess tracing; usually for breakpoint debugging and system call tracing
rtas
s390_runtime_instr
sys_debug_setcontext

@file-system

Description: File system operations

Note: normally all processes need this to be able to read a directory or open a file.

SyscallPurpose
accessChecks whether the calling process can access the pathname, dereferenced when it is a symbolic link
chdirChange work directory
chmodChange mode of the file, dereferenced for symbolic links
closeClose file descriptor
creatLike open(), but sets flags O_CREAT|O_WRONLY|O_TRUNC
faccessatSimilar to access(), works slightly different when pathname is relative
faccessat2Closely similar to faccessat() but implements flags argument to correct incorrect implementation in faccessat()
fallocate
fchdirSimilar to chdir, but uses open file descriptor
fchmodSame as chmod, but used file by open file descriptor fd
fchmodatSimilar to chmod(), works slightly different when pathname is relative
fchmodat2
fcntlPerforms an action on file defined by a file descriptor, such as setting flags
fcntl6464-bit version of fcntl64()
fgetxattr
flistxattr
fremovexattr
fsetxattr
fstatSimilar to stat(), but uses file descriptor fd
fstat64
fstatat64
fstatfs
fstatfs64
ftruncateTruncate a file open for writing to specified number of bytes, which may fill it with null bytes (\0) or decrease its size and losing data
ftruncate6464-bit version of ftruncate()
futimesat
getcwdCopies the absolute pathname of current working directory to a buffer
getdentsRetrieve entries from a directory
getdents6464-bit version of getdents()
getxattr
inotify_add_watch
inotify_init
inotify_init1
inotify_rm_watch
lgetxattr
linkCreate new link (hard link) to existing file
linkatSimilar to link(), but deals differently with relative paths
listxattr
llistxattr
lremovexattr
lsetxattr
lstatSimilar to stat(), but if pathname is symbolic link, return information about link and not the file that symbolic link points to
lstat64
mkdirCreate directory
mkdiratSimilar to mkdir() but deals differently with relative paths
mknodCreate filesystem node (file, device special file, or named pipe) named pathname
mknodatSimilar to mknod, works slightly different when pathname is relative
newfstatat
oldfstat
oldlstat
oldstat
openOpens file specified by pathname
openatSimilar to open(), but uses dirfd argument and deals differently with path
openat2
readlink
readlinkat
removexattr
renameRename a file, move it between directories if required
renameatSimilar to rename(), with deals differently with relative paths
renameat2Similar to renameat() when no flags are provided, otherwise it has additional options
rmdirDelete directory
setxattr
statGet information about file
stat6464-bit version of stat()
statfs
statfs64
statx
symlinkCreate symbolic link
symlinkatSimilar to symlink() but deals differently with relative paths
truncateTruncate a writable file to specified number of bytes, which may fill it with null bytes (\0) or decrease its size and losing data
truncate6464-bit of truncate()
unlinkDelete name from filesystem
unlinkatSimilar to unlink() but deals differently with relative paths
utimeChange access and modification times of inode
utimensat
utimensat_time64
utimesSimilar to utime(), but uses array instead of a structure

@io-event

Description: Event loop system calls

SyscallPurpose
_newselect
epoll_create
epoll_create1
epoll_ctlManage (add, modify, remove) entries in epoll instance, which is used to monitor if I/O is allowed on the defined set of file descriptors. Similar to poll(), with additional benefits.
epoll_ctl_old
epoll_pwait
epoll_pwait2
epoll_wait
epoll_wait_old
eventfd
eventfd2
pollSimilar task to select(2), which is waiting for a set of file descriptors to become available for I/O.
ppollLet an application wait until file descriptor is available or signal is caught
ppoll_time64
pselect6
pselect6_time64
selectLet a program monitor multiple file descriptors until one or more become available for I/O actions. This system call has limitations and typically poll or epoll is used.

@ipc

SysV IPC, POSIX Message Queues or other Inter-Process Communication (IPC)

SyscallPurpose
ipc
memfd_create
mq_getsetattr
mq_notify
mq_open
mq_timedreceive
mq_timedreceive_time64
mq_timedsend
mq_timedsend_time64
mq_unlink
msgctl
msgget
msgrcv
msgsnd
pipeCreate a pipe that allows unidirectional communication between processes
pipe2Similar to pipe(), to create a channel between two processes. With flag O_DIRECT it will use packet-style communication instead of a stream
process_madvise
process_vm_readv
process_vm_writev
semctl
semget
semop
semtimedop
semtimedop_time64
shmat
shmctl
shmdt
shmget

@keyring

Kernel keyring access

SyscallPurpose
add_keyCreate or update a key for kernel key management facility
keyctlAllow user-space programs to take actions on keys, such as updating, revocation, ownership
request_keyRequest a key from kernel key management facility

@memlock

Memory locking control

SyscallPurpose
mlockLock pages in a specified address range, so they are guaranteed to stay in memory instead of being swapped to disk
mlock2Same as mlock() if flags is 0. With flag MLOCK_ONFAULT is locks the current resident pages, the mark the range so currently nonresident pages are locked later when they are used (page fault)
mlockallSimilar to mlock, but tries to lock all the memory pages of the calling process to prevent swapping
munlockOpposite of mlock() to release lock on memory area, so it can be swapped to disk if needed
munlockallUnlocks all memory pages of calling process so it can be swapped to disk again by the kernel

@module

Description: Ability to load or unload kernel modules

SyscallPurpose
delete_moduleTries to remove an unused loadable module entry which is related currently loaded Linux kernel module (LKM)
finit_moduleSimilar to init_module(); loads image (ELF) but refers to a file description
init_moduleLoad image (ELF) into the kernel space including the required steps to initialize it, including triggering the init() function of the module

@mount

Description: Ability to mount or unmount a file system

Note: Most services will not need to use mount/umount

SyscallPurpose
chroot
fsconfig
fsmount
fsopen
fspick
mount
mount_setattr
move_mount
open_tree
pivot_root
umount
umount2

@network-io

Description: Network or Unix socket actions, like opening a network port to listen

When to use: This filter set is only required for services that actually listen to a socket on the network.

SyscallPurpose
acceptAccept a connection on a socket
accept4
bindAssigns address to a socket that was created with socket()
connectInitiate connection on a defined socket
getpeernameReceive address of the peer connected to a socket
getsocknameRetrieve current address of defined socket
getsockoptGet options for socket
listenMarks socket as a passive to allow it accepting incoming connections with accept()
recvLike read(), but normally only used on a socket and has additional flags that can be set
recvfromReceives a message on a socket, close to recv(), but with additional flags related to receiving source
recvmmsg
recvmmsg_time64
recvmsgReceives a message on a socket with a predefined structure to minimize the number of arguments
send
sendmmsg
sendmsg
sendto
setsockoptSet options on socket
shutdown
socketCreate endpoint for communication and return file descriptor
socketcall
socketpairCreate a pair of connected sockets, for example for communication between parent and child process

@obsolete

Description: Unusual, obsolete or unimplemented system calls, with some unknown to the underlying seccomp libary

SyscallPurpose
_sysctl
afs_syscall
bdflush
break
create_module
ftime
get_kernel_syms
getpmsg
gtty
idle
lock
mpx
prof
profil
putpmsg
query_module
security
sgetmask
ssetmask
stime
stty
sysfs
tuxcall
ulimit
uselib
ustat
vserver

@pkey

Description: Set of calls for memory protection keys

SyscallPurpose
pkey_alloc
pkey_free
pkey_mprotect

@privileged

Description: System calls which typically need super-user capabilities. It includes also other filter sets:

SyscallPurpose
_sysctl
acct
bpf
capset
chroot
fanotify_init
fanotify_mark
nfsservctl
open_by_handle_at
pivot_root
quotactl
quotactl_fd
setdomainname
setfsuid
setfsuid32
setgroups
setgroups32
sethostname
setresuid
setresuid32
setreuid
setreuid32
setuid
setuid32
vhangup

@process

Description: Process control, execution, namespacing operations

SyscallPurpose
capget
cloneSimilar to fork() to create a child process, with more fine-grained options to define what is shared between calling process and child. This system call can also make a new process part of newly created namespace by specifying a flag.
clone3Provides superset of the functionality of the older clone() interface to create child process
execveat
fork
getrusage
kill
pidfd_open
pidfd_send_signal
prctl
rt_sigqueueinfo
rt_tgsigqueueinfo
setns
swapcontext
tgkill
times
tkill
unshare
vfork
wait4
waitid
waitpid

@raw-io

Description: raw I/O port access

SyscallPurpose
ioperm
iopl
pciconfig_iobase
pciconfig_read
pciconfig_write
s390_pci_mmio_read
s390_pci_mmio_write

@reboot

Description: ability to reboot or reboot preparation using kexec functionality that loads the kernel for later execution.

Note: normal services do not need this set of syscalls

SyscallPurpose
kexec_file_loadSimilar to kexec_load(), but uses file descriptor for kernel and initrd (initial ram disk)
kexec_loadLoad new kernel for later execution
rebootReboots the system, or enables/disables reboot keystroke (default: Ctrl+Alt+Delete; changed using loadkeys(1))

@resources

Description: ability to alter resource settings, such as process priority

SyscallPurpose
ioprio_set
mbind
migrate_pages
move_pages
niceChange process priority, with +19 (lowest priority) up to to -20 (high priority)
sched_setaffinity
sched_setattr
sched_setparam
sched_setscheduler
set_mempolicy
set_mempolicy_home_node
setpriority
setrlimitSet resource limits

@sandbox

Description: sandbox functionality, such as support for landlock and seccomp

SyscallPurpose
landlock_add_rule
landlock_create_ruleset
landlock_restrict_self
seccomp

@setuid

Description: Operations to changing user/group credentials (setuid/setgid)

SyscallPurpose
setgidSet effective group ID of calling process, with CAP_SETGID capability it also sets real GID and saved set-group-ID
setgid32
setgroups
setgroups32
setregid
setregid32
setresgid
setresgid32
setresuid
setresuid32
setreuid
setreuid32
setuidSet effective user ID of calling process, with CAP_SETUID capability it also sets real UID and saved set-user-ID
setuid32

@signal

Description: signal handling for processes

SyscallPurpose
rt_sigaction
rt_sigpending
rt_sigprocmask
rt_sigsuspend
rt_sigtimedwait
rt_sigtimedwait_time64
sigaction
sigaltstack
signal
signalfd
signalfd4
sigpending
sigprocmask
sigsuspend

@swap

Description: ability to enable or disable swap devices

Note: not required for normal services

SyscallPurpose
swapoff
swapon

@sync

Description: synchronize files and memory to storage

SyscallPurpose
fdatasync
fsync
msync
sync
sync_file_range
sync_file_range2
syncfs

@system-service

General system service operations

Besides the syscalls below, it also includes the following filter sets:

SyscallPurpose
arm_fadvise64_64
capgetRetrieve thread capabilities
capsetSet thread capabilities
copy_file_range
fadvise64
fadvise64_64
flockApply or remove advisory lock on file
get_mempolicy
getcpu
getpriority
ioctl
ioprio_get
kcmp
madvise
mremap
name_to_handle_at
oldolduname
olduname
personality
readahead
readdirRead a directory
remap_file_pages
sched_get_priority_max
sched_get_priority_min
sched_getattr
sched_getparam
sched_getscheduler
sched_rr_get_interval
sched_rr_get_interval_time64
sched_yield
sendfileCopies data between one file descriptor and another
sendfile64
setfsgid
setfsgid32
setfsuid
setfsuid32
setpgid
setsid
splice
sysinfo
teeDuplicate pipe content, does not consume the data
umaskSet file mode creation mask
unameRetrieve name and information about the current kernel
userfaultfd
vmsplice

@timer

Description: Timers, to schedule operations by time

SyscallPurpose
alarmSchedule an alarm; it lets the system generate a SIGALRM signal for the process after a specified time
getitimer
setitimer
timer_create
timer_delete
timer_getoverrun
timer_gettime
timer_gettime64
timer_settime
timer_settime64
timerfd_create
timerfd_gettime
timerfd_gettime64
timerfd_settime
timerfd_settime64
timesGet process and child process times, including CPU time in userspace and by the system for the calling process, and similar for the child processes

@known

Description: Includes all syscalls that are known to the Linux kernel, plus the ones in @obsolete

Relevant commands in this article

Like to learn more about the commands that were used in this article? Have a look, for some there is also a cheat sheet available.

  • awk
  • grep
  • sort
  • strings
  • uniq

Feedback

Small picture of Michael Boelen

This article has been written by our Linux security expert Michael Boelen. With focus on creating high-quality articles and relevant examples, he wants to improve the field of Linux security. No more web full of copy-pasted blog posts.

Discovered outdated information or have a question? Share your thoughts. Thanks for your contribution!

Mastodon icon