User-space, Kernel-space, and System Calls
A module runs in kernel space, whereas applications run in user space. Processes running in user space also don't have access to the kernel space. User space processes can only access a small part of the kernel via an interface exposed by the kernel - the system calls.
Last update: 2022-06-04
Table of Content
User Space and Kernel Space#
A module runs in kernel space, whereas applications run in user space. This concept is at the base of operating system theory.
The role of the operating system, in practice, is to provide programs with a consistent view of the computer’s hardware. In addition, the operating system must account for independent operation of programs and protection against unauthorized access to resources. This nontrivial task is possible only if the CPU enforces protection of system software from the applications.
Every modern processor is able to enforce this behavior. The chosen approach is to implement different operating modalities (or levels) in the CPU itself. The levels have different roles, and some operations are disallowed at the lower levels; program code can switch from one level to another only through a limited number of gates.
Unix systems are designed to take advantage of this hardware feature, using two such levels. All current processors have at least two protection levels, and some, like the x86 family, have more levels; when several levels exist, the highest and lowest levels are used. Under Unix, the kernel executes in the highest level (also called supervisor mode), where everything is allowed, whereas applications execute in the lowest level (the so-called user mode), where the processor regulates direct access to hardware and unauthorized access to memory.
We usually refer to the execution modes as kernel space and user space.
Processes running in user space also don’t have access to the kernel space. User space processes can only access a small part of the kernel via an interface exposed by the kernel - the system calls.
System Calls#
A system call is a programmatic way a program requests a service from the kernel.
The system call interface includes a number of functions that the operating system exports to the applications running on top of it. These functions allow actions like opening files, creating network connections, reading and writing from files, and so on.
System calls are divided into 5 categories mainly :
- Process Control
- File Management
- Device Management
- Information Maintenance
- Communication
Let go through some primitive system calls:
- Process Control:
-
This system calls perform the task of process creation, process termination, etc.
The Linux System calls under this are
fork()
,exit()
,exec()
.fork()
-
A new process is created by the
fork()
system call.A new process may be created with
fork()
without a new program being run-the new sub-process simply continues to execute exactly the same program that the first (parent) process was running. exit()
-
The
exit()
system call is used by a program to terminate its execution.The operating system reclaims resources that were used by the process after the
exit()
system call. exec()
-
A new program will start executing after a call to exec()
Running a new program does not require that a new process be created first: any process may call exec() at any time. The currently running program is immediately terminated, and the new program starts executing in the context of the existing process.
- File Management:
-
File management system calls handle file manipulation jobs like creating a file, reading, and writing, etc. The Linux System calls under this are
open()
,read()
,write()
,close()
.open()
-
It is the system call to open a file.
This system call just opens the file, to perform operations such as read and write, we need to execute different system call to perform the operations.
read()
-
This system call opens the file in reading mode
We can not edit the files with this system call. Multiple processes can execute the
read()
system call on the same file simultaneously. write()
-
This system call opens the file in writing mode
We can edit the files with this system call. Multiple processes can not execute the
write()
system call on the same file simultaneously. close()
- This system call closes the opened file.
- Device Management:
-
Device management does the job of device manipulation like reading from device buffers, writing into device buffers, etc. The Linux System calls under this is
ioctl()
.ioctl()
-
ioctl is referred to as Input and Output Control.
ioctl is a system call for device-specific input/output operations and other operations which cannot be expressed by regular system calls.
- Information Maintenance:
-
It handles information and its transfer between the OS and the user program. In addition, OS keeps the information about all its processes and system calls are used to access this information. The System calls under this are
getpid()
,alarm()
,sleep()
.getpid(
)-
getpid stands for Get the Process ID.
The
getpid()
function shall return the process ID of the calling process.The
getpid()
function shall always be successful, and no return value is reserved to indicate an error. alarm()
-
This system call sets an alarm clock for the delivery of a signal that when it has to be reached.
It arranges for a signal to be delivered to the calling process.
sleep()
-
This System call suspends the execution of the currently running process for some interval of time
Meanwhile, during this interval, another process is given chance to execute
- Communication:
-
These types of system calls are specially used for inter-process communications.
Two models are used for inter-process communication
- Message Passing (processes exchange messages with one another)
- Shared memory(processes share memory region to communicate)
- The system calls under this are
pipe()
,shmget()
,mmap()
.
pipe()
- The
pipe()
system call is used to communicate between different Linux processes. This system function is used to open file descriptors. shmget()
-
shmget stands for shared memory segment.
It is mainly used for Shared memory communication. This system call is used to access the shared memory and access the messages in order to communicate with the process.
mmap()
-
This function call is used to map or unmap files or devices into memory.
The
mmap()
system call is responsible for mapping the content of the file to the virtual memory space of the process.
System call table
System call table is defined in Linux kernel source code.
For example, here is the syscall_64.tbl which defines 64-bit system call numbers and entry vectors.
# The format is:
# <number> <abi> <name> <entry point>
#
# The __x64_sys_*() stubs are created on-the-fly for sys_*() system calls
#
# The abi is "common", "64" or "x32" for this file.
#
0 common read sys_read
1 common write sys_write
2 common open sys_open
3 common close sys_close
4 common stat sys_newstat
5 common fstat sys_newfstat
6 common lstat sys_newlstat
7 common poll sys_poll
8 common lseek sys_lseek
9 common mmap sys_mmap
10 common mprotect sys_mprotect
11 common munmap sys_munmap
Example#
Use printf
function which is provided in userspace:
#include <stdio.h>
void main() {
printf("USER: Hello World!\n");
}}
gcc hello_userspace.c -o hello_userspace
Run with strace
:
strace ./hello_userspace
execve("./hello_userspace", ["./hello_userspace"], 0x7ffd769d3740 /* 23 vars */) = 0
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
... read dynamic library linking index
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
... read symbols in libc library
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0
... obtain the standard output
write(1, "USER: Hello World!\n", 19USER: Hello World!) = 19
You will see a list of system call invoked, including execve
, access
, open
, read
, write
, close
.
Use low-level syscall function
glibc offers you a function called syscall()
that you can use to explore the system call interface. Note to add #define _GNU_SOURCE
to access to low-level functions.
#define _GNU_SOURCE
#include <sys/syscall.h>
// implemented in libc.so
long syscall(long number, ...);
void main() {
syscall(SYS_write, 1, "SYSCALL: Hello World!\n", 22);
}
gcc hello_syscall_glibc.c -o hello_syscall_glibc
Call syscall directly using Assembly
We know that the ID of system call write
is 1
, we can invoke it through syscall
instruction.
.global _start
.text
_start:
# write(1, message, 26)
mov $1, %rax # system call ID. 1 is write
mov $1, %rdi # file handle 1 is stdout
mov $message, %rsi # address of string to output
mov $26, %rdx # string length
syscall # system call invocation!
# exit(0)
mov $60, %rax # system call ID. 60 is exit
xor %rdi, %rdi # we want return code 0
syscall # system call invocation!
message:
.ascii "ASM SYSCALL: Hello World!\n"
gcc hello_syscall_asm.s -nostdlib -no-pie -o hello_syscall_asm
Run with strace
to see only write
system call is invoked:
strace ./hello_syscall_asm
execve("./hello_syscall_asm", ["./hello_syscall_asm"], 0x7ffe4fc483a0 /* 23 vars */) = 0
write(1, "ASM SYSCALL: Hello World!\n", 26ASM SYSCALL: Hello World!
) = 26
Exercise#
-
Use
ltrace -S
instead ofstrace
to investigate the system call order. -
strace
can attach to a running process. Try it.