Commit 3d885bef authored by Jonathan Corbet's avatar Jonathan Corbet

Merge branch 'user-space-api' into docs-next

Create a beginning user-space API manual, and add one old doc that nobody
cares much about.  More useful stuff should come soon...
parents 9581539e f504d47b
...@@ -24,6 +24,18 @@ trying to get it to work optimally on a given system. ...@@ -24,6 +24,18 @@ trying to get it to work optimally on a given system.
admin-guide/index admin-guide/index
Application-developer documentation
-----------------------------------
The user-space API manual gathers together documents describing aspects of
the kernel interface as seen by application developers.
.. toctree::
:maxdepth: 2
userspace-api/index
Introduction to kernel development Introduction to kernel development
---------------------------------- ----------------------------------
......
# -*- coding: utf-8; mode: python -*-
project = "The Linux kernel user-space API guide"
tags.add("subproject")
latex_documents = [
('index', 'userspace-api.tex', project,
'The kernel development community', 'manual'),
]
=====================================
The Linux kernel user-space API guide
=====================================
.. _man-pages: https://www.kernel.org/doc/man-pages/
While much of the kernel's user-space API is documented elsewhere
(particularly in the man-pages_ project), some user-space information can
also be found in the kernel tree itself. This manual is intended to be the
place where this information is gathered.
.. class:: toc-title
Table of contents
.. toctree::
:maxdepth: 2
unshare
.. only:: subproject and html
Indices
=======
* :ref:`genindex`
unshare system call
===================
unshare system call: This document describes the new system call, unshare(). The document
--------------------
This document describes the new system call, unshare. The document
provides an overview of the feature, why it is needed, how it can provides an overview of the feature, why it is needed, how it can
be used, its interface specification, design, implementation and be used, its interface specification, design, implementation and
how it can be tested. how it can be tested.
Change Log: Change Log
----------- ----------
version 0.1 Initial document, Janak Desai (janak@us.ibm.com), Jan 11, 2006 version 0.1 Initial document, Janak Desai (janak@us.ibm.com), Jan 11, 2006
Contents: Contents
--------- --------
1) Overview 1) Overview
2) Benefits 2) Benefits
3) Cost 3) Cost
...@@ -24,6 +24,7 @@ Contents: ...@@ -24,6 +24,7 @@ Contents:
1) Overview 1) Overview
----------- -----------
Most legacy operating system kernels support an abstraction of threads Most legacy operating system kernels support an abstraction of threads
as multiple execution contexts within a process. These kernels provide as multiple execution contexts within a process. These kernels provide
special resources and mechanisms to maintain these "threads". The Linux special resources and mechanisms to maintain these "threads". The Linux
...@@ -38,33 +39,35 @@ threads. On Linux, at the time of thread creation using the clone system ...@@ -38,33 +39,35 @@ threads. On Linux, at the time of thread creation using the clone system
call, applications can selectively choose which resources to share call, applications can selectively choose which resources to share
between threads. between threads.
unshare system call adds a primitive to the Linux thread model that unshare() system call adds a primitive to the Linux thread model that
allows threads to selectively 'unshare' any resources that were being allows threads to selectively 'unshare' any resources that were being
shared at the time of their creation. unshare was conceptualized by shared at the time of their creation. unshare() was conceptualized by
Al Viro in the August of 2000, on the Linux-Kernel mailing list, as part Al Viro in the August of 2000, on the Linux-Kernel mailing list, as part
of the discussion on POSIX threads on Linux. unshare augments the of the discussion on POSIX threads on Linux. unshare() augments the
usefulness of Linux threads for applications that would like to control usefulness of Linux threads for applications that would like to control
shared resources without creating a new process. unshare is a natural shared resources without creating a new process. unshare() is a natural
addition to the set of available primitives on Linux that implement addition to the set of available primitives on Linux that implement
the concept of process/thread as a virtual machine. the concept of process/thread as a virtual machine.
2) Benefits 2) Benefits
----------- -----------
unshare would be useful to large application frameworks such as PAM
unshare() would be useful to large application frameworks such as PAM
where creating a new process to control sharing/unsharing of process where creating a new process to control sharing/unsharing of process
resources is not possible. Since namespaces are shared by default resources is not possible. Since namespaces are shared by default
when creating a new process using fork or clone, unshare can benefit when creating a new process using fork or clone, unshare() can benefit
even non-threaded applications if they have a need to disassociate even non-threaded applications if they have a need to disassociate
from default shared namespace. The following lists two use-cases from default shared namespace. The following lists two use-cases
where unshare can be used. where unshare() can be used.
2.1 Per-security context namespaces 2.1 Per-security context namespaces
----------------------------------- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
unshare can be used to implement polyinstantiated directories using
unshare() can be used to implement polyinstantiated directories using
the kernel's per-process namespace mechanism. Polyinstantiated directories, the kernel's per-process namespace mechanism. Polyinstantiated directories,
such as per-user and/or per-security context instance of /tmp, /var/tmp or such as per-user and/or per-security context instance of /tmp, /var/tmp or
per-security context instance of a user's home directory, isolate user per-security context instance of a user's home directory, isolate user
processes when working with these directories. Using unshare, a PAM processes when working with these directories. Using unshare(), a PAM
module can easily setup a private namespace for a user at login. module can easily setup a private namespace for a user at login.
Polyinstantiated directories are required for Common Criteria certification Polyinstantiated directories are required for Common Criteria certification
with Labeled System Protection Profile, however, with the availability with Labeled System Protection Profile, however, with the availability
...@@ -74,33 +77,36 @@ polyinstantiating /tmp, /var/tmp and other directories deemed ...@@ -74,33 +77,36 @@ polyinstantiating /tmp, /var/tmp and other directories deemed
appropriate by system administrators. appropriate by system administrators.
2.2 unsharing of virtual memory and/or open files 2.2 unsharing of virtual memory and/or open files
------------------------------------------------- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Consider a client/server application where the server is processing Consider a client/server application where the server is processing
client requests by creating processes that share resources such as client requests by creating processes that share resources such as
virtual memory and open files. Without unshare, the server has to virtual memory and open files. Without unshare(), the server has to
decide what needs to be shared at the time of creating the process decide what needs to be shared at the time of creating the process
which services the request. unshare allows the server an ability to which services the request. unshare() allows the server an ability to
disassociate parts of the context during the servicing of the disassociate parts of the context during the servicing of the
request. For large and complex middleware application frameworks, this request. For large and complex middleware application frameworks, this
ability to unshare after the process was created can be very ability to unshare() after the process was created can be very
useful. useful.
3) Cost 3) Cost
------- -------
In order to not duplicate code and to handle the fact that unshare
In order to not duplicate code and to handle the fact that unshare()
works on an active task (as opposed to clone/fork working on a newly works on an active task (as opposed to clone/fork working on a newly
allocated inactive task) unshare had to make minor reorganizational allocated inactive task) unshare() had to make minor reorganizational
changes to copy_* functions utilized by clone/fork system call. changes to copy_* functions utilized by clone/fork system call.
There is a cost associated with altering existing, well tested and There is a cost associated with altering existing, well tested and
stable code to implement a new feature that may not get exercised stable code to implement a new feature that may not get exercised
extensively in the beginning. However, with proper design and code extensively in the beginning. However, with proper design and code
review of the changes and creation of an unshare test for the LTP review of the changes and creation of an unshare() test for the LTP
the benefits of this new feature can exceed its cost. the benefits of this new feature can exceed its cost.
4) Requirements 4) Requirements
--------------- ---------------
unshare reverses sharing that was done using clone(2) system call,
so unshare should have a similar interface as clone(2). That is, unshare() reverses sharing that was done using clone(2) system call,
so unshare() should have a similar interface as clone(2). That is,
since flags in clone(int flags, void *stack) specifies what should since flags in clone(int flags, void *stack) specifies what should
be shared, similar flags in unshare(int flags) should specify be shared, similar flags in unshare(int flags) should specify
what should be unshared. Unfortunately, this may appear to invert what should be unshared. Unfortunately, this may appear to invert
...@@ -108,13 +114,14 @@ the meaning of the flags from the way they are used in clone(2). ...@@ -108,13 +114,14 @@ the meaning of the flags from the way they are used in clone(2).
However, there was no easy solution that was less confusing and that However, there was no easy solution that was less confusing and that
allowed incremental context unsharing in future without an ABI change. allowed incremental context unsharing in future without an ABI change.
unshare interface should accommodate possible future addition of unshare() interface should accommodate possible future addition of
new context flags without requiring a rebuild of old applications. new context flags without requiring a rebuild of old applications.
If and when new context flags are added, unshare design should allow If and when new context flags are added, unshare() design should allow
incremental unsharing of those resources on an as needed basis. incremental unsharing of those resources on an as needed basis.
5) Functional Specification 5) Functional Specification
--------------------------- ---------------------------
NAME NAME
unshare - disassociate parts of the process execution context unshare - disassociate parts of the process execution context
...@@ -124,7 +131,7 @@ SYNOPSIS ...@@ -124,7 +131,7 @@ SYNOPSIS
int unshare(int flags); int unshare(int flags);
DESCRIPTION DESCRIPTION
unshare allows a process to disassociate parts of its execution unshare() allows a process to disassociate parts of its execution
context that are currently being shared with other processes. Part context that are currently being shared with other processes. Part
of execution context, such as the namespace, is shared by default of execution context, such as the namespace, is shared by default
when a new process is created using fork(2), while other parts, when a new process is created using fork(2), while other parts,
...@@ -132,7 +139,7 @@ DESCRIPTION ...@@ -132,7 +139,7 @@ DESCRIPTION
shared by explicit request to share them when creating a process shared by explicit request to share them when creating a process
using clone(2). using clone(2).
The main use of unshare is to allow a process to control its The main use of unshare() is to allow a process to control its
shared execution context without creating a new process. shared execution context without creating a new process.
The flags argument specifies one or bitwise-or'ed of several of The flags argument specifies one or bitwise-or'ed of several of
...@@ -176,17 +183,20 @@ SEE ALSO ...@@ -176,17 +183,20 @@ SEE ALSO
6) High Level Design 6) High Level Design
-------------------- --------------------
Depending on the flags argument, the unshare system call allocates
Depending on the flags argument, the unshare() system call allocates
appropriate process context structures, populates it with values from appropriate process context structures, populates it with values from
the current shared version, associates newly duplicated structures the current shared version, associates newly duplicated structures
with the current task structure and releases corresponding shared with the current task structure and releases corresponding shared
versions. Helper functions of clone (copy_*) could not be used versions. Helper functions of clone (copy_*) could not be used
directly by unshare because of the following two reasons. directly by unshare() because of the following two reasons.
1) clone operates on a newly allocated not-yet-active task 1) clone operates on a newly allocated not-yet-active task
structure, where as unshare operates on the current active structure, where as unshare() operates on the current active
task. Therefore unshare has to take appropriate task_lock() task. Therefore unshare() has to take appropriate task_lock()
before associating newly duplicated context structures before associating newly duplicated context structures
2) unshare has to allocate and duplicate all context structures
2) unshare() has to allocate and duplicate all context structures
that are being unshared, before associating them with the that are being unshared, before associating them with the
current task and releasing older shared structures. Failure current task and releasing older shared structures. Failure
do so will create race conditions and/or oops when trying do so will create race conditions and/or oops when trying
...@@ -202,80 +212,106 @@ Therefore code from copy_* functions that allocated and duplicated ...@@ -202,80 +212,106 @@ Therefore code from copy_* functions that allocated and duplicated
current context structure was moved into new dup_* functions. Now, current context structure was moved into new dup_* functions. Now,
copy_* functions call dup_* functions to allocate and duplicate copy_* functions call dup_* functions to allocate and duplicate
appropriate context structures and then associate them with the appropriate context structures and then associate them with the
task structure that is being constructed. unshare system call on task structure that is being constructed. unshare() system call on
the other hand performs the following: the other hand performs the following:
1) Check flags to force missing, but implied, flags 1) Check flags to force missing, but implied, flags
2) For each context structure, call the corresponding unshare
2) For each context structure, call the corresponding unshare()
helper function to allocate and duplicate a new context helper function to allocate and duplicate a new context
structure, if the appropriate bit is set in the flags argument. structure, if the appropriate bit is set in the flags argument.
3) If there is no error in allocation and duplication and there 3) If there is no error in allocation and duplication and there
are new context structures then lock the current task structure, are new context structures then lock the current task structure,
associate new context structures with the current task structure, associate new context structures with the current task structure,
and release the lock on the current task structure. and release the lock on the current task structure.
4) Appropriately release older, shared, context structures. 4) Appropriately release older, shared, context structures.
7) Low Level Design 7) Low Level Design
------------------- -------------------
Implementation of unshare can be grouped in the following 4 different
Implementation of unshare() can be grouped in the following 4 different
items: items:
a) Reorganization of existing copy_* functions a) Reorganization of existing copy_* functions
b) unshare system call service function
c) unshare helper functions for each different process context b) unshare() system call service function
c) unshare() helper functions for each different process context
d) Registration of system call number for different architectures d) Registration of system call number for different architectures
7.1) Reorganization of copy_* functions 7.1) Reorganization of copy_* functions
Each copy function such as copy_mm, copy_namespace, copy_files, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
etc, had roughly two components. The first component allocated
and duplicated the appropriate structure and the second component Each copy function such as copy_mm, copy_namespace, copy_files,
linked it to the task structure passed in as an argument to the copy etc, had roughly two components. The first component allocated
function. The first component was split into its own function. and duplicated the appropriate structure and the second component
These dup_* functions allocated and duplicated the appropriate linked it to the task structure passed in as an argument to the copy
context structure. The reorganized copy_* functions invoked function. The first component was split into its own function.
their corresponding dup_* functions and then linked the newly These dup_* functions allocated and duplicated the appropriate
duplicated structures to the task structure with which the context structure. The reorganized copy_* functions invoked
copy function was called. their corresponding dup_* functions and then linked the newly
duplicated structures to the task structure with which the
7.2) unshare system call service function copy function was called.
7.2) unshare() system call service function
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* Check flags * Check flags
Force implied flags. If CLONE_THREAD is set force CLONE_VM. Force implied flags. If CLONE_THREAD is set force CLONE_VM.
If CLONE_VM is set, force CLONE_SIGHAND. If CLONE_SIGHAND is If CLONE_VM is set, force CLONE_SIGHAND. If CLONE_SIGHAND is
set and signals are also being shared, force CLONE_THREAD. If set and signals are also being shared, force CLONE_THREAD. If
CLONE_NEWNS is set, force CLONE_FS. CLONE_NEWNS is set, force CLONE_FS.
* For each context flag, invoke the corresponding unshare_* * For each context flag, invoke the corresponding unshare_*
helper routine with flags passed into the system call and a helper routine with flags passed into the system call and a
reference to pointer pointing the new unshared structure reference to pointer pointing the new unshared structure
* If any new structures are created by unshare_* helper * If any new structures are created by unshare_* helper
functions, take the task_lock() on the current task, functions, take the task_lock() on the current task,
modify appropriate context pointers, and release the modify appropriate context pointers, and release the
task lock. task lock.
* For all newly unshared structures, release the corresponding * For all newly unshared structures, release the corresponding
older, shared, structures. older, shared, structures.
7.3) unshare_* helper functions 7.3) unshare_* helper functions
For unshare_* helpers corresponding to CLONE_SYSVSEM, CLONE_SIGHAND, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
and CLONE_THREAD, return -EINVAL since they are not implemented yet.
For others, check the flag value to see if the unsharing is
required for that structure. If it is, invoke the corresponding
dup_* function to allocate and duplicate the structure and return
a pointer to it.
7.4) Appropriately modify architecture specific code to register the For unshare_* helpers corresponding to CLONE_SYSVSEM, CLONE_SIGHAND,
new system call. and CLONE_THREAD, return -EINVAL since they are not implemented yet.
For others, check the flag value to see if the unsharing is
required for that structure. If it is, invoke the corresponding
dup_* function to allocate and duplicate the structure and return
a pointer to it.
7.4) Finally
~~~~~~~~~~~~
Appropriately modify architecture specific code to register the
new system call.
8) Test Specification 8) Test Specification
--------------------- ---------------------
The test for unshare should test the following:
The test for unshare() should test the following:
1) Valid flags: Test to check that clone flags for signal and 1) Valid flags: Test to check that clone flags for signal and
signal handlers, for which unsharing is not implemented signal handlers, for which unsharing is not implemented
yet, return -EINVAL. yet, return -EINVAL.
2) Missing/implied flags: Test to make sure that if unsharing 2) Missing/implied flags: Test to make sure that if unsharing
namespace without specifying unsharing of filesystem, correctly namespace without specifying unsharing of filesystem, correctly
unshares both namespace and filesystem information. unshares both namespace and filesystem information.
3) For each of the four (namespace, filesystem, files and vm) 3) For each of the four (namespace, filesystem, files and vm)
supported unsharing, verify that the system call correctly supported unsharing, verify that the system call correctly
unshares the appropriate structure. Verify that unsharing unshares the appropriate structure. Verify that unsharing
them individually as well as in combination with each them individually as well as in combination with each
other works as expected. other works as expected.
4) Concurrent execution: Use shared memory segments and futex on 4) Concurrent execution: Use shared memory segments and futex on
an address in the shm segment to synchronize execution of an address in the shm segment to synchronize execution of
about 10 threads. Have a couple of threads execute execve, about 10 threads. Have a couple of threads execute execve,
...@@ -285,11 +321,12 @@ The test for unshare should test the following: ...@@ -285,11 +321,12 @@ The test for unshare should test the following:
9) Future Work 9) Future Work
-------------- --------------
The current implementation of unshare does not allow unsharing of
The current implementation of unshare() does not allow unsharing of
signals and signal handlers. Signals are complex to begin with and signals and signal handlers. Signals are complex to begin with and
to unshare signals and/or signal handlers of a currently running to unshare signals and/or signal handlers of a currently running
process is even more complex. If in the future there is a specific process is even more complex. If in the future there is a specific
need to allow unsharing of signals and/or signal handlers, it can need to allow unsharing of signals and/or signal handlers, it can
be incrementally added to unshare without affecting legacy be incrementally added to unshare() without affecting legacy
applications using unshare. applications using unshare().
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment