Commit 435a7743 authored by Jonathan Corbet's avatar Jonathan Corbet

Merge branch 'mauro' into docs-next

A big set of fixes and RST conversions from Mauro.  He swears that this is
the last RST conversion set, which is certainly cause for celebration.
parents 0b227076 4ac25081
......@@ -8,7 +8,7 @@ Description:
to device min/max capabilities. Values are integer as they are
stored in a 8bit register in the device. Lowest value is
automatically put to TL. Once set, alarms could be search at
master level, refer to Documentation/w1/w1_generic.rst for
master level, refer to Documentation/w1/w1-generic.rst for
detailed information
Users: any user space application which wants to communicate with
w1_term device
......
......@@ -265,7 +265,7 @@ Set the DMA mask size
---------------------
.. note::
If anything below doesn't make sense, please refer to
Documentation/DMA-API.txt. This section is just a reminder that
:doc:`/core-api/dma-api`. This section is just a reminder that
drivers need to indicate DMA capabilities of the device and is not
an authoritative source for DMA interfaces.
......@@ -291,7 +291,7 @@ Many 64-bit "PCI" devices (before PCI-X) and some PCI-X devices are
Setup shared control data
-------------------------
Once the DMA masks are set, the driver can allocate "consistent" (a.k.a. shared)
memory. See Documentation/DMA-API.txt for a full description of
memory. See :doc:`/core-api/dma-api` for a full description of
the DMA APIs. This section is just a reminder that it needs to be done
before enabling DMA on the device.
......@@ -421,7 +421,7 @@ owners if there is one.
Then clean up "consistent" buffers which contain the control data.
See Documentation/DMA-API.txt for details on unmapping interfaces.
See :doc:`/core-api/dma-api` for details on unmapping interfaces.
Unregister from other subsystems
......
......@@ -101,37 +101,48 @@ be specified in bytes with optional scale suffix [kKmMgG]. The default huge
page size may be selected with the "default_hugepagesz=<size>" boot parameter.
Hugetlb boot command line parameter semantics
hugepagesz - Specify a huge page size. Used in conjunction with hugepages
hugepagesz
Specify a huge page size. Used in conjunction with hugepages
parameter to preallocate a number of huge pages of the specified
size. Hence, hugepagesz and hugepages are typically specified in
pairs such as:
pairs such as::
hugepagesz=2M hugepages=512
hugepagesz can only be specified once on the command line for a
specific huge page size. Valid huge page sizes are architecture
dependent.
hugepages - Specify the number of huge pages to preallocate. This typically
hugepages
Specify the number of huge pages to preallocate. This typically
follows a valid hugepagesz or default_hugepagesz parameter. However,
if hugepages is the first or only hugetlb command line parameter it
implicitly specifies the number of huge pages of default size to
allocate. If the number of huge pages of default size is implicitly
specified, it can not be overwritten by a hugepagesz,hugepages
parameter pair for the default size.
For example, on an architecture with 2M default huge page size:
For example, on an architecture with 2M default huge page size::
hugepages=256 hugepagesz=2M hugepages=512
will result in 256 2M huge pages being allocated and a warning message
indicating that the hugepages=512 parameter is ignored. If a hugepages
parameter is preceded by an invalid hugepagesz parameter, it will
be ignored.
default_hugepagesz - Specify the default huge page size. This parameter can
default_hugepagesz
pecify the default huge page size. This parameter can
only be specified once on the command line. default_hugepagesz can
optionally be followed by the hugepages parameter to preallocate a
specific number of huge pages of default size. The number of default
sized huge pages to preallocate can also be implicitly specified as
mentioned in the hugepages section above. Therefore, on an
architecture with 2M default huge page size:
architecture with 2M default huge page size::
hugepages=256
default_hugepagesz=2M hugepages=256
hugepages=256 default_hugepagesz=2M
will all result in 256 2M huge pages being allocated. Valid default
huge page size is architecture dependent.
......
......@@ -31,6 +31,7 @@ the Linux memory management.
idle_page_tracking
ksm
memory-hotplug
nommu-map
numa_memory_policy
numaperf
pagemap
......
......@@ -583,7 +583,7 @@ trimming of allocations is initiated.
The default value is 1.
See Documentation/nommu-mmap.txt for more information.
See Documentation/admin-guide/mm/nommu-mmap.rst for more information.
numa_zonelist_order
......
......@@ -128,7 +128,7 @@ it. The recommended placement is in the first 16KiB of RAM.
The boot loader must load a device tree image (dtb) into system ram
at a 64bit aligned address and initialize it with the boot data. The
dtb format is documented in Documentation/devicetree/booting-without-of.txt.
dtb format is documented in Documentation/devicetree/booting-without-of.rst.
The kernel will look for the dtb magic value of 0xd00dfeed at the dtb
physical address to determine if a dtb has been passed instead of a
tagged list.
......
......@@ -196,7 +196,7 @@ a virtual address mapping (unlike the earlier scheme of virtual address
do not have a corresponding kernel virtual address space mapping) and
low-memory pages.
Note: Please refer to Documentation/DMA-API-HOWTO.txt for a discussion
Note: Please refer to :doc:`/core-api/dma-api-howto` for a discussion
on PCI high mem DMA aspects and mapping of scatter gather lists, and support
for 64 bit PCI.
......
......@@ -8,7 +8,7 @@ How to access I/O mapped memory from within device drivers
The virt_to_bus() and bus_to_virt() functions have been
superseded by the functionality provided by the PCI DMA interface
(see Documentation/DMA-API-HOWTO.txt). They continue
(see :doc:`/core-api/dma-api-howto`). They continue
to be documented below for historical purposes, but new code
must not use them. --davidm 00/12/12
......
......@@ -5,7 +5,7 @@ Dynamic DMA mapping using the generic device
:Author: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
This document describes the DMA API. For a more gentle introduction
of the API (and actual examples), see Documentation/DMA-API-HOWTO.txt.
of the API (and actual examples), see :doc:`/core-api/dma-api-howto`.
This API is split into two pieces. Part I describes the basic API.
Part II describes extensions for supporting non-consistent memory
......@@ -471,7 +471,7 @@ without the _attrs suffixes, except that they pass an optional
dma_attrs.
The interpretation of DMA attributes is architecture-specific, and
each attribute should be documented in Documentation/DMA-attributes.txt.
each attribute should be documented in :doc:`/core-api/dma-attributes`.
If dma_attrs are 0, the semantics of each of these functions
is identical to those of the corresponding function
......@@ -484,7 +484,7 @@ for DMA::
#include <linux/dma-mapping.h>
/* DMA_ATTR_FOO should be defined in linux/dma-mapping.h and
* documented in Documentation/DMA-attributes.txt */
* documented in Documentation/core-api/dma-attributes.rst */
...
unsigned long attr;
......
......@@ -17,7 +17,7 @@ To do ISA style DMA you need to include two headers::
#include <asm/dma.h>
The first is the generic DMA API used to convert virtual addresses to
bus addresses (see Documentation/DMA-API.txt for details).
bus addresses (see :doc:`/core-api/dma-api` for details).
The second contains the routines specific to ISA DMA transfers. Since
this is not present on all platforms make sure you construct your
......
......@@ -39,6 +39,8 @@ Library functionality that is used throughout the kernel.
rbtree
generic-radix-tree
packing
bus-virt-phys-mapping
this_cpu_ops
timekeeping
errseq
......@@ -82,6 +84,7 @@ more memory-management documentation in :doc:`/vm/index`.
:maxdepth: 1
memory-allocation
unaligned-memory-access
dma-api
dma-api-howto
dma-attributes
......
Booting the Linux/ppc kernel without Open Firmware
--------------------------------------------------
.. SPDX-License-Identifier: GPL-2.0
(c) 2005 Benjamin Herrenschmidt <benh at kernel.crashing.org>,
IBM Corp.
(c) 2005 Becky Bruce <becky.bruce at freescale.com>,
Freescale Semiconductor, FSL SOC and 32-bit additions
(c) 2006 MontaVista Software, Inc.
Flash chip node definition
==================================================
Booting the Linux/ppc kernel without Open Firmware
==================================================
Table of Contents
=================
Copyright (c) 2005 Benjamin Herrenschmidt <benh at kernel.crashing.org>,
IBM Corp.
Copyright (c) 2005 Becky Bruce <becky.bruce at freescale.com>,
Freescale Semiconductor, FSL SOC and 32-bit additions
Copyright (c) 2006 MontaVista Software, Inc.
Flash chip node definition
.. Table of Contents
I - Introduction
1) Entry point for arch/arm
......@@ -61,15 +65,18 @@ Table of Contents
Revision Information
====================
May 18, 2005: Rev 0.1 - Initial draft, no chapter III yet.
May 18, 2005: Rev 0.1
- Initial draft, no chapter III yet.
May 19, 2005: Rev 0.2 - Add chapter III and bits & pieces here or
May 19, 2005: Rev 0.2
- Add chapter III and bits & pieces here or
clarifies the fact that a lot of things are
optional, the kernel only requires a very
small device tree, though it is encouraged
to provide an as complete one as possible.
May 24, 2005: Rev 0.3 - Precise that DT block has to be in RAM
May 24, 2005: Rev 0.3
- Precise that DT block has to be in RAM
- Misc fixes
- Define version 3 and new format version 16
for the DT block (version 16 needs kernel
......@@ -82,7 +89,8 @@ Revision Information
"name" property is now automatically
deduced from the unit name
June 1, 2005: Rev 0.4 - Correct confusion between OF_DT_END and
June 1, 2005: Rev 0.4
- Correct confusion between OF_DT_END and
OF_DT_END_NODE in structure definition.
- Change version 16 format to always align
property data to 4 bytes. Since tokens are
......@@ -260,7 +268,7 @@ it with special cases.
b) create your main platform file as
"arch/powerpc/platforms/myplatform/myboard_setup.c" and add it
to the Makefile under the condition of your CONFIG_
to the Makefile under the condition of your ``CONFIG_``
option. This file will define a structure of type "ppc_md"
containing the various callbacks that the generic code will
use to get to your platform specific code
......@@ -271,7 +279,7 @@ it with special cases.
with classic Powerpc architectures.
3) Entry point for arch/x86
-------------------------------
---------------------------
There is one single 32bit entry point to the kernel at code32_start,
the decompressor (the real mode entry point goes to the same 32bit
......@@ -280,7 +288,7 @@ it with special cases.
Documentation/x86/boot.rst
The physical pointer to the device-tree block (defined in chapter II)
is passed via setup_data which requires at least boot protocol 2.09.
The type filed is defined as
The type filed is defined as::
#define SETUP_DTB 2
......@@ -354,9 +362,9 @@ the block to RAM before passing it to the kernel.
The kernel is passed the physical address pointing to an area of memory
that is roughly described in include/linux/of_fdt.h by the structure
boot_param_header:
boot_param_header:::
struct boot_param_header {
struct boot_param_header {
u32 magic; /* magic word OF_DT_HEADER */
u32 totalsize; /* total size of DT block */
u32 off_dt_struct; /* offset to structure */
......@@ -374,19 +382,19 @@ struct boot_param_header {
/* version 17 fields below */
u32 size_dt_struct; /* size of the DT structure block */
};
};
Along with the constants:
Along with the constants::
/* Definitions used by the flattened device tree */
#define OF_DT_HEADER 0xd00dfeed /* 4: version,
/* Definitions used by the flattened device tree */
#define OF_DT_HEADER 0xd00dfeed /* 4: version,
4: total size */
#define OF_DT_BEGIN_NODE 0x1 /* Start node: full name
#define OF_DT_BEGIN_NODE 0x1 /* Start node: full name
*/
#define OF_DT_END_NODE 0x2 /* End node */
#define OF_DT_PROP 0x3 /* Property: name off,
#define OF_DT_END_NODE 0x2 /* End node */
#define OF_DT_PROP 0x3 /* Property: name off,
size, content */
#define OF_DT_END 0x9
#define OF_DT_END 0x9
All values in this header are in big endian format, the various
fields in this header are defined more precisely below. All
......@@ -430,7 +438,7 @@ struct boot_param_header {
way to avoid overriding critical things like, on Open Firmware
capable machines, the RTAS instance, or on some pSeries, the TCE
tables used for the iommu. Typically, the reserve map should
contain _at least_ this DT block itself (header,total_size). If
contain **at least** this DT block itself (header,total_size). If
you are passing an initrd to the kernel, you should reserve it as
well. You do not need to reserve the kernel image itself. The map
should be 64-bit aligned.
......@@ -485,7 +493,7 @@ struct boot_param_header {
So the typical layout of a DT block (though the various parts don't
need to be in that order) looks like this (addresses go from top to
bottom):
bottom)::
------------------------------
......@@ -600,7 +608,7 @@ discussed in a later chapter. At this point, it is only meant to give
you a idea of what a device-tree looks like. I have purposefully kept
the "name" and "linux,phandle" properties which aren't necessary in
order to give you a better idea of what the tree looks like in
practice.
practice::
/ o device-tree
|- name = "device-tree"
......@@ -650,6 +658,7 @@ properties and their content.
3) Device tree "structure" block
--------------------------------
The structure of the device tree is a linearized tree structure. The
"OF_DT_BEGIN_NODE" token starts a new node, and the "OF_DT_END_NODE"
......@@ -666,12 +675,14 @@ Here's the basic structure of a single node:
root node)
* [align gap to next 4 bytes boundary]
* for each property:
* token OF_DT_PROP (that is 0x00000003)
* 32-bit value of property value size in bytes (or 0 if no
value)
* 32-bit value of offset in string block of property name
* property value data if any
* [align gap to next 4 bytes boundary]
* [child nodes if any]
* token OF_DT_END_NODE (that is 0x00000002)
......@@ -688,6 +699,7 @@ manipulating a flattened tree must take care to preserve this
constraint.
4) Device tree "strings" block
------------------------------
In order to save space, property names, which are generally redundant,
are stored separately in the "strings" block. This block is simply the
......@@ -700,15 +712,17 @@ strings block.
III - Required content of the device tree
=========================================
WARNING: All "linux,*" properties defined in this document apply only
to a flattened device-tree. If your platform uses a real
implementation of Open Firmware or an implementation compatible with
the Open Firmware client interface, those properties will be created
by the trampoline code in the kernel's prom_init() file. For example,
that's where you'll have to add code to detect your board model and
set the platform number. However, when using the flattened device-tree
entry point, there is no prom_init() pass, and thus you have to
provide those properties yourself.
.. Warning::
All ``linux,*`` properties defined in this document apply only
to a flattened device-tree. If your platform uses a real
implementation of Open Firmware or an implementation compatible with
the Open Firmware client interface, those properties will be created
by the trampoline code in the kernel's prom_init() file. For example,
that's where you'll have to add code to detect your board model and
set the platform number. However, when using the flattened device-tree
entry point, there is no prom_init() pass, and thus you have to
provide those properties yourself.
1) Note about cells and address representation
......@@ -769,7 +783,7 @@ addresses), all buses must contain a "ranges" property. If the
"ranges" property is missing at a given level, it's assumed that
translation isn't possible, i.e., the registers are not visible on the
parent bus. The format of the "ranges" property for a bus is a list
of:
of::
bus address, parent bus address, size
......@@ -877,7 +891,7 @@ address which can extend beyond that limit.
This node is the parent of all individual CPU nodes. It doesn't
have any specific requirements, though it's generally good practice
to have at least:
to have at least::
#address-cells = <00000001>
#size-cells = <00000000>
......@@ -887,7 +901,7 @@ address which can extend beyond that limit.
that format when reading the "reg" properties of a CPU node, see
below
c) The /cpus/* nodes
c) The ``/cpus/*`` nodes
So under /cpus, you are supposed to create a node for every CPU on
the machine. There is no specific restriction on the name of the
......@@ -903,21 +917,23 @@ address which can extend beyond that limit.
- reg : This is the physical CPU number, it's a single 32-bit cell
and is also used as-is as the unit number for constructing the
unit name in the full path. For example, with 2 CPUs, you would
have the full path:
have the full path::
/cpus/PowerPC,970FX@0
/cpus/PowerPC,970FX@1
(unit addresses do not require leading zeroes)
- d-cache-block-size : one cell, L1 data cache block size in bytes (*)
- d-cache-block-size : one cell, L1 data cache block size in bytes [#]_
- i-cache-block-size : one cell, L1 instruction cache block size in
bytes
- d-cache-size : one cell, size of L1 data cache in bytes
- i-cache-size : one cell, size of L1 instruction cache in bytes
(*) The cache "block" size is the size on which the cache management
instructions operate. Historically, this document used the cache
"line" size here which is incorrect. The kernel will prefer the cache
block size and will fallback to cache line size for backward
compatibility.
.. [#] The cache "block" size is the size on which the cache management
instructions operate. Historically, this document used the cache
"line" size here which is incorrect. The kernel will prefer the cache
block size and will fallback to cache line size for backward
compatibility.
Recommended properties:
......@@ -963,7 +979,7 @@ compatibility.
#address-cells and #size-cells of the root node. For example,
with both of these properties being 2 like in the example given
earlier, a 970 based machine with 6Gb of RAM could typically
have a "reg" property here that looks like:
have a "reg" property here that looks like::
00000000 00000000 00000000 80000000
00000001 00000000 00000001 00000000
......@@ -1058,7 +1074,7 @@ compatibility.
on the SOC but are not used by a particular platform. See chapter VI
for more information on how to specify devices that are part of a SOC.
Example SOC node for the MPC8540:
Example SOC node for the MPC8540::
soc8540@e0000000 {
#address-cells = <1>;
......@@ -1079,18 +1095,20 @@ IV - "dtc", the device tree compiler
dtc source code can be found at
<http://git.jdl.com/gitweb/?p=dtc.git>
WARNING: This version is still in early development stage; the
resulting device-tree "blobs" have not yet been validated with the
kernel. The current generated block lacks a useful reserve map (it will
be fixed to generate an empty one, it's up to the bootloader to fill
it up) among others. The error handling needs work, bugs are lurking,
etc...
.. Warning::
This version is still in early development stage; the
resulting device-tree "blobs" have not yet been validated with the
kernel. The current generated block lacks a useful reserve map (it will
be fixed to generate an empty one, it's up to the bootloader to fill
it up) among others. The error handling needs work, bugs are lurking,
etc...
dtc basically takes a device-tree in a given format and outputs a
device-tree in another format. The currently supported formats are:
Input formats:
-------------
Input formats
-------------
- "dtb": "blob" format, that is a flattened device-tree block
with
......@@ -1102,8 +1120,8 @@ device-tree in another format. The currently supported formats are:
output of /proc/device-tree, that is nodes are directories and
properties are files
Output formats:
---------------
Output formats
--------------
- "dtb": "blob" format
- "dts": "source" format
......@@ -1113,7 +1131,7 @@ device-tree in another format. The currently supported formats are:
assembly file exports some symbols that can be used.
The syntax of the dtc tool is
The syntax of the dtc tool is::
dtc [-I <input-format>] [-O <output-format>]
[-o output-filename] [-V output_version] input_filename
......@@ -1127,15 +1145,17 @@ Additionally, dtc performs various sanity checks on the tree, like the
uniqueness of linux, phandle properties, validity of strings, etc...
The format of the .dts "source" file is "C" like, supports C and C++
style comments.
style comments::
/ {
}
/ {
}
The above is the "device-tree" definition. It's the only statement
supported currently at the toplevel.
/ {
::
/ {
property1 = "string_value"; /* define a property containing a 0
* terminated string
*/
......@@ -1163,7 +1183,7 @@ supported currently at the toplevel.
* childnode (in this case, a string)
*/
};
};
};
Nodes can contain other nodes etc... thus defining the hierarchical
structure of the tree.
......@@ -1322,7 +1342,7 @@ phandle of the parent node.
If the interrupt-parent property is not defined for a node, its
interrupt parent is assumed to be an ancestor in the node's
_device tree_ hierarchy.
*device tree* hierarchy.
3) OpenPIC Interrupt Controllers
--------------------------------
......@@ -1334,10 +1354,12 @@ information.
Sense and level information should be encoded as follows:
0 = low to high edge sensitive type enabled
1 = active low level sensitive type enabled
2 = active high level sensitive type enabled
3 = high to low edge sensitive type enabled
== ========================================
0 low to high edge sensitive type enabled
1 active low level sensitive type enabled
2 active high level sensitive type enabled
3 high to low edge sensitive type enabled
== ========================================
4) ISA Interrupt Controllers
----------------------------
......@@ -1350,13 +1372,15 @@ information.
ISA PIC interrupt controllers should adhere to the ISA PIC
encodings listed below:
0 = active low level sensitive type enabled
1 = active high level sensitive type enabled
2 = high to low edge sensitive type enabled
3 = low to high edge sensitive type enabled
== ========================================
0 active low level sensitive type enabled
1 active high level sensitive type enabled
2 high to low edge sensitive type enabled
3 low to high edge sensitive type enabled
== ========================================
VIII - Specifying Device Power Management Information (sleep property)
===================================================================
======================================================================
Devices on SOCs often have mechanisms for placing devices into low-power
states that are decoupled from the devices' own register blocks. Sometimes,
......@@ -1387,6 +1411,7 @@ reasonably grouped in this manner, then create a virtual sleep controller
sleep-map should wait until its necessity is demonstrated).
IX - Specifying dma bus information
===================================
Some devices may have DMA memory range shifted relatively to the beginning of
RAM, or even placed outside of kernel RAM. For example, the Keystone 2 SoC
......@@ -1404,7 +1429,9 @@ coherent DMA operations. The "dma-coherent" property is intended to be used
for identifying devices supported coherent DMA operations in DT.
* DMA Bus master
Optional property:
- dma-ranges: <prop-encoded-array> encoded as arbitrary number of triplets of
(child-bus-address, parent-bus-address, length). Each triplet specified
describes a contiguous DMA address range.
......@@ -1416,13 +1443,16 @@ Optional property:
(for more information see the Devicetree Specification)
* DMA Bus child
Optional property:
- dma-ranges: <empty> value. if present - It means that DMA addresses
translation has to be enabled for this device.
- dma-coherent: Present if dma operations are coherent
Example:
soc {
Example::
soc {
compatible = "ti,keystone","simple-bus";
ranges = <0x0 0x0 0x0 0xc0000000>;
dma-ranges = <0x80000000 0x8 0x00000000 0x80000000>;
......@@ -1435,11 +1465,13 @@ soc {
[...]
dma-coherent;
};
};
};
Appendix A - Sample SOC node for MPC8540
========================================
::
soc@e0000000 {
#address-cells = <1>;
#size-cells = <1>;
......
......@@ -15,3 +15,4 @@ Open Firmware and Device Tree
overlay-notes
bindings/index
booting-without-of
......@@ -228,8 +228,6 @@ over management of devices from the bootloader, the usage of sync_state() is
not restricted to that. Use it whenever it makes sense to take an action after
all the consumers of a device have probed::
::
int (*remove) (struct device *dev);
remove is called to unbind a driver from a device. This may be
......
......@@ -48,6 +48,7 @@ available subsections can be seen below.
scsi
libata
target
mailbox
mtdnand
miscellaneous
mei/index
......
......@@ -10,7 +10,7 @@ API overview
The big picture is that USB drivers can continue to ignore most DMA issues,
though they still must provide DMA-ready buffers (see
``Documentation/DMA-API-HOWTO.txt``). That's how they've worked through
:doc:`/core-api/dma-api-howto`). That's how they've worked through
the 2.4 (and earlier) kernels, or they can now be DMA-aware.
DMA-aware usb drivers:
......@@ -60,7 +60,7 @@ and effects like cache-trashing can impose subtle penalties.
force a consistent memory access ordering by using memory barriers. It's
not using a streaming DMA mapping, so it's good for small transfers on
systems where the I/O would otherwise thrash an IOMMU mapping. (See
``Documentation/DMA-API-HOWTO.txt`` for definitions of "coherent" and
:doc:`/core-api/dma-api-howto` for definitions of "coherent" and
"streaming" DMA mappings.)
Asking for 1/Nth of a page (as well as asking for N pages) is reasonably
......@@ -91,7 +91,7 @@ Working with existing buffers
Existing buffers aren't usable for DMA without first being mapped into the
DMA address space of the device. However, most buffers passed to your
driver can safely be used with such DMA mapping. (See the first section
of Documentation/DMA-API-HOWTO.txt, titled "What memory is DMA-able?")
of :doc:`/core-api/dma-api-howto`, titled "What memory is DMA-able?")
- When you're using scatterlists, you can map everything at once. On some
systems, this kicks in an IOMMU and turns the scatterlists into single
......
......@@ -2179,46 +2179,44 @@ subset=pid hides all top level files and directories in the procfs that
are not related to tasks.
5 Filesystem behavior
----------------------------
---------------------------
Originally, before the advent of pid namepsace, procfs was a global file
system. It means that there was only one procfs instance in the system.
When pid namespace was added, a separate procfs instance was mounted in
each pid namespace. So, procfs mount options are global among all
mountpoints within the same namespace.
::
mountpoints within the same namespace::
# grep ^proc /proc/mounts
proc /proc proc rw,relatime,hidepid=2 0 0
# grep ^proc /proc/mounts
proc /proc proc rw,relatime,hidepid=2 0 0
# strace -e mount mount -o hidepid=1 -t proc proc /tmp/proc
mount("proc", "/tmp/proc", "proc", 0, "hidepid=1") = 0
+++ exited with 0 +++
# strace -e mount mount -o hidepid=1 -t proc proc /tmp/proc
mount("proc", "/tmp/proc", "proc", 0, "hidepid=1") = 0
+++ exited with 0 +++
# grep ^proc /proc/mounts
proc /proc proc rw,relatime,hidepid=2 0 0
proc /tmp/proc proc rw,relatime,hidepid=2 0 0
# grep ^proc /proc/mounts
proc /proc proc rw,relatime,hidepid=2 0 0
proc /tmp/proc proc rw,relatime,hidepid=2 0 0
and only after remounting procfs mount options will change at all
mountpoints.
mountpoints::
# mount -o remount,hidepid=1 -t proc proc /tmp/proc
# mount -o remount,hidepid=1 -t proc proc /tmp/proc
# grep ^proc /proc/mounts
proc /proc proc rw,relatime,hidepid=1 0 0
proc /tmp/proc proc rw,relatime,hidepid=1 0 0
# grep ^proc /proc/mounts
proc /proc proc rw,relatime,hidepid=1 0 0
proc /tmp/proc proc rw,relatime,hidepid=1 0 0
This behavior is different from the behavior of other filesystems.
The new procfs behavior is more like other filesystems. Each procfs mount
creates a new procfs instance. Mount options affect own procfs instance.
It means that it became possible to have several procfs instances
displaying tasks with different filtering options in one pid namespace.
displaying tasks with different filtering options in one pid namespace::
# mount -o hidepid=invisible -t proc proc /proc
# mount -o hidepid=noaccess -t proc proc /tmp/proc
# grep ^proc /proc/mounts
proc /proc proc rw,relatime,hidepid=invisible 0 0
proc /tmp/proc proc rw,relatime,hidepid=noaccess 0 0
# mount -o hidepid=invisible -t proc proc /proc
# mount -o hidepid=noaccess -t proc proc /tmp/proc
# grep ^proc /proc/mounts
proc /proc proc rw,relatime,hidepid=invisible 0 0
proc /tmp/proc proc rw,relatime,hidepid=noaccess 0 0
......@@ -314,7 +314,7 @@ To use drm_gem_cma_get_unmapped_area(), drivers must fill the struct
a pointer on drm_gem_cma_get_unmapped_area().
More detailed information about get_unmapped_area can be found in
Documentation/nommu-mmap.txt
Documentation/admin-guide/mm/nommu-mmap.rst
Memory Coherency
----------------
......
......@@ -87,6 +87,7 @@ Applications may chose a specific instance of the NX co-processor using
the vas_id field in the VAS_TX_WIN_OPEN ioctl as detailed below.
A userspace library libnxz is available here but still in development:
https://github.com/abalib/power-gzip
Applications that use inflate / deflate calls can link with libnxz
......@@ -110,6 +111,7 @@ Applications should use the VAS_TX_WIN_OPEN ioctl as follows to establish
a connection with NX co-processor engine:
::
struct vas_tx_win_open_attr {
__u32 version;
__s16 vas_id; /* specific instance of vas or -1
......@@ -119,8 +121,10 @@ a connection with NX co-processor engine:
__u64 reserved2[6];
};
version: The version field must be currently set to 1.
vas_id: If '-1' is passed, kernel will make a best-effort attempt
version:
The version field must be currently set to 1.
vas_id:
If '-1' is passed, kernel will make a best-effort attempt
to assign an optimal instance of NX for the process. To
select the specific VAS instance, refer
"Discovery of available VAS engines" section below.
......@@ -129,7 +133,8 @@ a connection with NX co-processor engine:
and must be set to 0.
The attributes attr for the VAS_TX_WIN_OPEN ioctl are defined as
follows:
follows::
#define VAS_MAGIC 'v'
#define VAS_TX_WIN_OPEN _IOW(VAS_MAGIC, 1,
struct vas_tx_win_open_attr)
......@@ -141,6 +146,8 @@ a connection with NX co-processor engine:
returns -1 and sets the errno variable to indicate the error.
Error conditions:
====== ================================================
EINVAL fd does not refer to a valid VAS device.
EINVAL Invalid vas ID
EINVAL version is not set with proper value
......@@ -149,6 +156,7 @@ a connection with NX co-processor engine:
ENOSPC System has too many active windows (connections)
opened
EINVAL reserved fields are not set to 0.
====== ================================================
See the ioctl(2) man page for more details, error codes and
restrictions.
......@@ -158,11 +166,13 @@ mmap() NX-GZIP device
The mmap() system call for a NX-GZIP device fd returns a paste_address
that the application can use to copy/paste its CRB to the hardware engines.
::
paste_addr = mmap(addr, size, prot, flags, fd, offset);
Only restrictions on mmap for a NX-GZIP device fd are:
* size should be PAGE_SIZE
* offset parameter should be 0ULL
......@@ -170,10 +180,12 @@ that the application can use to copy/paste its CRB to the hardware engines.
In addition to the error conditions listed on the mmap(2) man
page, can also fail with one of the following error codes:
====== =============================================
EINVAL fd is not associated with an open window
(i.e mmap() does not follow a successful call
to the VAS_TX_WIN_OPEN ioctl).
EINVAL offset field is not 0ULL.
====== =============================================
Discovery of available VAS engines
==================================
......@@ -210,7 +222,7 @@ In case if NX encounters translation error (called NX page fault) on CSB
address or any request buffer, raises an interrupt on the CPU to handle the
fault. Page fault can happen if an application passes invalid addresses or
request buffers are not in memory. The operating system handles the fault by
updating CSB with the following data:
updating CSB with the following data::
csb.flags = CSB_V;
csb.cc = CSB_CC_TRANSLATION;
......@@ -223,7 +235,7 @@ the application can resend this request to NX.
If the OS can not update CSB due to invalid CSB address, sends SEGV signal
to the process who opened the send window on which the original request was
issued. This signal returns with the following siginfo struct:
issued. This signal returns with the following siginfo struct::
siginfo.si_signo = SIGSEGV;
siginfo.si_errno = EFAULT;
......@@ -248,6 +260,7 @@ Simple example
==============
::
int use_nx_gzip()
{
int rc, fd;
......
......@@ -19,17 +19,41 @@ Unsorted Documentation
Atomic Types
============
.. raw:: latex
\footnotesize
.. include:: ../atomic_t.txt
:literal:
.. raw:: latex
\normalsize
Atomic bitops
=============
.. raw:: latex
\footnotesize
.. include:: ../atomic_bitops.txt
:literal:
.. raw:: latex
\normalsize
Memory Barriers
===============
.. raw:: latex
\footnotesize
.. include:: ../memory-barriers.txt
:literal:
.. raw:: latex
\normalsize
......@@ -22,6 +22,7 @@ Linux Tracing Technologies
boottime-trace
hwlat_detector
intel_th
ring-buffer-design
stm
sys-t
coresight/index
Lockless Ring Buffer Design
===========================
.. This file is dual-licensed: you can use it either under the terms
.. of the GPL 2.0 or the GFDL 1.2 license, at your option. Note that this
.. dual licensing only applies to this file, and not this project as a
.. whole.
..
.. a) This file is free software; you can redistribute it and/or
.. modify it under the terms of the GNU General Public License as
.. published by the Free Software Foundation version 2 of
.. the License.
..
.. This file is distributed in the hope that it will be useful,
.. but WITHOUT ANY WARRANTY; without even the implied warranty of
.. MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
.. GNU General Public License for more details.
..
.. Or, alternatively,
..
.. b) Permission is granted to copy, distribute and/or modify this
.. document under the terms of the GNU Free Documentation License,
.. Version 1.2 version published by the Free Software
.. Foundation, with no Invariant Sections, no Front-Cover Texts
.. and no Back-Cover Texts. A copy of the license is included at
.. Documentation/userspace-api/media/fdl-appendix.rst.
..
.. TODO: replace it to GPL-2.0 OR GFDL-1.2 WITH no-invariant-sections
===========================
Lockless Ring Buffer Design
===========================
Copyright 2009 Red Hat Inc.
Author: Steven Rostedt <srostedt@redhat.com>
License: The GNU Free Documentation License, Version 1.2
:Author: Steven Rostedt <srostedt@redhat.com>
:License: The GNU Free Documentation License, Version 1.2
(dual licensed under the GPL v2)
Reviewers: Mathieu Desnoyers, Huang Ying, Hidetoshi Seto,
:Reviewers: Mathieu Desnoyers, Huang Ying, Hidetoshi Seto,
and Frederic Weisbecker.
......@@ -14,37 +42,50 @@ Written for: 2.6.31
Terminology used in this Document
---------------------------------
tail - where new writes happen in the ring buffer.
tail
- where new writes happen in the ring buffer.
head - where new reads happen in the ring buffer.
head
- where new reads happen in the ring buffer.
producer - the task that writes into the ring buffer (same as writer)
producer
- the task that writes into the ring buffer (same as writer)
writer - same as producer
writer
- same as producer
consumer - the task that reads from the buffer (same as reader)
consumer
- the task that reads from the buffer (same as reader)
reader - same as consumer.
reader
- same as consumer.
reader_page - A page outside the ring buffer used solely (for the most part)
reader_page
- A page outside the ring buffer used solely (for the most part)
by the reader.
head_page - a pointer to the page that the reader will use next
head_page
- a pointer to the page that the reader will use next
tail_page - a pointer to the page that will be written to next
tail_page
- a pointer to the page that will be written to next
commit_page - a pointer to the page with the last finished non-nested write.
commit_page
- a pointer to the page with the last finished non-nested write.
cmpxchg - hardware-assisted atomic transaction that performs the following:
cmpxchg
- hardware-assisted atomic transaction that performs the following::
A = B if previous A == C
R = cmpxchg(A, C, B) is saying that we replace A with B if and only if
current A is equal to C, and we put the old (current) A into R
R = cmpxchg(A, C, B) is saying that we replace A with B if and only
if current A is equal to C, and we put the old (current)
A into R
R gets the previous A regardless if A is updated with B or not.
To see if the update was successful a compare of R == C may be used.
To see if the update was successful a compare of ``R == C``
may be used.
The Generic Ring Buffer
-----------------------
......@@ -64,7 +105,7 @@ No two writers can write at the same time (on the same per-cpu buffer),
but a writer may interrupt another writer, but it must finish writing
before the previous writer may continue. This is very important to the
algorithm. The writers act like a "stack". The way interrupts works
enforces this behavior.
enforces this behavior::
writer1 start
......@@ -115,6 +156,8 @@ A sample of how the reader page is swapped: Note this does not
show the head page in the buffer, it is for demonstrating a swap
only.
::
+------+
|reader| RING BUFFER
|page |
......@@ -172,6 +215,7 @@ only.
It is possible that the page swapped is the commit page and the tail page,
if what is in the ring buffer is less than what is held in a buffer page.
::
reader page commit page tail page
| | |
......@@ -184,8 +228,8 @@ if what is in the ring buffer is less than what is held in a buffer page.
|
v
+---+ +---+ +---+ +---+
<---| |--->| |--->| |--->| |--->
--->| |<---| |<---| |<---| |<---
<---| |--->| |--->| |--->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
This case is still valid for this algorithm.
......@@ -196,15 +240,19 @@ buffer.
The main pointers:
reader page - The page used solely by the reader and is not part
reader page
- The page used solely by the reader and is not part
of the ring buffer (may be swapped in)
head page - the next page in the ring buffer that will be swapped
head page
- the next page in the ring buffer that will be swapped
with the reader page.
tail page - the page where the next write will take place.
tail page
- the page where the next write will take place.
commit page - the page that last finished a write.
commit page
- the page that last finished a write.
The commit page only is updated by the outermost writer in the
writer stack. A writer that preempts another writer will not move the
......@@ -219,7 +267,7 @@ transaction. If another write happens it must finish before continuing
with the previous write.
Write reserve:
Write reserve::
Buffer page
+---------+
......@@ -230,7 +278,7 @@ with the previous write.
| empty |
+---------+
Write commit:
Write commit::
Buffer page
+---------+
......@@ -242,7 +290,7 @@ with the previous write.
+---------+
If a write happens after the first reserve:
If a write happens after the first reserve::
Buffer page
+---------+
......@@ -253,7 +301,7 @@ with the previous write.
|reserved |
+---------+ <--- tail pointer
After second writer commits:
After second writer commits::
Buffer page
......@@ -266,7 +314,7 @@ with the previous write.
|commit |
+---------+ <--- tail pointer
When the first writer commits:
When the first writer commits::
Buffer page
+---------+
......@@ -292,20 +340,21 @@ be several pages ahead. If the tail page catches up to the commit
page then no more writes may take place (regardless of the mode
of the ring buffer: overwrite and produce/consumer).
The order of pages is:
The order of pages is::
head page
commit page
tail page
Possible scenario:
Possible scenario::
tail page
head page commit page |
| | |
v v v
+---+ +---+ +---+ +---+
<---| |--->| |--->| |--->| |--->
--->| |<---| |<---| |<---| |<---
<---| |--->| |--->| |--->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
There is a special case that the head page is after either the commit page
......@@ -315,6 +364,7 @@ part of the ring buffer, but the reader page is not. Whenever there
has been less than a full page that has been committed inside the ring buffer,
and a reader swaps out a page, it will be swapping out the commit page.
::
reader page commit page tail page
| | |
......@@ -327,8 +377,8 @@ and a reader swaps out a page, it will be swapping out the commit page.
|
v
+---+ +---+ +---+ +---+
<---| |--->| |--->| |--->| |--->
--->| |<---| |<---| |<---| |<---
<---| |--->| |--->| |--->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
^
|
......@@ -347,14 +397,14 @@ When the tail meets the head page, if the buffer is in overwrite mode,
the head page will be pushed ahead one. If the buffer is in producer/consumer
mode, the write will fail.
Overwrite mode:
Overwrite mode::
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |--->| |--->| |--->
--->| |<---| |<---| |<---| |<---
<---| |--->| |--->| |--->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
^
|
......@@ -365,8 +415,8 @@ Overwrite mode:
|
v
+---+ +---+ +---+ +---+
<---| |--->| |--->| |--->| |--->
--->| |<---| |<---| |<---| |<---
<---| |--->| |--->| |--->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
^
|
......@@ -377,8 +427,8 @@ Overwrite mode:
|
v
+---+ +---+ +---+ +---+
<---| |--->| |--->| |--->| |--->
--->| |<---| |<---| |<---| |<---
<---| |--->| |--->| |--->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
^
|
......@@ -397,7 +447,7 @@ State flags are placed inside the pointer to the page. To do this,
each page must be aligned in memory by 4 bytes. This will allow the 2
least significant bits of the address to be used as flags, since
they will always be zero for the address. To get the address,
simply mask out the flags.
simply mask out the flags::
MASK = ~3
......@@ -405,11 +455,14 @@ simply mask out the flags.
Two flags will be kept by these two bits:
HEADER - the page being pointed to is a head page
HEADER
- the page being pointed to is a head page
UPDATE - the page being pointed to is being updated by a writer
UPDATE
- the page being pointed to is being updated by a writer
and was or is about to be a head page.
::
reader page
|
......@@ -420,8 +473,8 @@ Two flags will be kept by these two bits:
|
v
+---+ +---+ +---+ +---+
<---| |--->| |-H->| |--->| |--->
--->| |<---| |<---| |<---| |<---
<---| |--->| |-H->| |--->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
......@@ -430,23 +483,23 @@ the next page is the next page to be swapped out by the reader.
This pointer means the next page is the head page.
When the tail page meets the head pointer, it will use cmpxchg to
change the pointer to the UPDATE state:
change the pointer to the UPDATE state::
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |-H->| |--->| |--->
--->| |<---| |<---| |<---| |<---
<---| |--->| |-H->| |--->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |-U->| |--->| |--->
--->| |<---| |<---| |<---| |<---
<---| |--->| |-U->| |--->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
"-U->" represents a pointer in the UPDATE state.
......@@ -462,7 +515,7 @@ head page does not have the HEADER flag set, the compare will fail
and the reader will need to look for the new head page and try again.
Note, the flags UPDATE and HEADER are never set at the same time.
The reader swaps the reader page as follows:
The reader swaps the reader page as follows::
+------+
|reader| RING BUFFER
......@@ -477,7 +530,7 @@ The reader swaps the reader page as follows:
+-----H-------------+
The reader sets the reader page next pointer as HEADER to the page after
the head page.
the head page::
+------+
......@@ -495,7 +548,7 @@ the head page.
It does a cmpxchg with the pointer to the previous head page to make it
point to the reader page. Note that the new pointer does not have the HEADER
flag set. This action atomically moves the head page forward.
flag set. This action atomically moves the head page forward::
+------+
|reader| RING BUFFER
......@@ -511,7 +564,7 @@ flag set. This action atomically moves the head page forward.
+------------------------------------+
After the new head page is set, the previous pointer of the head page is
updated to the reader page.
updated to the reader page::
+------+
|reader| RING BUFFER
......@@ -548,7 +601,7 @@ prev pointers may not.
Note, the way to determine a reader page is simply by examining the previous
pointer of the page. If the next pointer of the previous page does not
point back to the original page, then the original page is a reader page:
point back to the original page, then the original page is a reader page::
+--------+
......@@ -572,53 +625,53 @@ not be able to swap the head page from the buffer, nor will it be able to
move the head page, until the writer is finished with the move.
This eliminates any races that the reader can have on the writer. The reader
must spin, and this is why the reader cannot preempt the writer.
must spin, and this is why the reader cannot preempt the writer::
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |-H->| |--->| |--->
--->| |<---| |<---| |<---| |<---
<---| |--->| |-H->| |--->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |-U->| |--->| |--->
--->| |<---| |<---| |<---| |<---
<---| |--->| |-U->| |--->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
The following page will be made into the new head page.
The following page will be made into the new head page::
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |-U->| |-H->| |--->
--->| |<---| |<---| |<---| |<---
<---| |--->| |-U->| |-H->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
After the new head page has been set, we can set the old head page
pointer back to NORMAL.
pointer back to NORMAL::
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |--->| |-H->| |--->
--->| |<---| |<---| |<---| |<---
<---| |--->| |--->| |-H->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
After the head page has been moved, the tail page may now move forward.
After the head page has been moved, the tail page may now move forward::
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |--->| |-H->| |--->
--->| |<---| |<---| |<---| |<---
<---| |--->| |--->| |-H->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
......@@ -630,7 +683,7 @@ tail page may make it all the way around the buffer and meet the commit
page. At this time, we must start dropping writes (usually with some kind
of warning to the user). But what happens if the commit was still on the
reader page? The commit page is not part of the ring buffer. The tail page
must account for this.
must account for this::
reader page commit page
......@@ -644,8 +697,8 @@ must account for this.
|
v
+---+ +---+ +---+ +---+
<---| |--->| |-H->| |--->| |--->
--->| |<---| |<---| |<---| |<---
<---| |--->| |-H->| |--->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
^
|
......@@ -676,7 +729,7 @@ the head page if the head page is the next page. If the head page
is not the next page, the tail page is simply updated with a cmpxchg.
Only writers move the tail page. This must be done atomically to protect
against nested writers.
against nested writers::
temp_page = tail_page
next_page = temp_page->next
......@@ -684,7 +737,7 @@ against nested writers.
The above will update the tail page if it is still pointing to the expected
page. If this fails, a nested write pushed it forward, the current write
does not need to push it.
does not need to push it::
temp page
......@@ -694,43 +747,43 @@ does not need to push it.
|
v
+---+ +---+ +---+ +---+
<---| |--->| |--->| |--->| |--->
--->| |<---| |<---| |<---| |<---
<---| |--->| |--->| |--->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
Nested write comes in and moves the tail page forward:
Nested write comes in and moves the tail page forward::
tail page (moved by nested writer)
temp page |
| |
v v
+---+ +---+ +---+ +---+
<---| |--->| |--->| |--->| |--->
--->| |<---| |<---| |<---| |<---
<---| |--->| |--->| |--->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
The above would fail the cmpxchg, but since the tail page has already
been moved forward, the writer will just try again to reserve storage
on the new tail page.
But the moving of the head page is a bit more complex.
But the moving of the head page is a bit more complex::
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |-H->| |--->| |--->
--->| |<---| |<---| |<---| |<---
<---| |--->| |-H->| |--->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
The write converts the head page pointer to UPDATE.
The write converts the head page pointer to UPDATE::
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |-U->| |--->| |--->
--->| |<---| |<---| |<---| |<---
<---| |--->| |-U->| |--->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
But if a nested writer preempts here, it will see that the next
......@@ -739,217 +792,216 @@ it is nested and will save that information. The detection is the
fact that it sees the UPDATE flag instead of a HEADER or NORMAL
pointer.
The nested writer will set the new head page pointer.
The nested writer will set the new head page pointer::
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |-U->| |-H->| |--->
--->| |<---| |<---| |<---| |<---
<---| |--->| |-U->| |-H->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
But it will not reset the update back to normal. Only the writer
that converted a pointer from HEAD to UPDATE will convert it back
to NORMAL.
to NORMAL::
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |-U->| |-H->| |--->
--->| |<---| |<---| |<---| |<---
<---| |--->| |-U->| |-H->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
After the nested writer finishes, the outermost writer will convert
the UPDATE pointer to NORMAL.
the UPDATE pointer to NORMAL::
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |--->| |-H->| |--->
--->| |<---| |<---| |<---| |<---
<---| |--->| |--->| |-H->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
It can be even more complex if several nested writes came in and moved
the tail page ahead several pages:
the tail page ahead several pages::
(first writer)
(first writer)
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |-H->| |--->| |--->
--->| |<---| |<---| |<---| |<---
<---| |--->| |-H->| |--->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
The write converts the head page pointer to UPDATE.
The write converts the head page pointer to UPDATE::
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |-U->| |--->| |--->
--->| |<---| |<---| |<---| |<---
<---| |--->| |-U->| |--->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
Next writer comes in, and sees the update and sets up the new
head page.
head page::
(second writer)
(second writer)
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |-U->| |-H->| |--->
--->| |<---| |<---| |<---| |<---
<---| |--->| |-U->| |-H->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
The nested writer moves the tail page forward. But does not set the old
update page to NORMAL because it is not the outermost writer.
update page to NORMAL because it is not the outermost writer::
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |-U->| |-H->| |--->
--->| |<---| |<---| |<---| |<---
<---| |--->| |-U->| |-H->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
Another writer preempts and sees the page after the tail page is a head page.
It changes it from HEAD to UPDATE.
It changes it from HEAD to UPDATE::
(third writer)
(third writer)
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |-U->| |-U->| |--->
--->| |<---| |<---| |<---| |<---
<---| |--->| |-U->| |-U->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
The writer will move the head page forward:
The writer will move the head page forward::
(third writer)
(third writer)
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |-U->| |-U->| |-H->
--->| |<---| |<---| |<---| |<---
<---| |--->| |-U->| |-U->| |-H->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
But now that the third writer did change the HEAD flag to UPDATE it
will convert it to normal:
will convert it to normal::
(third writer)
(third writer)
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |-U->| |--->| |-H->
--->| |<---| |<---| |<---| |<---
<---| |--->| |-U->| |--->| |-H->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
Then it will move the tail page, and return back to the second writer.
Then it will move the tail page, and return back to the second writer::
(second writer)
(second writer)
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |-U->| |--->| |-H->
--->| |<---| |<---| |<---| |<---
<---| |--->| |-U->| |--->| |-H->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
The second writer will fail to move the tail page because it was already
moved, so it will try again and add its data to the new tail page.
It will return to the first writer.
It will return to the first writer::
(first writer)
(first writer)
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |-U->| |--->| |-H->
--->| |<---| |<---| |<---| |<---
<---| |--->| |-U->| |--->| |-H->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
The first writer cannot know atomically if the tail page moved
while it updates the HEAD page. It will then update the head page to
what it thinks is the new head page.
what it thinks is the new head page::
(first writer)
(first writer)
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |-U->| |-H->| |-H->
--->| |<---| |<---| |<---| |<---
<---| |--->| |-U->| |-H->| |-H->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
Since the cmpxchg returns the old value of the pointer the first writer
will see it succeeded in updating the pointer from NORMAL to HEAD.
But as we can see, this is not good enough. It must also check to see
if the tail page is either where it use to be or on the next page:
if the tail page is either where it use to be or on the next page::
(first writer)
(first writer)
A B tail page
| | |
v v v
+---+ +---+ +---+ +---+
<---| |--->| |-U->| |-H->| |-H->
--->| |<---| |<---| |<---| |<---
<---| |--->| |-U->| |-H->| |-H->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
If tail page != A and tail page != B, then it must reset the pointer
back to NORMAL. The fact that it only needs to worry about nested
writers means that it only needs to check this after setting the HEAD page.
writers means that it only needs to check this after setting the HEAD page::
(first writer)
(first writer)
A B tail page
| | |
v v v
+---+ +---+ +---+ +---+
<---| |--->| |-U->| |--->| |-H->
--->| |<---| |<---| |<---| |<---
<---| |--->| |-U->| |--->| |-H->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
Now the writer can update the head page. This is also why the head page must
remain in UPDATE and only reset by the outermost writer. This prevents
the reader from seeing the incorrect head page.
the reader from seeing the incorrect head page::
(first writer)
(first writer)
A B tail page
| | |
v v v
+---+ +---+ +---+ +---+
<---| |--->| |--->| |--->| |-H->
--->| |<---| |<---| |<---| |<---
<---| |--->| |--->| |--->| |-H->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
......@@ -570,8 +570,8 @@ ACQUIRE 는 해당 오퍼레이션의 로드 부분에만 적용되고 RELEASE
[*] 버스 마스터링 DMA 와 일관성에 대해서는 다음을 참고하시기 바랍니다:
Documentation/driver-api/pci/pci.rst
Documentation/DMA-API-HOWTO.txt
Documentation/DMA-API.txt
Documentation/core-api/dma-api-howto.rst
Documentation/core-api/dma-api.rst
데이터 의존성 배리어 (역사적)
......@@ -1907,7 +1907,7 @@ Mandatory 배리어들은 SMP 시스템에서도 UP 시스템에서도 SMP 효
writel_relaxed() 와 같은 완화된 I/O 접근자들에 대한 자세한 내용을 위해서는
"커널 I/O 배리어의 효과" 섹션을, consistent memory 에 대한 자세한 내용을
위해선 Documentation/DMA-API.txt 문서를 참고하세요.
위해선 Documentation/core-api/dma-api.rst 문서를 참고하세요.
=========================
......
......@@ -124,7 +124,7 @@ bootloader 必须传递一个系统内存的位置和最小值,以及根文件
bootloader 必须以 64bit 地址对齐的形式加载一个设备树映像(dtb)到系统
RAM 中,并用启动数据初始化它。dtb 格式在文档
Documentation/devicetree/booting-without-of.txt 中。内核将会在
Documentation/devicetree/booting-without-of.rst 中。内核将会在
dtb 物理地址处查找 dtb 魔数值(0xd00dfeed),以确定 dtb 是否已经代替
标签列表被传递进来。
......
......@@ -147,7 +147,7 @@ config HAVE_EFFICIENT_UNALIGNED_ACCESS
problems with received packets if doing so would not help
much.
See Documentation/unaligned-memory-access.txt for more
See Documentation/core-api/unaligned-memory-access.rst for more
information on the topic of unaligned memory accesses.
config ARCH_USE_BUILTIN_BSWAP
......
......@@ -907,7 +907,7 @@ sba_mark_invalid(struct ioc *ioc, dma_addr_t iova, size_t byte_cnt)
* @dir: dma direction
* @attrs: optional dma attributes
*
* See Documentation/DMA-API-HOWTO.txt
* See Documentation/core-api/dma-api-howto.rst
*/
static dma_addr_t sba_map_page(struct device *dev, struct page *page,
unsigned long poff, size_t size,
......@@ -1028,7 +1028,7 @@ sba_mark_clean(struct ioc *ioc, dma_addr_t iova, size_t size)
* @dir: R/W or both.
* @attrs: optional dma attributes
*
* See Documentation/DMA-API-HOWTO.txt
* See Documentation/core-api/dma-api-howto.rst
*/
static void sba_unmap_page(struct device *dev, dma_addr_t iova, size_t size,
enum dma_data_direction dir, unsigned long attrs)
......@@ -1105,7 +1105,7 @@ static void sba_unmap_page(struct device *dev, dma_addr_t iova, size_t size,
* @size: number of bytes mapped in driver buffer.
* @dma_handle: IOVA of new buffer.
*
* See Documentation/DMA-API-HOWTO.txt
* See Documentation/core-api/dma-api-howto.rst
*/
static void *
sba_alloc_coherent(struct device *dev, size_t size, dma_addr_t *dma_handle,
......@@ -1162,7 +1162,7 @@ sba_alloc_coherent(struct device *dev, size_t size, dma_addr_t *dma_handle,
* @vaddr: virtual address IOVA of "consistent" buffer.
* @dma_handler: IO virtual address of "consistent" buffer.
*
* See Documentation/DMA-API-HOWTO.txt
* See Documentation/core-api/dma-api-howto.rst
*/
static void sba_free_coherent(struct device *dev, size_t size, void *vaddr,
dma_addr_t dma_handle, unsigned long attrs)
......@@ -1425,7 +1425,7 @@ static void sba_unmap_sg_attrs(struct device *dev, struct scatterlist *sglist,
* @dir: R/W or both.
* @attrs: optional dma attributes
*
* See Documentation/DMA-API-HOWTO.txt
* See Documentation/core-api/dma-api-howto.rst
*/
static int sba_map_sg_attrs(struct device *dev, struct scatterlist *sglist,
int nents, enum dma_data_direction dir,
......@@ -1524,7 +1524,7 @@ static int sba_map_sg_attrs(struct device *dev, struct scatterlist *sglist,
* @dir: R/W or both.
* @attrs: optional dma attributes
*
* See Documentation/DMA-API-HOWTO.txt
* See Documentation/core-api/dma-api-howto.rst
*/
static void sba_unmap_sg_attrs(struct device *dev, struct scatterlist *sglist,
int nents, enum dma_data_direction dir,
......
......@@ -3,7 +3,7 @@
** PARISC 1.1 Dynamic DMA mapping support.
** This implementation is for PA-RISC platforms that do not support
** I/O TLBs (aka DMA address translation hardware).
** See Documentation/DMA-API-HOWTO.txt for interface definitions.
** See Documentation/core-api/dma-api-howto.rst for interface definitions.
**
** (c) Copyright 1999,2000 Hewlett-Packard Company
** (c) Copyright 2000 Grant Grundler
......
......@@ -3,8 +3,8 @@
#define _ASM_X86_DMA_MAPPING_H
/*
* IOMMU interface. See Documentation/DMA-API-HOWTO.txt and
* Documentation/DMA-API.txt for documentation.
* IOMMU interface. See Documentation/core-api/dma-api-howto.rst and
* Documentation/core-api/dma-api.rst for documentation.
*/
#include <linux/scatterlist.h>
......
......@@ -6,7 +6,7 @@
* This allows to use PCI devices that only support 32bit addresses on systems
* with more than 4GB.
*
* See Documentation/DMA-API-HOWTO.txt for the interface specification.
* See Documentation/core-api/dma-api-howto.rst for the interface specification.
*
* Copyright 2002 Andi Kleen, SuSE Labs.
*/
......
......@@ -666,7 +666,7 @@ sba_mark_invalid(struct ioc *ioc, dma_addr_t iova, size_t byte_cnt)
* @dev: instance of PCI owned by the driver that's asking
* @mask: number of address bits this PCI device can handle
*
* See Documentation/DMA-API-HOWTO.txt
* See Documentation/core-api/dma-api-howto.rst
*/
static int sba_dma_supported( struct device *dev, u64 mask)
{
......@@ -698,7 +698,7 @@ static int sba_dma_supported( struct device *dev, u64 mask)
* @size: number of bytes to map in driver buffer.
* @direction: R/W or both.
*
* See Documentation/DMA-API-HOWTO.txt
* See Documentation/core-api/dma-api-howto.rst
*/
static dma_addr_t
sba_map_single(struct device *dev, void *addr, size_t size,
......@@ -788,7 +788,7 @@ sba_map_page(struct device *dev, struct page *page, unsigned long offset,
* @size: number of bytes mapped in driver buffer.
* @direction: R/W or both.
*
* See Documentation/DMA-API-HOWTO.txt
* See Documentation/core-api/dma-api-howto.rst
*/
static void
sba_unmap_page(struct device *dev, dma_addr_t iova, size_t size,
......@@ -867,7 +867,7 @@ sba_unmap_page(struct device *dev, dma_addr_t iova, size_t size,
* @size: number of bytes mapped in driver buffer.
* @dma_handle: IOVA of new buffer.
*
* See Documentation/DMA-API-HOWTO.txt
* See Documentation/core-api/dma-api-howto.rst
*/
static void *sba_alloc(struct device *hwdev, size_t size, dma_addr_t *dma_handle,
gfp_t gfp, unsigned long attrs)
......@@ -898,7 +898,7 @@ static void *sba_alloc(struct device *hwdev, size_t size, dma_addr_t *dma_handle
* @vaddr: virtual address IOVA of "consistent" buffer.
* @dma_handler: IO virtual address of "consistent" buffer.
*
* See Documentation/DMA-API-HOWTO.txt
* See Documentation/core-api/dma-api-howto.rst
*/
static void
sba_free(struct device *hwdev, size_t size, void *vaddr,
......@@ -933,7 +933,7 @@ int dump_run_sg = 0;
* @nents: number of entries in list
* @direction: R/W or both.
*
* See Documentation/DMA-API-HOWTO.txt
* See Documentation/core-api/dma-api-howto.rst
*/
static int
sba_map_sg(struct device *dev, struct scatterlist *sglist, int nents,
......@@ -1017,7 +1017,7 @@ sba_map_sg(struct device *dev, struct scatterlist *sglist, int nents,
* @nents: number of entries in list
* @direction: R/W or both.
*
* See Documentation/DMA-API-HOWTO.txt
* See Documentation/core-api/dma-api-howto.rst
*/
static void
sba_unmap_sg(struct device *dev, struct scatterlist *sglist, int nents,
......
......@@ -14,7 +14,7 @@
/**
* List of possible attributes associated with a DMA mapping. The semantics
* of each attribute should be defined in Documentation/DMA-attributes.txt.
* of each attribute should be defined in Documentation/core-api/dma-attributes.rst.
*/
/*
......
......@@ -2829,7 +2829,7 @@ static inline errseq_t filemap_sample_wb_err(struct address_space *mapping)
/**
* file_sample_sb_err - sample the current errseq_t to test for later errors
* @mapping: mapping to be sampled
* @file: file pointer to be sampled
*
* Grab the most current superblock-level errseq_t value for the given
* struct file.
......
......@@ -337,10 +337,12 @@ static inline void __kcsan_disable_current(void) { }
* release_for_reuse(obj);
* }
*
* Note: ASSERT_EXCLUSIVE_ACCESS_SCOPED(), if applicable, performs more thorough
* Note:
*
* 1. ASSERT_EXCLUSIVE_ACCESS_SCOPED(), if applicable, performs more thorough
* checking if a clear scope where no concurrent accesses are expected exists.
*
* Note: For cases where the object is freed, `KASAN <kasan.html>`_ is a better
* 2. For cases where the object is freed, `KASAN <kasan.html>`_ is a better
* fit to detect use-after-free bugs.
*
* @var: variable to assert on
......
......@@ -1742,6 +1742,8 @@ enum netdev_priv_flags {
* @real_num_rx_queues: Number of RX queues currently active in device
* @xdp_prog: XDP sockets filter program pointer
* @gro_flush_timeout: timeout for GRO layer in NAPI
* @napi_defer_hard_irqs: If not zero, provides a counter that would
* allow to avoid NIC hard IRQ, on busy queues.
*
* @rx_handler: handler for received packets
* @rx_handler_data: XXX: need comments on this one
......
......@@ -62,6 +62,10 @@ enum phylink_op_type {
* @dev: a pointer to a struct device associated with the MAC
* @type: operation type of PHYLINK instance
* @pcs_poll: MAC PCS cannot provide link change interrupt
* @poll_fixed_state: if true, starts link_poll,
* if MAC link is at %MLO_AN_FIXED mode.
* @get_fixed_state: callback to execute to determine the fixed link state,
* if MAC link is at %MLO_AN_FIXED mode.
*/
struct phylink_config {
struct device *dev;
......
......@@ -31,7 +31,7 @@
* does memory allocation too using vmalloc_32().
*
* videobuf_dma_*()
* see Documentation/DMA-API-HOWTO.txt, these functions to
* see Documentation/core-api/dma-api-howto.rst, these functions to
* basically the same. The map function does also build a
* scatterlist for the buffer (and unmap frees it ...)
*
......
......@@ -1957,7 +1957,7 @@ config MMAP_ALLOW_UNINITIALIZED
userspace. Since that isn't generally a problem on no-MMU systems,
it is normally safe to say Y here.
See Documentation/nommu-mmap.txt for more information.
See Documentation/mm/nommu-mmap.rst for more information.
config SYSTEM_DATA_VERIFICATION
def_bool n
......
......@@ -1071,7 +1071,7 @@ static void check_unmap(struct dma_debug_entry *ref)
/*
* Drivers should use dma_mapping_error() to check the returned
* addresses of dma_map_single() and dma_map_page().
* If not, print this warning message. See Documentation/DMA-API.txt.
* If not, print this warning message. See Documentation/core-api/dma-api.rst.
*/
if (entry->map_err_type == MAP_ERR_NOT_CHECKED) {
err_printk(ref->dev, entry,
......
......@@ -387,7 +387,7 @@ config NOMMU_INITIAL_TRIM_EXCESS
This option specifies the initial value of this option. The default
of 1 says that all excess pages should be trimmed.
See Documentation/nommu-mmap.txt for more information.
See Documentation/mm/nommu-mmap.rst for more information.
config TRANSPARENT_HUGEPAGE
bool "Transparent Hugepage Support"
......
......@@ -5,7 +5,7 @@
* Replacement code for mm functions to support CPU's that don't
* have any form of memory management unit (thus no virtual memory).
*
* See Documentation/nommu-mmap.txt
* See Documentation/mm/nommu-mmap.rst
*
* Copyright (c) 2004-2008 David Howells <dhowells@redhat.com>
* Copyright (c) 2000-2003 David McCullough <davidm@snapgear.com>
......
......@@ -7898,6 +7898,7 @@ EXPORT_SYMBOL(netdev_bonding_info_change);
/**
* netdev_get_xmit_slave - Get the xmit slave of master device
* @dev: device
* @skb: The packet
* @all_slaves: assume all the slaves are active
*
......
......@@ -1083,7 +1083,9 @@ sub dump_struct($$) {
$members =~ s/\s*__packed\s*/ /gos;
$members =~ s/\s*CRYPTO_MINALIGN_ATTR/ /gos;
$members =~ s/\s*____cacheline_aligned_in_smp/ /gos;
# replace DECLARE_BITMAP
$members =~ s/__ETHTOOL_DECLARE_LINK_MODE_MASK\s*\(([^\)]+)\)/DECLARE_BITMAP($1, __ETHTOOL_LINK_MODE_MASK_NBITS)/gos;
$members =~ s/DECLARE_BITMAP\s*\(([^,)]+),\s*([^,)]+)\)/unsigned long $1\[BITS_TO_LONGS($2)\]/gos;
# replace DECLARE_HASHTABLE
$members =~ s/DECLARE_HASHTABLE\s*\(([^,)]+),\s*([^,)]+)\)/unsigned long $1\[1 << (($2) - 1)\]/gos;
......@@ -1769,6 +1771,11 @@ sub process_proto_function($$) {
$prototype =~ s@/\*.*?\*/@@gos; # strip comments.
$prototype =~ s@[\r\n]+@ @gos; # strip newlines/cr's.
$prototype =~ s@^\s+@@gos; # strip leading spaces
# Handle prototypes for function pointers like:
# int (*pcs_config)(struct foo)
$prototype =~ s@^(\S+\s+)\(\s*\*(\S+)\)@$1$2@gos;
if ($prototype =~ /SYSCALL_DEFINE/) {
syscall_munge();
}
......
// SPDX-License-Identifier: GPL-2.0
/*
* Tests Memory Protection Keys (see Documentation/vm/protection-keys.txt)
* Tests Memory Protection Keys (see Documentation/core-api/protection-keys.rst)
*
* There are examples in here of:
* * how to set protection keys on memory
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment