Commit f3dfe925 authored by Linus Torvalds's avatar Linus Torvalds

Merge tag 'docs-6.1' of git://git.lwn.net/linux

Pull documentation updates from Jonathan Corbet:
 "There's not a huge amount of activity in the docs tree this time
  around, but a few significant changes even so:

   - A complete rewriting of the top-level index.rst file, which mostly
     reflects itself in a redone top page in the HTML-rendered docs. The
     hope is that the new organization will be a friendlier starting
     point for both users and developers.

   - Some math-rendering improvements.

   - A coding-style.rst update on the use of BUG() and WARN()

   - A big maintainer-PHP guide update.

   - Some code-of-conduct updates

   - More Chinese translation work

  Plus the usual pile of typo fixes, corrections, and updates"

* tag 'docs-6.1' of git://git.lwn.net/linux: (66 commits)
  checkpatch: warn on usage of VM_BUG_ON() and other BUG variants
  coding-style.rst: document BUG() and WARN() rules ("do not crash the kernel")
  Documentation: devres: add missing IO helper
  Documentation: devres: update IRQ helper
  Documentation/mm: modify page_referenced to folio_referenced
  Documentation/CoC: Reflect current CoC interpretation and practices
  docs/doc-guide: Add documentation on SPHINX_IMGMATH
  docs: process/5.Posting.rst: clarify use of Reported-by: tag
  docs, kprobes: Fix the wrong location of Kprobes
  docs: add a man-pages link to the front page
  docs: put atomic*.txt and memory-barriers.txt into the core-api book
  docs: move asm-annotations.rst into core-api
  docs: remove some index.rst cruft
  docs: reconfigure the HTML left column
  docs: Rewrite the front page
  docs: promote the title of process/index.rst
  Documentation: devres: add missing SPI helper
  Documentation: devres: add missing PINCTRL helpers
  docs: hugetlbpage.rst: fix a typo of hugepage size
  docs/zh_CN: Add new translation of admin-guide/bootconfig.rst
  ...
parents 890f2420 69d517e6
......@@ -3,7 +3,7 @@ Date: May 2011
KernelVersion: 3.0
Contact: Rafał Miłecki <zajec5@gmail.com>
Description:
Each BCMA core has it's manufacturer id. See
Each BCMA core has its manufacturer id. See
include/linux/bcma/bcma.h for possible values.
What: /sys/bus/bcma/devices/.../id
......
......@@ -31,7 +31,7 @@ Description: 'FCoE Controller' instances on the fcoe bus.
1) Write interface name to ctlr_create 2) Configure the FCoE
Controller (ctlr_X) 3) Enable the FCoE Controller to begin
discovery and login. The FCoE Controller is destroyed by
writing it's name, i.e. ctlr_X to the ctlr_delete file.
writing its name, i.e. ctlr_X to the ctlr_delete file.
Attributes:
......
......@@ -18,7 +18,7 @@ Description:
on the signal from which time of flight measurements are
taken.
The appropriate values to take is dependent on both the
sensor and it's operating environment:
sensor and its operating environment:
* as3935 (0-31 range)
18 = indoors (default)
14 = outdoors
......@@ -296,7 +296,7 @@ Description: Processor frequency boosting control
This switch controls the boost setting for the whole system.
Boosting allows the CPU and the firmware to run at a frequency
beyond it's nominal limit.
beyond its nominal limit.
More details can be found in
Documentation/admin-guide/pm/cpufreq.rst
......
......@@ -2,8 +2,8 @@ What: /sys/bus/platform/devices/ci_hdrc.0/role
Date: Mar 2017
Contact: Peter Chen <peter.chen@nxp.com>
Description:
It returns string "gadget" or "host" when read it, it indicates
current controller role.
When read, it returns string "gadget" or "host", indicating
the current controller role.
It will do role switch when write "gadget" or "host" to it.
It will do role switch when "gadget" or "host" is written to it.
Only controller at dual-role configuration supports writing.
......@@ -152,7 +152,7 @@ Description:
case further investigation is required to determine which
device is causing the problem. Note that genuine RTC clock
values (such as when pm_trace has not been used), can still
match a device and output it's name here.
match a device and output its name here.
What: /sys/power/pm_async
Date: January 2009
......
......@@ -486,6 +486,6 @@ over a rather long period of time, but improvements are always welcome!
So if you need to wait for both an RCU grace period and for
all pre-existing call_rcu() callbacks, you will need to execute
both rcu_barrier() and synchronize_rcu(), if necessary, using
something like workqueues to to execute them concurrently.
something like workqueues to execute them concurrently.
See rcubarrier.rst for more information.
......@@ -61,7 +61,7 @@ checking of rcu_dereference() primitives:
rcu_access_pointer(p):
Return the value of the pointer and omit all barriers,
but retain the compiler constraints that prevent duplicating
or coalescsing. This is useful when when testing the
or coalescsing. This is useful when testing the
value of the pointer itself, for example, against NULL.
The rcu_dereference_check() check expression can be any boolean
......
......@@ -262,8 +262,6 @@ Compiling the kernel
- Make sure you have at least gcc 5.1 available.
For more information, refer to :ref:`Documentation/process/changes.rst <changes>`.
Please note that you can still run a.out user programs with this kernel.
- Do a ``make`` to create a compressed kernel image. It is also
possible to do ``make install`` if you have lilo installed to suit the
kernel makefiles, but you may want to check your particular lilo setup first.
......@@ -332,85 +330,10 @@ Compiling the kernel
If something goes wrong
-----------------------
- If you have problems that seem to be due to kernel bugs, please check
the file MAINTAINERS to see if there is a particular person associated
with the part of the kernel that you are having trouble with. If there
isn't anyone listed there, then the second best thing is to mail
them to me (torvalds@linux-foundation.org), and possibly to any other
relevant mailing-list or to the newsgroup.
- In all bug-reports, *please* tell what kernel you are talking about,
how to duplicate the problem, and what your setup is (use your common
sense). If the problem is new, tell me so, and if the problem is
old, please try to tell me when you first noticed it.
- If the bug results in a message like::
unable to handle kernel paging request at address C0000010
Oops: 0002
EIP: 0010:XXXXXXXX
eax: xxxxxxxx ebx: xxxxxxxx ecx: xxxxxxxx edx: xxxxxxxx
esi: xxxxxxxx edi: xxxxxxxx ebp: xxxxxxxx
ds: xxxx es: xxxx fs: xxxx gs: xxxx
Pid: xx, process nr: xx
xx xx xx xx xx xx xx xx xx xx
or similar kernel debugging information on your screen or in your
system log, please duplicate it *exactly*. The dump may look
incomprehensible to you, but it does contain information that may
help debugging the problem. The text above the dump is also
important: it tells something about why the kernel dumped code (in
the above example, it's due to a bad kernel pointer). More information
on making sense of the dump is in Documentation/admin-guide/bug-hunting.rst
- If you compiled the kernel with CONFIG_KALLSYMS you can send the dump
as is, otherwise you will have to use the ``ksymoops`` program to make
sense of the dump (but compiling with CONFIG_KALLSYMS is usually preferred).
This utility can be downloaded from
https://www.kernel.org/pub/linux/utils/kernel/ksymoops/ .
Alternatively, you can do the dump lookup by hand:
- In debugging dumps like the above, it helps enormously if you can
look up what the EIP value means. The hex value as such doesn't help
me or anybody else very much: it will depend on your particular
kernel setup. What you should do is take the hex value from the EIP
line (ignore the ``0010:``), and look it up in the kernel namelist to
see which kernel function contains the offending address.
To find out the kernel function name, you'll need to find the system
binary associated with the kernel that exhibited the symptom. This is
the file 'linux/vmlinux'. To extract the namelist and match it against
the EIP from the kernel crash, do::
nm vmlinux | sort | less
This will give you a list of kernel addresses sorted in ascending
order, from which it is simple to find the function that contains the
offending address. Note that the address given by the kernel
debugging messages will not necessarily match exactly with the
function addresses (in fact, that is very unlikely), so you can't
just 'grep' the list: the list will, however, give you the starting
point of each kernel function, so by looking for the function that
has a starting address lower than the one you are searching for but
is followed by a function with a higher address you will find the one
you want. In fact, it may be a good idea to include a bit of
"context" in your problem report, giving a few lines around the
interesting one.
If you for some reason cannot do the above (you have a pre-compiled
kernel image or similar), telling me as much about your setup as
possible will help. Please read
'Documentation/admin-guide/reporting-issues.rst' for details.
- Alternatively, you can use gdb on a running kernel. (read-only; i.e. you
cannot change values or set break points.) To do this, first compile the
kernel with -g; edit arch/x86/Makefile appropriately, then do a ``make
clean``. You'll also need to enable CONFIG_PROC_FS (via ``make config``).
After you've rebooted with the new kernel, do ``gdb vmlinux /proc/kcore``.
You can now use all the usual gdb commands. The command to look up the
point where your system crashed is ``l *0xXXXXXXXX``. (Replace the XXXes
with the EIP value.)
gdb'ing a non-running kernel currently fails because ``gdb`` (wrongly)
disregards the starting offset for which the kernel is compiled.
If you have problems that seem to be due to kernel bugs, please follow the
instructions at 'Documentation/admin-guide/reporting-issues.rst'.
Hints on understanding kernel bug reports are in
'Documentation/admin-guide/bug-hunting.rst'. More on debugging the kernel
with gdb is in 'Documentation/dev-tools/gdb-kernel-debugging.rst' and
'Documentation/dev-tools/kgdb.rst'.
......@@ -613,6 +613,7 @@ kernel command line.
eibrs enhanced IBRS
eibrs,retpoline enhanced IBRS + Retpolines
eibrs,lfence enhanced IBRS + LFENCE
ibrs use IBRS to protect kernel
Not specifying this option is equivalent to
spectre_v2=auto.
......
......@@ -200,7 +200,7 @@ prb
A pointer to the printk ringbuffer (struct printk_ringbuffer). This
may be pointing to the static boot ringbuffer or the dynamically
allocated ringbuffer, depending on when the the core dump occurred.
allocated ringbuffer, depending on when the core dump occurred.
Used by user-space tools to read the active kernel log buffer.
printk_rb_static
......
......@@ -65,7 +65,7 @@ HugePages_Surp
may be temporarily larger than the maximum number of surplus huge
pages when the system is under memory pressure.
Hugepagesize
is the default hugepage size (in Kb).
is the default hugepage size (in kB).
Hugetlb
is the total amount of memory (in kB), consumed by huge
pages of all sizes.
......
......@@ -133,7 +133,7 @@ code field of ``BPF_END``.
The byte swap instructions operate on the destination register
only and do not use a separate source register or immediate value.
The 1-bit source operand field in the opcode is used to to select what byte
The 1-bit source operand field in the opcode is used to select what byte
order the operation convert from or to:
========= ===== =================================================
......
......@@ -31,7 +31,7 @@ The map uses key of type of either ``__u64 cgroup_inode_id`` or
};
``cgroup_inode_id`` is the inode id of the cgroup directory.
``attach_type`` is the the program's attach type.
``attach_type`` is the program's attach type.
Linux 5.9 added support for type ``__u64 cgroup_inode_id`` as the key type.
When this key type is used, then all attach types of the particular cgroup and
......@@ -155,7 +155,7 @@ However, the BPF program can still only associate with one map of each type
``BPF_MAP_TYPE_CGROUP_STORAGE`` or more than one
``BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE``.
In all versions, userspace may use the the attach parameters of cgroup and
In all versions, userspace may use the attach parameters of cgroup and
attach type pair in ``struct bpf_cgroup_storage_key`` as the key to the BPF map
APIs to read or update the storage for a given attachment. For Linux 5.9
attach type shared storages, only the first value in the struct, cgroup inode
......
......@@ -15,6 +15,18 @@
import sys
import os
import sphinx
import shutil
# helper
# ------
def have_command(cmd):
"""Search ``cmd`` in the ``PATH`` environment.
If found, return True.
If not found, return False.
"""
return shutil.which(cmd) is not None
# Get Sphinx version
major, minor, patch = sphinx.version_info[:3]
......@@ -107,7 +119,32 @@ else:
autosectionlabel_prefix_document = True
autosectionlabel_maxdepth = 2
extensions.append("sphinx.ext.imgmath")
# Load math renderer:
# For html builder, load imgmath only when its dependencies are met.
# mathjax is the default math renderer since Sphinx 1.8.
have_latex = have_command('latex')
have_dvipng = have_command('dvipng')
load_imgmath = have_latex and have_dvipng
# Respect SPHINX_IMGMATH (for html docs only)
if 'SPHINX_IMGMATH' in os.environ:
env_sphinx_imgmath = os.environ['SPHINX_IMGMATH']
if 'yes' in env_sphinx_imgmath:
load_imgmath = True
elif 'no' in env_sphinx_imgmath:
load_imgmath = False
else:
sys.stderr.write("Unknown env SPHINX_IMGMATH=%s ignored.\n" % env_sphinx_imgmath)
# Always load imgmath for Sphinx <1.8 or for epub docs
load_imgmath = (load_imgmath or (major == 1 and minor < 8)
or 'epub' in sys.argv)
if load_imgmath:
extensions.append("sphinx.ext.imgmath")
math_renderer = 'imgmath'
else:
math_renderer = 'mathjax'
# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']
......@@ -333,7 +370,8 @@ html_static_path = ['sphinx-static']
html_use_smartypants = False
# Custom sidebar templates, maps document names to template names.
#html_sidebars = {}
# Note that the RTD theme ignores this.
html_sidebars = { '**': ['searchbox.html', 'localtoc.html', 'sourcelink.html']}
# Additional templates that should be rendered to pages, maps page names to
# template names.
......
......@@ -43,10 +43,11 @@ annotated objects like this, tools can be run on them to generate more useful
information. In particular, on properly annotated objects, ``objtool`` can be
run to check and fix the object if needed. Currently, ``objtool`` can report
missing frame pointer setup/destruction in functions. It can also
automatically generate annotations for :doc:`ORC unwinder <x86/orc-unwinder>`
automatically generate annotations for the ORC unwinder
(Documentation/x86/orc-unwinder.rst)
for most code. Both of these are especially important to support reliable
stack traces which are in turn necessary for :doc:`Kernel live patching
<livepatch/livepatch>`.
stack traces which are in turn necessary for kernel live patching
(Documentation/livepatch/livepatch.rst).
Caveat and Discussion
---------------------
......
......@@ -560,7 +560,7 @@ available:
* cpuhp_state_remove_instance(state, node)
* cpuhp_state_remove_instance_nocalls(state, node)
The arguments are the same as for the the cpuhp_state_add_instance*()
The arguments are the same as for the cpuhp_state_add_instance*()
variants above.
The functions differ in the way how the installed callbacks are treated:
......
......@@ -23,6 +23,7 @@ it.
printk-formats
printk-index
symbol-namespaces
asm-annotations
Data structures and low-level utilities
=======================================
......@@ -44,6 +45,8 @@ Library functionality that is used throughout the kernel.
this_cpu_ops
timekeeping
errseq
wrappers/atomic_t
wrappers/atomic_bitops
Low level entry and exit
========================
......@@ -67,6 +70,7 @@ Documentation/locking/index.rst for more related documentation.
local_ops
padata
../RCU/index
wrappers/memory-barriers.rst
Low-level hardware management
=============================
......
......@@ -71,7 +71,7 @@ variety of methods:
Note that irq domain lookups must happen in contexts that are
compatible with a RCU read-side critical section.
The irq_create_mapping() function must be called *atleast once*
The irq_create_mapping() function must be called *at least once*
before any call to irq_find_mapping(), lest the descriptor will not
be allocated.
......
.. SPDX-License-Identifier: GPL-2.0
This is a simple wrapper to bring atomic_bitops.txt into the RST world
until such a time as that file can be converted directly.
=============
Atomic bitops
=============
.. raw:: latex
\footnotesize
.. include:: ../../atomic_bitops.txt
:literal:
.. raw:: latex
\normalsize
.. SPDX-License-Identifier: GPL-2.0
This is a simple wrapper to bring atomic_t.txt into the RST world
until such a time as that file can be converted directly.
============
Atomic types
============
.. raw:: latex
\footnotesize
.. include:: ../../atomic_t.txt
:literal:
.. raw:: latex
\normalsize
.. SPDX-License-Identifier: GPL-2.0
This is a simple wrapper to bring memory-barriers.txt into the RST world
until such a time as that file can be converted directly.
============================
Linux kernel memory barriers
============================
.. raw:: latex
\footnotesize
.. include:: ../../memory-barriers.txt
:literal:
.. raw:: latex
\normalsize
......@@ -48,10 +48,6 @@ or ``virtualenv``, depending on how your distribution packaged Python 3.
on the Sphinx version, it should be installed separately,
with ``pip install sphinx_rtd_theme``.
#) Some ReST pages contain math expressions. Due to the way Sphinx works,
those expressions are written using LaTeX notation. It needs texlive
installed with amsfonts and amsmath in order to evaluate them.
In summary, if you want to install Sphinx version 2.4.4, you should do::
$ virtualenv sphinx_2.4.4
......@@ -86,6 +82,27 @@ Depending on the distribution, you may also need to install a series of
``texlive`` packages that provide the minimal set of functionalities
required for ``XeLaTeX`` to work.
Math Expressions in HTML
------------------------
Some ReST pages contain math expressions. Due to the way Sphinx works,
those expressions are written using LaTeX notation.
There are two options for Sphinx to render math expressions in html output.
One is an extension called `imgmath`_ which converts math expressions into
images and embeds them in html pages.
The other is an extension called `mathjax`_ which delegates math rendering
to JavaScript capable web browsers.
The former was the only option for pre-6.1 kernel documentation and it
requires quite a few texlive packages including amsfonts and amsmath among
others.
Since kernel release 6.1, html pages with math expressions can be built
without installing any texlive packages. See `Choice of Math Renderer`_ for
further info.
.. _imgmath: https://www.sphinx-doc.org/en/master/usage/extensions/math.html#module-sphinx.ext.imgmath
.. _mathjax: https://www.sphinx-doc.org/en/master/usage/extensions/math.html#module-sphinx.ext.mathjax
.. _sphinx-pre-install:
Checking for Sphinx dependencies
......@@ -164,6 +181,38 @@ To remove the generated documentation, run ``make cleandocs``.
as well would improve the quality of images embedded in PDF
documents, especially for kernel releases 5.18 and later.
Choice of Math Renderer
-----------------------
Since kernel release 6.1, mathjax works as a fallback math renderer for
html output.\ [#sph1_8]_
Math renderer is chosen depending on available commands as shown below:
.. table:: Math Renderer Choices for HTML
============= ================= ============
Math renderer Required commands Image format
============= ================= ============
imgmath latex, dvipng PNG (raster)
mathjax
============= ================= ============
The choice can be overridden by setting an environment variable
``SPHINX_IMGMATH`` as shown below:
.. table:: Effect of Setting ``SPHINX_IMGMATH``
====================== ========
Setting Renderer
====================== ========
``SPHINX_IMGMATH=yes`` imgmath
``SPHINX_IMGMATH=no`` mathjax
====================== ========
.. [#sph1_8] Fallback of math renderer requires Sphinx >=1.8.
Writing Documentation
=====================
......
......@@ -301,6 +301,7 @@ IO region
devm_release_region()
devm_release_resource()
devm_request_mem_region()
devm_request_free_mem_region()
devm_request_region()
devm_request_resource()
......@@ -334,7 +335,7 @@ IRQ
devm_irq_alloc_descs_from()
devm_irq_alloc_generic_chip()
devm_irq_setup_generic_chip()
devm_irq_sim_init()
devm_irq_domain_create_sim()
LED
devm_led_classdev_register()
......@@ -392,7 +393,9 @@ PHY
PINCTRL
devm_pinctrl_get()
devm_pinctrl_put()
devm_pinctrl_get_select()
devm_pinctrl_register()
devm_pinctrl_register_and_init()
devm_pinctrl_unregister()
POWER
......@@ -427,6 +430,8 @@ SLAVE DMA ENGINE
devm_acpi_dma_controller_register()
SPI
devm_spi_alloc_master()
devm_spi_alloc_slave()
devm_spi_register_master()
WATCHDOG
......
......@@ -100,7 +100,7 @@ I believe platform_data is available for this, but if rather not, moving
the isa_driver pointer to the private struct isa_dev is ofcourse fine as
well.
Then, if the the driver did not provide a .match, it matches. If it did,
Then, if the driver did not provide a .match, it matches. If it did,
the driver match() method is called to determine a match.
If it did **not** match, dev->platform_data is reset to indicate this to
......
......@@ -86,17 +86,24 @@ Module Options
Special configuration for udlfb is usually unnecessary. There are a few
options, however.
From the command line, pass options to modprobe
modprobe udlfb fb_defio=0 console=1 shadow=1
From the command line, pass options to modprobe::
Or modify options on the fly at /sys/module/udlfb/parameters directory via
sudo nano fb_defio
change the parameter in place, and save the file.
modprobe udlfb fb_defio=0 console=1 shadow=1
Unplug/replug USB device to apply with new settings
Or change options on the fly by editing
/sys/module/udlfb/parameters/PARAMETER_NAME ::
Or for permanent option, create file like /etc/modprobe.d/udlfb.conf with text
options udlfb fb_defio=0 console=1 shadow=1
cd /sys/module/udlfb/parameters
ls # to see a list of parameter names
sudo nano PARAMETER_NAME
# change the parameter in place, and save the file.
Unplug/replug USB device to apply with new settings.
Or to apply options permanently, create a modprobe configuration file
like /etc/modprobe.d/udlfb.conf with text::
options udlfb fb_defio=0 console=1 shadow=1
Accepted boolean options:
......
......@@ -122,7 +122,7 @@ volumes, calling::
to tell fscache that a volume has been withdrawn. This waits for all
outstanding accesses on the volume to complete before returning.
When the the cache is completely withdrawn, fscache should be notified by
When the cache is completely withdrawn, fscache should be notified by
calling::
void fscache_relinquish_cache(struct fscache_cache *cache);
......
......@@ -456,15 +456,15 @@ The ext4 superblock is laid out as follows in
* - 0x277
- __u8
- s_lastcheck_hi
- Upper 8 bits of the s_lastcheck_hi field.
- Upper 8 bits of the s_lastcheck field.
* - 0x278
- __u8
- s_first_error_time_hi
- Upper 8 bits of the s_first_error_time_hi field.
- Upper 8 bits of the s_first_error_time field.
* - 0x279
- __u8
- s_last_error_time_hi
- Upper 8 bits of the s_last_error_time_hi field.
- Upper 8 bits of the s_last_error_time field.
* - 0x27A
- __u8
- s_pad[2]
......
......@@ -286,9 +286,8 @@ compress_algorithm=%s:%d Control compress algorithm and its compress level, now,
algorithm level range
lz4 3 - 16
zstd 1 - 22
compress_log_size=%u Support configuring compress cluster size, the size will
be 4KB * (1 << %u), 16KB is minimum size, also it's
default size.
compress_log_size=%u Support configuring compress cluster size. The size will
be 4KB * (1 << %u). The default and minimum sizes are 16KB.
compress_extension=%s Support adding specified extension, so that f2fs can enable
compression on those corresponding files, e.g. if all files
with '.ext' has high compression rate, we can set the '.ext'
......
......@@ -661,7 +661,7 @@ idmappings::
mount idmapping: u0:k10000:r10000
Assume a file owned by ``u1000`` is read from disk. The filesystem maps this id
to ``k21000`` according to it's idmapping. This is what is stored in the
to ``k21000`` according to its idmapping. This is what is stored in the
inode's ``i_uid`` and ``i_gid`` fields.
When the caller queries the ownership of this file via ``stat()`` the kernel
......
......@@ -176,7 +176,7 @@ Then userspace.
The requirement for a static, fixed preallocated system area comes from how
qnx6fs deals with writes.
Each superblock got it's own half of the system area. So superblock #1
Each superblock got its own half of the system area. So superblock #1
always uses blocks from the lower half while superblock #2 just writes to
blocks represented by the upper half bitmap system area bits.
......
......@@ -227,7 +227,7 @@ Files
from the data buffer, updating the value of the specified signal
notification register. The signal notification register will
either be replaced with the input data or will be updated to the
bitwise OR or the old value and the input data, depending on the
bitwise OR of the old value and the input data, depending on the
contents of the signal1_type, or signal2_type respectively,
file.
......
......@@ -100,7 +100,7 @@ transactions together::
ntp = xfs_trans_dup(tp);
xfs_trans_commit(tp);
xfs_log_reserve(ntp);
xfs_trans_reserve(ntp);
This results in a series of "rolling transactions" where the inode is locked
across the entire chain of transactions. Hence while this series of rolling
......@@ -191,7 +191,7 @@ transaction rolling mechanism to re-reserve space on every transaction roll. We
know from the implementation of the permanent transactions how many transaction
rolls are likely for the common modifications that need to be made.
For example, and inode allocation is typically two transactions - one to
For example, an inode allocation is typically two transactions - one to
physically allocate a free inode chunk on disk, and another to allocate an inode
from an inode chunk that has free inodes in it. Hence for an inode allocation
transaction, we might set the reservation log count to a value of 2 to indicate
......@@ -200,7 +200,7 @@ chain. Each time a permanent transaction rolls, it consumes an entire unit
reservation.
Hence when the permanent transaction is first allocated, the log space
reservation is increases from a single unit reservation to multiple unit
reservation is increased from a single unit reservation to multiple unit
reservations. That multiple is defined by the reservation log count, and this
means we can roll the transaction multiple times before we have to re-reserve
log space when we roll the transaction. This ensures that the common
......@@ -259,7 +259,7 @@ the next transaction in the sequeunce, but we have none remaining. We cannot
sleep during the transaction commit process waiting for new log space to become
available, as we may end up on the end of the FIFO queue and the items we have
locked while we sleep could end up pinning the tail of the log before there is
enough free space in the log to fulfil all of the pending reservations and
enough free space in the log to fulfill all of the pending reservations and
then wake up transaction commit in progress.
To take a new reservation without sleeping requires us to be able to take a
......@@ -551,14 +551,14 @@ Essentially, this shows that an item that is in the AIL can still be modified
and relogged, so any tracking must be separate to the AIL infrastructure. As
such, we cannot reuse the AIL list pointers for tracking committed items, nor
can we store state in any field that is protected by the AIL lock. Hence the
committed item tracking needs it's own locks, lists and state fields in the log
committed item tracking needs its own locks, lists and state fields in the log
item.
Similar to the AIL, tracking of committed items is done through a new list
called the Committed Item List (CIL). The list tracks log items that have been
committed and have formatted memory buffers attached to them. It tracks objects
in transaction commit order, so when an object is relogged it is removed from
it's place in the list and re-inserted at the tail. This is entirely arbitrary
its place in the list and re-inserted at the tail. This is entirely arbitrary
and done to make it easy for debugging - the last items in the list are the
ones that are most recently modified. Ordering of the CIL is not necessary for
transactional integrity (as discussed in the next section) so the ordering is
......@@ -615,7 +615,7 @@ those changes into the current checkpoint context. We then initialise a new
context and attach that to the CIL for aggregation of new transactions.
This allows us to unlock the CIL immediately after transfer of all the
committed items and effectively allow new transactions to be issued while we
committed items and effectively allows new transactions to be issued while we
are formatting the checkpoint into the log. It also allows concurrent
checkpoints to be written into the log buffers in the case of log force heavy
workloads, just like the existing transaction commit code does. This, however,
......@@ -884,9 +884,9 @@ pin the object the first time it is inserted into the CIL - if it is already in
the CIL during a transaction commit, then we do not pin it again. Because there
can be multiple outstanding checkpoint contexts, we can still see elevated pin
counts, but as each checkpoint completes the pin count will retain the correct
value according to it's context.
value according to its context.
Just to make matters more slightly more complex, this checkpoint level context
Just to make matters slightly more complex, this checkpoint level context
for the pin count means that the pinning of an item must take place under the
CIL commit/flush lock. If we pin the object outside this lock, we cannot
guarantee which context the pin count is associated with. This is because of
......
.. SPDX-License-Identifier: GPL-2.0
.. The Linux Kernel documentation master file, created by
sphinx-quickstart on Fri Feb 12 13:51:46 2016.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
.. _linux_doc:
The Linux Kernel documentation
......@@ -18,133 +12,84 @@ documents into a coherent whole. Please note that improvements to the
documentation are welcome; join the linux-doc list at vger.kernel.org if
you want to help out.
Licensing documentation
-----------------------
Working with the development community
--------------------------------------
The following describes the license of the Linux kernel source code
(GPLv2), how to properly mark the license of individual files in the source
tree, as well as links to the full license text.
* :ref:`kernel_licensing`
User-oriented documentation
---------------------------
The following manuals are written for *users* of the kernel — those who are
trying to get it to work optimally on a given system.
The essential guides for interacting with the kernel's development
community and getting your work upstream.
.. toctree::
:maxdepth: 2
admin-guide/index
kbuild/index
Firmware-related documentation
------------------------------
The following holds information on the kernel's expectations regarding the
platform firmwares.
:maxdepth: 1
.. toctree::
:maxdepth: 2
process/development-process
process/submitting-patches
Code of conduct <process/code-of-conduct>
maintainer/index
All development-process docs <process/index>
firmware-guide/index
devicetree/index
Application-developer documentation
-----------------------------------
Internal API manuals
--------------------
The user-space API manual gathers together documents describing aspects of
the kernel interface as seen by application developers.
Manuals for use by developers working to interface with the rest of the
kernel.
.. toctree::
:maxdepth: 2
userspace-api/index
:maxdepth: 1
core-api/index
driver-api/index
subsystem-apis
Locking in the kernel <locking/index>
Introduction to kernel development
----------------------------------
Development tools and processes
-------------------------------
These manuals contain overall information about how to develop the kernel.
The kernel community is quite large, with thousands of developers
contributing over the course of a year. As with any large community,
knowing how things are done will make the process of getting your changes
merged much easier.
Various other manuals with useful information for all kernel developers.
.. toctree::
:maxdepth: 2
:maxdepth: 1
process/index
dev-tools/index
process/license-rules
doc-guide/index
dev-tools/index
dev-tools/testing-overview
kernel-hacking/index
trace/index
maintainer/index
fault-injection/index
livepatch/index
Kernel API documentation
------------------------
User-oriented documentation
---------------------------
These books get into the details of how specific kernel subsystems work
from the point of view of a kernel developer. Much of the information here
is taken directly from the kernel source, with supplemental material added
as needed (or at least as we managed to add it — probably *not* all that is
needed).
The following manuals are written for *users* of the kernel — those who are
trying to get it to work optimally on a given system and application
developers seeking information on the kernel's user-space APIs.
.. toctree::
:maxdepth: 2
:maxdepth: 1
driver-api/index
core-api/index
locking/index
accounting/index
block/index
cdrom/index
cpu-freq/index
fb/index
fpga/index
hid/index
i2c/index
iio/index
isdn/index
infiniband/index
leds/index
netlabel/index
networking/index
pcmcia/index
power/index
target/index
timers/index
spi/index
w1/index
watchdog/index
virt/index
input/index
hwmon/index
gpu/index
security/index
sound/index
crypto/index
filesystems/index
mm/index
bpf/index
usb/index
PCI/index
scsi/index
misc-devices/index
scheduler/index
mhi/index
peci/index
Architecture-agnostic documentation
-----------------------------------
admin-guide/index
The kernel build system <kbuild/index>
admin-guide/reporting-issues.rst
User-space tools <tools/index>
userspace-api/index
See also: the `Linux man pages <https://www.kernel.org/doc/man-pages/>`_,
which are kept separately from the kernel's own documentation.
Firmware-related documentation
------------------------------
The following holds information on the kernel's expectations regarding the
platform firmwares.
.. toctree::
:maxdepth: 2
:maxdepth: 1
firmware-guide/index
devicetree/index
asm-annotations
Architecture-specific documentation
-----------------------------------
......@@ -163,9 +108,8 @@ of the documentation body, or may require some adjustments and/or conversion
to ReStructured Text format, or are simply too old.
.. toctree::
:maxdepth: 2
:maxdepth: 1
tools/index
staging/index
......
......@@ -90,7 +90,11 @@ e.g., on Ubuntu for gcc-10::
Or on Fedora::
dnf install gcc-plugin-devel
dnf install gcc-plugin-devel libmpc-devel
Or on Fedora when using cross-compilers that include plugins::
dnf install libmpc-devel
Enable the GCC plugin infrastructure and some plugin(s) you want to use
in the kernel config::
......@@ -99,6 +103,19 @@ in the kernel config::
CONFIG_GCC_PLUGIN_LATENT_ENTROPY=y
...
Run gcc (native or cross-compiler) to ensure plugin headers are detected::
gcc -print-file-name=plugin
CROSS_COMPILE=arm-linux-gnu- ${CROSS_COMPILE}gcc -print-file-name=plugin
The word "plugin" means they are not detected::
plugin
A full path means they are detected::
/usr/lib/gcc/x86_64-redhat-linux/12/plugin
To compile the minimum tool set including the plugin(s)::
make scripts
......
......@@ -39,7 +39,7 @@ as the writer can invalidate a pointer that the reader is following.
Sequence counters (``seqcount_t``)
==================================
This is the the raw counting mechanism, which does not protect against
This is the raw counting mechanism, which does not protect against
multiple writers. Write side critical sections must thus be serialized
by an external lock.
......
......@@ -197,7 +197,7 @@ unevictable list for the memory cgroup and node being scanned.
There may be situations where a page is mapped into a VM_LOCKED VMA, but the
page is not marked as PG_mlocked. Such pages will make it all the way to
shrink_active_list() or shrink_page_list() where they will be detected when
vmscan walks the reverse map in page_referenced() or try_to_unmap(). The page
vmscan walks the reverse map in folio_referenced() or try_to_unmap(). The page
is culled to the unevictable list when it is released by the shrinker.
To "cull" an unevictable page, vmscan simply puts the page back on the LRU list
......@@ -267,7 +267,7 @@ the LRU. Such pages can be "noticed" by memory management in several places:
(4) in the fault path and when a VM_LOCKED stack segment is expanded; or
(5) as mentioned above, in vmscan:shrink_page_list() when attempting to
reclaim a page in a VM_LOCKED VMA by page_referenced() or try_to_unmap().
reclaim a page in a VM_LOCKED VMA by folio_referenced() or try_to_unmap().
mlocked pages become unlocked and rescued from the unevictable list when:
......@@ -547,7 +547,7 @@ vmscan's shrink_inactive_list() and shrink_page_list() also divert obviously
unevictable pages found on the inactive lists to the appropriate memory cgroup
and node unevictable list.
rmap's page_referenced_one(), called via vmscan's shrink_active_list() or
rmap's folio_referenced_one(), called via vmscan's shrink_active_list() or
shrink_page_list(), and rmap's try_to_unmap_one() called via shrink_page_list(),
check for (3) pages still mapped into VM_LOCKED VMAs, and call mlock_vma_page()
to correct them. Such pages are culled to the unevictable list when released
......
......@@ -256,8 +256,10 @@ The tags in common use are:
- Cc: the named person received a copy of the patch and had the
opportunity to comment on it.
Be careful in the addition of tags to your patches: only Cc: is appropriate
for addition without the explicit permission of the person named.
Be careful in the addition of tags to your patches, as only Cc: is appropriate
for addition without the explicit permission of the person named; using
Reported-by: is fine most of the time as well, but ask for permission if
the bug was reported in private.
Sending the patch
......
......@@ -51,7 +51,7 @@ the Technical Advisory Board (TAB) or other maintainers if you're
uncertain how to handle situations that come up. It will not be
considered a violation report unless you want it to be. If you are
uncertain about approaching the TAB or any other maintainers, please
reach out to our conflict mediator, Mishi Choudhary <mishi@linux.com>.
reach out to our conflict mediator, Joanna Lee <joanna.lee@gesmer.com>.
In the end, "be kind to each other" is really what the end goal is for
everybody. We know everyone is human and we all fail at times, but the
......@@ -127,10 +127,12 @@ are listed at https://kernel.org/code-of-conduct.html. Members can not
access reports made before they joined or after they have left the
committee.
The initial Code of Conduct Committee consists of volunteer members of
the TAB, as well as a professional mediator acting as a neutral third
party. The first task of the committee is to establish documented
processes, which will be made public.
The Code of Conduct Committee consists of volunteer community members
appointed by the TAB, as well as a professional mediator acting as a
neutral third party. The processes the Code of Conduct committee will
use to address reports is varied and will depend on the individual
circumstance, however, this file serves as documentation for the
general process used.
Any member of the committee, including the mediator, can be contacted
directly if a reporter does not wish to include the full committee in a
......@@ -141,16 +143,16 @@ processes (see above) and consults with the TAB as needed and
appropriate, for instance to request and receive information about the
kernel community.
Any decisions by the committee will be brought to the TAB, for
implementation of enforcement with the relevant maintainers if needed.
A decision by the Code of Conduct Committee can be overturned by the TAB
by a two-thirds vote.
Any decisions regarding enforcement recommendations will be brought to
the TAB for implementation of enforcement with the relevant maintainers
if needed. A decision by the Code of Conduct Committee can be overturned
by the TAB by a two-thirds vote.
At quarterly intervals, the Code of Conduct Committee and TAB will
provide a report summarizing the anonymised reports that the Code of
Conduct committee has received and their status, as well details of any
overridden decisions including complete and identifiable voting details.
We expect to establish a different process for Code of Conduct Committee
staffing beyond the bootstrap period. This document will be updated
with that information when this occurs.
Because how we interpret and enforce the Code of Conduct will evolve over
time, this document will be updated when necessary to reflect any
changes.
......@@ -1186,6 +1186,68 @@ expression used. For instance:
#endif /* CONFIG_SOMETHING */
22) Do not crash the kernel
---------------------------
In general, the decision to crash the kernel belongs to the user, rather
than to the kernel developer.
Avoid panic()
*************
panic() should be used with care and primarily only during system boot.
panic() is, for example, acceptable when running out of memory during boot and
not being able to continue.
Use WARN() rather than BUG()
****************************
Do not add new code that uses any of the BUG() variants, such as BUG(),
BUG_ON(), or VM_BUG_ON(). Instead, use a WARN*() variant, preferably
WARN_ON_ONCE(), and possibly with recovery code. Recovery code is not
required if there is no reasonable way to at least partially recover.
"I'm too lazy to do error handling" is not an excuse for using BUG(). Major
internal corruptions with no way of continuing may still use BUG(), but need
good justification.
Use WARN_ON_ONCE() rather than WARN() or WARN_ON()
**************************************************
WARN_ON_ONCE() is generally preferred over WARN() or WARN_ON(), because it
is common for a given warning condition, if it occurs at all, to occur
multiple times. This can fill up and wrap the kernel log, and can even slow
the system enough that the excessive logging turns into its own, additional
problem.
Do not WARN lightly
*******************
WARN*() is intended for unexpected, this-should-never-happen situations.
WARN*() macros are not to be used for anything that is expected to happen
during normal operation. These are not pre- or post-condition asserts, for
example. Again: WARN*() must not be used for a condition that is expected
to trigger easily, for example, by user space actions. pr_warn_once() is a
possible alternative, if you need to notify the user of a problem.
Do not worry about panic_on_warn users
**************************************
A few more words about panic_on_warn: Remember that ``panic_on_warn`` is an
available kernel option, and that many users set this option. This is why
there is a "Do not WARN lightly" writeup, above. However, the existence of
panic_on_warn users is not a valid reason to avoid the judicious use
WARN*(). That is because, whoever enables panic_on_warn has explicitly
asked the kernel to crash if a WARN*() fires, and such users must be
prepared to deal with the consequences of a system that is somewhat more
likely to crash.
Use BUILD_BUG_ON() for compile-time assertions
**********************************************
The use of BUILD_BUG_ON() is acceptable and encouraged, because it is a
compile-time assertion that has no effect at runtime.
Appendix I) References
----------------------
......
......@@ -5,6 +5,7 @@
.. _process_index:
=============================================
Working with the kernel development community
=============================================
......
......@@ -97,6 +97,12 @@ text, like this:
commit <sha1> upstream.
or alternatively:
.. code-block:: none
[ Upstream commit <sha1> ]
Additionally, some patches submitted via :ref:`option_1` may have additional
patch prerequisites which can be cherry-picked. This can be specified in the
following format in the sign-off area:
......
......@@ -715,8 +715,8 @@ references.
.. _backtraces:
Backtraces in commit mesages
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Backtraces in commit messages
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Backtraces help document the call chain leading to a problem. However,
not all backtraces are helpful. For example, early boot call chains are
......
......@@ -94,7 +94,7 @@ other HZ detail. Thus the CFS scheduler has no notion of "timeslices" in the
way the previous scheduler had, and has no heuristics whatsoever. There is
only one central tunable (you have to switch on CONFIG_SCHED_DEBUG):
/proc/sys/kernel/sched_min_granularity_ns
/sys/kernel/debug/sched/min_granularity_ns
which can be used to tune the scheduler from "desktop" (i.e., low latencies) to
"server" (i.e., good batching) workloads. It defaults to a setting suitable
......
......@@ -14,45 +14,3 @@ Unsorted Documentation
static-keys
tee
xz
Atomic Types
============
.. raw:: latex
\footnotesize
.. include:: ../atomic_t.txt
:literal:
.. raw:: latex
\normalsize
Atomic bitops
=============
.. raw:: latex
\footnotesize
.. include:: ../atomic_bitops.txt
:literal:
.. raw:: latex
\normalsize
Memory Barriers
===============
.. raw:: latex
\footnotesize
.. include:: ../memory-barriers.txt
:literal:
.. raw:: latex
\normalsize
.. SPDX-License-Identifier: GPL-2.0
==============================
Kernel subsystem documentation
==============================
These books get into the details of how specific kernel subsystems work
from the point of view of a kernel developer. Much of the information here
is taken directly from the kernel source, with supplemental material added
as needed (or at least as we managed to add it — probably *not* all that is
needed).
**Fixme**: much more organizational work is needed here.
.. toctree::
:maxdepth: 1
driver-api/index
core-api/index
locking/index
accounting/index
block/index
cdrom/index
cpu-freq/index
fb/index
fpga/index
hid/index
i2c/index
iio/index
isdn/index
infiniband/index
leds/index
netlabel/index
networking/index
pcmcia/index
power/index
target/index
timers/index
spi/index
w1/index
watchdog/index
virt/index
input/index
hwmon/index
gpu/index
security/index
sound/index
crypto/index
filesystems/index
mm/index
bpf/index
usb/index
PCI/index
scsi/index
misc-devices/index
scheduler/index
mhi/index
peci/index
......@@ -412,7 +412,7 @@ Extended error information
Because the default sort key above is 'hitcount', the above shows a
the list of call_sites by increasing hitcount, so that at the bottom
we see the functions that made the most kmalloc calls during the
run. If instead we we wanted to see the top kmalloc callers in
run. If instead we wanted to see the top kmalloc callers in
terms of the number of bytes requested rather than the number of
calls, and we wanted the top caller to appear at the top, we can use
the 'sort' parameter, along with the 'descending' modifier::
......
......@@ -328,8 +328,8 @@ Configuring Kprobes
===================
When configuring the kernel using make menuconfig/xconfig/oldconfig,
ensure that CONFIG_KPROBES is set to "y". Under "General setup", look
for "Kprobes".
ensure that CONFIG_KPROBES is set to "y", look for "Kprobes" under
"General architecture-dependent options".
So that you can load and unload Kprobes-based instrumentation modules,
make sure "Loadable module support" (CONFIG_MODULES) and "Module
......
......@@ -20,7 +20,7 @@ For example::
[root@f32 ~]# cd /sys/kernel/tracing/
[root@f32 tracing]# echo timerlat > current_tracer
It is possible to follow the trace by reading the trace trace file::
It is possible to follow the trace by reading the trace file::
[root@f32 tracing]# cat trace
# tracer: timerlat
......
Chinese translated version of Documentation/core-api/irq/index.rst
If you have any comment or update to the content, please contact the
original document maintainer directly. However, if you have a problem
communicating in English you can also ask the Chinese maintainer for
help. Contact the Chinese maintainer if this translation is outdated
or if there is a problem with the translation.
Maintainer: Eric W. Biederman <ebiederman@xmission.com>
Chinese maintainer: Fu Wei <tekkamanninja@gmail.com>
---------------------------------------------------------------------
Documentation/core-api/irq/index.rst 的中文翻译
如果想评论或更新本文的内容,请直接联系原文档的维护者。如果你使用英文
交流有困难的话,也可以向中文版维护者求助。如果本翻译更新不及时或者翻
译存在问题,请联系中文版维护者。
英文版维护者: Eric W. Biederman <ebiederman@xmission.com>
中文版维护者: 傅炜 Fu Wei <tekkamanninja@gmail.com>
中文版翻译者: 傅炜 Fu Wei <tekkamanninja@gmail.com>
中文版校译者: 傅炜 Fu Wei <tekkamanninja@gmail.com>
以下为正文
---------------------------------------------------------------------
何为 IRQ?
一个 IRQ 是来自某个设备的一个中断请求。目前,它们可以来自一个硬件引脚,
或来自一个数据包。多个设备可能连接到同个硬件引脚,从而共享一个 IRQ。
一个 IRQ 编号是用于告知硬件中断源的内核标识。通常情况下,这是一个
全局 irq_desc 数组的索引,但是除了在 linux/interrupt.h 中的实现,
具体的细节是体系结构特定的。
一个 IRQ 编号是设备上某个可能的中断源的枚举。通常情况下,枚举的编号是
该引脚在系统内中断控制器的所有输入引脚中的编号。对于 ISA 总线中的情况,
枚举的是在两个 i8259 中断控制器中 16 个输入引脚。
架构可以对 IRQ 编号指定额外的含义,在硬件涉及任何手工配置的情况下,
是被提倡的。ISA 的 IRQ 是一个分配这类额外含义的典型例子。
.. SPDX-License-Identifier: GPL-2.0
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/PCI/acpi-info.rst
:翻译:
司延腾 Yanteng Si <siyanteng@loongson.cn>
:校译:
=====================
PCI主桥的ACPI注意事项
=====================
一般的规则是,ACPI命名空间应该描述操作系统可能使用的所有东西,除非有其他方法让操作系
统找到它[1, 2]。
例如,没有标准的硬件机制来枚举PCI主桥,所以ACPI命名空间必须描述每个主桥、访问它
下面的PCI配置空间的方法、主桥转发到PCI的地址空间窗口(使用_CRS)以及传统的INTx
中断的路由(使用_PRT)。
在主桥下面的PCI设备,通常不需要通过ACPI描述。操作系统可以通过标准的PCI枚举机制来
发现它们,使用配置访问来发现和识别设备,并读取和测量它们的BAR。然而,如果ACPI为它们
提供电源管理或热插拔功能,或者如果设备有由平台中断控制器连接的INTx中断,需要一个_PRT
来描述这些连接,这种情况下ACPI可以描述PCI设备。
ACPI资源描述是通过ACPI命名空间中设备的_CRS对象完成的[2]。_CRS就像一个通用的PCI BAR:
操作系统可以读取_CRS并找出正在消耗的资源,即使它没有该设备的驱动程序[3]。这一点很重要,
因为它意味着一个旧的操作系统可以正确地工作,即使是在操作系统不知道的新设备的系统上。新设
备可能什么都不做,但操作系统至少可以确保没有资源与它们冲突。
像MCFG、HPET、ECDT等静态表,不是保留地址空间的机制。静态表是在操作系统在启动初期且在它
能够解析ACPI命名空间之前需要知道的东西。如果定义了一个新的表,即使旧的操作系统忽略了这
个表,它也需要正常运行。_CRS允许这样做,因为它是通用的,可以被旧的操作系统解析;而静态表
则不允许。
如果操作系统要管理一个通过ACPI描述的不可发现的设备,该设备将有一个特定的_HID/_CID,以
告诉操作系统与之绑定的驱动程序,并且_CRS告诉操作系统和驱动程序该设备的寄存器在哪里。
PCI主桥是PNP0A03或PNP0A08设备。它们的_CRS应该描述它们所消耗的所有地址空间。这包括它
们转发到PCI总线上的所有窗口,以及不转发到PCI的主桥本身的寄存器。主桥的寄存器包括次要/下
级总线寄存器,决定了桥下面的总线范围,窗口寄存器描述了桥洞,等等。这些都是设备相关的,非
架构相关的东西,所以PNP0A03/PNP0A08驱动可以管理它们的唯一方法是通过_PRS/_CRS/_SRS,
它包含了特定于设备的细节。主桥寄存器也包括ECAM空间,因为它是由主桥消耗的。
ACPI定义了一个Consumer/Producer位来区分桥寄存器(“Consumer”下文译作消费者)和
桥洞(“Producer”下文译作生产者)[4, 5],但是早期的BIOS没有正确使用这个位。其结果
是,目前的ACPI规范只为扩展地址空间描述符定义了消费者/生产者;在旧的QWord/Word/Word地
址空间描述符中,该位应该被忽略。因此,操作系统必须假定所有的QWord/Word/Word描述符都是
窗口。
在增加扩展地址空间描述符之前,消费者/生产者的失败意味着没有办法描述PNP0A03/PNP0A08设
备本身的桥寄存器。解决办法是在PNP0C02捕捉器中描述桥寄存器(包括ECAM空间)[6]。
除了ECAM之外,桥寄存器空间反正是特定于设备的,所以通用的PNP0A03/PNP0A08驱动程
序(pci_root.c)没有必要了解它。
新的架构应该能够在PNP0A03设备中使用“消费者”扩展地址空间描述符,用于桥寄存器,包括
ECAM,尽管对[6]的严格解释可能禁止这样做。旧的x86和ia64内核假定所有的地址空间描述
符,包括“消费者”扩展地址空间的描述符,都是窗口,所以在这些架构上以这种方式描述桥寄
存器是不安全的。
PNP0C02“主板”设备基本上是万能的。除了“不要将这些资源用于其他用途”之外,没有其他的编
程模型。因此,PNP0C02 _CRS应该声明ACPI命名空间中(1)没有被_CRS声明的任何其他设备对
象的地址空间,(2)不应该被OS分配给其他东西。
除非有一个标准的固件接口用于配置访问,例如ia64 SAL接口[7],否则PCIe规范要求使用增强
型配置访问方法(ECAM)。主桥消耗ECAM内存地址空间并将内存访问转换为PCI配置访问。该规范
定义了ECAM地址空间的布局和功能;只有地址空间的基础是特定于设备的。ACPI操作系统从静态
MCFG表或PNP0A03设备中的_CBA方法中了解基础地址。
MCFG表必须描述非热插拔主桥的ECAM空间[8]。由于MCFG是一个静态表,不能通过热插拔更新,
PNP0A03设备中的_CBA方法描述了可热插拔主桥的ECAM空间[9]。请注意,对于MCFG和_CBA,
基址总是对应于总线0,即使桥器下面的总线范围(通过_CRS报告)不从0开始。
[1] ACPI 6.2, sec 6.1:
对于任何在非枚举类型的总线上的设备(例如,ISA总线),OSPM会枚举设备的标识符,ACPI
系统固件必须为每个设备提供一个_HID对象...以使OSPM能够做到这一点。
[2] ACPI 6.2, sec 3.7:
操作系统枚举主板设备时,只需通过读取ACPI命名空间来寻找具有硬件ID的设备。
ACPI枚举的每个设备都包括ACPI命名空间中ACPI定义的对象,该对象报告设备可能占用的硬
件资源[_PRS],报告设备当前使用的资源[_CRS]的对象,以及配置这些资源的对象[_SRS]。
这些信息被即插即用操作系统(OSPM)用来配置设备。
[3] ACPI 6.2, sec 6.2:
OSPM使用设备配置对象来配置通过ACPI列举的设备的硬件资源。设备配置对象提供了关于当前
和可能的资源需求的信息,共享资源之间的关系,以及配置硬件资源的方法。
当OSPM枚举一个设备时,它调用_PRS来确定该设备的资源需求。它也可以调用_CRS来找到该设
备的当前资源设置。利用这些信息,即插即用系统决定设备应该消耗什么资源,并通过调用设备
的_SRS控制方法来设置这些资源。
在ACPI中,设备可以消耗资源(例如,传统的键盘),提供资源(例如,一个专有的PCI桥),
或者两者都做。除非另有规定,设备的资源被假定为来自设备层次结构中设备上方最近的匹配资
源。
[4] ACPI 6.2, sec 6.4.3.5.1, 2, 3, 4:
QWord/DWord/Word 地址空间描述符 (.1, .2, .3)
常规标志: Bit [0] 被忽略。
扩展地址空间描述符 (.4)
常规标志: Bit [0] 消费者/生产者:
* 1 – 这个设备消费这个资源
* 0 – 该设备生产和消费该资源
[5] ACPI 6.2, sec 19.6.43:
ResourceUsage指定内存范围是由这个设备(ResourceConsumer)消费还是传递给子设备
(ResourceProducer)。如果没有指定,那么就假定是ResourceConsumer。
[6] PCI Firmware 3.2, sec 4.1.2:
如果操作系统不能原生的懂得保留MMCFG区域,MMCFG区域必须由固件保留。在MCFG表中或通
过_CBA方法(见第4.1.3节)报告的地址范围必须通过声明主板资源来保留。对于大多数系统,
主板资源将出现在ACPI命名空间的根部(在_SB下),在一个节点的_HID为EISAID(PNP0C0
2),在这种情况下的资源不应该要求在根PCI总线的_CRS。这些资源可以选择在Int15 E820
或EFIGetMemoryMap中作为保留内存返回,但必须始终通过ACPI作为主板资源报告。
[7] PCI Express 4.0, sec 7.2.2:
对于PC兼容的系统,或者没有实现允许访问配置空间的处理器架构特定固件接口标准的系统,需
要使用本节中定义的ECAM。
[8] PCI Firmware 3.2, sec 4.1.2:
MCFG表是一个ACPI表,用于沟通的基础地址对应的非热的可移动的PCI段组范围内的PCI段组在
启动时提供给操作系统。这对PC兼容系统来说是必需的。
MCFG表仅用于沟通在启动时系统可用的PCI段组对应的基址。
[9] PCI Firmware 3.2, sec 4.1.3:
_CBA (Memory mapped Configuration Base Address) 控制方法是一个可选的ACPI对
象,用于返回热插拔主桥的64位内存映射的配置基址。_CBA 返回的基址是与处理器相关的地址。
_CBA 控制方法被评估为一个整数。
这个控制方法出现在主桥对象下。当_CBA方法出现在一个活动的主桥对象下时,操作系统会评
估这个结构,以确定内存映射的配置基址,对应于_CRS方法中指定的总线编号范围的PCI段组。
一个包含_CBA方法的ACPI命名空间对象也必须包含一个相应的_SEG方法。
......@@ -10,9 +10,6 @@
:校译:
.. _cn_PCI_index.rst:
===================
Linux PCI总线子系统
===================
......@@ -26,12 +23,12 @@ Linux PCI总线子系统
pci-iov-howto
msi-howto
sysfs-pci
acpi-info
Todolist:
acpi-info
pci-error-recovery
pcieaer-howto
endpoint/index
boot-interrupts
* pci-error-recovery
* pcieaer-howto
* endpoint/index
* boot-interrupts
......@@ -6,10 +6,10 @@
吴想成 Wu XiangCheng <bobwxc@email.cn>
Linux内核5.x版本 <http://kernel.org/>
Linux内核6.x版本 <http://kernel.org/>
=========================================
以下是Linux版本5的发行注记。仔细阅读它们,
以下是Linux版本6的发行注记。仔细阅读它们,
它们会告诉你这些都是什么,解释如何安装内核,以及遇到问题时该如何做。
什么是Linux?
......@@ -61,27 +61,27 @@ Linux内核5.x版本 <http://kernel.org/>
- 如果您要安装完整的源代码,请把内核tar档案包放在您有权限的目录中(例如您
的主目录)并将其解包::
xz -cd linux-5.x.tar.xz | tar xvf -
xz -cd linux-6.x.tar.xz | tar xvf -
将“X”替换成最新内核的版本号。
【不要】使用 /usr/src/linux 目录!这里有一组库头文件使用的内核头文件
(通常是不完整的)。它们应该与库匹配,而不是被内核的变化搞得一团糟。
- 您还可以通过打补丁在5.x版本之间升级。补丁以xz格式分发。要通过打补丁进行
安装,请获取所有较新的补丁文件,进入内核源代码(linux-5.x)的目录并
- 您还可以通过打补丁在6.x版本之间升级。补丁以xz格式分发。要通过打补丁进行
安装,请获取所有较新的补丁文件,进入内核源代码(linux-6.x)的目录并
执行::
xz -cd ../patch-5.x.xz | patch -p1
xz -cd ../patch-6.x.xz | patch -p1
请【按顺序】替换所有大于当前源代码树版本的“x”,这样就可以了。您可能想要
删除备份文件(文件名类似xxx~ 或 xxx.orig),并确保没有失败的补丁(文件名
类似xxx# 或 xxx.rej)。如果有,不是你就是我犯了错误。
5.x内核的补丁不同,5.x.y内核(也称为稳定版内核)的补丁不是增量的,而是
直接应用于基本的5.x内核。例如,如果您的基本内核是5.0,并且希望应用5.0.3
补丁,则不应先应用5.0.1和5.0.2的补丁。类似地,如果您运行的是5.0.2内核,
并且希望跳转到5.0.3,那么在应用5.0.3补丁之前,必须首先撤销5.0.2补丁
6.x内核的补丁不同,6.x.y内核(也称为稳定版内核)的补丁不是增量的,而是
直接应用于基本的6.x内核。例如,如果您的基本内核是6.0,并且希望应用6.0.3
补丁,则不应先应用6.0.1和6.0.2的补丁。类似地,如果您运行的是6.0.2内核,
并且希望跳转到6.0.3,那么在应用6.0.3补丁之前,必须首先撤销6.0.2补丁
(即patch -R)。更多关于这方面的内容,请阅读
:ref:`Documentation/process/applying-patches.rst <applying_patches>` 。
......@@ -103,7 +103,7 @@ Linux内核5.x版本 <http://kernel.org/>
软件要求
---------
编译和运行5.x内核需要各种软件包的最新版本。请参考
编译和运行6.x内核需要各种软件包的最新版本。请参考
:ref:`Documentation/process/changes.rst <changes>`
来了解最低版本要求以及如何升级软件包。请注意,使用过旧版本的这些包可能会
导致很难追踪的间接错误,因此不要以为在生成或操作过程中出现明显问题时可以
......@@ -116,12 +116,12 @@ Linux内核5.x版本 <http://kernel.org/>
``make O=output/dir`` 选项可以为输出文件(包括 .config)指定备用位置。
例如::
kernel source code: /usr/src/linux-5.x
kernel source code: /usr/src/linux-6.x
build directory: /home/name/build/kernel
要配置和构建内核,请使用::
cd /usr/src/linux-5.x
cd /usr/src/linux-6.x
make O=/home/name/build/kernel menuconfig
make O=/home/name/build/kernel
sudo make O=/home/name/build/kernel modules_install install
......@@ -227,8 +227,6 @@ Linux内核5.x版本 <http://kernel.org/>
- 确保您至少有gcc 5.1可用。
有关更多信息,请参阅 :ref:`Documentation/process/changes.rst <changes>` 。
请注意,您仍然可以使用此内核运行a.out用户程序。
- 执行 ``make`` 来创建压缩内核映像。如果您安装了lilo以适配内核makefile,
那么也可以进行 ``make install`` ,但是您可能需要先检查特定的lilo设置。
......@@ -282,67 +280,12 @@ Linux内核5.x版本 <http://kernel.org/>
若遇到问题
-----------
- 如果您发现了一些可能由于内核缺陷所导致的问题,请检查MAINTAINERS(维护者)
文件看看是否有人与令您遇到麻烦的内核部分相关。如果无人在此列出,那么第二
个最好的方案就是把它们发给我(torvalds@linux-foundation.org),也可能发送
到任何其他相关的邮件列表或新闻组。
- 在所有的缺陷报告中,【请】告诉我们您在说什么内核,如何复现问题,以及您的
设置是什么的(使用您的常识)。如果问题是新的,请告诉我;如果问题是旧的,
请尝试告诉我您什么时候首次注意到它。
- 如果缺陷导致如下消息::
unable to handle kernel paging request at address C0000010
Oops: 0002
EIP: 0010:XXXXXXXX
eax: xxxxxxxx ebx: xxxxxxxx ecx: xxxxxxxx edx: xxxxxxxx
esi: xxxxxxxx edi: xxxxxxxx ebp: xxxxxxxx
ds: xxxx es: xxxx fs: xxxx gs: xxxx
Pid: xx, process nr: xx
xx xx xx xx xx xx xx xx xx xx
或者类似的内核调试信息显示在屏幕上或在系统日志里,请【如实】复制它。
可能对你来说转储(dump)看起来不可理解,但它确实包含可能有助于调试问题的
信息。转储上方的文本也很重要:它说明了内核转储代码的原因(在上面的示例中,
是由于内核指针错误)。更多关于如何理解转储的信息,请参见
Documentation/admin-guide/bug-hunting.rst。
- 如果使用 CONFIG_KALLSYMS 编译内核,则可以按原样发送转储,否则必须使用
``ksymoops`` 程序来理解转储(但通常首选使用CONFIG_KALLSYMS编译)。
此实用程序可从
https://www.kernel.org/pub/linux/utils/kernel/ksymoops/ 下载。
或者,您可以手动执行转储查找:
- 在调试像上面这样的转储时,如果您可以查找EIP值的含义,这将非常有帮助。
十六进制值本身对我或其他任何人都没有太大帮助:它会取决于特定的内核设置。
您应该做的是从EIP行获取十六进制值(忽略 ``0010:`` ),然后在内核名字列表
中查找它,以查看哪个内核函数包含有问题的地址。
要找到内核函数名,您需要找到与显示症状的内核相关联的系统二进制文件。就是
文件“linux/vmlinux”。要提取名字列表并将其与内核崩溃中的EIP进行匹配,
请执行::
nm vmlinux | sort | less
这将为您提供一个按升序排序的内核地址列表,从中很容易找到包含有问题的地址
的函数。请注意,内核调试消息提供的地址不一定与函数地址完全匹配(事实上,
这是不可能的),因此您不能只“grep”列表:不过列表将为您提供每个内核函数
的起点,因此通过查找起始地址低于你正在搜索的地址,但后一个函数的高于的
函数,你会找到您想要的。实际上,在您的问题报告中加入一些“上下文”可能是
一个好主意,给出相关的上下几行。
如果您由于某些原因无法完成上述操作(如您使用预编译的内核映像或类似的映像),
请尽可能多地告诉我您的相关设置信息,这会有所帮助。有关详细信息请阅读
‘Documentation/admin-guide/reporting-issues.rst’。
- 或者,您可以在正在运行的内核上使用gdb(只读的;即不能更改值或设置断点)。
为此,请首先使用-g编译内核;适当地编辑arch/x86/Makefile,然后执行 ``make
clean`` 。您还需要启用CONFIG_PROC_FS(通过 ``make config`` )。
使用新内核重新启动后,执行 ``gdb vmlinux /proc/kcore`` 。现在可以使用所有
普通的gdb命令。查找系统崩溃点的命令是 ``l *0xXXXXXXXX`` (将xxx替换为EIP
值)。
用gdb无法调试一个当前未运行的内核是由于gdb(错误地)忽略了编译内核的起始
偏移量。
如果您发现了一些可能由于内核缺陷所导致的问题,请参阅:
Documentation/translations/zh_CN/admin-guide/reporting-issues.rst 。
想要理解内核错误报告,请参阅:
Documentation/translations/zh_CN/admin-guide/bug-hunting.rst 。
更多用GDB调试内核的信息,请参阅:
Documentation/translations/zh_CN/dev-tools/gdb-kernel-debugging.rst
和 Documentation/dev-tools/kgdb.rst 。
.. SPDX-License-Identifier: GPL-2.0
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/admin-guide/bootconfig.rst
:译者: 吴想成 Wu XiangCheng <bobwxc@email.cn>
========
引导配置
========
:作者: Masami Hiramatsu <mhiramat@kernel.org>
概述
====
引导配置扩展了现有的内核命令行,以一种更有效率的方式在引导内核时进一步支持
键值数据。这允许管理员传递一份结构化关键字的配置文件。
配置文件语法
============
引导配置文件的语法采用非常简单的键值结构。每个关键字由点连接的单词组成,键
和值由 ``=`` 连接。值以分号( ``;`` )或换行符( ``\n`` )结尾。数组值中每
个元素由逗号( ``,`` )分隔。::
KEY[.WORD[...]] = VALUE[, VALUE2[...]][;]
与内核命令行语法不同,逗号和 ``=`` 周围允许有空格。
关键字只允许包含字母、数字、连字符( ``-`` )和下划线( ``_`` )。值可包含
可打印字符和空格,但分号( ``;`` )、换行符( ``\n`` )、逗号( ``,`` )、
井号( ``#`` )和右大括号( ``}`` )等分隔符除外。
如果你需要在值中使用这些分隔符,可以用双引号( ``"VALUE"`` )或单引号
( ``'VALUE'`` )括起来。注意,引号无法转义。
键的值可以为空或不存在。这些键用于检查该键是否存在(类似布尔值)。
键值语法
--------
引导配置文件语法允许用户通过大括号合并键名部分相同的关键字。例如::
foo.bar.baz = value1
foo.bar.qux.quux = value2
也可以写成::
foo.bar {
baz = value1
qux.quux = value2
}
或者更紧凑一些,写成::
foo.bar { baz = value1; qux.quux = value2 }
在这两种样式中,引导解析时相同的关键字都会自动合并。因此可以追加类似的树或
键值。
相同关键字的值
--------------
禁止两个或多个值或数组共享同一个关键字。例如::
foo = bar, baz
foo = qux # !错误! 我们不可以重定义相同的关键字
如果你想要更新值,必须显式使用覆盖操作符 ``:=`` 。例如::
foo = bar, baz
foo := qux
这样 ``foo`` 关键字的值就变成了 ``qux`` 。这对于通过添加(部分)自定义引导
配置来覆盖默认值非常有用,免于解析默认引导配置。
如果你想对现有关键字追加值作为数组成员,可以使用 ``+=`` 操作符。例如::
foo = bar, baz
foo += qux
这样, ``foo`` 关键字就同时拥有了 ``bar`` , ``baz`` 和 ``qux`` 。
此外,父关键字下可同时存在值和子关键字。
例如,下列配置是可行的。::
foo = value1
foo.bar = value2
foo := value3 # 这会更新foo的值。
注意,裸值不能直接放进结构化关键字中,必须在大括号外定义它。例如::
foo {
bar = value1
bar {
baz = value2
qux = value3
}
}
同时,关键字下值节点的顺序是固定的。如果值和子关键字同时存在,值永远是该关
键字的第一个子节点。因此如果用户先指定子关键字,如::
foo.bar = value1
foo = value2
则在程序(和/proc/bootconfig)中,它会按如下显示::
foo = value2
foo.bar = value1
注释
----
配置语法接受shell脚本风格的注释。注释以井号( ``#`` )开始,到换行符
( ``\n`` )结束。
::
# comment line
foo = value # value is set to foo.
bar = 1, # 1st element
2, # 2nd element
3 # 3rd element
会被解析为::
foo = value
bar = 1, 2, 3
注意你不能把注释放在值和分隔符( ``,`` 或 ``;`` )之间。如下配置语法是错误的::
key = 1 # comment
,2
/proc/bootconfig
================
/proc/bootconfig是引导配置的用户空间接口。与/proc/cmdline不同,此文件内容以
键值列表样式显示。
每个键值对一行,样式如下::
KEY[.WORDS...] = "[VALUE]"[,"VALUE2"...]
用引导配置引导内核
==================
用引导配置引导内核有两种方法:将引导配置附加到initrd镜像或直接嵌入内核中。
*initrd: initial RAM disk,初始内存磁盘*
将引导配置附加到initrd
----------------------
由于默认情况下引导配置文件是用initrd加载的,因此它将被添加到initrd(initramfs)
镜像文件的末尾,其中包含填充、大小、校验值和12字节幻数,如下所示::
[initrd][bootconfig][padding][size(le32)][checksum(le32)][#BOOTCONFIG\n]
大小和校验值为小端序存放的32位无符号值。
当引导配置被加到initrd镜像时,整个文件大小会对齐到4字节。空字符( ``\0`` )
会填补对齐空隙。因此 ``size`` 就是引导配置文件的长度+填充的字节。
Linux内核在内存中解码initrd镜像的最后部分以获取引导配置数据。由于这种“背负式”
的方法,只要引导加载器传递了正确的initrd文件大小,就无需更改或更新引导加载器
和内核镜像本身。如果引导加载器意外传递了更长的大小,内核将无法找到引导配置数
据。
Linux内核在tools/bootconfig下提供了 ``bootconfig`` 命令来完成此操作,管理员
可以用它从initrd镜像中删除或追加配置文件。你可以用以下命令来构建它::
# make -C tools/bootconfig
要向initrd镜像添加你的引导配置文件,请按如下命令操作(旧数据会自动移除)::
# tools/bootconfig/bootconfig -a your-config /boot/initrd.img-X.Y.Z
要从镜像中移除配置,可以使用-d选项::
# tools/bootconfig/bootconfig -d /boot/initrd.img-X.Y.Z
然后在内核命令行上添加 ``bootconfig`` 告诉内核去initrd文件末尾寻找内核配置。
将引导配置嵌入内核
------------------
如果你不能使用initrd,也可以通过Kconfig选项将引导配置文件嵌入内核中。在此情
况下,你需要用以下选项重新编译内核::
CONFIG_BOOT_CONFIG_EMBED=y
CONFIG_BOOT_CONFIG_EMBED_FILE="/引导配置/文件/的/路径"
``CONFIG_BOOT_CONFIG_EMBED_FILE`` 需要从源码树或对象树开始的引导配置文件的
绝对/相对路径。内核会将其嵌入作为默认引导配置。
与将引导配置附加到initrd一样,你也需要在内核命令行上添加 ``bootconfig`` 告诉
内核去启用内嵌的引导配置。
注意,即使你已经设置了此选项,仍可用附加到initrd的其他引导配置覆盖内嵌的引导
配置。
通过引导配置传递内核参数
========================
除了内核命令行,引导配置也可以用于传递内核参数。所有 ``kernel`` 关键字下的键
值对都将直接传递给内核命令行。此外, ``init`` 下的键值对将通过命令行传递给
init进程。参数按以下顺序与用户给定的内核命令行字符串相连,因此命令行参数可以
覆盖引导配置参数(这取决于子系统如何处理参数,但通常前面的参数将被后面的参数
覆盖)::
[bootconfig params][cmdline params] -- [bootconfig init params][cmdline init params]
如果引导配置文件给出的kernel/init参数是::
kernel {
root = 01234567-89ab-cdef-0123-456789abcd
}
init {
splash
}
这将被复制到内核命令行字符串中,如下所示::
root="01234567-89ab-cdef-0123-456789abcd" -- splash
如果用户给出的其他命令行是::
ro bootconfig -- quiet
则最后的内核命令行如下::
root="01234567-89ab-cdef-0123-456789abcd" ro bootconfig -- splash quiet
配置文件的限制
==============
当前最大的配置大小是32KB,关键字总数(不是键值条目)必须少于1024个节点。
注意:这不是条目数而是节点数,条目必须消耗超过2个节点(一个关键字和一个值)。
所以从理论上讲最多512个键值对。如果关键字平均包含3个单词,则可有256个键值对。
在大多数情况下,配置项的数量将少于100个条目,小于8KB,因此这应该足够了。如果
节点数超过1024,解析器将返回错误,即使文件大小小于32KB。(请注意,此最大尺寸
不包括填充的空字符。)
无论如何,因为 ``bootconfig`` 命令在附加启动配置到initrd映像时会验证它,用户
可以在引导之前注意到它。
引导配置API
===========
用户可以查询或遍历键值对,也可以查找(前缀)根关键字节点,并在查找该节点下的
键值。
如果您有一个关键字字符串,则可以直接使用 xbc_find_value() 查询该键的值。如果
你想知道引导配置里有哪些关键字,可以使用 xbc_for_each_key_value() 迭代键值对。
请注意,您需要使用 xbc_array_for_each_value() 访问数组的值,例如::
vnode = NULL;
xbc_find_value("key.word", &vnode);
if (vnode && xbc_node_is_array(vnode))
xbc_array_for_each_value(vnode, value) {
printk("%s ", value);
}
如果您想查找具有前缀字符串的键,可以使用 xbc_find_node() 通过前缀字符串查找
节点,然后用 xbc_node_for_each_key_value() 迭代前缀节点下的键。
但最典型的用法是获取前缀下的命名值或前缀下的命名数组,例如::
root = xbc_find_node("key.prefix");
value = xbc_node_find_value(root, "option", &vnode);
...
xbc_node_for_each_array_value(root, "array-option", value, anode) {
...
}
这将访问值“key.prefix.option”的值和“key.prefix.array-option”的数组。
锁是不需要的,因为在初始化之后配置只读。如果需要修改,必须复制所有数据和关键字。
函数与结构体
============
相关定义的kernel-doc参见:
- include/linux/bootconfig.h
- lib/bootconfig.c
......@@ -63,6 +63,7 @@ Todolist:
.. toctree::
:maxdepth: 1
bootconfig
clearing-warn-once
cpu-load
cputopology
......@@ -80,7 +81,6 @@ Todolist:
* binderfs
* binfmt-misc
* blockdev/index
* bootconfig
* braille-console
* btmrvl
* cgroup-v1/index
......
.. SPDX-License-Identifier: GPL-2.0+
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/core-api/circular-buffers.rst
:翻译:
周彬彬 Binbin Zhou <zhoubinbin@loongson.cn>
:校译:
司延腾 Yanteng Si <siyanteng@loongson.cn>
吴想成 Wu Xiangcheng <bobwxc@email.cn>
时奎亮 Alex Shi <alexs@kernel.org>
==========
环形缓冲区
==========
:作者: David Howells <dhowells@redhat.com>
:作者: Paul E. McKenney <paulmck@linux.ibm.com>
Linux 提供了许多可用于实现循环缓冲的特性。有两组这样的特性:
(1) 用于确定2次方大小的缓冲区信息的便利函数。
(2) 可以代替缓冲区中对象的生产者和消费者共享锁的内存屏障。
如下所述,要使用这些设施,只需要一个生产者和一个消费者。可以通过序列化来处理多个
生产者,并通过序列化来处理多个消费者。
.. Contents:
(*) 什么是环形缓冲区?
(*) 测量2次幂缓冲区
(*) 内存屏障与环形缓冲区的结合使用
- 生产者
- 消费者
(*) 延伸阅读
什么是环形缓冲区?
==================
首先,什么是环形缓冲区?环形缓冲区是具有固定的有限大小的缓冲区,它有两个索引:
(1) 'head'索引 - 生产者将元素插入缓冲区的位置。
(2) 'tail'索引 - 消费者在缓冲区中找到下一个元素的位置。
通常,当tail指针等于head指针时,表明缓冲区是空的;而当head指针比tail指针少一个时,
表明缓冲区是满的。
添加元素时,递增head索引;删除元素时,递增tail索引。tail索引不应该跳过head索引,
两个索引在到达缓冲区末端时都应该被赋值为0,从而允许海量的数据流过缓冲区。
通常情况下,元素都有相同的单元大小,但这并不是使用以下技术的严格要求。如果要在缓
冲区中包含多个元素或可变大小的元素,则索引可以增加超过1,前提是两个索引都没有超过
另一个。然而,实现者必须小心,因为超过一个单位大小的区域可能会覆盖缓冲区的末端并
且缓冲区会被分成两段。
测量2次幂缓冲区
===============
计算任意大小的环形缓冲区的占用或剩余容量通常是一个费时的操作,需要使用模(除法)
指令。但是如果缓冲区的大小为2次幂,则可以使用更快的按位与指令代替。
Linux提供了一组用于处理2次幂环形缓冲区的宏。可以通过以下方式使用::
#include <linux/circ_buf.h>
这些宏包括:
(#) 测量缓冲区的剩余容量::
CIRC_SPACE(head_index, tail_index, buffer_size);
返回缓冲区[1]中可插入元素的剩余空间大小。
(#) 测量缓冲区中的最大连续立即可用空间::
CIRC_SPACE_TO_END(head_index, tail_index, buffer_size);
返回缓冲区[1]中剩余的连续空间的大小,元素可以立即插入其中,而不必绕回到缓冲
区的开头。
(#) 测量缓冲区的使用数::
CIRC_CNT(head_index, tail_index, buffer_size);
返回当前占用缓冲区[2]的元素数量。
(#) 测量缓冲区的连续使用数::
CIRC_CNT_TO_END(head_index, tail_index, buffer_size);
返回可以从缓冲区中提取的连续元素[2]的数量,而不必绕回到缓冲区的开头。
这里的每一个宏名义上都会返回一个介于0和buffer_size-1之间的值,但是:
(1) CIRC_SPACE*()是为了在生产者中使用。对生产者来说,它们将返回一个下限,因为生
产者控制着head索引,但消费者可能仍然在另一个CPU上耗尽缓冲区并移动tail索引。
对消费者来说,它将显示一个上限,因为生产者可能正忙于耗尽空间。
(2) CIRC_CNT*()是为了在消费者中使用。对消费者来说,它们将返回一个下限,因为消费
者控制着tail索引,但生产者可能仍然在另一个CPU上填充缓冲区并移动head索引。
对于生产者,它将显示一个上限,因为消费者可能正忙于清空缓冲区。
(3) 对于第三方来说,生产者和消费者对索引的写入顺序是无法保证的,因为它们是独立的,
而且可能是在不同的CPU上进行的,所以在这种情况下的结果只是一种猜测,甚至可能
是错误的。
内存屏障与环形缓冲区的结合使用
==============================
通过将内存屏障与环形缓冲区结合使用,可以避免以下需求:
(1) 使用单个锁来控制对缓冲区两端的访问,从而允许同时填充和清空缓冲区;以及
(2) 使用原子计数器操作。
这有两个方面:填充缓冲区的生产者和清空缓冲区的消费者。在任何时候,只应有一个生产
者在填充缓冲区,同样的也只应有一个消费者在清空缓冲区,但双方可以同时操作。
生产者
------
生产者看起来像这样::
spin_lock(&producer_lock);
unsigned long head = buffer->head;
/* spin_unlock()和下一个spin_lock()提供必要的排序。 */
unsigned long tail = READ_ONCE(buffer->tail);
if (CIRC_SPACE(head, tail, buffer->size) >= 1) {
/* 添加一个元素到缓冲区 */
struct item *item = buffer[head];
produce_item(item);
smp_store_release(buffer->head,
(head + 1) & (buffer->size - 1));
/* wake_up()将确保在唤醒任何人之前提交head */
wake_up(consumer);
}
spin_unlock(&producer_lock);
这将表明CPU必须在head索引使其对消费者可用之前写入新项目的内容,同时CPU必须在唤醒
消费者之前写入修改后的head索引。
请注意,wake_up()并不保证任何形式的屏障,除非确实唤醒了某些东西。因此我们不能依靠
它来进行排序。但是数组中始终有一个元素留空,因此生产者必须产生两个元素,然后才可
能破坏消费者当前正在读取的元素。同时,消费者连续调用之间成对的解锁-加锁提供了索引
读取(指示消费者已清空给定元素)和生产者对该相同元素的写入之间的必要顺序。
消费者
------
消费者看起来像这样::
spin_lock(&consumer_lock);
/* 读取该索引处的内容之前,先读取索引 */
unsigned long head = smp_load_acquire(buffer->head);
unsigned long tail = buffer->tail;
if (CIRC_CNT(head, tail, buffer->size) >= 1) {
/* 从缓冲区中提取一个元素 */
struct item *item = buffer[tail];
consume_item(item);
/* 在递增tail之前完成对描述符的读取。 */
smp_store_release(buffer->tail,
(tail + 1) & (buffer->size - 1));
}
spin_unlock(&consumer_lock);
这表明CPU在读取新元素之前确保索引是最新的,然后在写入新的尾指针之前应确保CPU已完
成读取该元素,这将擦除该元素。
请注意,使用READ_ONCE()和smp_load_acquire()来读取反向(head)索引。这可以防止编译
器丢弃并重新加载其缓存值。如果您能确定反向(head)索引将仅使用一次,则这不是必须
的。smp_load_acquire()还可以强制CPU对后续的内存引用进行排序。类似地,两种算法都使
用smp_store_release()来写入线程的索引。这记录了我们正在写入可以并发读取的内容的事
实,以防止编译器破坏存储,并强制对以前的访问进行排序。
延伸阅读
========
关于Linux的内存屏障设施的描述,请查看Documentation/memory-barriers.txt。
.. SPDX-License-Identifier: GPL-2.0+
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/core-api/generic-radix-tree.rst
:翻译:
周彬彬 Binbin Zhou <zhoubinbin@loongson.cn>
===================
通用基数树/稀疏数组
===================
通用基数树/稀疏数组的相关内容请见include/linux/generic-radix-tree.h文件中的
“DOC: Generic radix trees/sparse arrays”。
通用基数树函数
--------------
该API在以下内核代码中:
include/linux/generic-radix-tree.h
.. SPDX-License-Identifier: GPL-2.0+
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/core-api/idr.rst
:翻译:
周彬彬 Binbin Zhou <zhoubinbin@loongson.cn>
:校译:
司延腾 Yanteng Si <siyanteng@loongson.cn>
吴想成 Wu Xiangcheng <bobwxc@email.cn>
时奎亮 Alex Shi <alexs@kernel.org>
======
ID分配
======
:作者: Matthew Wilcox
概述
====
要解决的一个常见问题是分配标识符(IDs);它通常是标识事物的数字。比如包括文件描述
符、进程ID、网络协议中的数据包标识符、SCSI标记和设备实例编号。IDR和IDA为这个问题
提供了一个合理的解决方案,以避免每个人都自创。IDR提供将ID映射到指针的能力,而IDA
仅提供ID分配,因此内存效率更高。
IDR接口已经被废弃,请使用 ``XArray`` 。
IDR的用法
=========
首先初始化一个IDR,对于静态分配的IDR使用DEFINE_IDR(),或者对于动态分配的IDR使用
idr_init()。
您可以调用idr_alloc()来分配一个未使用的ID。通过调用idr_find()查询与该ID相关的指针,
并通过调用idr_remove()释放该ID。
如果需要更改与一个ID相关联的指针,可以调用idr_replace()。这样做的一个常见原因是通
过将 ``NULL`` 指针传递给分配函数来保留ID;用保留的ID初始化对象,最后将初始化的对
象插入IDR。
一些用户需要分配大于 ``INT_MAX`` 的ID。到目前为止,所有这些用户都满足 ``UINT_MAX``
的限制,他们使用idr_alloc_u32()。如果您需要超出u32的ID,我们将与您合作以满足您的
需求。
如果需要按顺序分配ID,可以使用idr_alloc_cyclic()。处理较大数量的ID时,IDR的效率会
降低,所以使用这个函数会有一点代价。
要对IDR使用的所有指针进行操作,您可以使用基于回调的idr_for_each()或迭代器样式的
idr_for_each_entry()。您可能需要使用idr_for_each_entry_continue()来继续迭代。如果
迭代器不符合您的需求,您也可以使用idr_get_next()。
当使用完IDR后,您可以调用idr_destroy()来释放IDR占用的内存。这并不会释放IDR指向的
对象;如果您想这样做,请使用其中一个迭代器来执行此操作。
您可以使用idr_is_empty()来查看当前是否分配了任何ID。
如果在从IDR分配一个新ID时需要带锁,您可能需要传递一组限制性的GFP标志,但这可能导
致IDR无法分配内存。为了解决该问题,您可以在获取锁之前调用idr_preload(),然后在分
配之后调用idr_preload_end()。
IDR同步的相关内容请见include/linux/idr.h文件中的“DOC: idr sync”。
IDA的用法
=========
IDA的用法的相关内容请见lib/idr.c文件中的“DOC: IDA description”。
函数和数据结构
==============
该API在以下内核代码中:
include/linux/idr.h
lib/idr.c
......@@ -44,15 +44,15 @@
assoc_array
xarray
rbtree
idr
circular-buffers
generic-radix-tree
packing
Todolist:
idr
circular-buffers
generic-radix-tree
packing
this_cpu_ops
timekeeping
errseq
......
.. SPDX-License-Identifier: GPL-2.0+
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/core-api/packing.rst
:翻译:
周彬彬 Binbin Zhou <zhoubinbin@loongson.cn>
:校译:
司延腾 Yanteng Si <siyanteng@loongson.cn>
吴想成 Wu Xiangcheng <bobwxc@email.cn>
时奎亮 Alex Shi <alexs@kernel.org>
========================
通用的位域打包和解包函数
========================
问题陈述
--------
使用硬件时,必须在几种与其交互的方法之间进行选择。
可以将指针映射到在硬件设备的内存区上精心设计的结构体,并将其字段作为结构成员(可
能声明为位域)访问。但是由于CPU和硬件设备之间潜在的字节顺序不匹配,以这种方式编写
代码会降低其可移植性。
此外,必须密切注意将硬件文档中的寄存器定义转换为结构的位域索引。此外,一些硬件
(通常是网络设备)倾向于以违反任何合理字边界(有时甚至是64位)的方式对其寄存器字
段进行分组。这就造成了不得不在结构中定义寄存器字段的“高”和“低”部分的不便。
结构域定义的更可靠的替代方法是通过移动适当数量的位来提取所需的字段。但这仍然不能
防止字节顺序不匹配,除非所有内存访问都是逐字节执行的。此外,代码很容易变得杂乱无
章,同时可能会在所需的许多位移操作中丢失一些高层次的想法。
许多驱动程序采用了位移的方法,然后试图用定制的宏来减少杂乱无章的东西,但更多的时
候,这些宏所采用的捷径依旧妨碍了代码真正的可移植性。
解决方案
--------
该API涉及2个基本操作:
- 将一个CPU可使用的数字打包到内存缓冲区中(具有硬件约束/特殊性)。
- 将内存缓冲区(具有硬件约束/特殊性)解压缩为一个CPU可使用的数字。
该API提供了对所述硬件约束和特殊性以及CPU字节序的抽象,因此这两者之间可能不匹配。
这些API函数的基本单元是u64。从CPU的角度来看,位63总是意味着字节7的位偏移量7,尽管
只是逻辑上的。问题是:我们将这个比特放在内存的什么位置?
以下示例介绍了打包u64字段的内存布局。打包缓冲区中的字节偏移量始终默认为0,1...7。
示例显示的是逻辑字节和位所在的位置。
1. 通常情况下(无特殊性),我们会这样做:
::
63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32
7 6 5 4
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
3 2 1 0
也就是说,CPU可使用的u64的MSByte(7)位于内存偏移量0处,而u64的LSByte(0)位于内存偏移量7处。
这对应于大多数人认为的“大端”,其中位i对应于数字2^i。这在代码注释中也称为“逻辑”符号。
2. 如果设置了QUIRK_MSB_ON_THE_RIGHT,我们按如下方式操作:
::
56 57 58 59 60 61 62 63 48 49 50 51 52 53 54 55 40 41 42 43 44 45 46 47 32 33 34 35 36 37 38 39
7 6 5 4
24 25 26 27 28 29 30 31 16 17 18 19 20 21 22 23 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7
3 2 1 0
也就是说,QUIRK_MSB_ON_THE_RIGHT不会影响字节定位,但会反转字节内的位偏移量。
3. 如果设置了QUIRK_LITTLE_ENDIAN,我们按如下方式操作:
::
39 38 37 36 35 34 33 32 47 46 45 44 43 42 41 40 55 54 53 52 51 50 49 48 63 62 61 60 59 58 57 56
4 5 6 7
7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 23 22 21 20 19 18 17 16 31 30 29 28 27 26 25 24
0 1 2 3
因此,QUIRK_LITTLE_ENDIAN意味着在内存区域内,每个4字节的字的每个字节都被放置在与
该字的边界相比的镜像位置。
4. 如果设置了QUIRK_MSB_ON_THE_RIGHT和QUIRK_LITTLE_ENDIAN,我们这样做:
::
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
4 5 6 7
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
0 1 2 3
5. 如果只设置了QUIRK_LSW32_IS_FIRST,我们这样做:
::
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
3 2 1 0
63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32
7 6 5 4
在这种情况下,8字节内存区域解释如下:前4字节对应最不重要的4字节的字,后4字节对应
更重要的4字节的字。
6. 如果设置了QUIRK_LSW32_IS_FIRST和QUIRK_MSB_ON_THE_RIGHT,我们这样做:
::
24 25 26 27 28 29 30 31 16 17 18 19 20 21 22 23 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7
3 2 1 0
56 57 58 59 60 61 62 63 48 49 50 51 52 53 54 55 40 41 42 43 44 45 46 47 32 33 34 35 36 37 38 39
7 6 5 4
7. 如果设置了QUIRK_LSW32_IS_FIRST和QUIRK_LITTLE_ENDIAN,则如下所示:
::
7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 23 22 21 20 19 18 17 16 31 30 29 28 27 26 25 24
0 1 2 3
39 38 37 36 35 34 33 32 47 46 45 44 43 42 41 40 55 54 53 52 51 50 49 48 63 62 61 60 59 58 57 56
4 5 6 7
8. 如果设置了QUIRK_LSW32_IS_FIRST,QUIRK_LITTLE_ENDIAN和QUIRK_MSB_ON_THE_RIGHT,
则如下所示:
::
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
0 1 2 3
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
4 5 6 7
我们总是认为我们的偏移量好像没有特殊性,然后在访问内存区域之前翻译它们。
预期用途
--------
选择使用该API的驱动程序首先需要确定上述3种quirk组合(共8种)中的哪一种与硬件文档
中描述的相匹配。然后,他们应该封装packing()函数,创建一个新的xxx_packing(),使用
适当的QUIRK_* one-hot 位集合来调用它。
packing()函数返回一个int类型的错误码,以防止程序员使用不正确的API。这些错误预计不
会在运行时发生,因此xxx_packing()返回void并简单地接受这些错误是合理的。它可以选择
转储栈或打印错误描述。
.. SPDX-License-Identifier: GPL-2.0
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/Devicetree/changesets.rst
:翻译:
司延腾 Yanteng Si <siyanteng@loongson.cn>
:校译:
============
设备树变更集
============
设备树变更集是一种方法,它允许人们以这样一种方式在实时树中使用变化,即要么使用全部的
变化,要么不使用。如果在使用变更集的过程中发生错误,那么树将被回滚到之前的状态。一个
变更集也可以在使用后被删除。
当一个变更集被使用时,所有的改变在发出OF_RECONFIG通知器之前被一次性使用到树上。这是
为了让接收者在收到通知时看到一个完整的、一致的树的状态。
一个变化集的顺序如下。
1. of_changeset_init() - 初始化一个变更集。
2. 一些DT树变化的调用,of_changeset_attach_node(), of_changeset_detach_node(),
of_changeset_add_property(), of_changeset_remove_property,
of_changeset_update_property()来准备一组变更。此时不会对活动树做任何变更。所有
的变更操作都记录在of_changeset的 `entries` 列表中。
3. of_changeset_apply() - 将变更使用到树上。要么整个变更集被使用,要么如果有错误,
树会被恢复到之前的状态。核心通过锁确保正确的顺序。如果需要的话,可以使用一个解锁的
__of_changeset_apply版本。
如果一个成功使用的变更集需要被删除,可以用of_changeset_revert()来完成。
.. SPDX-License-Identifier: GPL-2.0
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/Devicetree/dynamic-resolution-notes.rst
:翻译:
司延腾 Yanteng Si <siyanteng@loongson.cn>
:校译:
========================
Devicetree动态解析器说明
========================
本文描述了内核内DeviceTree解析器的实现,它位于drivers/of/resolver.c中。
解析器如何工作?
----------------
解析器被赋予一个任意的树作为输入,该树用适当的dtc选项编译,并有一个/plugin/标签。这就产
生了适当的__fixups__和__local_fixups__节点。
解析器依次通过以下步骤工作:
1. 从实时树中获取最大的设备树phandle值 + 1.
2. 调整树的所有本地 phandles,以解决这个量。
3. 使用 __local__fixups__ 节点信息以相同的量调整所有本地引用。
4. 对于__fixups__节点中的每个属性,找到它在实时树中引用的节点。这是用来标记该节点的标签。
5. 检索fixup的目标的phandle。
6. 对于属性中的每个fixup,找到节点:属性:偏移的位置,并用phandle值替换它。
......@@ -24,21 +24,16 @@ Open Firmware 和 Devicetree
usage-model
of_unittest
Todolist:
* kernel-api
kernel-api
Devicetree Overlays
===================
.. toctree::
:maxdepth: 1
Todolist:
* changesets
* dynamic-resolution-notes
* overlay-notes
changesets
dynamic-resolution-notes
overlay-notes
Devicetree Bindings
===================
......
.. SPDX-License-Identifier: GPL-2.0
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/Devicetree/kernel-api.rst
:翻译:
司延腾 Yanteng Si <siyanteng@loongson.cn>
:校译:
=================
内核中的设备树API
=================
核心函数
--------
该API在以下内核代码中:
drivers/of/base.c
include/linux/of.h
drivers/of/property.c
include/linux/of_graph.h
drivers/of/address.c
drivers/of/irq.c
drivers/of/fdt.c
驱动模型函数
------------
该API在以下内核代码中:
include/linux/of_device.h
drivers/of/device.c
include/linux/of_platform.h
drivers/of/platform.c
覆盖和动态DT函数
----------------
该API在以下内核代码中:
drivers/of/resolver.c
drivers/of/dynamic.c
drivers/of/overlay.c
.. SPDX-License-Identifier: GPL-2.0
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/Devicetree/overlay-notes.rst
:翻译:
司延腾 Yanteng Si <siyanteng@loongson.cn>
:校译:
==============
设备树覆盖说明
==============
本文档描述了drivers/of/overlay.c中的内核内设备树覆盖功能的实现,是
Documentation/devicetree/dynamic-resolution-notes.rst[1]的配套文档。
覆盖如何工作
------------
设备树覆盖的目的是修改内核的实时树,并使修改以反映变化的方式影响内核的状态。
由于内核主要处理的是设备,任何新的设备节点如果导致一个活动的设备,就应该创建它,
而如果设备节点被禁用或被全部删除,受影响的设备应该被取消注册。
让我们举个例子,我们有一个foo板,它的基本树形图如下::
---- foo.dts ---------------------------------------------------------------
/* FOO平台 */
/dts-v1/;
/ {
compatible = "corp,foo";
/* 共享的资源 */
res: res {
};
/* 芯片上的外围设备 */
ocp: ocp {
/* 总是被实例化的外围设备 */
peripheral1 { ... };
};
};
---- foo.dts ---------------------------------------------------------------
覆盖bar.dts,
::
---- bar.dts - 按标签覆盖目标位置 ----------------------------
/dts-v1/;
/插件/;
&ocp {
/* bar外围 */
bar {
compatible = "corp,bar";
... /* 各种属性和子节点 */
};
};
---- bar.dts ---------------------------------------------------------------
当加载(并按照[1]中描述的方式解决)时,应该产生foo+bar.dts::
---- foo+bar.dts -----------------------------------------------------------
/* FOO平台 + bar外围 */
/ {
compatible = "corp,foo";
/* 共享资源 */
res: res {
};
/* 芯片上的外围设备 */
ocp: ocp {
/* 总是被实例化的外围设备 */
peripheral1 { ... };
/* bar外围 */
bar {
compatible = "corp,bar";
... /* 各种属性和子节点 */
};
};
};
---- foo+bar.dts -----------------------------------------------------------
作为覆盖的结果,已经创建了一个新的设备节点(bar),因此将注册一个bar平台设备,
如果加载了匹配的设备驱动程序,将按预期创建设备。
如果基础DT不是用-@选项编译的,那么“&ocp”标签将不能用于将覆盖节点解析到基础
DT中的适当位置。在这种情况下,可以提供目标路径。通过标签的目标位置的语法是比
较好的,因为不管标签在DT中出现在哪里,覆盖都可以被应用到任何包含标签的基础DT上。
上面的bar.dts例子被修改为使用目标路径语法,即为::
---- bar.dts - 通过明确的路径覆盖目标位置 --------------------
/dts-v1/;
/插件/;
&{/ocp} {
/* bar外围 */
bar {
compatible = "corp,bar";
... /* 各种外围设备和子节点 */
}
};
---- bar.dts ---------------------------------------------------------------
内核中关于覆盖的API
-------------------
该API相当容易使用。
1) 调用of_overlay_fdt_apply()来创建和应用一个覆盖的变更集。返回值是一个
错误或一个识别这个覆盖的cookie。
2) 调用of_overlay_remove()来删除和清理先前通过调用of_overlay_fdt_apply()
而创建的覆盖变更集。不允许删除一个被另一个覆盖的覆盖变化集。
最后,如果你需要一次性删除所有的覆盖,只需调用of_overlay_remove_all(),
它将以正确的顺序删除每一个覆盖。
你可以选择注册在覆盖操作中被调用的通知器。详见
of_overlay_notifier_register/unregister和enum of_overlay_notify_action。
OF_OVERLAY_PRE_APPLY、OF_OVERLAY_POST_APPLY或OF_OVERLAY_PRE_REMOVE
的通知器回调可以存储指向覆盖层中的设备树节点或其内容的指针,但这些指针不能持
续到OF_OVERLAY_POST_REMOVE的通知器回调。在OF_OVERLAY_POST_REMOVE通
知器被调用后,包含覆盖层的内存将被kfree()ed。请注意,即使OF_OVERLAY_POST_REMOVE
的通知器返回错误,内存也会被kfree()ed。
drivers/of/dynamic.c中的变更集通知器是第二种类型的通知器,可以通过应用或移除
覆盖层来触发。这些通知器不允许在覆盖层或其内容中存储指向设备树节点的指针。当包含
覆盖层的内存因移除覆盖层而被释放时,覆盖层代码并不能防止这类指针仍然有效。
任何其他保留指向覆盖层节点或数据的指针的代码都被认为是一个错误,因为在移除覆盖层
后,该指针将指向已释放的内存。
覆盖层的用户必须特别注意系统上发生的整体操作,以确保其他内核代码不保留任何指向覆
盖层节点或数据的指针。任何无意中使用这种指针的例子是,如果一个驱动或子系统模块在
应用了覆盖后被加载,并且该驱动或子系统扫描了整个设备树或其大部分,包括覆盖节点。
.. SPDX-License-Identifier: GPL-2.0
.. include:: ../../disclaimer-zh_CN.rst
:Original: Documentation/driver-api/gpio/index.rst
:翻译:
司延腾 Yanteng Si <siyanteng@loongson.cn>
:校译:
=======================
通用型输入/输出(GPIO)
=======================
目录:
.. toctree::
:maxdepth: 2
legacy
Todolist:
* intro
* using-gpio
* driver
* consumer
* board
* drivers-on-gpio
* bt8xxgpio
核心
====
该API在以下内核代码中:
include/linux/gpio/driver.h
drivers/gpio/gpiolib.c
ACPI支持
========
该API在以下内核代码中:
drivers/gpio/gpiolib-acpi.c
设备树支持
==========
该API在以下内核代码中:
drivers/gpio/gpiolib-of.c
设备管理支持
============
该API在以下内核代码中:
drivers/gpio/gpiolib-devres.c
sysfs帮助(函数)
=================
该API在以下内核代码中:
drivers/gpio/gpiolib-sysfs.c
.. SPDX-License-Identifier: GPL-2.0
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/driver-api/index.rst
:翻译:
司延腾 Yanteng Si <siyanteng@loongson.cn>
:校译:
========================
Linux驱动实现者的API指南
========================
内核提供了各种各样的接口来支持设备驱动的开发。这份文档只是对其中一些接口进行了
一定程度的整理——希望随着时间的推移,它能变得更好!可用的小节可以在下面看到。
.. class:: toc-title
目录列表:
.. toctree::
:maxdepth: 2
gpio/index
io_ordering
Todolist:
* driver-model/index
* basics
* infrastructure
* ioctl
* early-userspace/index
* pm/index
* clk
* device-io
* dma-buf
* device_link
* component
* message-based
* infiniband
* aperture
* frame-buffer
* regulator
* reset
* iio/index
* input
* usb/index
* firewire
* pci/index
* cxl/index
* spi
* i2c
* ipmb
* ipmi
* i3c/index
* interconnect
* devfreq
* hsi
* edac
* scsi
* libata
* target
* mailbox
* mtdnand
* miscellaneous
* mei/index
* mtd/index
* mmc/index
* nvdimm/index
* w1
* rapidio/index
* s390-drivers
* vme
* 80211/index
* uio-howto
* firmware/index
* pin-control
* md/index
* media/index
* misc_devices
* nfc/index
* dmaengine/index
* slimbus
* soundwire/index
* thermal/index
* fpga/index
* acpi/index
* auxiliary_bus
* backlight/lp855x-driver.rst
* connector
* console
* dcdbas
* eisa
* isa
* isapnp
* io-mapping
* generic-counter
* memory-devices/index
* men-chameleon-bus
* ntb
* nvmem
* parport-lowlevel
* pps
* ptp
* phy/index
* pwm
* pldmfw/index
* rfkill
* serial/index
* sm501
* surface_aggregator/index
* switchtec
* sync_file
* tty/index
* vfio-mediated-device
* vfio
* vfio-pci-device-specific-driver-acceptance
* xilinx/index
* xillybus
* zorro
* hte/index
.. only:: subproject and html
Indices
=======
* :ref:`genindex`
.. SPDX-License-Identifier: GPL-2.0
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/driver-api/io_ordering.rst
:翻译:
林永听 Lin Yongting <linyongting@gmail.com>
司延腾 Yanteng Si <siyanteng@loongson.cn>
:校译:
===========================
对内存映射地址的I/O写入排序
===========================
在某些平台上,所谓的内存映射I/O是弱顺序。在这些平台上,驱动开发者有责任
保证I/O内存映射地址的写操作按程序图意的顺序达到设备。通常读取一个“安全”
设备寄存器或桥寄存器,触发IO芯片清刷未处理的写操作到达设备后才处理读操作,
而达到保证目的。驱动程序通常在spinlock保护的临界区退出之前使用这种技术。
这也可以保证后面的写操作只在前面的写操作之后到达设备(这非常类似于内存
屏障操作,mb(),不过仅适用于I/O)。
假设一个设备驱动程的具体例子::
...
CPU A: spin_lock_irqsave(&dev_lock, flags)
CPU A: val = readl(my_status);
CPU A: ...
CPU A: writel(newval, ring_ptr);
CPU A: spin_unlock_irqrestore(&dev_lock, flags)
...
CPU B: spin_lock_irqsave(&dev_lock, flags)
CPU B: val = readl(my_status);
CPU B: ...
CPU B: writel(newval2, ring_ptr);
CPU B: spin_unlock_irqrestore(&dev_lock, flags)
...
上述例子中,设备可能会先接收到newval2的值,然后接收到newval的值,问题就
发生了。不过很容易通过下面方法来修复::
...
CPU A: spin_lock_irqsave(&dev_lock, flags)
CPU A: val = readl(my_status);
CPU A: ...
CPU A: writel(newval, ring_ptr);
CPU A: (void)readl(safe_register); /* 配置寄存器?*/
CPU A: spin_unlock_irqrestore(&dev_lock, flags)
...
CPU B: spin_lock_irqsave(&dev_lock, flags)
CPU B: val = readl(my_status);
CPU B: ...
CPU B: writel(newval2, ring_ptr);
CPU B: (void)readl(safe_register); /* 配置寄存器?*/
CPU B: spin_unlock_irqrestore(&dev_lock, flags)
在解决方案中,读取safe_register寄存器,触发IO芯片清刷未处理的写操作,
再处理后面的读操作,防止引发数据不一致问题。
......@@ -108,6 +108,7 @@ TODOList:
:maxdepth: 2
core-api/index
driver-api/index
locking/index
accounting/index
cpu-freq/index
......@@ -120,10 +121,10 @@ TODOList:
scheduler/index
mm/index
peci/index
PCI/index
TODOList:
* driver-api/index
* block/index
* cdrom/index
* ide/index
......@@ -148,7 +149,6 @@ TODOList:
* crypto/index
* bpf/index
* usb/index
* PCI/index
* scsi/index
* misc-devices/index
* mhi/index
......
Chinese translated version of Documentation/driver-api/io_ordering.rst
If you have any comment or update to the content, please contact the
original document maintainer directly. However, if you have a problem
communicating in English you can also ask the Chinese maintainer for
help. Contact the Chinese maintainer if this translation is outdated
or if there is a problem with the translation.
Chinese maintainer: Lin Yongting <linyongting@gmail.com>
---------------------------------------------------------------------
Documentation/driver-api/io_ordering.rst 的中文翻译
如果想评论或更新本文的内容,请直接联系原文档的维护者。如果你使用英文
交流有困难的话,也可以向中文版维护者求助。如果本翻译更新不及时或者翻
译存在问题,请联系中文版维护者。
中文版维护者: 林永听 Lin Yongting <linyongting@gmail.com>
中文版翻译者: 林永听 Lin Yongting <linyongting@gmail.com>
中文版校译者: 林永听 Lin Yongting <linyongting@gmail.com>
以下为正文
---------------------------------------------------------------------
在某些平台上,所谓的内存映射I/O是弱顺序。在这些平台上,驱动开发者有责任
保证I/O内存映射地址的写操作按程序图意的顺序达到设备。通常读取一个“安全”
设备寄存器或桥寄存器,触发IO芯片清刷未处理的写操作到达设备后才处理读操作,
而达到保证目的。驱动程序通常在spinlock保护的临界区退出之前使用这种技术。
这也可以保证后面的写操作只在前面的写操作之后到达设备(这非常类似于内存
屏障操作,mb(),不过仅适用于I/O)。
假设一个设备驱动程的具体例子:
...
CPU A: spin_lock_irqsave(&dev_lock, flags)
CPU A: val = readl(my_status);
CPU A: ...
CPU A: writel(newval, ring_ptr);
CPU A: spin_unlock_irqrestore(&dev_lock, flags)
...
CPU B: spin_lock_irqsave(&dev_lock, flags)
CPU B: val = readl(my_status);
CPU B: ...
CPU B: writel(newval2, ring_ptr);
CPU B: spin_unlock_irqrestore(&dev_lock, flags)
...
上述例子中,设备可能会先接收到newval2的值,然后接收到newval的值,问题就
发生了。不过很容易通过下面方法来修复:
...
CPU A: spin_lock_irqsave(&dev_lock, flags)
CPU A: val = readl(my_status);
CPU A: ...
CPU A: writel(newval, ring_ptr);
CPU A: (void)readl(safe_register); /* 配置寄存器?*/
CPU A: spin_unlock_irqrestore(&dev_lock, flags)
...
CPU B: spin_lock_irqsave(&dev_lock, flags)
CPU B: val = readl(my_status);
CPU B: ...
CPU B: writel(newval2, ring_ptr);
CPU B: (void)readl(safe_register); /* 配置寄存器?*/
CPU B: spin_unlock_irqrestore(&dev_lock, flags)
在解决方案中,读取safe_register寄存器,触发IO芯片清刷未处理的写操作,
再处理后面的读操作,防止引发数据不一致问题。
This diff is collapsed.
......@@ -80,7 +80,7 @@ p->se.vruntime。一旦p->se.vruntime变得足够大,其它的任务将成为
CFS使用纳秒粒度的计时,不依赖于任何jiffies或HZ的细节。因此CFS并不像之前的调度器那样
有“时间片”的概念,也没有任何启发式的设计。唯一可调的参数(你需要打开CONFIG_SCHED_DEBUG)是:
/proc/sys/kernel/sched_min_granularity_ns
/sys/kernel/debug/sched/min_granularity_ns
它可以用来将调度器从“桌面”模式(也就是低时延)调节为“服务器”(也就是高批处理)模式。
它的默认设置是适合桌面的工作负载。SCHED_BATCH也被CFS调度器模块处理。
......
This diff is collapsed.
......@@ -377,7 +377,7 @@ Emulating cr0.wp
================
If tdp is not enabled, the host must keep cr0.wp=1 so page write protection
works for the guest kernel, not guest guest userspace. When the guest
works for the guest kernel, not guest userspace. When the guest
cr0.wp=1, this does not present a problem. However when the guest cr0.wp=0,
we cannot map the permissions for gpte.u=1, gpte.w=0 to any spte (the
semantics require allowing any guest kernel access plus user read access).
......
......@@ -52,7 +52,7 @@ Notes and limitations.
clear the entire bulk in buffer. It would be possible to read the
maximum buffer size to not run into this error condition, only extra
bytes in the buffer is a logic error in the driver. The code should
should match reads and writes as well as data sizes. Reads and
match reads and writes as well as data sizes. Reads and
writes are serialized and the status verifies that the chip is idle
(and data is available) before the read is executed, so it should
not happen.
......
......@@ -113,7 +113,7 @@ generally only make sense when searching is disabled, as a search will
redetect manually removed devices that are present and timeout manually
added devices that aren't on the bus.
Bus searches occur at an interval, specified as a summ of timeout and
Bus searches occur at an interval, specified as a sum of timeout and
timeout_us module parameters (either of which may be 0) for as long as
w1_master_search remains greater than 0 or is -1. Each search attempt
decrements w1_master_search by 1 (down to 0) and increments
......
......@@ -33,8 +33,8 @@ Some of these entries are:
- interrupt: An array of entries. Every IDT vector that doesn't
explicitly point somewhere else gets set to the corresponding
value in interrupts. These point to a whole array of
magically-generated functions that make their way to do_IRQ with
the interrupt number as a parameter.
magically-generated functions that make their way to common_interrupt()
with the interrupt number as a parameter.
- APIC interrupts: Various special-purpose interrupts for things
like TLB shootdown.
......
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment