Commit aad26f55 authored by Linus Torvalds's avatar Linus Torvalds

Merge tag 'docs-6.0' of git://git.lwn.net/linux

Pull documentation updates from Jonathan Corbet:
 "This was a moderately busy cycle for documentation, but nothing
  all that earth-shaking:

   - More Chinese translations, and an update to the Italian
     translations.

     The Japanese, Korean, and traditional Chinese translations
     are more-or-less unmaintained at this point, instead.

   - Some build-system performance improvements.

   - The removal of the archaic submitting-drivers.rst document,
     with the movement of what useful material that remained into
     other docs.

   - Improvements to sphinx-pre-install to, hopefully, give more
     useful suggestions.

   - A number of build-warning fixes

  Plus the usual collection of typo fixes, updates, and more"

* tag 'docs-6.0' of git://git.lwn.net/linux: (92 commits)
  docs: efi-stub: Fix paths for x86 / arm stubs
  Docs/zh_CN: Update the translation of sched-stats to 5.19-rc8
  Docs/zh_CN: Update the translation of pci to 5.19-rc8
  Docs/zh_CN: Update the translation of pci-iov-howto to 5.19-rc8
  Docs/zh_CN: Update the translation of usage to 5.19-rc8
  Docs/zh_CN: Update the translation of testing-overview to 5.19-rc8
  Docs/zh_CN: Update the translation of sparse to 5.19-rc8
  Docs/zh_CN: Update the translation of kasan to 5.19-rc8
  Docs/zh_CN: Update the translation of iio_configfs to 5.19-rc8
  doc:it_IT: align Italian documentation
  docs: Remove spurious tag from admin-guide/mm/overcommit-accounting.rst
  Documentation: process: Update email client instructions for Thunderbird
  docs: ABI: correct QEMU fw_cfg spec path
  doc/zh_CN: remove submitting-driver reference from docs
  docs: zh_TW: align to submitting-drivers removal
  docs: zh_CN: align to submitting-drivers removal
  docs: ko_KR: howto: remove reference to removed submitting-drivers
  docs: ja_JP: howto: remove reference to removed submitting-drivers
  docs: it_IT: align to submitting-drivers removal
  docs: process: remove outdated submitting-drivers.rst
  ...
parents b0691222 339170d8
......@@ -12,8 +12,9 @@ Description:
configuration data to the guest userspace.
The authoritative guest-side hardware interface documentation
to the fw_cfg device can be found in "docs/specs/fw_cfg.txt"
in the QEMU source tree.
to the fw_cfg device can be found in "docs/specs/fw_cfg.rst"
in the QEMU source tree, or online at:
https://qemu-project.gitlab.io/qemu/specs/fw_cfg.html
**SysFS fw_cfg Interface**
......
config WARN_MISSING_DOCUMENTS
bool "Warn if there's a missing documentation file"
depends on COMPILE_TEST
help
......
......@@ -7,10 +7,9 @@ This list is the Linux Device List, the official registry of allocated
device numbers and ``/dev`` directory nodes for the Linux operating
system.
The LaTeX version of this document is no longer maintained, nor is
the document that used to reside at lanana.org. This version in the
mainline Linux kernel is the master document. Updates shall be sent
as patches to the kernel maintainers (see the
The version of this document at lanana.org is no longer maintained. This
version in the mainline Linux kernel is the master document. Updates
shall be sent as patches to the kernel maintainers (see the
:ref:`Documentation/process/submitting-patches.rst <submittingpatches>` document).
Specifically explore the sections titled "CHAR and MISC DRIVERS", and
"BLOCK LAYER" in the MAINTAINERS file to find the right maintainers
......
......@@ -7,10 +7,10 @@ as a PE/COFF image, thereby convincing EFI firmware loaders to load
it as an EFI executable. The code that modifies the bzImage header,
along with the EFI-specific entry point that the firmware loader
jumps to are collectively known as the "EFI boot stub", and live in
arch/x86/boot/header.S and arch/x86/boot/compressed/eboot.c,
arch/x86/boot/header.S and drivers/firmware/efi/libstub/x86-stub.c,
respectively. For ARM the EFI stub is implemented in
arch/arm/boot/compressed/efi-header.S and
arch/arm/boot/compressed/efi-stub.c. EFI stub code that is shared
drivers/firmware/efi/libstub/arm32-stub.c. EFI stub code that is shared
between architectures is in drivers/firmware/efi/libstub.
For arm64, there is no compressed kernel support, so the Image itself
......
......@@ -3109,7 +3109,7 @@
mem_encrypt=on: Activate SME
mem_encrypt=off: Do not activate SME
Refer to Documentation/virt/kvm/amd-memory-encryption.rst
Refer to Documentation/virt/kvm/x86/amd-memory-encryption.rst
for details on when memory encryption can be activated.
mem_sleep_default= [SUSPEND] Default system suspend mode:
......
......@@ -38,8 +38,8 @@ acct
If BSD-style process accounting is enabled these values control
its behaviour. If free space on filesystem where the log lives
goes below ``lowwater``% accounting suspends. If free space gets
above ``highwater``% accounting resumes. ``frequency`` determines
goes below ``lowwater``\ % accounting suspends. If free space gets
above ``highwater``\ % accounting resumes. ``frequency`` determines
how often do we check the amount of free space (value is in
seconds). Default:
......
......@@ -171,96 +171,73 @@ HWCAP_PACG
Documentation/arm64/pointer-authentication.rst.
HWCAP2_DCPODP
Functionality implied by ID_AA64ISAR1_EL1.DPB == 0b0010.
HWCAP2_SVE2
Functionality implied by ID_AA64ZFR0_EL1.SVEVer == 0b0001.
HWCAP2_SVEAES
Functionality implied by ID_AA64ZFR0_EL1.AES == 0b0001.
HWCAP2_SVEPMULL
Functionality implied by ID_AA64ZFR0_EL1.AES == 0b0010.
HWCAP2_SVEBITPERM
Functionality implied by ID_AA64ZFR0_EL1.BitPerm == 0b0001.
HWCAP2_SVESHA3
Functionality implied by ID_AA64ZFR0_EL1.SHA3 == 0b0001.
HWCAP2_SVESM4
Functionality implied by ID_AA64ZFR0_EL1.SM4 == 0b0001.
HWCAP2_FLAGM2
Functionality implied by ID_AA64ISAR0_EL1.TS == 0b0010.
HWCAP2_FRINT
Functionality implied by ID_AA64ISAR1_EL1.FRINTTS == 0b0001.
HWCAP2_SVEI8MM
Functionality implied by ID_AA64ZFR0_EL1.I8MM == 0b0001.
HWCAP2_SVEF32MM
Functionality implied by ID_AA64ZFR0_EL1.F32MM == 0b0001.
HWCAP2_SVEF64MM
Functionality implied by ID_AA64ZFR0_EL1.F64MM == 0b0001.
HWCAP2_SVEBF16
Functionality implied by ID_AA64ZFR0_EL1.BF16 == 0b0001.
HWCAP2_I8MM
Functionality implied by ID_AA64ISAR1_EL1.I8MM == 0b0001.
HWCAP2_BF16
Functionality implied by ID_AA64ISAR1_EL1.BF16 == 0b0001.
HWCAP2_DGH
Functionality implied by ID_AA64ISAR1_EL1.DGH == 0b0001.
HWCAP2_RNG
Functionality implied by ID_AA64ISAR0_EL1.RNDR == 0b0001.
HWCAP2_BTI
Functionality implied by ID_AA64PFR0_EL1.BT == 0b0001.
HWCAP2_MTE
Functionality implied by ID_AA64PFR1_EL1.MTE == 0b0010, as described
by Documentation/arm64/memory-tagging-extension.rst.
HWCAP2_ECV
Functionality implied by ID_AA64MMFR0_EL1.ECV == 0b0001.
HWCAP2_AFP
Functionality implied by ID_AA64MFR1_EL1.AFP == 0b0001.
HWCAP2_RPRES
Functionality implied by ID_AA64ISAR2_EL1.RPRES == 0b0001.
HWCAP2_MTE3
Functionality implied by ID_AA64PFR1_EL1.MTE == 0b0011, as described
by Documentation/arm64/memory-tagging-extension.rst.
......
......@@ -66,7 +66,7 @@ The wiki documentation always refers to the linux-next version of the script.
For Semantic Patch Language(SmPL) grammar documentation refer to:
http://coccinelle.lip6.fr/documentation.php
https://coccinelle.gitlabpages.inria.fr/website/docs/main_grammar.html
Using Coccinelle on the Linux kernel
------------------------------------
......
......@@ -208,6 +208,14 @@ In general, the rules for selftests are
Contributing new tests (details)
================================
* In your Makefile, use facilities from lib.mk by including it instead of
reinventing the wheel. Specify flags and binaries generation flags on
need basis before including lib.mk. ::
CFLAGS = $(KHDR_INCLUDES)
TEST_GEN_PROGS := close_range_test
include ../lib.mk
* Use TEST_GEN_XXX if such binaries or files are generated during
compiling.
......@@ -230,13 +238,30 @@ Contributing new tests (details)
* First use the headers inside the kernel source and/or git repo, and then the
system headers. Headers for the kernel release as opposed to headers
installed by the distro on the system should be the primary focus to be able
to find regressions.
to find regressions. Use KHDR_INCLUDES in Makefile to include headers from
the kernel source.
* If a test needs specific kernel config options enabled, add a config file in
the test directory to enable them.
e.g: tools/testing/selftests/android/config
* Create a .gitignore file inside test directory and add all generated objects
in it.
* Add new test name in TARGETS in selftests/Makefile::
TARGETS += android
* All changes should pass::
kselftest-{all,install,clean,gen_tar}
kselftest-{all,install,clean,gen_tar} O=abo_path
kselftest-{all,install,clean,gen_tar} O=rel_path
make -C tools/testing/selftests {all,install,clean,gen_tar}
make -C tools/testing/selftests {all,install,clean,gen_tar} O=abs_path
make -C tools/testing/selftests {all,install,clean,gen_tar} O=rel_path
Test Module
===========
......
......@@ -2,7 +2,7 @@
This module is part of the DA9061/DA9062/DA9063. For more details about entire
DA9062 and DA9061 chips see Documentation/devicetree/bindings/mfd/da9062.txt
For DA9063 see Documentation/devicetree/bindings/mfd/da9063.txt
For DA9063 see Documentation/devicetree/bindings/mfd/dlg,da9063.yaml
This module provides the KEY_POWER event.
......
.. title:: Kernel-doc comments
===========================
Writing kernel-doc comments
===========================
......
......@@ -132,7 +132,8 @@ format-specific subdirectories under ``Documentation/output``.
To generate documentation, Sphinx (``sphinx-build``) must obviously be
installed. For prettier HTML output, the Read the Docs Sphinx theme
(``sphinx_rtd_theme``) is used if available. For PDF output you'll also need
``XeLaTeX`` and ``convert(1)`` from ImageMagick (https://www.imagemagick.org).
``XeLaTeX`` and ``convert(1)`` from ImageMagick
(https://www.imagemagick.org).\ [#ink]_
All of these are widely available and packaged in distributions.
To pass extra options to Sphinx, you can use the ``SPHINXOPTS`` make
......@@ -150,8 +151,19 @@ If the theme is not available, it will fall-back to the classic one.
The Sphinx theme can be overridden by using the ``DOCS_THEME`` make variable.
There is another make variable ``SPHINXDIRS``, which is useful when test
building a subset of documentation. For example, you can build documents
under ``Documentation/doc-guide`` by running
``make SPHINXDIRS=doc-guide htmldocs``.
The documentation section of ``make help`` will show you the list of
subdirectories you can specify.
To remove the generated documentation, run ``make cleandocs``.
.. [#ink] Having ``inkscape(1)`` from Inkscape (https://inkscape.org)
as well would improve the quality of images embedded in PDF
documents, especially for kernel releases 5.18 and later.
Writing Documentation
=====================
......
......@@ -114,7 +114,7 @@ For a function using multiple GPIOs all of those can be obtained with one call::
This function returns a struct gpio_descs which contains an array of
descriptors. It also contains a pointer to a gpiolib private structure which,
if passed back to get/set array functions, may speed up I/O proocessing::
if passed back to get/set array functions, may speed up I/O processing::
struct gpio_descs {
struct gpio_array *info;
......
......@@ -119,7 +119,7 @@ GPIO lines with debounce support
Debouncing is a configuration set to a pin indicating that it is connected to
a mechanical switch or button, or similar that may bounce. Bouncing means the
line is pulled high/low quickly at very short intervals for mechanical
reasons. This can result in the value being unstable or irqs fireing repeatedly
reasons. This can result in the value being unstable or irqs firing repeatedly
unless the line is debounced.
Debouncing in practice involves setting up a timer when something happens on
......@@ -219,7 +219,7 @@ use a trick: when a line is set as output, if the line is flagged as open
drain, and the IN output value is low, it will be driven low as usual. But
if the IN output value is set to high, it will instead *NOT* be driven high,
instead it will be switched to input, as input mode is high impedance, thus
achieveing an "open drain emulation" of sorts: electrically the behaviour will
achieving an "open drain emulation" of sorts: electrically the behaviour will
be identical, with the exception of possible hardware glitches when switching
the mode of the line.
......@@ -642,7 +642,7 @@ In this case the typical set-up will look like this:
As you can see pretty similar, but you do not supply a parent handler for
the IRQ, instead a parent irqdomain, an fwnode for the hardware and
a funcion .child_to_parent_hwirq() that has the purpose of looking up
a function .child_to_parent_hwirq() that has the purpose of looking up
the parent hardware irq from a child (i.e. this gpio chip) hardware irq.
As always it is good to look at examples in the kernel tree for advice
on how to find the required pieces.
......
......@@ -44,7 +44,7 @@ These devices will appear on the system as ``/dev/gpiochip0`` thru
found in the kernel tree ``tools/gpio`` subdirectory.
For structured and managed applications, we recommend that you make use of the
libgpiod_ library. This provides helper abstractions, command line utlities
libgpiod_ library. This provides helper abstractions, command line utilities
and arbitration for multiple simultaneous consumers on the same GPIO chip.
.. _libgpiod: https://git.kernel.org/pub/scm/libs/libgpiod/libgpiod.git/
......@@ -25,8 +25,7 @@ and userspace consumers. The kernel space consumers can directly talk to HTE
subsystem while userspace consumers timestamp requests go through GPIOLIB CDEV
framework to HTE subsystem.
.. kernel-doc:: drivers/gpio/gpiolib.c
:functions: gpiod_enable_hw_timestamp_ns gpiod_disable_hw_timestamp_ns
See gpiod_enable_hw_timestamp_ns() and gpiod_disable_hw_timestamp_ns().
For userspace consumers, GPIO_V2_LINE_FLAG_EVENT_CLOCK_HTE flag must be
specified during IOCTL calls. Refer to ``tools/gpio/gpio-event-mon.c``, which
......@@ -37,7 +36,7 @@ LIC (Legacy Interrupt Controller) IRQ GTE
This GTE instance timestamps LIC IRQ lines in real time. There are 352 IRQ
lines which this instance can add timestamps to in real time. The hte
devicetree binding described at ``Documentation/devicetree/bindings/hte/``
devicetree binding described at ``Documentation/devicetree/bindings/timestamp``
provides an example of how a consumer can request an IRQ line. Since it is a
one-to-one mapping with IRQ GTE provider, consumers can simply specify the IRQ
number that they are interested in. There is no userspace consumer support for
......
......@@ -818,10 +818,11 @@ Compression implementation
Instead, the main goal is to reduce data writes to flash disk as much as
possible, resulting in extending disk life time as well as relaxing IO
congestion. Alternatively, we've added ioctl(F2FS_IOC_RELEASE_COMPRESS_BLOCKS)
interface to reclaim compressed space and show it to user after putting the
immutable bit. Immutable bit, after release, it doesn't allow writing/mmaping
on the file, until reserving compressed space via
ioctl(F2FS_IOC_RESERVE_COMPRESS_BLOCKS) or truncating filesize to zero.
interface to reclaim compressed space and show it to user after setting a
special flag to the inode. Once the compressed space is released, the flag
will block writing data to the file until either the compressed space is
reserved via ioctl(F2FS_IOC_RESERVE_COMPRESS_BLOCKS) or the file size is
truncated to zero.
Compress metadata layout::
......
......@@ -607,7 +607,7 @@ can be removed.
User xattr
----------
The the "-o userxattr" mount option forces overlayfs to use the
The "-o userxattr" mount option forces overlayfs to use the
"user.overlay." xattr namespace instead of "trusted.overlay.". This is
useful for unprivileged mounting of overlayfs.
......
......@@ -12,7 +12,6 @@ increase the chances of your change being accepted.
* It should be unnecessary to mention, but please read and follow:
- Documentation/process/submit-checklist.rst
- Documentation/process/submitting-drivers.rst
- Documentation/process/submitting-patches.rst
- Documentation/process/coding-style.rst
......
......@@ -755,8 +755,7 @@ make a neat patch, there's administrative work to be done:
it implies a more-than-passing commitment to some part of the code.
- Finally, don't forget to read
``Documentation/process/submitting-patches.rst`` and possibly
``Documentation/process/submitting-drivers.rst``.
``Documentation/process/submitting-patches.rst``
Kernel Cantrips
===============
......
......@@ -10,8 +10,7 @@ of conventions and procedures which are used in the posting of patches;
following them will make life much easier for everybody involved. This
document will attempt to cover these expectations in reasonable detail;
more information can also be found in the files
:ref:`Documentation/process/submitting-patches.rst <submittingpatches>`,
:ref:`Documentation/process/submitting-drivers.rst <submittingdrivers>`
:ref:`Documentation/process/submitting-patches.rst <submittingpatches>`
and :ref:`Documentation/process/submit-checklist.rst <submitchecklist>`.
......
......@@ -5,15 +5,13 @@ For more information
There are numerous sources of information on Linux kernel development and
related topics. First among those will always be the Documentation
directory found in the kernel source distribution. The top-level :ref:`process/howto.rst <process_howto>`
file is an important starting point; :ref:`process/submitting-patches.rst <submittingpatches>`
and :ref:`process/submitting-drivers.rst <submittingdrivers>`
are also something which all kernel developers should
read. Many internal kernel APIs are documented using the kerneldoc
mechanism; "make htmldocs" or "make pdfdocs" can be used to generate those
documents in HTML or PDF format (though the version of TeX shipped by some
distributions runs into internal limits and fails to process the documents
properly).
directory found in the kernel source distribution. Start with the
top-level :ref:`process/howto.rst <process_howto>`; also read
:ref:`process/submitting-patches.rst <submittingpatches>`. Many internal
kernel APIs are documented using the kerneldoc mechanism; "make htmldocs"
or "make pdfdocs" can be used to generate those documents in HTML or PDF
format (though the version of TeX shipped by some distributions runs into
internal limits and fails to process the documents properly).
Various web sites discuss kernel development at all levels of detail. Your
author would like to humbly suggest https://lwn.net/ as a source;
......
......@@ -277,36 +277,61 @@ Thunderbird (GUI)
Thunderbird is an Outlook clone that likes to mangle text, but there are ways
to coerce it into behaving.
After doing the modifications, this includes installing the extensions,
you need to restart Thunderbird.
- Allow use of an external editor:
The easiest thing to do with Thunderbird and patches is to use an
"external editor" extension and then just use your favorite ``$EDITOR``
for reading/merging patches into the body text. To do this, download
and install the extension, then add a button for it using
:menuselection:`View-->Toolbars-->Customize...` and finally just click on it
when in the :menuselection:`Compose` dialog.
Please note that "external editor" requires that your editor must not
The easiest thing to do with Thunderbird and patches is to use extensions
which open your favorite external editor.
Here are some example extensions which are capable of doing this.
- "External Editor Revived"
https://github.com/Frederick888/external-editor-revived
https://addons.thunderbird.net/en-GB/thunderbird/addon/external-editor-revived/
It requires installing a "native messaging host".
Please read the wiki which can be found here:
https://github.com/Frederick888/external-editor-revived/wiki
- "External Editor"
https://github.com/exteditor/exteditor
To do this, download and install the extension, then open the
:menuselection:`compose` window, add a button for it using
:menuselection:`View-->Toolbars-->Customize...`
then just click on the new button when you wish to use the external editor.
Please note that "External Editor" requires that your editor must not
fork, or in other words, the editor must not return before closing.
You may have to pass additional flags or change the settings of your
editor. Most notably if you are using gvim then you must pass the -f
option to gvim by putting ``/usr/bin/gvim -f`` (if the binary is in
option to gvim by putting ``/usr/bin/gvim --nofork"`` (if the binary is in
``/usr/bin``) to the text editor field in :menuselection:`external editor`
settings. If you are using some other editor then please read its manual
to find out how to do this.
To beat some sense out of the internal editor, do this:
- Edit your Thunderbird config settings so that it won't use ``format=flowed``.
Go to :menuselection:`edit-->preferences-->advanced-->config editor` to bring up
the thunderbird's registry editor.
- Edit your Thunderbird config settings so that it won't use ``format=flowed``!
Go to your main window and find the button for your main dropdown menu.
:menuselection:`Main Menu-->Preferences-->General-->Config Editor...`
to bring up the thunderbird's registry editor.
- Set ``mailnews.send_plaintext_flowed`` to ``false``
- Set ``mailnews.send_plaintext_flowed`` to ``false``
- Set ``mailnews.wraplength`` from ``72`` to ``0``
- Set ``mailnews.wraplength`` from ``72`` to ``0``
- :menuselection:`View-->Message Body As-->Plain Text`
- Don't write HTML messages! Go to the main window
:menuselection:`Main Menu-->Account Settings-->youracc@server.something-->Composition & Addressing`!
There you can disable the option "Compose messages in HTML format".
- :menuselection:`View-->Character Encoding-->Unicode (UTF-8)`
- Open messages only as plain text! Go to the main window
:menuselection:`Main Menu-->View-->Message Body As-->Plain Text`!
TkRat (GUI)
***********
......
......@@ -105,8 +105,8 @@ required reading:
patches if these rules are followed, and many people will only
review code if it is in the proper style.
:ref:`Documentation/process/submitting-patches.rst <submittingpatches>` and :ref:`Documentation/process/submitting-drivers.rst <submittingdrivers>`
These files describe in explicit detail how to successfully create
:ref:`Documentation/process/submitting-patches.rst <submittingpatches>`
This file describes in explicit detail how to successfully create
and send a patch, including (but not limited to):
- Email contents
......
......@@ -40,7 +40,6 @@ Other guides to the community that are of interest to most developers are:
:maxdepth: 1
changes
submitting-drivers
stable-api-nonsense
management-style
stable-kernel-rules
......
.. _kernel_docs:
Index of Documentation for People Interested in Writing and/or Understanding the Linux Kernel
=============================================================================================
Index of Further Kernel Documentation
=====================================
Juan-Mariano de Goyeneche <jmseyas@dit.upm.es>
Initial Author: Juan-Mariano de Goyeneche (<jmseyas@dit.upm.es>;
email address is defunct now.)
The need for a document like this one became apparent in the
linux-kernel mailing list as the same questions, asking for pointers
......@@ -16,21 +17,16 @@ philosophy and design decisions behind this code.
Unfortunately, not many documents are available for beginners to
start. And, even if they exist, there was no "well-known" place which
kept track of them. These lines try to cover this lack. All documents
available on line known by the author are listed, while some reference
books are also mentioned.
kept track of them. These lines try to cover this lack.
PLEASE, if you know any paper not listed here or write a new document,
send me an e-mail, and I'll include a reference to it here. Any
corrections, ideas or comments are also welcomed.
include a reference to it here, following the kernel's patch submission
process. Any corrections, ideas or comments are also welcome.
The papers that follow are listed in no particular order. All are
cataloged with the following fields: the document's "Title", the
"Author"/s, the "URL" where they can be found, some "Keywords" helpful
when searching for specific topics, and a brief "Description" of the
Document.
Enjoy!
All documents are cataloged with the following fields: the document's
"Title", the "Author"/s, the "URL" where they can be found, some
"Keywords" helpful when searching for specific topics, and a brief
"Description" of the Document.
.. note::
......@@ -83,6 +79,18 @@ On-line docs
Finally this trace-log is used as base for more a exact conceptual
exploration and description of the Linux TCP/IP implementation.*
* Title: **The Linux Kernel Module Programming Guide**
:Author: Peter Jay Salzman, Michael Burian, Ori Pomerantz, Bob Mottram,
Jim Huang.
:URL: https://sysprog21.github.io/lkmpg/
:Date: 2021
:Keywords: modules, GPL book, /proc, ioctls, system calls,
interrupt handlers .
:Description: A very nice GPL book on the topic of modules
programming. Lots of examples. Currently the new version is being
actively maintained at https://github.com/sysprog21/lkmpg.
* Title: **On submitting kernel Patches**
:Author: Andi Kleen
......@@ -126,17 +134,19 @@ On-line docs
describes how to write user-mode utilities for communicating with
Card Services.
* Title: **The Linux Kernel Module Programming Guide**
:Author: Peter Jay Salzman, Michael Burian, Ori Pomerantz, Bob Mottram,
Jim Huang.
:URL: https://sysprog21.github.io/lkmpg/
:Date: 2021
:Keywords: modules, GPL book, /proc, ioctls, system calls,
interrupt handlers .
:Description: A very nice GPL book on the topic of modules
programming. Lots of examples. Currently the new version is being
actively maintained at https://github.com/sysprog21/lkmpg.
* Title: **How NOT to write kernel drivers**
:Author: Arjan van de Ven.
:URL: https://landley.net/kdocs/ols/2002/ols2002-pages-545-555.pdf
:Date: 2002
:Keywords: driver.
:Description: Programming bugs and Do-nots in kernel driver development
:Abstract: *Quit a few tutorials, articles and books give an introduction
on how to write Linux kernel drivers. Unfortunately the things one
should NOT do in Linux kernel code is either only a minor appendix
or, more commonly, completely absent. This paper tries to briefly touch
the areas in which the most common and serious bugs and do-nots are
encountered.*
* Title: **Global spinlock list and usage**
......
.. _submittingdrivers:
Submitting Drivers For The Linux Kernel
=======================================
This document is intended to explain how to submit device drivers to the
various kernel trees. Note that if you are interested in video card drivers
you should probably talk to XFree86 (https://www.xfree86.org/) and/or X.Org
(https://x.org/) instead.
.. note::
This document is old and has seen little maintenance in recent years; it
should probably be updated or, perhaps better, just deleted. Most of
what is here can be found in the other development documents anyway.
Oh, and we don't really recommend submitting changes to XFree86 :)
Also read the :ref:`Documentation/process/submitting-patches.rst <submittingpatches>`
document.
Allocating Device Numbers
-------------------------
Major and minor numbers for block and character devices are allocated
by the Linux assigned name and number authority (currently this is
Torben Mathiasen). The site is https://www.lanana.org/. This
also deals with allocating numbers for devices that are not going to
be submitted to the mainstream kernel.
See :ref:`Documentation/admin-guide/devices.rst <admin_devices>`
for more information on this.
If you don't use assigned numbers then when your device is submitted it will
be given an assigned number even if that is different from values you may
have shipped to customers before.
Who To Submit Drivers To
------------------------
Linux 2.0:
No new drivers are accepted for this kernel tree.
Linux 2.2:
No new drivers are accepted for this kernel tree.
Linux 2.4:
If the code area has a general maintainer then please submit it to
the maintainer listed in MAINTAINERS in the kernel file. If the
maintainer does not respond or you cannot find the appropriate
maintainer then please contact Willy Tarreau <w@1wt.eu>.
Linux 2.6 and upper:
The same rules apply as 2.4 except that you should follow linux-kernel
to track changes in API's. The final contact point for Linux 2.6+
submissions is Andrew Morton.
What Criteria Determine Acceptance
----------------------------------
Licensing:
The code must be released to us under the
GNU General Public License. If you wish the driver to be
useful to other communities such as BSD you may release
under multiple licenses. If you choose to release under
licenses other than the GPL, you should include your
rationale for your license choices in your cover letter.
See accepted licenses at include/linux/module.h
Copyright:
The copyright owner must agree to use of GPL.
It's best if the submitter and copyright owner
are the same person/entity. If not, the name of
the person/entity authorizing use of GPL should be
listed in case it's necessary to verify the will of
the copyright owner.
Interfaces:
If your driver uses existing interfaces and behaves like
other drivers in the same class it will be much more likely
to be accepted than if it invents gratuitous new ones.
If you need to implement a common API over Linux and NT
drivers do it in userspace.
Code:
Please use the Linux style of code formatting as documented
in :ref:`Documentation/process/coding-style.rst <codingStyle>`.
If you have sections of code
that need to be in other formats, for example because they
are shared with a windows driver kit and you want to
maintain them just once separate them out nicely and note
this fact.
Portability:
Pointers are not always 32bits, not all computers are little
endian, people do not all have floating point and you
shouldn't use inline x86 assembler in your driver without
careful thought. Pure x86 drivers generally are not popular.
If you only have x86 hardware it is hard to test portability
but it is easy to make sure the code can easily be made
portable.
Clarity:
It helps if anyone can see how to fix the driver. It helps
you because you get patches not bug reports. If you submit a
driver that intentionally obfuscates how the hardware works
it will go in the bitbucket.
PM support:
Since Linux is used on many portable and desktop systems, your
driver is likely to be used on such a system and therefore it
should support basic power management by implementing, if
necessary, the .suspend and .resume methods used during the
system-wide suspend and resume transitions. You should verify
that your driver correctly handles the suspend and resume, but
if you are unable to ensure that, please at least define the
.suspend method returning the -ENOSYS ("Function not
implemented") error. You should also try to make sure that your
driver uses as little power as possible when it's not doing
anything. For the driver testing instructions see
Documentation/power/drivers-testing.rst and for a relatively
complete overview of the power management issues related to
drivers see :ref:`Documentation/driver-api/pm/devices.rst <driverapi_pm_devices>`.
Control:
In general if there is active maintenance of a driver by
the author then patches will be redirected to them unless
they are totally obvious and without need of checking.
If you want to be the contact and update point for the
driver it is a good idea to state this in the comments,
and include an entry in MAINTAINERS for your driver.
What Criteria Do Not Determine Acceptance
-----------------------------------------
Vendor:
Being the hardware vendor and maintaining the driver is
often a good thing. If there is a stable working driver from
other people already in the tree don't expect 'we are the
vendor' to get your driver chosen. Ideally work with the
existing driver author to build a single perfect driver.
Author:
It doesn't matter if a large Linux company wrote the driver,
or you did. Nobody has any special access to the kernel
tree. Anyone who tells you otherwise isn't telling the
whole story.
Resources
---------
Linux kernel master tree:
ftp.\ *country_code*\ .kernel.org:/pub/linux/kernel/...
where *country_code* == your country code, such as
**us**, **uk**, **fr**, etc.
https://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git
Linux kernel mailing list:
linux-kernel@vger.kernel.org
[mail majordomo@vger.kernel.org to subscribe]
Linux Device Drivers, Third Edition (covers 2.6.10):
https://lwn.net/Kernel/LDD3/ (free version)
LWN.net:
Weekly summary of kernel development activity - https://lwn.net/
2.6 API changes:
https://lwn.net/Articles/2.6-kernel-api/
Porting drivers from prior kernels to 2.6:
https://lwn.net/Articles/driver-porting/
KernelNewbies:
Documentation and assistance for new kernel programmers
https://kernelnewbies.org/
Linux USB project:
http://www.linux-usb.org/
How to NOT write kernel driver by Arjan van de Ven:
https://landley.net/kdocs/ols/2002/ols2002-pages-545-555.pdf
Kernel Janitor:
https://kernelnewbies.org/KernelJanitors
GIT, Fast Version Control System:
https://git-scm.com/
......@@ -12,9 +12,8 @@ This document contains a large number of suggestions in a relatively terse
format. For detailed information on how the kernel development process
works, see Documentation/process/development-process.rst. Also, read
Documentation/process/submit-checklist.rst
for a list of items to check before submitting code. If you are submitting
a driver, also read Documentation/process/submitting-drivers.rst; for device
tree binding patches, read
for a list of items to check before submitting code.
For device tree binding patches, read
Documentation/devicetree/bindings/submitting-patches.rst.
This documentation assumes that you're using ``git`` to prepare your patches.
......
......@@ -1046,7 +1046,7 @@ The keyctl syscall functions are:
"filter" is either NULL to remove a watch or a filter specification to
indicate what events are required from the key.
See Documentation/watch_queue.rst for more information.
See Documentation/core-api/watch_queue.rst for more information.
Note that only one watch may be emplaced for any particular { key,
queue_fd } combination.
......
......@@ -98,6 +98,6 @@ References
See [sev-api-spec]_ for more info regarding SEV ``LAUNCH_SECRET`` operation.
.. [sev] Documentation/virt/kvm/amd-memory-encryption.rst
.. [sev] Documentation/virt/kvm/x86/amd-memory-encryption.rst
.. [secrets-coco-abi] Documentation/ABI/testing/securityfs-secrets-coco
.. [sev-api-spec] https://www.amd.com/system/files/TechDocs/55766_SEV-KM_API_Specification.pdf
......@@ -85,7 +85,7 @@ Often times the XuY functions will not be large enough, and instead you'll
want to pass a pre-filled struct to siphash. When doing this, it's important
to always ensure the struct has no padding holes. The easiest way to do this
is to simply arrange the members of the struct in descending order of size,
and to use offsetendof() instead of sizeof() for getting the size. For
and to use offsetofend() instead of sizeof() for getting the size. For
performance reasons, if possible, it's probably a good thing to align the
struct to the right boundary. Here's an example::
......
......@@ -120,14 +120,21 @@ def markup_refs(docname, app, node):
repl.append(nodes.Text(t[done:]))
return repl
#
# Keep track of cross-reference lookups that failed so we don't have to
# do them again.
#
failed_lookups = { }
def failure_seen(target):
return (target) in failed_lookups
def note_failure(target):
failed_lookups[target] = True
#
# In sphinx3 we can cross-reference to C macro and function, each one with its
# own C role, but both match the same regex, so we try both.
#
def markup_func_ref_sphinx3(docname, app, match):
class_str = ['c-func', 'c-macro']
reftype_str = ['function', 'macro']
cdom = app.env.domains['c']
#
# Go through the dance of getting an xref out of the C domain
......@@ -143,13 +150,13 @@ def markup_func_ref_sphinx3(docname, app, match):
if base_target not in Skipnames:
for target in possible_targets:
if target not in Skipfuncs:
for class_s, reftype_s in zip(class_str, reftype_str):
lit_text = nodes.literal(classes=['xref', 'c', class_s])
if (target not in Skipfuncs) and not failure_seen(target):
lit_text = nodes.literal(classes=['xref', 'c', 'c-func'])
lit_text += target_text
pxref = addnodes.pending_xref('', refdomain = 'c',
reftype = reftype_s,
reftarget = target, modname = None,
reftype = 'function',
reftarget = target,
modname = None,
classname = None)
#
# XXX The Latex builder will throw NoUri exceptions here,
......@@ -157,13 +164,14 @@ def markup_func_ref_sphinx3(docname, app, match):
#
try:
xref = cdom.resolve_xref(app.env, docname, app.builder,
reftype_s, target, pxref,
'function', target, pxref,
lit_text)
except NoUri:
xref = None
if xref:
return xref
note_failure(target)
return target_text
......
.. SPDX-License-Identifier: GPL-2.0
.. include:: ../../disclaimer-ita.rst
:Original: Documentation/devicetree/bindings/submitting-patches.rst
================================================
Sottomettere patch per devicetree (DT) *binding*
================================================
.. note:: to be translated
......@@ -5,6 +5,7 @@
.. _it_kernel_doc:
=================================
Scrivere i commenti in kernel-doc
=================================
......@@ -469,6 +470,7 @@ Il titolo che segue ``DOC:`` funziona da intestazione all'interno del file
sorgente, ma anche come identificatore per l'estrazione di questi commenti di
documentazione. Quindi, il titolo dev'essere unico all'interno del file.
=======================================
Includere i commenti di tipo kernel-doc
=======================================
......
......@@ -5,8 +5,9 @@
.. _it_sphinxdoc:
Introduzione
============
=============================================
Usare Sphinx per la documentazione del kernel
=============================================
Il kernel Linux usa `Sphinx`_ per la generazione della documentazione a partire
dai file `reStructuredText`_ che si trovano nella cartella ``Documentation``.
......@@ -158,6 +159,9 @@ Per poter passare ulteriori opzioni a Sphinx potete utilizzare la variabile
make ``SPHINXOPTS``. Per esempio, se volete che Sphinx sia più verboso durante
la generazione potete usare il seguente comando ``make SPHINXOPTS=-v htmldocs``.
Potete anche personalizzare l'ouptut html passando un livello aggiuntivo
DOCS_CSS usando la rispettiva variabile d'ambiente ``DOCS_CSS``.
Potete eliminare la documentazione generata tramite il comando
``make cleandocs``.
......@@ -276,11 +280,11 @@ incrociato quando questa ha una voce nell'indice. Se trovate degli usi di
Tabelle a liste
---------------
Raccomandiamo l'uso delle tabelle in formato lista (*list table*). Le tabelle
in formato lista sono liste di liste. In confronto all'ASCII-art potrebbero
non apparire di facile lettura nei file in formato testo. Il loro vantaggio è
che sono facili da creare o modificare e che la differenza di una modifica è
molto più significativa perché limitata alle modifiche del contenuto.
Il formato ``list-table`` può essere utile per tutte quelle tabelle che non
possono essere facilmente scritte usando il formato ASCII-art di Sphinx. Però,
questo genere di tabelle sono illeggibili per chi legge direttamente i file di
testo. Dunque, questo formato dovrebbe essere evitato senza forti argomenti che
ne giustifichino l'uso.
La ``flat-table`` è anch'essa una lista di liste simile alle ``list-table``
ma con delle funzionalità aggiuntive:
......
......@@ -129,8 +129,7 @@ eseguiti simultaneamente.
.. warning::
Il nome 'tasklet' è ingannevole: non hanno niente a che fare
con i 'processi' ('tasks'), e probabilmente hanno più a che vedere
con qualche pessima vodka che Alexey Kuznetsov si fece a quel tempo.
con i 'processi' ('tasks').
Potete determinate se siete in un softirq (o tasklet) utilizzando la
macro :c:func:`in_softirq()` (``include/linux/preempt.h``).
......@@ -308,7 +307,7 @@ esse copiano una quantità arbitraria di dati da e verso lo spazio utente.
Al contrario di:c:func:`put_user()` e :c:func:`get_user()`, queste
funzioni ritornano la quantità di dati copiati (0 è comunque un successo).
[Sì, questa stupida interfaccia mi imbarazza. La battaglia torna in auge anno
[Sì, questa interfaccia mi imbarazza. La battaglia torna in auge anno
dopo anno. --RR]
Le funzioni potrebbero dormire implicitamente. Queste non dovrebbero mai essere
......@@ -679,9 +678,8 @@ tutti sulle spine: questo riflette cambiamenti fondamentati (eg. la funzione
non può più essere chiamata con le funzioni attive, o fa controlli aggiuntivi,
o non fa più controlli che venivano fatti in precedenza). Solitamente a questo
s'accompagna un'adeguata e completa nota sulla lista di discussone
linux-kernel; cercate negli archivi.
Solitamente eseguire una semplice sostituzione su tutto un file rendere
le cose **peggiori**.
più adatta; cercate negli archivi. Solitamente eseguire una semplice
sostituzione su tutto un file rendere le cose **peggiori**.
Inizializzazione dei campi d'una struttura
------------------------------------------
......@@ -759,14 +757,14 @@ Mettere le vostre cose nel kernel
Al fine d'avere le vostre cose in ordine per l'inclusione ufficiale, o
anche per avere patch pulite, c'è del lavoro amministrativo da fare:
- Trovare di chi è lo stagno in cui state pisciando. Guardare in cima
- Trovare chi è responsabile del codice che state modificando. Guardare in cima
ai file sorgenti, all'interno del file ``MAINTAINERS``, ed alla fine
di tutti nel file ``CREDITS``. Dovreste coordinarvi con queste persone
per evitare di duplicare gli sforzi, o provare qualcosa che è già stato
rigettato.
Assicuratevi di mettere il vostro nome ed indirizzo email in cima a
tutti i file che create o che mangeggiate significativamente. Questo è
tutti i file che create o che maneggiate significativamente. Questo è
il primo posto dove le persone guarderanno quando troveranno un baco,
o quando **loro** vorranno fare una modifica.
......@@ -787,16 +785,15 @@ anche per avere patch pulite, c'è del lavoro amministrativo da fare:
"obj-$(CONFIG_xxx) += xxx.o". La sintassi è documentata nel file
``Documentation/kbuild/makefiles.rst``.
- Aggiungete voi stessi in ``CREDITS`` se avete fatto qualcosa di notevole,
solitamente qualcosa che supera il singolo file (comunque il vostro nome
dovrebbe essere all'inizio dei file sorgenti). ``MAINTAINERS`` significa
- Aggiungete voi stessi in ``CREDITS`` se credete di aver fatto qualcosa di
notevole, solitamente qualcosa che supera il singolo file (comunque il vostro
nome dovrebbe essere all'inizio dei file sorgenti). ``MAINTAINERS`` significa
che volete essere consultati quando vengono fatte delle modifiche ad un
sottosistema, e quando ci sono dei bachi; questo implica molto di più
di un semplice impegno su una parte del codice.
sottosistema, e quando ci sono dei bachi; questo implica molto di più di un
semplice impegno su una parte del codice.
- Infine, non dimenticatevi di leggere
``Documentation/process/submitting-patches.rst`` e possibilmente anche
``Documentation/process/submitting-drivers.rst``.
``Documentation/process/submitting-patches.rst``.
Trucchetti del kernel
=====================
......
......@@ -102,16 +102,11 @@ che non esistano.
Sincronizzazione nel kernel Linux
=================================
Se posso darvi un suggerimento: non dormite mai con qualcuno più pazzo di
voi. Ma se dovessi darvi un suggerimento sulla sincronizzazione:
**mantenetela semplice**.
Se dovessi darvi un suggerimento sulla sincronizzazione: **mantenetela
semplice**.
Siate riluttanti nell'introduzione di nuovi *lock*.
Abbastanza strano, quest'ultimo è l'esatto opposto del mio suggerimento
su quando **avete** dormito con qualcuno più pazzo di voi. E dovreste
pensare a prendervi un cane bello grande.
I due principali tipi di *lock* nel kernel: spinlock e mutex
------------------------------------------------------------
......@@ -316,7 +311,7 @@ Pete Zaitcev ci offre il seguente riassunto:
- Se siete in un contesto utente (una qualsiasi chiamata di sistema)
e volete sincronizzarvi con altri processi, usate i mutex. Potete trattenere
il mutex e dormire (``copy_from_user*(`` o ``kmalloc(x,GFP_KERNEL)``).
il mutex e dormire (``copy_from_user(`` o ``kmalloc(x,GFP_KERNEL)``).
- Altrimenti (== i dati possono essere manipolati da un'interruzione) usate
spin_lock_irqsave() e spin_unlock_irqrestore().
......@@ -979,9 +974,6 @@ fallisce nel trovare quello che vuole, quindi rilascia il *lock* di lettura,
trattiene un *lock* di scrittura ed inserisce un oggetto; questo genere di
codice presenta una corsa critica.
Se non riuscite a capire il perché, per favore state alla larga dal mio
codice.
corsa fra temporizzatori: un passatempo del kernel
--------------------------------------------------
......
.. include:: ../disclaimer-ita.rst
:Original: Documentation/process/botching-up-ioctls.rst
.. _it_configuregit:
Configurare Git
===============
.. note:: To be translated
.. include:: ../disclaimer-ita.rst
:Original: :ref:`Documentation/networking/netdev-FAQ.rst <netdev-FAQ>`
:Original: :ref:`Documentation/process/maintainer-netdev.rst <netdev-FAQ>`
.. _it_netdev-FAQ:
......
......@@ -168,14 +168,15 @@ in questa ricerca:
.../scripts/get_maintainer.pl
Se questo script viene eseguito con l'opzione "-f" ritornerà il
manutentore(i) attuale per un dato file o cartella. Se viene passata una
patch sulla linea di comando, lo script elencherà i manutentori che
dovrebbero riceverne una copia. Ci sono svariate opzioni che regolano
quanto a fondo get_maintainer.pl debba cercare i manutentori;
siate quindi prudenti nell'utilizzare le opzioni più aggressive poiché
potreste finire per includere sviluppatori che non hanno un vero interesse
per il codice che state modificando.
Se questo script viene eseguito con l'opzione "-f" ritornerà il manutentore(i)
attuale per un dato file o cartella. Se viene passata una patch sulla linea di
comando, lo script elencherà i manutentori che dovrebbero riceverne una copia.
Questo è la maniera raccomandata (non quella con "-f") per ottenere la lista di
persone da aggiungere a Cc per le vostre patch. Ci sono svariate opzioni che
regolano quanto a fondo get_maintainer.pl debba cercare i manutentori; siate
quindi prudenti nell'utilizzare le opzioni più aggressive poiché potreste finire
per includere sviluppatori che non hanno un vero interesse per il codice che
state modificando.
Se tutto ciò dovesse fallire, parlare con Andrew Morton potrebbe essere
un modo efficace per capire chi è il manutentore di un dato pezzo di codice.
......
......@@ -16,9 +16,8 @@ e di procedure per la pubblicazione delle patch; seguirle renderà la vita
più facile a tutti quanti. Questo documento cercherà di coprire questi
argomenti con un ragionevole livello di dettaglio; più informazioni possono
essere trovare nella cartella 'Documentation', nei file
:ref:`translations/it_IT/process/submitting-patches.rst <it_submittingpatches>`,
:ref:`translations/it_IT/process/submitting-drivers.rst <it_submittingdrivers>`, e
:ref:`translations/it_IT/process/submit-checklist.rst <it_submitchecklist>`.
:ref:`translations/it_IT/process/submitting-patches.rst <it_submittingpatches>`
e :ref:`translations/it_IT/process/submit-checklist.rst <it_submitchecklist>`.
Quando pubblicarle
......@@ -214,13 +213,28 @@ irrilevanti (quelli generati dal processo di generazione, per esempio, o i file
di backup del vostro editor). Il file "dontdiff" nella cartella Documentation
potrà esservi d'aiuto su questo punto; passatelo a diff con l'opzione "-X".
Le etichette sopra menzionante sono usate per descrivere come i vari
sviluppatori sono stati associati allo sviluppo di una patch. Sono descritte
in dettaglio nel documento :ref:`translations/it_IT/process/submitting-patches.rst <it_submittingpatches>`;
quello che segue è un breve riassunto. Ognuna di queste righe ha il seguente
formato:
Le etichette sopracitate danno un'idea di come una patch prende vita e sono
descritte nel dettaglio nel documento
:ref:`Documentation/translations/it_IT/process/submitting-patches.rst <it_submittingpatches>`.
Qui di seguito un breve riassunto.
::
Un'etichetta ci può dire quale commit ha introdotto il problema che viene corretto nella patch::
Fixes: 1f2e3d4c5b6a ("The first line of the commit specified by the first 12 characters of its SHA-1 ID")
Un'altra etichetta viene usata per fornire collegamenti a pagine web contenenti
maggiori informazioni, per esempio un rapporto circa il baco risolto dalla
patch, oppure un documento con le specifiche implementate dalla patch::
Link: https://example.com/somewhere.html optional-other-stuff
Alcuni manutentori aggiungono quest'etichetta alla patch per fare riferimento
alla più recente discussione pubblica. A volte questo è fatto automaticamente da
alcuni strumenti come b4 or un *hook* git come quello descritto qui
'Documentation/translations/it_IT/maintainer/configure-git.rst'
Un terzo tipo di etichetta viene usato per indicare chi ha contribuito allo
sviluppo della patch. Tutte queste etichette seguono il formato::
tag: Full Name <email address> optional-other-stuff
......
......@@ -13,9 +13,8 @@ e argomenti correlati. Primo tra questi sarà sempre la cartella Documentation
che si trova nei sorgenti kernel.
Il file :ref:`process/howto.rst <it_process_howto>` è un punto di partenza
importante; :ref:`process/submitting-patches.rst <it_submittingpatches>` e
:ref:`process/submitting-drivers.rst <it_submittingdrivers>` sono
anch'essi qualcosa che tutti gli sviluppatori del kernel dovrebbero leggere.
importante; :ref:`process/submitting-patches.rst <it_submittingpatches>` è
anch'esso qualcosa che tutti gli sviluppatori del kernel dovrebbero leggere.
Molte API interne al kernel sono documentate utilizzando il meccanismo
kerneldoc; "make htmldocs" o "make pdfdocs" possono essere usati per generare
quei documenti in HTML o PDF (sebbene le versioni di TeX di alcune
......
......@@ -11,8 +11,8 @@ Requisiti minimi per compilare il kernel
Introduzione
============
Questo documento fornisce una lista dei software necessari per eseguire i
kernel 4.x.
Questo documento fornisce una lista dei software necessari per eseguire questa
versione del kernel.
Questo documento è basato sul file "Changes" del kernel 2.0.x e quindi le
persone che lo scrissero meritano credito (Jared Mauch, Axel Boldt,
......@@ -32,12 +32,13 @@ PC Card, per esempio, probabilmente non dovreste preoccuparvi di pcmciautils.
====================== ================= ========================================
Programma Versione minima Comando per verificare la versione
====================== ================= ========================================
GNU C 4.9 gcc --version
Clang/LLVM (optional) 10.0.1 clang --version
GNU C 5.1 gcc --version
Clang/LLVM (optional) 11.0.0 clang --version
GNU make 3.81 make --version
binutils 2.23 ld -v
flex 2.5.35 flex --version
bison 2.0 bison --version
pahole 1.16 pahole --version
util-linux 2.10o fdformat --version
kmod 13 depmod -V
e2fsprogs 1.41.4 e2fsck -V
......@@ -58,6 +59,7 @@ iptables 1.4.2 iptables -V
openssl & libcrypto 1.0.0 openssl version
bc 1.06.95 bc --version
Sphinx\ [#f1]_ 1.7 sphinx-build --version
cpio any cpio --version
====================== ================= ========================================
.. [#f1] Sphinx è necessario solo per produrre la documentazione del Kernel
......@@ -111,6 +113,16 @@ Bison
Dalla versione 4.16, il sistema di compilazione, durante l'esecuzione, genera
un parsificatore. Questo richiede bison 2.0 o successivo.
pahole
------
Dalla versione 5.2, quando viene impostato CONFIG_DEBUG_INFO_BTF, il sistema di
compilazione genera BTF (BPF Type Format) a partire da DWARF per vmlinux. Più
tardi anche per i moduli. Questo richiede pahole v1.16 o successivo.
A seconda della distribuzione, lo si può trovare nei pacchetti 'dwarves' o
'pahole'. Oppure lo si può trovare qui: https://fedorapeople.org/~acme/dwarves/.
Perl
----
......@@ -455,6 +467,11 @@ mcelog
- <http://www.mcelog.org/>
cpio
----
- <https://www.gnu.org/software/cpio/>
Rete
****
......
......@@ -466,14 +466,52 @@ la riga della parentesi graffa di chiusura. Ad esempio:
}
EXPORT_SYMBOL(system_is_up);
6.1) Prototipi di funzione
**************************
Nei prototipi di funzione, includete i nomi dei parametri e i loro tipi.
Nonostante questo non sia richiesto dal linguaggio C, in Linux viene preferito
perché è un modo semplice per aggiungere informazioni importanti per il
lettore.
Non usate la parola chiave ``extern`` coi prototipi di funzione perché
Non usate la parola chiave ``extern`` con le dichiarazioni di funzione perché
rende le righe più lunghe e non è strettamente necessario.
Quando scrivete i prototipi di funzione mantenete `l'ordine degli elementi <https://lore.kernel.org/mm-commits/CAHk-=wiOCLRny5aifWNhr621kYrJwhfURsa0vFPeUEm8mF0ufg@mail.gmail.com/>`_.
Prendiamo questa dichiarazione di funzione come esempio::
__init void * __must_check action(enum magic value, size_t size, u8 count,
char *fmt, ...) __printf(4, 5) __malloc;
L'ordine suggerito per gli elementi di un prototipo di funzione è il seguente:
- classe d'archiviazione (in questo caso ``static __always_inline``. Da notare
che ``__always_inline`` è tecnicamente un attributo ma che viene trattato come
``inline``)
- attributi della classe di archiviazione (in questo caso ``__init``, in altre
parole la sezione, ma anche cose tipo ``__cold``)
- il tipo di ritorno (in questo caso, ``void *``)
- attributi per il valore di ritorno (in questo caso, ``__must_check``)
- il nome della funzione (in questo caso, ``action``)
- i parametri della funzione(in questo caso,
``(enum magic value, size_t size, u8 count, char *fmt, ...)``,
da notare che va messo anche il nome del parametro)
- attributi dei parametri (in questo caso, ``__printf(4, 5)``)
- attributi per il comportamento della funzione (in questo caso, ``__malloc_``)
Notate che per la **definizione** di una funzione (il altre parole il corpo
della funzione), il compilatore non permette di usare gli attributi per i
parametri dopo i parametri. In questi casi, devono essere messi dopo gli
attributi della classe d'archiviazione (notate che la posizione di
``__printf(4,5)`` cambia rispetto alla **dichiarazione**)::
static __always_inline __init __printf(4, 5) void * __must_check action(enum magic value,
size_t size, u8 count, char *fmt, ...) __malloc
{
...
}*)**``)**``)``)``*)``)``)``)``*``)``)``)*)
7) Centralizzare il ritorno delle funzioni
------------------------------------------
......@@ -855,7 +893,7 @@ I messaggi del kernel non devono terminare con un punto fermo.
Scrivere i numeri fra parentesi (%d) non migliora alcunché e per questo
dovrebbero essere evitati.
Ci sono alcune macro per la diagnostica in <linux/device.h> che dovreste
Ci sono alcune macro per la diagnostica in <linux/dev_printk.h> che dovreste
usare per assicurarvi che i messaggi vengano associati correttamente ai
dispositivi e ai driver, e che siano etichettati correttamente: dev_err(),
dev_warn(), dev_info(), e così via. Per messaggi che non sono associati ad
......
......@@ -69,8 +69,8 @@ dovrebbero essere fatto negli argomenti di funzioni di allocazione di memoria
piccoli di quelli che il chiamante si aspettava. L'uso di questo modo di
allocare può portare ad un overflow della memoria di heap e altri
malfunzionamenti. (Si fa eccezione per valori numerici per i quali il
compilatore può generare avvisi circa un potenziale overflow. Tuttavia usare
i valori numerici come suggerito di seguito è innocuo).
compilatore può generare avvisi circa un potenziale overflow. Tuttavia, anche in
questi casi è preferibile riscrivere il codice come suggerito di seguito).
Per esempio, non usate ``count * size`` come argomento::
......@@ -80,6 +80,9 @@ Al suo posto, si dovrebbe usare l'allocatore a due argomenti::
foo = kmalloc_array(count, size, GFP_KERNEL);
Nello specifico, kmalloc() può essere sostituta da kmalloc_array(), e kzalloc()
da kcalloc().
Se questo tipo di allocatore non è disponibile, allora dovrebbero essere usate
le funzioni del tipo *saturate-on-overflow*::
......@@ -100,9 +103,20 @@ Invece, usate la seguente funzione::
invitati a riorganizzare il vostro codice usando il
`flexible array member <#zero-length-and-one-element-arrays>`_.
Per maggiori dettagli fate riferimento a array_size(),
array3_size(), e struct_size(), così come la famiglia di
funzioni check_add_overflow() e check_mul_overflow().
Per altri calcoli, usate le funzioni size_mul(), size_add(), e size_sub(). Per
esempio, al posto di::
foo = krealloc(current_size + chunk_size * (count - 3), GFP_KERNEL);
dovreste scrivere:
foo = krealloc(size_add(current_size,
size_mul(chunk_size,
size_sub(count, 3))), GFP_KERNEL);
Per maggiori dettagli fate riferimento a array3_size() e flex_array_size(), ma
anche le funzioni della famiglia check_mul_overflow(), check_add_overflow(),
check_sub_overflow(), e check_shl_overflow().
simple_strtol(), simple_strtoll(), simple_strtoul(), simple_strtoull()
----------------------------------------------------------------------
......
......@@ -109,8 +109,7 @@ Di seguito una lista di file che sono presenti nei sorgente del kernel e che
accetteranno patch solo se queste osserveranno tali regole, e molte
persone revisioneranno il codice solo se scritto nello stile appropriato.
:ref:`Documentation/translations/it_IT/process/submitting-patches.rst <it_submittingpatches>` e
:ref:`Documentation/translations/it_IT/process/submitting-drivers.rst <it_submittingdrivers>`
:ref:`Documentation/translations/it_IT/process/submitting-patches.rst <it_submittingpatches>`
Questo file descrive dettagliatamente come creare ed inviare una patch
con successo, includendo (ma non solo questo):
......
......@@ -41,12 +41,12 @@ degli sviluppatori:
:maxdepth: 1
changes
submitting-drivers
stable-api-nonsense
management-style
stable-kernel-rules
submit-checklist
kernel-docs
maintainers
Ed infine, qui ci sono alcune guide più tecniche che son state messe qua solo
perché non si è trovato un posto migliore.
......
.. SPDX-License-Identifier: GPL-2.0
.. include:: ../disclaimer-ita.rst
:Original: Documentation/process/maintainer-handbooks.rst
:Translator: Federico Vaga <federico.vaga@vaga.pv.it>
.. _it_maintainer_handbooks_main:
Note sul processo di sviluppo dei sottosistemi e dei sorgenti dei manutentori
=============================================================================
Lo scopo di questo documento è quello di fornire informazioni sul processo di
sviluppo dedicate ai sottosistemi che vanno ad integrare quelle più generali
descritte in :ref:`Documentation/translations/it_IT/process
<it_development_process_main>`.
Indice:
.. toctree::
:numbered:
:maxdepth: 2
maintainer-tip
......@@ -931,12 +931,11 @@ che avete nel vostro portachiavi::
uid [ unknown] Linus Torvalds <torvalds@kernel.org>
sub rsa2048 2011-09-20 [E]
Poi, aprite il `PGP pathfinder`_. Nel campo "From", incollate l'impronta
digitale della chiave di Linus Torvalds che si vede nell'output qui sopra.
Nel campo "to", incollate il key-id della chiave sconosciuta che avete
trovato con ``gpg --search``, e poi verificare il risultato:
- `Finding paths to Linus`_
Poi, cercate un percorso affidabile da Linux Torvalds alla chiave che avete
trovato con ``gpg --search`` usando la chiave sconosciuta.Per farlo potete usare
diversi strumenti come https://github.com/mricon/wotmate,
https://git.kernel.org/pub/scm/docs/kernel/pgpkeys.git/tree/graphs, e
https://the.earth.li/~noodles/pathfind.html.
Se trovate un paio di percorsi affidabili è un buon segno circa la validità
della chiave. Ora, potete aggiungerla al vostro portachiavi dal keyserver::
......@@ -948,6 +947,3 @@ fiducia nell'amministratore del servizio *PGP Pathfinder* sperando che non
sia malintenzionato (infatti, questo va contro :ref:`it_devs_not_infra`).
Tuttavia, se mantenete con cura la vostra rete di fiducia sarà un deciso
miglioramento rispetto alla cieca fiducia nei keyserver.
.. _`PGP pathfinder`: https://pgp.cs.uu.nl/
.. _`Finding paths to Linus`: https://pgp.cs.uu.nl/paths/79BE3E4300411886/to/C94035C21B4F2AEB.html
.. SPDX-License-Identifier: GPL-2.0
.. include:: ../disclaimer-ita.rst
:Original: Documentation/process/maintainer-tip.rst
Il tascabile dei sorgenti tip
=============================
.. note:: To be translated
:Original: Documentation/process/maintainers.rst
Lista dei manutentori e come inviare modifiche al kernel
========================================================
Questa pagina non verrà tradotta. Fate riferimento alla versione originale in
inglese.
.. note:: La pagina originale usa una direttiva speciale per integrare il file
`MAINTAINERS` in sphinx. La parte di quel documento che si potrebbe
tradurre contiene comunque informazioni già presenti in
:ref:`Documentation/translations/it_IT/process/submitting-patches.rst
<it_submittingpatches>`.
......@@ -41,11 +41,11 @@ Regole sul tipo di patch che vengono o non vengono accettate nei sorgenti
Procedura per sottomettere patch per i sorgenti -stable
-------------------------------------------------------
- Una patch di sicurezza non dovrebbero essere gestite (solamente) dal processo
.. note::
Una patch di sicurezza non dovrebbe essere gestita (solamente) dal processo
di revisione -stable, ma dovrebbe seguire le procedure descritte in
:ref:`Documentation/translations/it_IT/admin-guide/security-bugs.rst <it_securitybugs>`.
Per tutte le altre sottomissioni, scegliere una delle seguenti procedure
------------------------------------------------------------------------
......@@ -90,9 +90,9 @@ L':ref:`it_option_2` e l':ref:`it_option_3` sono più utili quando, al momento
dell'inclusione dei sorgenti principali, si ritiene che non debbano essere
incluse anche in quelli stabili (per esempio, perché si crede che si dovrebbero
fare più verifiche per eventuali regressioni). L':ref:`it_option_3` è
particolarmente utile se la patch ha bisogno di qualche modifica per essere
applicata ad un kernel più vecchio (per esempio, perché nel frattempo l'API è
cambiata).
particolarmente utile se una patch dev'essere riportata su una versione
precedente (per esempio la patch richiede modifiche a causa di cambiamenti di
API).
Notate che per l':ref:`it_option_3`, se la patch è diversa da quella nei
sorgenti principali (per esempio perché è stato necessario un lavoro di
......@@ -167,9 +167,18 @@ Ciclo di una revisione
della lista linux-kernel obietta la bontà della patch, sollevando problemi
che i manutentori ed i membri non avevano compreso, allora la patch verrà
rimossa dalla coda.
- Alla fine del ciclo di revisione tutte le patch che hanno ricevuto l'ACK
verranno aggiunte per il prossimo rilascio -stable, e successivamente
questo nuovo rilascio verrà fatto.
- Le patch che hanno ricevuto un ACK verranno inviate nuovamente come parte di
un rilascio candidato (-rc) al fine di essere verificate dagli sviluppatori e
dai testatori.
- Solitamente si pubblica solo una -rc, tuttavia se si riscontrano problemi
importanti, alcune patch potrebbero essere modificate o essere scartate,
oppure nuove patch potrebbero essere messe in coda. Dunque, verranno pubblicate
nuove -rc e così via finché non si ritiene che non vi siano più problemi.
- Si può rispondere ad una -rc scrivendo sulla lista di discussione un'email
con l'etichetta "Tested-by:". Questa etichetta verrà raccolta ed aggiunta al
commit rilascio.
- Alla fine del ciclo di revisione il nuovo rilascio -stable conterrà tutte le
patch che erano in coda e sono state verificate.
- Le patch di sicurezza verranno accettate nei sorgenti -stable direttamente
dalla squadra per la sicurezza del kernel, e non passerà per il normale
ciclo di revisione. Contattate la suddetta squadra per maggiori dettagli
......@@ -186,8 +195,19 @@ Sorgenti
- Il rilascio definitivo, e marchiato, di tutti i kernel stabili può essere
trovato in rami distinti per versione al seguente indirizzo:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
- I rilasci candidati di tutti i kernel stabili possono essere trovati al
seguente indirizzo:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/
.. warning::
I sorgenti -stable-rc sono un'istantanea dei sorgenti stable-queue e
subirà frequenti modifiche, dunque verrà anche trapiantato spesso.
Dovrebbe essere usato solo allo scopo di verifica (per esempio in un
sistema di CI)
Comitato per la revisione
-------------------------
......
.. include:: ../disclaimer-ita.rst
:Original: :ref:`Documentation/process/submitting-drivers.rst <submittingdrivers>`
:Translator: Federico Vaga <federico.vaga@vaga.pv.it>
.. _it_submittingdrivers:
Sottomettere driver per il kernel Linux
=======================================
.. note::
Questo documento è vecchio e negli ultimi anni non è stato più aggiornato;
dovrebbe essere aggiornato, o forse meglio, rimosso. La maggior parte di
quello che viene detto qui può essere trovato anche negli altri documenti
dedicati allo sviluppo. Per questo motivo il documento non verrà tradotto.
......@@ -18,16 +18,18 @@ Questo documento contiene un vasto numero di suggerimenti concisi. Per maggiori
dettagli su come funziona il processo di sviluppo del kernel leggete
Documentation/translations/it_IT/process/development-process.rst. Leggete anche
Documentation/translations/it_IT/process/submit-checklist.rst per una lista di
punti da verificare prima di inviare del codice. Se state inviando un driver,
allora leggete anche
Documentation/translations/it_IT/process/submitting-drivers.rst; per delle patch
relative alle associazioni per Device Tree leggete
punti da verificare prima di inviare del codice.
Per delle patch relative alle associazioni per Device Tree leggete
Documentation/translations/it_IT/process/submitting-patches.rst.
Questa documentazione assume che sappiate usare ``git`` per preparare le patch.
Se non siete pratici di ``git``, allora è bene che lo impariate;
renderà la vostra vita di sviluppatore del kernel molto più semplice.
I sorgenti di alcuni sottosistemi e manutentori contengono più informazioni
riguardo al loro modo di lavorare ed aspettative. Consultate
:ref:`Documentation/translations/it_IT/process/maintainer-handbooks.rst <it_maintainer_handbooks_main>`
Ottenere i sorgenti attuali
---------------------------
......@@ -84,11 +86,11 @@ comporti come descritto.
I manutentori vi saranno grati se scrivete la descrizione della patch in un
formato che sia compatibile con il gestore dei sorgenti usato dal kernel,
``git``, come un "commit log". Leggete :ref:`it_explicit_in_reply_to`.
``git``, come un "commit log". Leggete :ref:`it_the_canonical_patch_format`.
Risolvete solo un problema per patch. Se la vostra descrizione inizia ad
essere lunga, potrebbe essere un segno che la vostra patch necessita d'essere
divisa. Leggete :ref:`split_changes`.
divisa. Leggete :ref:`it_split_changes`.
Quando inviate o rinviate una patch o una serie, includete la descrizione
completa delle modifiche e la loro giustificazione. Non limitatevi a dire che
......@@ -104,17 +106,28 @@ do frotz" piuttosto che "[This patch] makes xyzzy do frotz" or "[I] changed
xyzzy to do frotz", come se steste dando ordini al codice di cambiare il suo
comportamento.
Se la patch corregge un baco conosciuto, fare riferimento a quel baco inserendo
il suo numero o il suo URL. Se la patch è la conseguenza di una discussione
su una lista di discussione, allora fornite l'URL all'archivio di quella
discussione; usate i collegamenti a https://lore.kernel.org/ con il
``Message-Id``, in questo modo vi assicurerete che il collegamento non diventi
invalido nel tempo.
Se ci sono delle discussioni, o altre informazioni d'interesse, che fanno
riferimento alla patch, allora aggiungete l'etichetta 'Link:' per farvi
riferimento. Per esempio, se la vostra patch corregge un baco potete aggiungere
quest'etichetta per fare riferimento ad un rapporto su una lista di discussione
o un *bug tracker*. Un altro esempio; potete usare quest'etichetta per far
riferimento ad una discussione precedentemente avvenuta su una lista di
discussione, o qualcosa di documentato sul web, da cui poi è nata la patch in
questione.
Quando volete fare riferimento ad una lista di discussione, preferite il
servizio d'archiviazione lore.kernel.org. Per create un collegamento URL è
sufficiente usare il campo ``Message-Id``, presente nell'intestazione del
messaggio, senza parentesi angolari. Per esempio::
Link: https://lore.kernel.org/r/30th.anniversary.repost@klaava.Helsinki.FI/
Prima d'inviare il messaggio ricordatevi di verificare che il collegamento così
creato funzioni e che indirizzi verso il messaggio desiderato.
Tuttavia, cercate di rendere la vostra spiegazione comprensibile anche senza
far riferimento a fonti esterne. In aggiunta ai collegamenti a bachi e liste
di discussione, riassumente i punti più importanti della discussione che hanno
portato alla creazione della patch.
Tuttavia, provate comunque a dare una spiegazione comprensibile anche senza
accedere alle fonti esterne. Inoltre, riassumente i punti più salienti che hanno
condotto all'invio della patch.
Se volete far riferimento a uno specifico commit, non usate solo
l'identificativo SHA-1. Per cortesia, aggiungete anche la breve riga
......@@ -227,9 +240,10 @@ nella vostra patch.
Dovreste sempre inviare una copia della patch ai manutentori dei sottosistemi
interessati dalle modifiche; date un'occhiata al file MAINTAINERS e alla storia
delle revisioni per scoprire chi si occupa del codice. Lo script
scripts/get_maintainer.pl può esservi d'aiuto. Se non riuscite a trovare un
manutentore per il sottosistema su cui state lavorando, allora Andrew Morton
(akpm@linux-foundation.org) sarà la vostra ultima possibilità.
scripts/get_maintainer.pl può esservi d'aiuto (passategli il percorso alle
vostre patch). Se non riuscite a trovare un manutentore per il sottosistema su
cui state lavorando, allora Andrew Morton (akpm@linux-foundation.org) sarà la
vostra ultima possibilità.
Normalmente, dovreste anche scegliere una lista di discussione a cui inviare la
vostra serie di patch. La lista di discussione linux-kernel@vger.kernel.org
......@@ -324,14 +338,19 @@ cosa stia accadendo.
Assicuratevi di dire ai revisori quali cambiamenti state facendo e di
ringraziarli per il loro tempo. Revisionare codice è un lavoro faticoso e che
richiede molto tempo, e a volte i revisori diventano burberi. Tuttavia, anche
in questo caso, rispondete con educazione e concentratevi sul problema che
hanno evidenziato.
richiede molto tempo, e a volte i revisori diventano burberi. Tuttavia, anche in
questo caso, rispondete con educazione e concentratevi sul problema che hanno
evidenziato. Quando inviate una version successiva ricordatevi di aggiungere un
``patch changelog`` alla email di intestazione o ad ogni singola patch spiegando
le differenze rispetto a sottomissioni precedenti (vedere
:ref:`it_the_canonical_patch_format`).
Leggete Documentation/translations/it_IT/process/email-clients.rst per
le raccomandazioni sui programmi di posta elettronica e l'etichetta da usare
sulle liste di discussione.
.. _it_resend_reminders:
Non scoraggiatevi - o impazientitevi
------------------------------------
......@@ -504,7 +523,8 @@ Utilizzare Reported-by:, Tested-by:, Reviewed-by:, Suggested-by: e Fixes:
L'etichetta Reported-by da credito alle persone che trovano e riportano i bachi
e si spera che questo possa ispirarli ad aiutarci nuovamente in futuro.
Rammentate che se il baco è stato riportato in privato, dovrete chiedere il
permesso prima di poter utilizzare l'etichetta Reported-by.
permesso prima di poter utilizzare l'etichetta Reported-by. Questa etichetta va
usata per i bachi, dunque non usatela per richieste di nuove funzionalità.
L'etichetta Tested-by: indica che la patch è stata verificata con successo
(su un qualche sistema) dalla persona citata. Questa etichetta informa i
......@@ -574,6 +594,8 @@ previste per i kernel stabili, e nemmeno dalla necessità di aggiungere
in copia conoscenza stable@vger.kernel.org su tutte le patch per
suddetti kernel.
.. _it_the_canonical_patch_format:
Il formato canonico delle patch
-------------------------------
......@@ -719,6 +741,8 @@ messe correttamente oltre la riga.::
Maggiori dettagli sul formato delle patch nei riferimenti qui di seguito.
.. _it_backtraces:
Aggiungere i *backtrace* nei messaggi di commit
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
......
......@@ -129,8 +129,8 @@ linux-api@vger.kernel.org に送ることを勧めます。
ルに従っているものだけを受け付け、多くの人は正しいスタイルのコード
だけをレビューします。
:ref:`Documentation/process/submitting-patches.rst <codingstyle>` と :ref:`Documentation/process/submitting-drivers.rst <submittingdrivers>`
れらのファイルには、どうやってうまくパッチを作って投稿するかにつ
:ref:`Documentation/process/submitting-patches.rst <codingstyle>`
このファイルには、どうやってうまくパッチを作って投稿するかにつ
いて非常に詳しく書かれており、以下を含みます (これだけに限らない
けれども)
......
......@@ -124,7 +124,7 @@ mtk.manpages@gmail.com의 메인테이너에게 보낼 것을 권장한다.
메인테이너들은 이 규칙을 따르는 패치들만을 받아들일 것이고 많은 사람들이
그 패치가 올바른 스타일일 경우만 코드를 검토할 것이다.
:ref:`Documentation/process/submitting-patches.rst <submittingpatches>` 와 :ref:`Documentation/process/submitting-drivers.rst <submittingdrivers>`
:ref:`Documentation/process/submitting-patches.rst <submittingpatches>`
이 파일들은 성공적으로 패치를 만들고 보내는 법을 다음의 내용들로
굉장히 상세히 설명하고 있다(그러나 다음으로 한정되진 않는다).
......
......@@ -123,14 +123,14 @@ nr_virtfn'是要启用的VF的编号。
...
}
static int dev_suspend(struct pci_dev *dev, pm_message_t state)
static int dev_suspend(struct device *dev)
{
...
return 0;
}
static int dev_resume(struct pci_dev *dev)
static int dev_resume(struct device *dev)
{
...
......@@ -163,8 +163,7 @@ nr_virtfn'是要启用的VF的编号。
.id_table = dev_id_table,
.probe = dev_probe,
.remove = dev_remove,
.suspend = dev_suspend,
.resume = dev_resume,
.driver.pm = &dev_pm_ops
.shutdown = dev_shutdown,
.sriov_configure = dev_sriov_configure,
};
......@@ -255,13 +255,13 @@ pci_set_master()将通过设置PCI_COMMAND寄存器中的总线主控位来启
虽然所有的驱动程序都应该明确指出PCI总线主控的DMA功能(如32位或64位),但对于流式
数据来说,具有超过32位总线主站功能的设备需要驱动程序通过调用带有适当参数的
``pci_set_dma_mask()`` 来“注册”这种功能。一般来说,在系统RAM高于4G物理地址的情
``dma_set_mask()`` 来“注册”这种功能。一般来说,在系统RAM高于4G物理地址的情
况下,这允许更有效的DMA。
所有PCI-X和PCIe兼容设备的驱动程序必须调用 ``pci_set_dma_mask()`` ,因为它们
所有PCI-X和PCIe兼容设备的驱动程序必须调用 ``dma_set_mask()`` ,因为它们
是64位DMA设备。
同样,如果设备可以通过调用 ``pci_set_consistent_dma_mask()`` 直接寻址到
同样,如果设备可以通过调用 ``dma_set_coherent_mask()`` 直接寻址到
4G物理地址以上的系统RAM中的“一致性内存”,那么驱动程序也必须“注册”这种功能。同
样,这包括所有PCI-X和PCIe兼容设备的驱动程序。许多64位“PCI”设备(在PCI-X之前)
和一些PCI-X设备对有效载荷(“流式”)数据具有64位DMA功能,但对控制(“一致性”)数
......
......@@ -36,6 +36,7 @@ Todolist:
:maxdepth: 1
reporting-issues
reporting-regressions
security-bugs
bug-hunting
bug-bisect
......@@ -44,7 +45,6 @@ Todolist:
Todolist:
* reporting-bugs
* ramoops
* dynamic-debug-howto
* kdump/index
......
......@@ -210,6 +210,8 @@ schemes/<N>/
- ``pageout``: 为具有 ``MADV_PAGEOUT`` 的区域调用 ``madvise()`` 。
- ``hugepage``: 为带有 ``MADV_HUGEPAGE`` 的区域调用 ``madvise()`` 。
- ``nohugepage``: 为带有 ``MADV_NOHUGEPAGE`` 的区域调用 ``madvise()``。
- ``lru_prio``: 在其LRU列表上对区域进行优先排序。
- ``lru_deprio``: 对区域的LRU列表进行降低优先处理。
- ``stat``: 什么都不做,只计算统计数据
schemes/<N>/access_pattern/
......
.. SPDX-License-Identifier: (GPL-2.0+ OR CC-BY-4.0)
..
If you want to distribute this text under CC-BY-4.0 only, please use 'The
Linux kernel developers' for author attribution and link this as source:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/Documentation/admin-guide/reporting-issues.rst
..
Note: Only the content of this RST file as found in the Linux kernel sources
is available under CC-BY-4.0, as versions of this text that were processed
(for example by the kernel's build system) might contain content taken from
files which use a more restrictive license.
.. See the bottom of this file for additional redistribution information.
.. include:: ../disclaimer-zh_CN.rst
......@@ -29,7 +22,9 @@
请搜索 `LKML内核邮件列表 <https://lore.kernel.org/lkml/>`_ 和
`Linux稳定版邮件列表 <https://lore.kernel.org/stable/>`_ 存档中匹配的报告并
加入讨论。如果找不到匹配的报告,请安装该系列的最新版本。如果它仍然出现问题,
报告给稳定版邮件列表(stable@vger.kernel.org)。
请报告给稳定版邮件列表(stable@vger.kernel.org)并抄送回归邮件列表
(regressions@lists.linux.dev);理想情况下,还可以抄送维护者和相关子系统的
邮件列表。
在所有其他情况下,请尽可能猜测是哪个内核部分导致了问题。查看MAINTAINERS文件,
了解开发人员希望如何得知问题,大多数情况下,报告问题都是通过电子邮件和抄送
......@@ -46,9 +41,10 @@
有使用附加模块)。还要确保它是在一个正常的环境中构建和运行,并且在问题发生
之前没有被污染(tainted)。
在编写报告时,要涵盖与问题相关的所有信息,如使用的内核和发行版。在碰见回归时,
尝试给出引入它的更改的提交ID,二分可以找到它。如果您同时面临Linux内核的多个
问题,请分别报告每个问题。
当你同时面临Linux内核的多个问题时,请分别报告。在编写报告时,要涵盖与问题
相关的所有信息,如使用的内核和发行版。如果碰见回归,请把报告抄送回归邮件列表
(regressions@lists.linux.dev)。也请试试用二分法找出源头;如果成功找到,请
在报告中写上它的提交ID并抄送sign-off-by链中的所有人。
一旦报告发出,请回答任何出现的问题,并尽可能地提供帮助。这包括通过不时重新
测试新版本并发送状态更新来推动进展。
......@@ -156,9 +152,10 @@
存在问题,因为问题可能已经在那里被修复了。如果您第一次发现供应商内核的问题,
请检查已知最新版本的普通构建是否可以正常运行。
* 向Linux稳定版邮件列表发送一个简短的问题报告(stable@vger.kernel.org)。大致
描述问题,并解释如何复现。讲清楚首个出现问题的版本和最后一个工作正常的版本。
然后等待进一步的指示。
* 向Linux稳定版邮件列表发送一个简短的问题报告(stable@vger.kernel.org)并抄送
Linux回归邮件列表(regressions@lists.linux.dev);如果你怀疑是由某子系统
引起的,请抄送其维护人员和子系统邮件列表。大致描述问题,并解释如何复现。
讲清楚首个出现问题的版本和最后一个工作正常的版本。然后等待进一步的指示。
下面的参考章节部分详细解释了这些步骤中的每一步。
......@@ -296,17 +293,14 @@ Linus Torvalds和主要的Linux内核开发人员希望看到一些问题尽快
报告过程中有一些“高优先级问题”的处理略有不同。有三种情况符合条件:回归、安全
问题和非常严重的问题。
如果在旧版本的Linux内核中工作的东西不能在新版本的Linux内核中工作,或者某种
程度上在新版本的Linux内核中工作得更差,那么你就需要处理“回归”。因此,当一个
在Linux 5.7中表现良好的WiFi驱动程序在5.8中表现不佳或根本不能工作时,这是一
种回归。如果应用程序在新的内核中出现不稳定的现象,这也是一种回归,这可能是
由于内核和用户空间之间的接口(如procfs和sysfs)发生不兼容的更改造成的。显著
的性能降低或功耗增加也可以称为回归。但是请记住:新内核需要使用与旧内核相似的
配置来构建(参见下面如何实现这一点)。这是因为内核开发人员在实现新特性时有
时无法避免不兼容性;但是为了避免回归,这些特性必须在构建配置期间显式地启用。
如果某个应用程序或实际用例在原先的Linux内核上运行良好,但在使用类似配置编译的
较新版本上效果更差、或者根本不能用,那么你就需要处理回归问题。
Documentation/admin-guide/reporting-regressions.rst 对此进行了更详细的解释。
它还提供了很多你可能想知道的关于回归的其他信息;例如,它解释了如何将您的问题
添加到回归跟踪列表中,以确保它不会被忽略。
什么是安全问题留给您自己判断。在继续之前,请考虑阅读
“Documentation/translations/zh_CN/admin-guide/security-bugs.rst”
Documentation/translations/zh_CN/admin-guide/security-bugs.rst
因为它提供了如何最恰当地处理安全问题的额外细节。
当发生了完全无法接受的糟糕事情时,此问题就是一个“非常严重的问题”。例如,
......@@ -390,7 +384,7 @@ Linux内核破坏了它处理的数据或损坏了它运行的硬件。当内核
核未被污染,那么它应该以“Not infected”结束;如果你看到“Tainted:”且后跟一些
空格和字母,那就被污染了。
如果你的内核被污染了,请阅读“Documentation/translations/zh_CN/admin-guide/tainted-kernels.rst”
如果你的内核被污染了,请阅读 Documentation/translations/zh_CN/admin-guide/tainted-kernels.rst
以找出原因。设法消除污染因素。通常是由以下三种因素之一引起的:
1. 发生了一个可恢复的错误(“kernel Oops”),内核污染了自己,因为内核知道在
......@@ -591,7 +585,8 @@ ath10k@lists.infradead.org”,将引导您到ath10k邮件列表的信息页,
搜索引擎,并添加类似“site:lists.infadead.org/pipermail/ath10k/”这
样的搜索条件,这会把结果限制在该链接中的档案。
也请进一步搜索网络、LKML和bugzilla.kernel.org网站。
也请进一步搜索网络、LKML和bugzilla.kernel.org网站。如果你的报告需要发送到缺陷
跟踪器中,那么您可能还需要检查子系统的邮件列表存档,因为可能有人只在那里报告了它。
有关如何搜索以及在找到匹配报告时如何操作的详细信息,请参阅上面的“搜索现有报告
(第一部分)”。
......@@ -802,10 +797,10 @@ Linux 首席开发者 Linus Torvalds 认为 Linux 内核永远不应恶化,这
重现它。
有一个叫做“二分”的过程可以来寻找变化,这在
“Documentation/translations/zh_CN/admin-guide/bug-bisect.rst”文档中进行了详细
Documentation/translations/zh_CN/admin-guide/bug-bisect.rst 文档中进行了详细
的描述,这个过程通常需要你构建十到二十个内核镜像,每次都尝试在构建下一个镜像
之前重现问题。是的,这需要花费一些时间,但不用担心,它比大多数人想象的要快得多。
多亏了“binary search二进制搜索”,这将引导你在源代码管理系统中找到导致回归的提交。
多亏了“binary search二搜索”,这将引导你在源代码管理系统中找到导致回归的提交。
一旦你找到它,就在网上搜索其主题、提交ID和缩短的提交ID(提交ID的前12个字符)。
如果有的话,这将引导您找到关于它的现有报告。
......@@ -823,10 +818,10 @@ Linux 首席开发者 Linus Torvalds 认为 Linux 内核永远不应恶化,这
当处理回归问题时,请确保你所面临的问题真的是由内核引起的,而不是由其他东西
引起的,如上文所述。
在整个过程中,请记住:只有当旧内核和新内核的配置相似时,问题才算回归。最好
的方法是:把配置文件(``.config``)从旧的工作内核直接复制到你尝试的每个新内
核版本。之后运行 ``make oldnoconfig`` 来调整它以适应新版本的需要,而不启用
任何新的功能,因为那些功能也可能导致回归
在整个过程中,请记住:只有当旧内核和新内核的配置相似时,问题才算回归。这可以
通过 ``make olddefconfig`` 来实现,详细解释参见
Documentation/admin-guide/reporting-regressions.rst ;它还提供了大量其他您
可能希望了解的有关回归的信息
撰写并发送报告
......@@ -959,11 +954,19 @@ Linux 首席开发者 Linus Torvalds 认为 Linux 内核永远不应恶化,这
**非常严重的缺陷** :确保在主题或工单标题以及第一段中明显标出 severeness
(非常严重的)。
**回归** :如果问题是一个回归,请在邮件的主题或缺陷跟踪器的标题中添加
[REGRESSION]。如果您没有进行二分,请至少注明您测试的最新主线版本(比如 5.7)
和出现问题的最新版本(比如 5.8)。如果您成功地进行了二分,请注明导致回归
的提交ID和主题。也请添加该变更的作者到你的报告中;如果您需要将您的缺陷提交
到缺陷跟踪器中,请将报告以私人邮件的形式转发给他,并注明报告提交地点。
**回归** :报告的主题应以“[REGRESSION]”开头。
如果您成功用二分法定位了问题,请使用引入回归之更改的标题作为主题的第二部分。
请在报告中写明“罪魁祸首”的提交ID。如果未能成功二分,请在报告中讲明最后一个
正常工作的版本(例如5.7)和最先发生问题的版本(例如5.8-rc1)。
通过邮件发送报告时,请抄送Linux回归邮件列表(regressions@lists.linux.dev)。
如果报告需要提交到某个web追踪器,请继续提交;并在提交后,通过邮件将报告转发
至回归列表;抄送相关子系统的维护人员和邮件列表。请确保报告是内联转发的,不要
把它作为附件。另外请在顶部添加一个简短的说明,在那里写上工单的网址。
在邮寄或转发报告时,如果成功二分,需要将“罪魁祸首”的作者添加到收件人中;同时
抄送signed-off-by链中的每个人,您可以在提交消息的末尾找到。
**安全问题** :对于这种问题,你将必须评估:如果细节被公开披露,是否会对其他
用户产生短期风险。如果不会,只需按照所述继续报告问题。如果有此风险,你需要
......@@ -980,7 +983,7 @@ Linux 首席开发者 Linus Torvalds 认为 Linux 内核永远不应恶化,这
报告,请将报告的文本转发到这些地址;但请在报告的顶部加上注释,表明您提交了
报告,并附上工单链接。
更多信息请参见“Documentation/translations/zh_CN/admin-guide/security-bugs.rst”
更多信息请参见 Documentation/translations/zh_CN/admin-guide/security-bugs.rst
发布报告后的责任
......@@ -1173,14 +1176,18 @@ FLOSS 问题报告的人看,询问他们的意见。同时征求他们关于
报告回归
~~~~~~~~~~
*向Linux稳定版邮件列表发送一个简短的问题报告(stable@vger.kernel.org)。
大致描述问题,并解释如何复现。讲清楚首个出现问题的版本和最后一个工作正常
的版本。然后等待进一步的指示。*
*向Linux稳定版邮件列表发送一个简短的问题报告(stable@vger.kernel.org)并
抄送Linux回归邮件列表(regressions@lists.linux.dev);如果你怀疑是由某
子系统引起的,请抄送其维护人员和子系统邮件列表。大致描述问题,并解释如
何复现。讲清楚首个出现问题的版本和最后一个工作正常的版本。然后等待进一
步的指示。*
当报告在稳定版或长期支持内核线内发生的回归(例如在从5.10.4更新到5.10.5时),
一份简短的报告足以快速报告问题。因此只需要粗略的描述。
一份简短的报告足以快速报告问题。因此只需向稳定版和回归邮件列表发送粗略的描述;
不过如果你怀疑某子系统导致此问题的话,请一并抄送其维护人员和子系统邮件列表,
这会加快进程。
但是请注意,如果您能够指明引入问题的确切版本,这将对开发人员有很大帮助。因此
请注意,如果您能够指明引入问题的确切版本,这将对开发人员有很大帮助。因此
如果有时间的话,请尝试使用普通内核找到该版本。让我们假设发行版发布Linux内核
5.10.5到5.10.8的更新时发生了故障。那么按照上面的指示,去检查该版本线中的最新
内核,比如5.10.9。如果问题出现,请尝试普通5.10.5,以确保供应商应用的补丁不会
......@@ -1190,7 +1197,9 @@ FLOSS 问题报告的人看,询问他们的意见。同时征求他们关于
前一段基本粗略地概述了“二分”方法。一旦报告出来,您可能会被要求做一个正确的
报告,因为它允许精确地定位导致问题的确切更改(然后很容易被恢复以快速修复问题)。
因此如果时间允许,考虑立即进行适当的二分。有关如何详细信息,请参阅“对回归的
特别关照”部分和文档“Documentation/translations/zh_CN/admin-guide/bug-bisect.rst”。
特别关照”部分和文档 Documentation/translations/zh_CN/admin-guide/bug-bisect.rst 。
如果成功二分的话,请将“罪魁祸首”的作者添加到收件人中;同时抄送所有在
signed-off-by链中的人,您可以在提交消息的末尾找到。
“报告仅在旧内核版本线中发生的问题”的参考
......@@ -1207,7 +1216,7 @@ FLOSS 问题报告的人看,询问他们的意见。同时征求他们关于
即使是微小的、看似明显的代码变化,有时也会带来新的、完全意想不到的问题。稳
定版和长期支持内核的维护者非常清楚这一点,因此他们只对这些内核进行符合
“Documentation/translations/zh_CN/process/stable-kernel-rules.rst”中所列出的
Documentation/translations/zh_CN/process/stable-kernel-rules.rst 中所列出的
规则的修改。
复杂或有风险的修改不符合条件,因此只能应用于主线。其他的修复很容易被回溯到
......@@ -1333,3 +1342,27 @@ FLOSS 问题报告的人看,询问他们的意见。同时征求他们关于
向 Linux 内核开发者报告问题是很难的:这个文档的长度和复杂性以及字里行间的内
涵都说明了这一点。但目前就是这样了。这篇文字的主要作者希望通过记录现状来为
以后改善这种状况打下一些基础。
..
end-of-content
..
This English version of this document is maintained by Thorsten Leemhuis
<linux@leemhuis.info>. If you spot a typo or small mistake, feel free to
let him know directly and he'll fix it. For translation problems, please
contact with translators. You are free to do the same in a mostly informal
way if you want to contribute changes to the text, but for copyright
reasons please CC linux-doc@vger.kernel.org and "sign-off" your
contribution as Documentation/process/submitting-patches.rst outlines in
the section "Sign your work - the Developer's Certificate of Origin".
..
This text is available under GPL-2.0+ or CC-BY-4.0, as stated at the top
of the file. If you want to distribute this text under CC-BY-4.0 only,
please use "The Linux kernel developers" for author attribution and link
this as source:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/Documentation/admin-guide/reporting-issues.rst
..
Note: Only the content of this RST file as found in the Linux kernel sources
is available under CC-BY-4.0, as versions of this text that were processed
(for example by the kernel's build system) might contain content taken from
files which use a more restrictive license.
.. SPDX-License-Identifier: (GPL-2.0+ OR CC-BY-4.0)
.. 【重分发信息参见本文件结尾】
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/admin-guide/reporting-regressions.rst
:译者:
吴想成 Wu XiangCheng <bobwxc@email.cn>
============
报告回归问题
============
“*我们拒绝出现回归*”是Linux内核开发的首要规则;Linux的发起者和领军开发者Linus
Torvalds立下了此规则并确保它被落实。
本文档描述了这条规则对用户的意义,以及Linux内核开发模型如何确保解决所有被报告
的回归;关于内核开发者如何处理的方面参见 Documentation/process/handling-regressions.rst 。
本文重点(亦即“太长不看”)
==========================
#. 如果某程序在原先的Linux内核上运行良好,但在较新版本上效果更差、或者根本不
能用,那么你就碰见回归问题了。注意,新内核需要使用类似配置编译;更多相关细
节参见下方。
#. 按照 Documentation/translations/zh_CN/admin-guide/reporting-issues.rst 中
所说的报告你的问题,该文档已经包含了所有关于回归的重要方面,为了方便起见也
复制到了下面。两个重点:在报告主题中使用“[REGRESSION]”开头并抄送或转发到
`回归邮件列表 <https://lore.kernel.org/regressions/>`_
(regressions@lists.linux.dev)。
#. 可选但是建议:在发送或转发报告时,指明该回归发生的起点,以便Linux内核回归
追踪机器人“regzbot”可以追踪此问题::
#regzbot introduced v5.13..v5.14-rc1
与用户相关的所有Linux内核回归细节
=================================
基本重点
--------
什么是“回归”以及什么是“无回归规则”?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
如果某程序/实例在原先的Linux内核上运行良好,但在较新版本上效果更差、或者根本
不能用,那么你就碰见回归问题了。“无回归规则”不允许出现这种情况。如果偶然发
生了,导致问题的开发者应当迅速修复问题。
也就是说,若Linux 5.13中的WiFi驱动程序运行良好,但是在5.14版本上却不能用、速
度明显变慢或出现错误,那就出现了回归。如果某正常工作的应用程序突然在新内核上
出现不稳定,这也是回归;这些问题可能是由于procfs、sysfs或Linux提供给用户空间
软件的许多其他接口之一的变化。但请记住,前述例子中的5.14需要使用类似于5.13的
配置构建。这可以用 ``make olddefconfig`` 实现,详细解释见下。
注意本节第一句话中的“实例”:即使开发者需要遵循“无回归”规则,但仍可自由地改
变内核的任何方面,甚至是导出到用户空间的API或ABI,只要别破坏现有的应用程序或
用例。
还需注意,“无回归”规则只限制内核提供给用户空间的接口。它不适用于内核内部接
口,比如一些外部开发的驱动程序用来插入钩子到内核的模块API。
如何报告回归?
~~~~~~~~~~~~~~
只需按照 Documentation/translations/zh_CN/admin-guide/reporting-issues.rst 中
所说的报告你的问题,该文档已经包含了要点。下面几点概述了一下只在回归中重要的
方面:
* 在检查可加入讨论的现有报告时,别忘了搜索 `Linux回归邮件列表
<https://lore.kernel.org/regressions/>`_ 和 `regzbot网页界面
<https://linux-regtracking.leemhuis.info/regzbot/>`_ 。
* 在报告主题的开头加上“[REGRESSION]”。
* 在你的报告中明确最后一个正常工作的内核版本和首个出问题的版本。如若可能,
用二分法尝试找出导致回归的变更,更多细节见下。
* 记得把报告发到Linux回归邮件列表(regressions@lists.linux.dev)。
* 如果通过邮件报告回归,请抄送回归列表。
* 如果你使用某些缺陷追踪器报告回归,请通过邮件转发已提交的报告到回归列表,
并抄送维护者以及出问题的相关子系统的邮件列表。
如果是稳定版或长期支持版系列(如v5.15.3…v5.15.5)的回归,请记得抄送
`Linux稳定版邮件列表 <https://lore.kernel.org/stable/>`_ (stable@vger.kernel.org)。
如果你成功地执行了二分,请抄送肇事提交的信息中所有签了“Signed-off-by:”的人。
在抄送你的报告到列表时,也请记得通知前述的Linux内核回归追踪机器人。只需在邮件
中包含如下片段::
#regzbot introduced: v5.13..v5.14-rc1
Regzbot会就将你的邮件视为在某个特定版本区间的回归报告。上例中即linux v5.13仍
然正常,而Linux 5.14-rc1是首个您遇到问题的版本。如果你执行了二分以查找导致回
归的提交,请使用指定肇事提交的id代替::
#regzbot introduced: 1f2e3d4c5d
添加这样的“regzbot命令”对你是有好处的,它会确保报告不会被忽略。如果你省略了
它,Linux内核的回归跟踪者会把你的回归告诉regzbot,只要你发送了一个副本到回归
邮件列表。但是回归跟踪者只有一个人,有时不得不休息或甚至偶尔享受可以远离电脑
的时光(听起来很疯狂)。因此,依赖此人手动将回归添加到 `已追踪且尚未解决的
Linux内核回归列表 <https://linux-regtracking.leemhuis.info/regzbot/>`_ 和
regzbot发送的每周回归报告,可能会出现延迟。 这样的延误会导致Linus Torvalds
在决定“继续开发还是发布新版本?”时忽略严重的回归。
真的修复了所有的回归吗?
~~~~~~~~~~~~~~~~~~~~~~~~
几乎所有都是,只要引起问题的变更(肇事提交)被可靠定位。也有些回归可以不用这
样,但通常是必须的。
谁需要找出回归的根本原因?
~~~~~~~~~~~~~~~~~~~~~~~~~~
受影响代码区域的开发者应该自行尝试定位问题所在。但仅靠他们的努力往往是不可
能做到的,很多问题只发生在开发者的无法接触的其他特定外部环境中——例如特定的
硬件平台、固件、Linux发行版、系统的配置或应用程序。这就是为什么最终往往是报
告者定位肇事提交;有时用户甚至需要再运行额外测试以查明确切的根本原因。开发
者应该提供建议和可能的帮助,以使普通用户更容易完成该流程。
如何找到罪魁祸首?
~~~~~~~~~~~~~~~~~~
如 Documentation/translations/zh_CN/admin-guide/reporting-issues.rst (简要)
和 Documentation/translations/zh_CN/admin-guide/bug-bisect.rst (详细)中所
述,执行二分。听起来工作量很大,但大部分情况下很快就能找到罪魁祸首。如果这很
困难或可靠地重现问题很耗时,请考虑与其他受影响的用户合作,一起缩小搜索范围。
当出现回归时我可以向谁寻求建议?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
发送邮件到回归邮件列表(regressions@lists.linux.dev)同时抄送Linux内核的回归
跟踪者(regressions@leemhuis.info);如果问题需要保密处理,可以省略列表。
关于回归的更多细节
------------------
“无回归规则”的目标是什么?
~~~~~~~~~~~~~~~~~~~~~~~~~~
用户应该放心升级内核版本,而不必担心有程序可能崩溃。这符合内核开发者的利益,
可以使更新有吸引力:他们不希望用户停留在停止维护或超过一年半的稳定/长期Linux
版本系列上。这也符合所有人的利益,因为 `那些系列可能含有已知的缺陷、安全问题
或其他后续版本已经修复的问题
<http://www.kroah.com/log/blog/2018/08/24/what-stable-kernel-should-i-use/>`_ 。
此外,内核开发者希望使用户测试最新的预发行版或常规发行版变得简单而有吸引力。
这同样符合所有人的利益,如果新版本出来后很快就有相关报告,会使追踪和修复问题
更容易。
实际中“无回归”规则真的可行吗?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
这不是句玩笑话,请见Linux创建者和主要开发人员Linus Torvalds在邮件列表中的许
多发言,其中一些在 Documentation/process/handling-regressions.rst 中被引用。
此规则的例外情况极为罕见;之前当开发者认为某个特定的情况有必要援引例外时,
基本都被证明错了。
谁来确保“无回归”被落实?
~~~~~~~~~~~~~~~~~~~~~~~~
照看和支撑树的子系统维护者应该关心这一点——例如,Linus Torvalds之于主线,
Greg Kroah-Hartman等人之于各种稳定/长期系列。
他们都得到了别人的帮助,以确保回归报告不会被遗漏。其中之一是Thorsten
Leemhuis,他目前担任Linux内核的“回归跟踪者”;为了做好这项工作,他使用了
regzbot——Linux内核回归跟踪机器人。所以这就是为什么要抄送或转发你的报告到
回归邮件列表来通知这些人,已经最好在你的邮件中包含“regzbot命令”来立即追踪它。
回归通常多久能修复?
~~~~~~~~~~~~~~~~~~~~
开发者应该尽快修复任何被报告的回归,以提供及时为受影响的用户提供解决方案,并
防止更多用户遇到问题;然而,开发人员需要花足够的时间和注意力确保回归修复不会
造成额外的损害。
因此,答案取决于各种因素,如回归的影响、存在时长或出现于哪个Linux版本系列。
但最终,大多数的回归应该在两周内修复。
当问题可以通过升级某些软件解决时,是回归吗?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
基本都是。如果开发人员告诉您其他情况,请咨询上述回归跟踪者。
当新内核变慢或能耗增加,是回归吗?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
是的,但有一些差别。在微型基准测试中变慢5%不太可能被视为回归,除非它也会对
广泛基准测试的结果产生超过1%的影响。如果有疑问,请寻求建议。
当更新Linux时外部内核模块崩溃了,是回归吗?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
不,因为“无回归”规则仅限于Linux内核提供给用户空间的接口和服务。因此,它不包括
构建或运行外部开发的内核模块,因为它们在内核空间中运行与挂进内核使用的内部接
口偶尔会变化。
如何处理安全修复引起的回归?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
在极为罕见的情况下,安全问题无法在不引起回归的情况下修复;这些修复都被放弃了,
因为它们终究会引起问题。幸运的是这种两难境地基本都可以避免,受影响区域的主要
开发者以及Linus Torvalds本人通常都会努力在不引入回归的情况下解决安全问题。
如果你仍然面临此种情况,请查看邮件列表档案是否有人尽力避免过回归。如果没有,
请报告它;如有疑问,请如上所述寻求建议。
当修复回归时不可避免会引入另一个,如何处理?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
很遗憾这种事确实会出现,但幸运的是并不经常出现;如果发生了,受影响代码区的资
深开发者应当调查该问题以找到避免回归的解决方法,至少避免它们的影响。如果你遇
到这样的情况,如上所述:检查之前的讨论是否有人已经尽了最大努力,如有疑问请寻
求建议。
小提示:如果人们在每个开发周期中定期给出主线预发布(即v5.15-rc1或-rc3)以供
测试,则可以避免这种情况。为了更好地解释,可以设想一个在Linux v5.14和v5.15-rc1
之间集成的更改,该更改导致了回归,但同时是应用于5.15-rc1的其他改进的强依赖。
如果有人在5.15发布之前就发现并报告了这个问题,那么所有更改都可以直接撤销,从
而解决回归问题。而就在几天或几周后,此解决方案变成了不可能,因为一些软件可能
已经开始依赖于后续更改之一:撤销所有更改将导致上述用户软件出现回归,这是不可
接受的。
若我所依赖的功能在数月前被移除了,是回归吗?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
是的,但如前节所述,通常很难修复此类回归。因此需要逐案处理。这也是定期测试主
线预发布对所有人有好处的另一个原因。
如果我似乎是唯一受影响的人,是否仍适用“无回归”规则?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
适用,但仅限于实际使用:Linux开发人员希望能够自由地取消那些只能在阁楼和博物
馆中找到的硬件的支持。
请注意,有时为了取得进展,不得不出现回归——后者也是防止Linux停滞不前所必需
的。因此如果回归所影响的用户很少,那么为了他们和其他人更大的利益,还是让事情
过去吧。尤其是存在某种规避回归的简单方法,例如更新一些软件或者使用专门为此目
的创建的内核参数。
回归规则是否也适用于staging树中的代码?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
不,参见 `适用于所有staging代码配置选项的帮助文本
<https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/staging/Kconfig>`_ ,
其早已声明::
请注意:这些驱动正在积极开发中,可能无法正常工作,并可能包含会在不久的
将来发生变化的用户接口。
虽然staging开发人员通常坚持“无回归”的原则,但有时为了取得进展也会违背它。这就
是为什么当staging树的WiFi驱动被基本推倒重来时,有些用户不得不处理回归(通常可
以忽略)。
为什么较新版本必须“使用相似配置编译”?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
因为Linux内核开发人员有时会集成已知的会导致回归的变更,但使它们成为可选的,并
在内核的默认配置下禁用它们。这一技巧允许进步,否则“无回归”规则将导致停滞。
例如,试想一个新的可以阻止恶意软件滥用某个内核的接口的安全特性,同时又需要满足
另一个很罕见的应用程序。上述的方法可使两方都满意:使用这些应用程序的人可以关闭
新的安全功能,而其他不会遇到麻烦的人可以启用它。
如何创建与旧内核相似的配置?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
用一个已知良好的内核启动机器,并用 ``make olddefconfig`` 配置新版的Linux。这
会让内核的构建脚本从正在运行的内核中摘录配置文件(“.config”文件),作为即将编
译的新版本的基础配置;同时将所有新的配置选项设为默认值,以禁用可能导致回归的
新功能。
如何报告在预编译的普通内核中发现的回归?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
您需要确保新的内核是用与旧版相似的配置编译(见上文),因为那些构建它们的人可
能启用了一些已知的与新内核不兼容的特性。如有疑问,请向内核的提供者报告问题并
寻求建议。
用“regzbot”追踪回归的更多信息
-----------------------------
什么是回归追踪?为啥我需要关心它?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
像“无回归”这样的规则需要有人来确保它们被遵守,否则会被有意/无意打破。历史证
明了这一点对于Linux内核开发也适用。这就是为什么Linux内核的回归跟踪者Thorsten
Leemhuis,,和另一些人尽力关注所有的回归直到他们解决。他们从未为此获得报酬,
因此这项工作是在尽最大努力的基础上完成的。
为什么/如何使用机器人追踪Linux内核回归?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
由于Linux内核开发过程的分布式和松散结构,完全手动跟踪回归已经被证明是相当困难
的。因此Linux内核的回归跟踪者开发了regzbot来促进这项工作,其长期目标是尽可能为
所有相关人员自动化回归跟踪。
Regzbot通过监视跟踪的回归报告的回复来工作。此外,它还查找用“Link:”标签引用这
些报告的补丁;对这些补丁的回复也会被跟踪。结合这些数据,可以很好地了解当前修
复过程的状态。
如何查看regzbot当前追踪的回归?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
参见 `regzbot在线 <https://linux-regtracking.leemhuis.info/regzbot/>`_ 。
何种问题可以由regzbot追踪?
~~~~~~~~~~~~~~~~~~~~~~~~~~~
该机器人只为了跟踪回归,因此请不要让regzbot涉及常规问题。但是对于Linux内核的
回归跟踪者来说,让regzbot跟踪严重问题也可以,如有关挂起、损坏数据或内部错误
(Panic、Oops、BUG()、warning…)的报告。
如何修改被追踪回归的相关信息?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
在直接或间接回复报告邮件时使用“regzbot命令”即可。最简单的方法是:在“已发送”文
件夹或邮件列表存档中找到报告,然后使用邮件客户端的“全部回复”功能对其进行回复。
在该邮件中的独立段落中可使用以下命令之一(即使用空行将这些命令中的一个或多个与
其余邮件文本分隔开)。
* 更新回归引入起点,例如在执行二分之后::
#regzbot introduced: 1f2e3d4c5d
* 设置或更新标题::
#regzbot title: foo
* 监视讨论或bugzilla.kernel.org上有关讨论或修复的工单::
#regzbot monitor: https://lore.kernel.org/r/30th.anniversary.repost@klaava.Helsinki.FI/
#regzbot monitor: https://bugzilla.kernel.org/show_bug.cgi?id=123456789
* 标记一个有更多相关细节的地方,例如有关但主题不同的邮件列表帖子或缺陷追踪器中的工单::
#regzbot link: https://bugzilla.kernel.org/show_bug.cgi?id=123456789
* 标记回归已失效::
#regzbot invalid: wasn't a regression, problem has always existed
Regzbot还支持其他一些主要由开发人员或回归追踪人员使用的命令。命令的更多细节请
参考 `入门指南 <https://gitlab.com/knurd42/regzbot/-/blob/main/docs/getting_started.md>`_
和 `参考手册 <https://gitlab.com/knurd42/regzbot/-/blob/main/docs/reference.md>`_ 。
..
正文结束
..
如本文件开头所述,本文以GPL-2.0+或CC-BY-4.0许可发行。如您想仅在CC-BY-4.0许
可下重分发本文,请用“Linux内核开发者”作为作者,并用如下链接作为来源:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/Documentation/translations/zh_CN/admin-guide/reporting-regressions.rst
..
注意:本RST文件内容只有在来自Linux内核源代码时是使用CC-BY-4.0许可的,因为经
过处理的版本(如经内核的构建系统)可能包含来自使用更严格许可证的文件的内容。
......@@ -5,6 +5,7 @@
:翻译:
司延腾 Yanteng Si <siyanteng@loongson.cn>
周彬彬 Binbin Zhou <zhoubinbin@loongson.cn>
:校译:
......@@ -278,6 +279,11 @@ HyperSparc cpu就是这样一个具有这种属性的cpu。
CPU上,因为它将cpu存储到页面上,使其变脏。同样,请看
sparc64关于如何处理这个问题的例子。
``void flush_dcache_folio(struct folio *folio)``
该函数的调用情形与flush_dcache_page()相同。它允许架构针对刷新整个
folio页面进行优化,而不是一次刷新一页。
``void copy_to_user_page(struct vm_area_struct *vma, struct page *page,
unsigned long user_vaddr, void *dst, void *src, int len)``
``void copy_from_user_page(struct vm_area_struct *vma, struct page *page,
......
......@@ -4,6 +4,7 @@
:翻译:
司延腾 Yanteng Si <siyanteng@loongson.cn>
周彬彬 Binbin Zhou <zhoubinbin@loongson.cn>
:校译:
......@@ -15,12 +16,13 @@
内核中的CPU热拔插
=================
:时间: 2016年12
:时间: 2021年9
:作者: Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
Rusty Russell <rusty@rustcorp.com.au>,
Srivatsa Vaddagiri <vatsa@in.ibm.com>,
Ashok Raj <ashok.raj@intel.com>,
Joel Schopp <jschopp@austin.ibm.com>
Joel Schopp <jschopp@austin.ibm.com>,
Thomas Gleixner <tglx@linutronix.de>
简介
====
......@@ -139,7 +141,7 @@ CPU的热拔插协作
下线情况
--------
一旦CPU被逻辑关闭,注册的热插拔状态的清除回调将被调用,从 ``CPUHP_ONLINE`` 开始,
一旦CPU被逻辑关闭,注册的热插拔状态的清除回调将被调用,从 ``CPUHP_ONLINE`` 开始,
``CPUHP_OFFLINE`` 状态结束。这包括:
* 如果任务因暂停操作而被冻结,那么 *cpuhp_tasks_frozen* 将被设置为true。
......@@ -154,82 +156,399 @@ CPU的热拔插协作
* 一旦所有的服务被迁移,内核会调用一个特定的例程 ``__cpu_disable()`` 来进行特定的清
理。
使用热插拔API
-------------
CPU热插拔API
============
CPU热拔插状态机
---------------
CPU热插拔使用一个从CPUHP_OFFLINE到CPUHP_ONLINE的线性状态空间的普通状态机。每个状态都
有一个startup和teardown的回调。
当一个CPU上线时,将按顺序调用startup回调,直到达到CPUHP_ONLINE状态。当设置状态的回调
或将实例添加到多实例状态时,也可以调用它们。
当一个CPU下线时,将按相反的顺序依次调用teardown回调,直到达到CPUHP_OFFLINE状态。当删
除状态的回调或从多实例状态中删除实例时,也可以调用它们。
如果某个使用场景只需要一个方向的热插拔操作回调(CPU上线或CPU下线),则在设置状态时,
可以将另一个不需要的回调设置为NULL。
状态空间被划分成三个阶段:
* PREPARE阶段
PREPARE阶段涵盖了从CPUHP_OFFLINE到CPUHP_BRINGUP_CPU之间的状态空间。
在该阶段中,startup回调在CPU上线操作启动CPU之前被调用,teardown回调在CPU下线操作使
CPU功能失效之后被调用。
这些回调是在控制CPU上调用的,因为它们显然不能在热插拔的CPU上运行,此时热插拔的CPU要
么还没有启动,要么已经功能失效。
startup回调用于设置CPU成功上线所需要的资源。teardown回调用于释放资源或在热插拔的CPU
功能失效后,将待处理的工作转移到在线的CPU上。
允许startup回调失败。如果回调失败,CPU上线操作被中止,CPU将再次被降到之前的状态(通
常是CPUHP_OFFLINE)。
本阶段中的teardown回调不允许失败。
* STARTING阶段
STARTING阶段涵盖了CPUHP_BRINGUP_CPU + 1到CPUHP_AP_ONLINE之间的状态空间。
该阶段中的startup回调是在早期CPU设置代码中的CPU上线操作期间,禁用中断的情况下在热拔
插的CPU上被调用。teardown回调是在CPU完全关闭前不久的CPU下线操作期间,禁用中断的情况
下在热拔插的CPU上被调用。
该阶段中的回调不允许失败。
回调用于低级别的硬件初始化/关机和核心子系统。
* ONLINE阶段
ONLINE阶段涵盖了CPUHP_AP_ONLINE + 1到CPUHP_ONLINE之间的状态空间。
该阶段中的startup回调是在CPU上线时在热插拔的CPU上调用的。teardown回调是在CPU下线操
作时在热插拔CPU上调用的。
回调是在每个CPU热插拔线程的上下文中调用的,该线程绑定在热插拔的CPU上。回调是在启用
中断和抢占的情况下调用的。
允许回调失败。如果回调失败,CPU热插拔操作被中止,CPU将恢复到之前的状态。
CPU 上线/下线操作
-----------------
一个成功的上线操作如下::
[CPUHP_OFFLINE]
[CPUHP_OFFLINE + 1]->startup() -> 成功
[CPUHP_OFFLINE + 2]->startup() -> 成功
[CPUHP_OFFLINE + 3] -> 略过,因为startup == NULL
...
[CPUHP_BRINGUP_CPU]->startup() -> 成功
=== PREPARE阶段结束
[CPUHP_BRINGUP_CPU + 1]->startup() -> 成功
...
[CPUHP_AP_ONLINE]->startup() -> 成功
=== STARTUP阶段结束
[CPUHP_AP_ONLINE + 1]->startup() -> 成功
...
[CPUHP_ONLINE - 1]->startup() -> 成功
[CPUHP_ONLINE]
一个成功的下线操作如下::
[CPUHP_ONLINE]
[CPUHP_ONLINE - 1]->teardown() -> 成功
...
[CPUHP_AP_ONLINE + 1]->teardown() -> 成功
=== STARTUP阶段开始
[CPUHP_AP_ONLINE]->teardown() -> 成功
...
[CPUHP_BRINGUP_ONLINE - 1]->teardown()
...
=== PREPARE阶段开始
[CPUHP_BRINGUP_CPU]->teardown()
[CPUHP_OFFLINE + 3]->teardown()
[CPUHP_OFFLINE + 2] -> 略过,因为teardown == NULL
[CPUHP_OFFLINE + 1]->teardown()
[CPUHP_OFFLINE]
一个失败的上线操作如下::
[CPUHP_OFFLINE]
[CPUHP_OFFLINE + 1]->startup() -> 成功
[CPUHP_OFFLINE + 2]->startup() -> 成功
[CPUHP_OFFLINE + 3] -> 略过,因为startup == NULL
...
[CPUHP_BRINGUP_CPU]->startup() -> 成功
=== PREPARE阶段结束
[CPUHP_BRINGUP_CPU + 1]->startup() -> 成功
...
[CPUHP_AP_ONLINE]->startup() -> 成功
=== STARTUP阶段结束
[CPUHP_AP_ONLINE + 1]->startup() -> 成功
---
[CPUHP_AP_ONLINE + N]->startup() -> 失败
[CPUHP_AP_ONLINE + (N - 1)]->teardown()
...
[CPUHP_AP_ONLINE + 1]->teardown()
=== STARTUP阶段开始
[CPUHP_AP_ONLINE]->teardown()
...
[CPUHP_BRINGUP_ONLINE - 1]->teardown()
...
=== PREPARE阶段开始
[CPUHP_BRINGUP_CPU]->teardown()
[CPUHP_OFFLINE + 3]->teardown()
[CPUHP_OFFLINE + 2] -> 略过,因为teardown == NULL
[CPUHP_OFFLINE + 1]->teardown()
[CPUHP_OFFLINE]
一个失败的下线操作如下::
[CPUHP_ONLINE]
[CPUHP_ONLINE - 1]->teardown() -> 成功
...
[CPUHP_ONLINE - N]->teardown() -> 失败
[CPUHP_ONLINE - (N - 1)]->startup()
...
[CPUHP_ONLINE - 1]->startup()
[CPUHP_ONLINE]
递归失败不能被合理地处理。
请看下面的例子,由于下线操作失败而导致的递归失败::
[CPUHP_ONLINE]
[CPUHP_ONLINE - 1]->teardown() -> 成功
...
[CPUHP_ONLINE - N]->teardown() -> 失败
[CPUHP_ONLINE - (N - 1)]->startup() -> 成功
[CPUHP_ONLINE - (N - 2)]->startup() -> 失败
CPU热插拔状态机在此停止,且不再尝试回滚,因为这可能会导致死循环::
[CPUHP_ONLINE - (N - 1)]->teardown() -> 成功
[CPUHP_ONLINE - N]->teardown() -> 失败
[CPUHP_ONLINE - (N - 1)]->startup() -> 成功
[CPUHP_ONLINE - (N - 2)]->startup() -> 失败
[CPUHP_ONLINE - (N - 1)]->teardown() -> 成功
[CPUHP_ONLINE - N]->teardown() -> 失败
周而复始,不断重复。在这种情况下,CPU留在该状态中::
[CPUHP_ONLINE - (N - 1)]
这至少可以让系统取得进展,让用户有机会进行调试,甚至解决这个问题。
分配一个状态
------------
有两种方式分配一个CPU热插拔状态:
* 静态分配
当子系统或驱动程序有相对于其他CPU热插拔状态的排序要求时,必须使用静态分配。例如,
在CPU上线操作期间,PERF核心startup回调必须在PERF驱动startup回调之前被调用。在CPU
下线操作中,驱动teardown回调必须在核心teardown回调之前调用。静态分配的状态由
cpuhp_state枚举中的常量描述,可以在include/linux/cpuhotplug.h中找到。
在适当的位置将状态插入枚举中,这样就满足了排序要求。状态常量必须被用于状态的设置
和移除。
当状态回调不是在运行时设置的,并且是kernel/cpu.c中CPU热插拔状态数组初始化的一部分
时,也需要静态分配。
* 动态分配
当对状态回调没有排序要求时,动态分配是首选方法。状态编号由setup函数分配,并在成功
后返回给调用者。
只有PREPARE和ONLINE阶段提供了一个动态分配范围。STARTING阶段则没有,因为该部分的大多
数回调都有明确的排序要求。
CPU热插拔状态的设置
-------------------
核心代码提供了以下函数用来设置状态:
* cpuhp_setup_state(state, name, startup, teardown)
* cpuhp_setup_state_nocalls(state, name, startup, teardown)
* cpuhp_setup_state_cpuslocked(state, name, startup, teardown)
* cpuhp_setup_state_nocalls_cpuslocked(state, name, startup, teardown)
对于一个驱动程序或子系统有多个实例,并且每个实例都需要调用相同的CPU hotplug状态回
调的情况,CPU hotplug核心提供多实例支持。与驱动程序特定的实例列表相比,其优势在于
与实例相关的函数完全针对CPU hotplug操作进行序列化,并在添加和删除时提供状态回调的
自动调用。要设置这样一个多实例状态,可以使用以下函数:
* cpuhp_setup_state_multi(state, name, startup, teardown)
@state参数要么是静态分配的状态,要么是动态分配状态(PUHP_PREPARE_DYN,CPUHP_ONLINE_DYN)
的常量之一, 具体取决于应该分配动态状态的状态阶段(PREPARE,ONLINE)。
@name参数用于sysfs输出和检测。命名惯例是"subsys:mode"或"subsys/driver:mode",
例如 "perf:mode"或"perf/x86:mode"。常见的mode名称有:
======== ============================================
prepare 对应PREPARE阶段中的状态
dead 对应PREPARE阶段中不提供startup回调的状态
starting 对应STARTING阶段中的状态
dying 对应STARTING阶段中不提供startup回调的状态
online 对应ONLINE阶段中的状态
offline 对应ONLINE阶段中不提供startup回调的状态
======== ============================================
由于@name参数只用于sysfs和检测,如果其他mode描述符比常见的描述符更好地描述状态的性质,
也可以使用。
@name参数的示例:"perf/online", "perf/x86:prepare", "RCU/tree:dying", "sched/waitempty"
@startup参数是一个指向回调的函数指针,在CPU上线操作时被调用。若应用不需要startup
回调,则将该指针设为NULL。
@teardown参数是一个指向回调的函数指针,在CPU下线操作时调用。若应用不需要teardown
回调,则将该指针设为NULL。
这些函数在处理已注册回调的方式上有所不同:
* cpuhp_setup_state_nocalls(), cpuhp_setup_state_nocalls_cpuslocked()和
cpuhp_setup_state_multi()只注册回调。
* cpuhp_setup_state()和cpuhp_setup_state_cpuslocked()注册回调,并对当前状态大于新
安装状态的所有在线CPU调用@startup回调(如果不是NULL)。根据状态阶段,回调要么在
当前的CPU上调用(PREPARE阶段),要么在CPU的热插拔线程中调用每个在线CPU(ONLINE阶段)。
如果CPU N的回调失败,那么CPU 0...N-1的teardown回调被调用以回滚操作。状态设置失败,
状态的回调没有被注册,在动态分配的情况下,分配的状态被释放。
状态设置和回调调用是针对CPU热拔插操作进行序列化的。如果设置函数必须从CPU热插拔的读
锁定区域调用,那么必须使用_cpuslocked()变体。这些函数不能在CPU热拔插回调中使用。
函数返回值:
======== ==========================================================
0 静态分配的状态设置成功
>0 动态分配的状态设置成功
返回的数值是被分配的状态编号。如果状态回调后来必须被移除,
例如模块移除,那么这个数值必须由调用者保存,并作为状态移
除函数的@state参数。对于多实例状态,动态分配的状态编号也
需要作为实例添加/删除操作的@state参数。
<0 操作失败
======== ==========================================================
移除CPU热拔插状态
-----------------
为了移除一个之前设置好的状态,提供了如下函数:
* cpuhp_remove_state(state)
* cpuhp_remove_state_nocalls(state)
* cpuhp_remove_state_nocalls_cpuslocked(state)
* cpuhp_remove_multi_state(state)
@state参数要么是静态分配的状态,要么是由cpuhp_setup_state*()在动态范围内分配
的状态编号。如果状态在动态范围内,则状态编号被释放,可再次进行动态分配。
这些函数在处理已注册回调的方式上有所不同:
* cpuhp_remove_state_nocalls(), cpuhp_remove_state_nocalls_cpuslocked()
和 cpuhp_remove_multi_state()只删除回调。
* cpuhp_remove_state()删除回调,并调用所有当前状态大于被删除状态的在线CPU的
teardown回调(如果不是NULL)。根据状态阶段,回调要么在当前的CPU上调用
(PREPARE阶段),要么在CPU的热插拔线程中调用每个在线CPU(ONLINE阶段)。
为了完成移除工作,teardown回调不能失败。
状态移除和回调调用是针对CPU热拔插操作进行序列化的。如果移除函数必须从CPU hotplug
读取锁定区域调用,那么必须使用_cpuslocked()变体。这些函数不能从CPU热插拔的回调中使用。
如果一个多实例的状态被移除,那么调用者必须先移除所有的实例。
多实例状态实例管理
------------------
一旦多实例状态被建立,实例就可以被添加到状态中:
一旦一个CPU下线或上线,就有可能收到通知。这对某些需要根据可用CPU数量执行某种设置或清
理功能的驱动程序来说可能很重要::
* cpuhp_state_add_instance(state, node)
* cpuhp_state_add_instance_nocalls(state, node)
#include <linux/cpuhotplug.h>
@state参数是一个静态分配的状态或由cpuhp_setup_state_multi()在动态范围内分配的状
态编号。
ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "X/Y:online",
Y_online, Y_prepare_down);
@node参数是一个指向hlist_node的指针,它被嵌入到实例的数据结构中。这个指针被交给
多实例状态的回调,可以被回调用来通过container_of()检索到实例。
*X* 是子系统, *Y* 是特定的驱动程序。 *Y_online* 回调将在所有在线CPU的注册过程中被调用。
如果在线回调期间发生错误, *Y_prepare_down* 回调将在所有之前调用过在线回调的CPU上调
用。注册完成后,一旦有CPU上线, *Y_online* 回调将被调用,当CPU关闭时, *Y_prepare_down*
将被调用。所有之前在 *Y_online* 中分配的资源都应该在 *Y_prepare_down* 中释放。如果在
注册过程中发生错误,返回值 *ret* 为负值。否则会返回一个正值,其中包含动态分配状态
( *CPUHP_AP_ONLINE_DYN* )的分配热拔插。对于预定义的状态,它将返回0。
这些函数在处理已注册回调的方式上有所不同:
该回调可以通过调用 ``cpuhp_remove_state()`` 来删除。如果是动态分配的状态
( *CPUHP_AP_ONLINE_DYN* ),则使用返回的状态。在移除热插拔状态的过程中,将调用拆解回调。
* cpuhp_state_add_instance_nocalls()只将实例添加到多实例状态的节点列表中。
多个实例
~~~~~~~~
* cpuhp_state_add_instance()为所有当前状态大于@state的在线CPU添加实例并调用与
@state相关的startup回调(如果不是NULL)。该回调只对将要添加的实例进行调用。
根据状态阶段,回调要么在当前的CPU上调用(PREPARE阶段),要么在CPU的热插拔线
程中调用每个在线CPU(ONLINE阶段)。
如果一个驱动程序有多个实例,并且每个实例都需要独立执行回调,那么很可能应该使用
``multi-state`` 。首先需要注册一个多状态的状态::
如果CPU N的回调失败,那么CPU 0 ... N-1的teardown回调被调用以回滚操作,该函数
失败,实例不会被添加到多实例状态的节点列表中。
ret = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN, "X/Y:online,
Y_online, Y_prepare_down);
Y_hp_online = ret;
要从状态的节点列表中删除一个实例,可以使用这些函数:
``cpuhp_setup_state_multi()`` 的行为与 ``cpuhp_setup_state()`` 类似,只是它
为多状态准备了回调,但不调用回调。这是一个一次性的设置。
一旦分配了一个新的实例,你需要注册这个新实例::
* cpuhp_state_remove_instance(state, node)
* cpuhp_state_remove_instance_nocalls(state, node)
ret = cpuhp_state_add_instance(Y_hp_online, &d->node);
参数与上述cpuhp_state_add_instance*()变体相同。
这个函数将把这个实例添加到你先前分配的 ``Y_hp_online`` 状态,并在所有在线的
CPU上调用先前注册的回调( ``Y_online`` )。 *node* 元素是你的每个实例数据结构
中的一个 ``struct hlist_node`` 成员。
这些函数在处理已注册回调的方式上有所不同:
在移除该实例时::
* cpuhp_state_remove_instance_nocalls()只从状态的节点列表中删除实例。
cpuhp_state_remove_instance(Y_hp_online, &d->node)
* cpuhp_state_remove_instance()删除实例并调用与@state相关的回调(如果不是NULL),
用于所有当前状态大于@state的在线CPU。 该回调只对将要被移除的实例进行调用。
根据状态阶段,回调要么在当前的CPU上调用(PREPARE阶段),要么在CPU的热插拔
线程中调用每个在线CPU(ONLINE阶段)。
应该被调用,这将在所有在线CPU上调用拆分回调
为了完成移除工作,teardown回调不能失败
手动设置
~~~~~~~~
节点列表的添加/删除操作和回调调用是针对CPU热拔插操作进行序列化。这些函数不能在
CPU hotplug回调和CPU hotplug读取锁定区域内使用。
通常情况下,在注册或移除状态时调用setup和teamdown回调是很方便的,因为通常在CPU上线
(下线)和驱动的初始设置(关闭)时需要执行该操作。然而,每个注册和删除功能也有一个
_nocalls的后缀,如果不希望调用回调,则不调用所提供的回调。在手动设置(或关闭)期间,
应该使用 ``get_online_cpus()`` 和 ``put_online_cpus()`` 函数来抑制CPU热插拔操作。
样例
----
在STARTING阶段设置和取消静态分配的状态,以获取上线和下线操作的通知::
事件的顺序
----------
ret = cpuhp_setup_state(CPUHP_SUBSYS_STARTING, "subsys:starting", subsys_cpu_starting, subsys_cpu_dying);
if (ret < 0)
return ret;
....
cpuhp_remove_state(CPUHP_SUBSYS_STARTING);
热插拔状态被定义在 ``include/linux/cpuhotplug.h``:
在ONLINE阶段设置和取消动态分配的状态,以获取下线操作的通知::
* ``CPUHP_OFFLINE`` ... ``CPUHP_AP_OFFLINE`` 状态是在CPU启动前调用的。
state = cpuhp_setup_state(CPUHP_ONLINE_DYN, "subsys:offline", NULL, subsys_cpu_offline);
if (state < 0)
return state;
....
cpuhp_remove_state(state);
* ``CPUHP_AP_OFFLINE`` ... ``CPUHP_AP_ONLINE`` 状态是在CPU被启动后被调用的。
中断是关闭的,调度程序还没有在这个CPU上活动。从 ``CPUHP_AP_OFFLINE`` 开始,
回调被调用到目标CPU上。
在ONLINE阶段设置和取消动态分配的状态,以获取有关上线操作的通知,而无需调用回调::
* ``CPUHP_AP_ONLINE_DYN`` 和 ``CPUHP_AP_ONLINE_DYN_END`` 之间的状态被保留
给动态分配。
state = cpuhp_setup_state_nocalls(CPUHP_ONLINE_DYN, "subsys:online", subsys_cpu_online, NULL);
if (state < 0)
return state;
....
cpuhp_remove_state_nocalls(state);
* 这些状态在CPU关闭时以相反的顺序调用,从 ``CPUHP_ONLINE`` 开始,在 ``CPUHP_OFFLINE``
停止。这里的回调是在将被关闭的CPU上调用的,直到 ``CPUHP_AP_OFFLINE`` 。
在ONLINE阶段设置、使用和取消动态分配的多实例状态,以获得上线和下线操作的通知::
通过 ``CPUHP_AP_ONLINE_DYN`` 动态分配的状态通常已经足够了。然而,如果在启动或关闭
期间需要更早的调用,那么应该获得一个显式状态。如果热拔插事件需要相对于另一个热拔插事
件的特定排序,也可能需要一个显式状态。
state = cpuhp_setup_state_multi(CPUHP_ONLINE_DYN, "subsys:online", subsys_cpu_online, subsys_cpu_offline);
if (state < 0)
return state;
....
ret = cpuhp_state_add_instance(state, &inst1->node);
if (ret)
return ret;
....
ret = cpuhp_state_add_instance(state, &inst2->node);
if (ret)
return ret;
....
cpuhp_remove_instance(state, &inst1->node);
....
cpuhp_remove_instance(state, &inst2->node);
....
remove_multi_state(state);
测试热拔插状态
==============
......
......@@ -28,6 +28,7 @@
printk-basics
printk-formats
workqueue
watch_queue
symbol-namespaces
数据结构和低级实用程序
......
......@@ -5,6 +5,7 @@
:翻译:
司延腾 Yanteng Si <siyanteng@loongson.cn>
周彬彬 Binbin Zhou <zhoubinbin@loongson.cn>
.. _cn_irq-domain.rst:
......@@ -52,8 +53,18 @@ irq_domain和一个hwirq号作为参数。 如果hwirq的映射还不存在,
一个新的Linux irq_desc,将其与hwirq关联起来,并调用.map()回调,这样驱动
程序就可以执行任何必要的硬件设置。
当接收到一个中断时,应该使用irq_find_mapping()函数从hwirq号中找到
Linux IRQ号。
一旦建立了映射,可以通过多种方法检索或使用它:
- irq_resolve_mapping()返回一个指向给定域和hwirq号的irq_desc结构指针,
如果没有映射则返回NULL。
- irq_find_mapping()返回给定域和hwirq的Linux IRQ号,如果没有映射则返回0。
- irq_linear_revmap()现与irq_find_mapping()相同,已被废弃。
- generic_handle_domain_irq()处理一个由域和hwirq号描述的中断。
请注意,irq域的查找必须发生在与RCU读临界区兼容的上下文中。
在调用irq_find_mapping()之前,至少要调用一次irq_create_mapping()函数,
以免描述符不能被分配。
......@@ -119,7 +130,8 @@ irq_domain_add_tree()和irq_domain_create_tree()在功能上是等价的,除
Linux IRQ号编入硬件本身,这样就不需要映射了。 调用irq_create_direct_mapping()
会分配一个Linux IRQ号,并调用.map()回调,这样驱动就可以将Linux IRQ号编入硬件中。
大多数驱动程序不能使用这个映射。
大多数驱动程序无法使用此映射,现在它由CONFIG_IRQ_DOMAIN_NOMAP选项控制。
请不要引入此API的新用户。
传统映射类型
------------
......@@ -128,7 +140,6 @@ Linux IRQ号编入硬件本身,这样就不需要映射了。 调用irq_create
irq_domain_add_simple()
irq_domain_add_legacy()
irq_domain_add_legacy_isa()
irq_domain_create_simple()
irq_domain_create_legacy()
......@@ -137,6 +148,9 @@ Linux IRQ号编入硬件本身,这样就不需要映射了。 调用irq_create
一组用于IRQ号的定义(#define),这些定义被传递给struct设备注册。 在这种情况下,
不能动态分配Linux IRQ号,应该使用传统映射。
顾名思义,\*_legacy()系列函数已被废弃,只是为了方便对古老平台的支持而存在。
不应该增加新的用户。当\*_simple()系列函数的使用导致遗留行为时,他们也是如此。
传统映射假设已经为控制器分配了一个连续的IRQ号范围,并且可以通过向hwirq号添加一
个固定的偏移来计算IRQ号,反之亦然。 缺点是需要中断控制器管理IRQ分配,并且需要为每
个hwirq分配一个irq_desc,即使它没有被使用。
......
......@@ -5,6 +5,7 @@
:翻译:
司延腾 Yanteng Si <siyanteng@loongson.cn>
周彬彬 Binbin Zhou <zhoubinbin@loongson.cn>
.. _cn_kernel-api.rst:
......@@ -282,6 +283,8 @@ kernel/acct.c
该API在以下内核代码中:
include/linux/bio.h
block/blk-core.c
block/blk-core.c
......
......@@ -5,6 +5,7 @@
:翻译:
司延腾 Yanteng Si <siyanteng@loongson.cn>
周彬彬 Binbin Zhou <zhoubinbin@loongson.cn>
:校译:
......@@ -66,12 +67,24 @@ mm/vmalloc.c
该API在以下内核代码中:
mm/readahead.c
文件映射
--------
mm/filemap.c
预读
----
mm/readahead.c
回写
----
mm/page-writeback.c
截断
----
mm/truncate.c
include/linux/pagemap.h
......@@ -105,6 +118,14 @@ mm/mempolicy.c
include/linux/mm_types.h
include/linux/mm_inline.h
include/linux/page-flags.h
include/linux/mm.h
include/linux/page_ref.h
include/linux/mmzone.h
mm/util.c
......@@ -6,6 +6,7 @@
:翻译:
司延腾 Yanteng Si <siyanteng@loongson.cn>
周彬彬 Binbin Zhou <zhoubinbin@loongson.cn>
.. _cn_printk-basics.rst:
......@@ -107,6 +108,4 @@ pr_debug()和pr_devel(),除非定义了 ``DEBUG`` (或者在pr_debug()的情
该API在以下内核代码中:
kernel/printk/printk.c
include/linux/printk.h
......@@ -5,6 +5,7 @@
:翻译:
司延腾 Yanteng Si <siyanteng@loongson.cn>
周彬彬 Binbin Zhou <zhoubinbin@loongson.cn>
.. _cn_printk-formats.rst:
......@@ -548,7 +549,7 @@ nodemask_pr_args()来方便打印cpumask和nodemask。
::
%pGp referenced|uptodate|lru|active|private|node=0|zone=2|lastcpupid=0x1fffff
%pGp 0x17ffffc0002036(referenced|uptodate|lru|active|private|node=0|zone=2|lastcpupid=0x1fffff)
%pGg GFP_USER|GFP_DMA32|GFP_NOWARN
%pGv read|exec|mayread|maywrite|mayexec|denywrite
......
.. SPDX-License-Identifier: GPL-2.0+
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/core-api/watch_queue.rst
:翻译:
周彬彬 Binbin Zhou <zhoubinbin@loongson.cn>
:校译:
司延腾 Yanteng Si <siyanteng@loongson.cn>
吴想成 Wu Xiangcheng <bobwxc@email.cn>
============
通用通知机制
============
通用通知机制是建立在标准管道驱动之上的,它可以有效地将来自内核的通知消息拼接到用
户空间打开的管道中。这可以与以下方面结合使用::
* Key/keyring 通知
通知缓冲区可以通过以下方式启用:
“General setup”/“General notification queue”
(CONFIG_WATCH_QUEUE)
文档包含以下章节:
.. contents:: :local:
概述
====
该设施以一种特殊模式打开的管道形式出现,管道的内部环形缓冲区用于保存内核生成的消
息。然后通过read()读出这些消息。在此类管道上禁用拼接以及类似的操作,因为它们希望
在某些情况下将其添加的内容还原到环中-这可能最终会与通知消息重叠。
管道的所有者必须告诉内核它想通过该管道观察哪些源。只有连接到该管道上的源才会将消
息插入其中。请注意,一个源可能绑定到多个管道,并同时将消息插入到所有管道中。
还可以将过滤器放置在管道上,以便在不感兴趣时可以忽略某些源类型和子事件。
如果环中没有可用的插槽,或者没有预分配的消息缓冲区可用,则将丢弃消息。在这两种情
况下,read()都会在读取缓冲区中当前的最后一条消息后,将WATCH_META_LOSS_NOTIFICATION
插入到输出缓冲区中。
请注意,当生成一个通知时,内核不会等待消费者收集它,而是继续执行。这意味着可以在
持有自旋锁的同时生成通知,并且还可以保护内核不被用户空间故障无限期地阻碍。
消息结构
========
通知消息由一个简短的头部开始::
struct watch_notification {
__u32 type:24;
__u32 subtype:8;
__u32 info;
};
“type”表示通知记录的来源,“subtype”表示该来源的记录类型(见下文观测源章节)。该类
型也可以是“WATCH_TYPE_META”。这是一个由观测队列本身在内部生成的特殊记录类型。有两
个子类型:
* WATCH_META_REMOVAL_NOTIFICATION
* WATCH_META_LOSS_NOTIFICATION
第一个表示安装了观察的对象已被删除或销毁,第二个表示某些消息已丢失。
“info”表示一系列东西,包括:
* 消息的长度,以字节为单位,包括头(带有WATCH_INFO_LENGTH的掩码,并按
WATCH_INFO_LENGTH__SHIFT移位)。这表示记录的大小,可能在8到127字节之间。
* 观测ID(带有WATCH_INFO_ID掩码,并按WATCH_INFO_ID__SHIFT移位)。这表示观测的主
叫ID,可能在0到255之间。多个观测组可以共享一个队列,这提供了一种区分它们的方法。
* 特定类型的字段(WATCH_INFO_TYPE_INFO)。这是由通知生产者设置的,以指示类型和
子类型的某些特定含义。
除长度外,信息中的所有内容都可以用于过滤。
头部后面可以有补充信息。此格式是由类型和子类型决定的。
观测列表(通知源)API
=====================
“观测列表“是订阅通知源的观测者的列表。列表可以附加到对象(比如键或超级块),也可
以是全局的(比如对于设备事件)。从用户空间的角度来看,一个非全局的观测列表通常是
通过引用它所属的对象来引用的(比如使用KEYCTL_NOTIFY并给它一个密钥序列号来观测特定
的密钥)。
为了管理观测列表,提供了以下函数:
* ::
void init_watch_list(struct watch_list *wlist,
void (*release_watch)(struct watch *wlist));
初始化一个观测列表。 如果 ``release_watch`` 不是NULL,那么这表示当watch_list对
象被销毁时,应该调用函数来丢弃观测列表对被观测对象的任何引用。
* ``void remove_watch_list(struct watch_list *wlist);``
这将删除订阅watch_list的所有观测,并释放它们,然后销毁watch_list对象本身。
观测队列(通知输出)API
=======================
“观测队列”是由应用程序分配的用以记录通知的缓冲区,其工作原理完全隐藏在管道设备驱
动中,但必须获得对它的引用才能设置观测。可以通过以下方式进行管理:
* ``struct watch_queue *get_watch_queue(int fd);``
由于观测队列在内核中通过实现缓冲区的管道的文件描述符表示,用户空间必须通过系
统调用传递该文件描述符,这可以用于从系统调用中查找指向观测队列的不透明指针。
* ``void put_watch_queue(struct watch_queue *wqueue);``
该函数用以丢弃从 ``get_watch_queue()`` 获得的引用。
观测订阅API
===========
“观测”是观测列表上的订阅,表示观测队列,从而表示应写入通知记录的缓冲区。观测队列
对象还可以携带该对象的过滤规则,由用户空间设置。watch结构体的某些部分可以由驱动程
序设置::
struct watch {
union {
u32 info_id; /* 在info字段中进行OR运算的ID */
...
};
void *private; /* 被观测对象的私有数据 */
u64 id; /* 内部标识符 */
...
};
``info_id`` 值是从用户空间获得并按WATCH_INFO_ID__SHIFT移位的8位数字。当通知写入关
联的观测队列缓冲区时,这将与struct watch_notification::info的WATCH_INFO_ID字段进
行或运算。
``private`` 字段是与watch_list相关联的驱动程序数据,并由 ``watch_list::release_watch()``
函数清除。
``id`` 字段是源的ID。使用不同ID发布的通知将被忽略。
提供以下函数来管理观测:
* ``void init_watch(struct watch *watch, struct watch_queue *wqueue);``
初始化一个观测对象,把它的指针设置到观察队列中,使用适当的限制来避免死锁。
* ``int add_watch_to_object(struct watch *watch, struct watch_list *wlist);``
将观测订阅到观测列表(通知源)。watch结构体中的driver-settable字段必须在调用
它之前设置。
* ::
int remove_watch_from_object(struct watch_list *wlist,
struct watch_queue *wqueue,
u64 id, false);
从观测列表中删除一个观测,该观测必须与指定的观测队列(``wqueue``)和对象标识
符(``id``)匹配。通知(``WATCH_META_REMOVAL_NOTIFICATION``)被发送到观测队列
表示该观测已被删除。
* ``int remove_watch_from_object(struct watch_list *wlist, NULL, 0, true);``
从观测列表中删除所有观测。预计这将被称为销毁前的准备工作,届时新的观测将无法
访问观测列表。通知(``WATCH_META_REMOVAL_NOTIFICATION``)被发送到每个订阅观测
的观测队列,以表明该观测已被删除。
通知发布API
===========
要将通知发布到观测列表以便订阅的观测可以看到,应使用以下函数::
void post_watch_notification(struct watch_list *wlist,
struct watch_notification *n,
const struct cred *cred,
u64 id);
应预先设置通知格式,并应传入一个指向头部(``n``)的指针。通知可能大于此值,并且缓
冲槽为单位的大小在 ``n->info & WATCH_INFO_LENGTH`` 中注明。
``cred`` 结构体表示源(对象)的证书,并传递给LSM,例如SELinux,以允许或禁止根据该队
列(对象)的证书在每个单独队列中记录注释。
``id`` 是源对象ID(如密钥上的序列号)。只有设置相同ID的观测才能看到这个通知。
观测源
======
任何特定的缓冲区都可以从多个源获取信息。 这些源包括:
* WATCH_TYPE_KEY_NOTIFY
这种类型的通知表示密钥和密钥环的变化,包括密钥环内容或密钥属性的变化。
更多信息请参见Documentation/security/keys/core.rst。
事件过滤
========
当创建观测队列后,我们可以应用一组过滤器以限制接收的事件::
struct watch_notification_filter filter = {
...
};
ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter)
过滤器的描述的类型变量是::
struct watch_notification_filter {
__u32 nr_filters;
__u32 __reserved;
struct watch_notification_type_filter filters[];
};
其中“nr_filters”表示filters[]数组中过滤器的数量,而“__reserved”应为0。
“filter”数组有以下类型的元素::
struct watch_notification_type_filter {
__u32 type;
__u32 info_filter;
__u32 info_mask;
__u32 subtype_filter[8];
};
其中:
* ``type`` 是过滤的事件类型,应类似于“WATCH_TYPE_KEY_NOTIFY”。
* ``info_filter`` 与 ``info_mask`` 充当通知记录的信息字段的过滤器,只有在以下情
况,通知才会写入缓冲区::
(watch.info & info_mask) == info_filter
例如,这可以用于忽略不在一个挂载树上的观测点的事件。
* ``subtype_filter`` 是一个位掩码,表示感兴趣的子类型。subtype_filter[0]的
bit[0]对应子类型0,bit[1]对应子类型1,以此类推。
若ioctl()的参数为NULL,则过滤器将被移除,并且来自观测源的所有事件都将通过。
用户空间代码示例
================
缓冲区的创建如下所示::
pipe2(fds, O_TMPFILE);
ioctl(fds[1], IOC_WATCH_QUEUE_SET_SIZE, 256);
它可以被设置成接收密钥环变化的通知::
keyctl(KEYCTL_WATCH_KEY, KEY_SPEC_SESSION_KEYRING, fds[1], 0x01);
然后,这些通知可以被如下方式所使用::
static void consumer(int rfd, struct watch_queue_buffer *buf)
{
unsigned char buffer[128];
ssize_t buf_len;
while (buf_len = read(rfd, buffer, sizeof(buffer)),
buf_len > 0
) {
void *p = buffer;
void *end = buffer + buf_len;
while (p < end) {
union {
struct watch_notification n;
unsigned char buf1[128];
} n;
size_t largest, len;
largest = end - p;
if (largest > 128)
largest = 128;
memcpy(&n, p, largest);
len = (n->info & WATCH_INFO_LENGTH) >>
WATCH_INFO_LENGTH__SHIFT;
if (len == 0 || len > largest)
return;
switch (n.n.type) {
case WATCH_TYPE_META:
got_meta(&n.n);
case WATCH_TYPE_KEY_NOTIFY:
saw_key_change(&n.n);
break;
}
p += len;
}
}
}
......@@ -6,6 +6,7 @@
:翻译:
司延腾 Yanteng Si <siyanteng@loongson.cn>
周彬彬 Binbin Zhou <zhoubinbin@loongson.cn>
.. _cn_workqueue.rst:
......@@ -178,10 +179,6 @@ workqueue将自动创建与属性相匹配的后备工作者池。调节并发
这个标志对于未绑定的wq来说是没有意义的。
请注意,标志 ``WQ_NON_REENTRANT`` 不再存在,因为现在所有的工作
队列都是不可逆的——任何工作项都保证在任何时间内最多被整个系统的一
个工作者执行。
``max_active``
--------------
......@@ -328,6 +325,22 @@ And with cmwq with ``@max_active`` >= 3, ::
工作项函数在堆栈追踪中应该是微不足道的。
不可重入条件
============
工作队列保证,如果在工作项排队后满足以下条件,则工作项不能重入:
1. 工作函数没有被改变。
2. 没有人将该工作项排到另一个工作队列中。
3. 该工作项尚未被重新启动。
换言之,如果上述条件成立,则保证在任何给定时间最多由一个系统范围内的工作程序执行
该工作项。
请注意,在self函数中将工作项重新排队(到同一队列)不会破坏这些条件,因此可以安全
地执行此操作。否则在破坏工作函数内部的条件时需要小心。
内核内联文档参考
================
......
......@@ -6,6 +6,7 @@
:翻译:
司延腾 Yanteng Si <siyanteng@loongson.cn>
周彬彬 Binbin Zhou <zhoubinbin@loongson.cn>
:校译:
......@@ -254,7 +255,8 @@ __xa_set_mark() 和 __xa_clear_mark() 函数也适用于你查找一个条目并
高级API是基于xa_state的。这是一个不透明的数据结构,你使用XA_STATE()宏在堆栈中声明。这个宏初始化了
xa_state,准备开始在XArray上移动。它被用作一个游标来保持在XArray中的位置,并让你把各种操作组合在一
起,而不必每次都从头开始。
起,而不必每次都从头开始。xa_state的内容受rcu_read_lock()或xas_lock()的保护。如果需要删除保护状态
和树的这些锁中的任何一个,你必须调用xas_pause()以便将来的调用不会依赖于状态中未受保护的部分。
xa_state也被用来存储错误(store errors)。你可以调用xas_error()来检索错误。所有的操作在进行之前都
会检查xa_state是否处于错误状态,所以你没有必要在每次调用之后检查错误;你可以连续进行多次调用,只在
......
......@@ -11,34 +11,65 @@
概述
----
KernelAddressSANitizer(KASAN)是一种动态内存安全错误检测工具,主要功能是
检查内存越界访问和使用已释放内存的问题。KASAN有三种模式:
Kernel Address SANitizer(KASAN)是一种动态内存安全错误检测工具,主要功能是
检查内存越界访问和使用已释放内存的问题。
1. 通用KASAN(与用户空间的ASan类似)
2. 基于软件标签的KASAN(与用户空间的HWASan类似)
3. 基于硬件标签的KASAN(基于硬件内存标签)
KASAN有三种模式:
由于通用KASAN的内存开销较大,通用KASAN主要用于调试。基于软件标签的KASAN
可用于dogfood测试,因为它具有较低的内存开销,并允许将其用于实际工作量。
基于硬件标签的KASAN具有较低的内存和性能开销,因此可用于生产。同时可用于
检测现场内存问题或作为安全缓解措施。
1. 通用KASAN
2. 基于软件标签的KASAN
3. 基于硬件标签的KASAN
软件KASAN模式(#1和#2)使用编译时工具在每次内存访问之前插入有效性检查,
因此需要一个支持它的编译器版本
用CONFIG_KASAN_GENERIC启用的通用KASAN,是用于调试的模式,类似于用户空
间的ASan。这种模式在许多CPU架构上都被支持,但它有明显的性能和内存开销
通用KASAN在GCC和Clang受支持。GCC需要8.3.0或更高版本。任何受支持的Clang
版本都是兼容的,但从Clang 11才开始支持检测全局变量的越界访问。
基于软件标签的KASAN或SW_TAGS KASAN,通过CONFIG_KASAN_SW_TAGS启用,
可以用于调试和自我测试,类似于用户空间HWASan。这种模式只支持arm64,但其
适度的内存开销允许在内存受限的设备上用真实的工作负载进行测试。
基于软件标签的KASAN模式仅在Clang中受支持。
基于硬件标签的KASAN或HW_TAGS KASAN,用CONFIG_KASAN_HW_TAGS启用,被
用作现场内存错误检测器或作为安全缓解的模式。这种模式只在支持MTE(内存标签
扩展)的arm64 CPU上工作,但它的内存和性能开销很低,因此可以在生产中使用。
硬件KASAN模式(#3)依赖硬件来执行检查,但仍需要支持内存标签指令的编译器
版本。GCC 10+和Clang 11+支持此模式。
关于每种KASAN模式的内存和性能影响的细节,请参见相应的Kconfig选项的描述。
两种软件KASAN模式都适用于SLUB和SLAB内存分配器,而基于硬件标签的KASAN目前
仅支持SLUB
通用模式和基于软件标签的模式通常被称为软件模式。基于软件标签的模式和基于
硬件标签的模式被称为基于标签的模式
目前x86_64、arm、arm64、xtensa、s390、riscv架构支持通用KASAN模式,仅
arm64架构支持基于标签的KASAN模式。
支持
----
体系架构
~~~~~~~~
在x86_64、arm、arm64、powerpc、riscv、s390和xtensa上支持通用KASAN,
而基于标签的KASAN模式只在arm64上支持。
编译器
~~~~~~
软件KASAN模式使用编译时工具在每个内存访问之前插入有效性检查,因此需要一个
提供支持的编译器版本。基于硬件标签的模式依靠硬件来执行这些检查,但仍然需要
一个支持内存标签指令的编译器版本。
通用KASAN需要GCC 8.3.0版本或更高版本,或者内核支持的任何Clang版本。
基于软件标签的KASAN需要GCC 11+或者内核支持的任何Clang版本。
基于硬件标签的KASAN需要GCC 10+或Clang 12+。
内存类型
~~~~~~~~
通用KASAN支持在所有的slab、page_alloc、vmap、vmalloc、堆栈和全局内存
中查找错误。
基于软件标签的KASAN支持slab、page_alloc、vmalloc和堆栈内存。
基于硬件标签的KASAN支持slab、page_alloc和不可执行的vmalloc内存。
对于slab,两种软件KASAN模式都支持SLUB和SLAB分配器,而基于硬件标签的
KASAN只支持SLUB。
用法
----
......@@ -53,7 +84,7 @@ arm64架构支持基于标签的KASAN模式。
对于软件模式,还可以在 ``CONFIG_KASAN_OUTLINE`` 和 ``CONFIG_KASAN_INLINE``
之间进行选择。outline和inline是编译器插桩类型。前者产生较小的二进制文件,
而后者快1.1-2倍。
而后者快2倍。
要将受影响的slab对象的alloc和free堆栈跟踪包含到报告中,请启用
``CONFIG_STACKTRACE`` 。要包括受影响物理页面的分配和释放堆栈跟踪的话,
......@@ -172,21 +203,29 @@ KASAN受通用 ``panic_on_warn`` 命令行参数的影响。启用该功能后
默认情况下,KASAN只为第一次无效内存访问打印错误报告。使用 ``kasan_multi_shot`` ,
KASAN会针对每个无效访问打印报告。这有效地禁用了KASAN报告的 ``panic_on_warn`` 。
另外,独立于 ``panic_on_warn`` , ``kasan.fault=`` 引导参数可以用来控制恐慌和报
告行为:
- ``kasan.fault=report`` 或 ``=panic`` 控制是只打印KASAN报告还是同时使内核恐慌
(默认: ``report`` )。即使启用了 ``kasan_multi_shot`` ,也会发生内核恐慌。
基于硬件标签的KASAN模式(请参阅下面有关各种模式的部分)旨在在生产中用作安全缓解
措施。因此,它支持允许禁用KASAN或控制其功能的引导参数。
措施。因此,它支持允许禁用KASAN或控制其功能的附加引导参数。
- ``kasan=off`` 或 ``=on`` 控制KASAN是否启用 (默认: ``on`` )。
- ``kasan.mode=sync`` 或 ``=async`` 控制KASAN是否配置为同步或异步执行模式(默认:
``sync`` )。同步模式:当标签检查错误发生时,立即检测到错误访问。异步模式:
延迟错误访问检测。当标签检查错误发生时,信息存储在硬件中(在arm64的
- ``kasan.mode=sync`` 、 ``=async`` 或 ``=asymm`` 控制KASAN是否配置
为同步或异步执行模式(默认:``sync`` )。
同步模式:当标签检查错误发生时,立即检测到错误访问。
异步模式:延迟错误访问检测。当标签检查错误发生时,信息存储在硬件中(在arm64的
TFSR_EL1寄存器中)。内核会定期检查硬件,并且仅在这些检查期间报告标签错误。
非对称模式:读取时同步检测不良访问,写入时异步检测。
- ``kasan.vmalloc=off`` 或 ``=on`` 禁用或启用vmalloc分配的标记(默认:``on`` )。
- ``kasan.stacktrace=off`` 或 ``=on`` 禁用或启用alloc和free堆栈跟踪收集
(默认: ``on`` )。
- ``kasan.fault=report`` 或 ``=panic`` 控制是只打印KASAN报告还是同时使内核恐慌
(默认: ``report`` )。即使启用了 ``kasan_multi_shot`` ,也会发生内核恐慌。
实施细则
--------
......@@ -244,7 +283,6 @@ KASAN会针对每个无效访问打印报告。这有效地禁用了KASAN报告
基于软件标签的KASAN使用0xFF作为匹配所有指针标签(不检查通过带有0xFF指针标签
的指针进行的访问)。值0xFE当前保留用于标记已释放的内存区域。
基于软件标签的KASAN目前仅支持对Slab和page_alloc内存进行标记。
基于硬件标签的KASAN模式
~~~~~~~~~~~~~~~~~~~~~~~
......@@ -262,8 +300,6 @@ KASAN会针对每个无效访问打印报告。这有效地禁用了KASAN报告
基于硬件标签的KASAN使用0xFF作为匹配所有指针标签(不检查通过带有0xFF指针标签的
指针进行的访问)。值0xFE当前保留用于标记已释放的内存区域。
基于硬件标签的KASAN目前仅支持对Slab和page_alloc内存进行标记。
如果硬件不支持MTE(ARMv8.5之前),则不会启用基于硬件标签的KASAN。在这种情况下,
所有KASAN引导参数都将被忽略。
......@@ -275,6 +311,8 @@ KASAN会针对每个无效访问打印报告。这有效地禁用了KASAN报告
影子内存
--------
本节的内容只适用于软件KASAN模式。
内核将内存映射到地址空间的几个不同部分。内核虚拟地址的范围很大:没有足够的真实
内存来支持内核可以访问的每个地址的真实影子区域。因此,KASAN只为地址空间的某些
部分映射真实的影子。
......@@ -297,7 +335,7 @@ CONFIG_KASAN_VMALLOC
~~~~~~~~~~~~~~~~~~~~
使用 ``CONFIG_KASAN_VMALLOC`` ,KASAN可以以更大的内存使用为代价覆盖vmalloc
空间。目前,这在x86、riscv、s390和powerpc上受支持。
空间。目前,这在arm64、x86、riscv、s390和powerpc上受支持。
这通过连接到vmalloc和vmap并动态分配真实的影子内存来支持映射。
......@@ -349,10 +387,10 @@ KASAN连接到vmap基础架构以懒清理未使用的影子内存。
``kasan_disable_current()``/``kasan_enable_current()`` 部分注释这部分代码。
这也会禁用通过函数调用发生的间接访问的报告。
对于基于标签的KASAN模式(包括硬件模式),要禁用访问检查,请使用
``kasan_reset_tag()`` 或 ``page_kasan_tag_reset()`` 。请注意,通过
``page_kasan_tag_reset()`` 临时禁用访问检查需要通过 ``page_kasan_tag``
/ ``page_kasan_tag_set`` 保存和恢复每页KASAN标签。
对于基于标签的KASAN模式,要禁用访问检查,请使用 ``kasan_reset_tag()`` 或
``page_kasan_tag_reset()`` 。请注意,通过 ``page_kasan_tag_reset()``
临时禁用访问检查需要通过 ``page_kasan_tag`` / ``page_kasan_tag_set`` 保
存和恢复每页KASAN标签。
测试
~~~~
......@@ -381,11 +419,10 @@ KASAN连接到vmap基础架构以懒清理未使用的影子内存。
当由于缺少KASAN报告而导致测试失败时::
# kmalloc_double_kzfree: EXPECTATION FAILED at lib/test_kasan.c:629
Expected kasan_data->report_expected == kasan_data->report_found, but
kasan_data->report_expected == 1
kasan_data->report_found == 0
not ok 28 - kmalloc_double_kzfree
# kmalloc_double_kzfree: EXPECTATION FAILED at lib/test_kasan.c:974
KASAN failure expected in "kfree_sensitive(ptr)", but none occurred
not ok 44 - kmalloc_double_kzfree
最后打印所有KASAN测试的累积状态。成功::
......
......@@ -106,3 +106,5 @@ __releases - 指定的锁在函数进入时被持有,但在退出时不被持
make 的可选变量 CHECKFLAGS 可以用来向 sparse 工具传递参数。编译系统会自
动向 sparse 工具传递 -Wbitwise 参数。
注意sparse定义了__CHECKER__预处理器符号。
\ No newline at end of file
......@@ -107,3 +107,28 @@ Documentation/dev-tools/kcov.rst 是能够构建在内核之中,用于在每
之后你就能确保这些错误在测试过程中都不会发生了。
一些工具与KUnit和kselftest集成,并且在检测到问题时会自动打断测试。
静态分析工具
============
除了测试运行中的内核,我们还可以使用**静态分析**工具直接分析内核的源代
码(**在编译时**)。内核中常用的工具允许人们检查整个源代码树或其中的特
定文件。它们使得在开发过程中更容易发现和修复问题。
Sparse可以通过执行类型检查、锁检查、值范围检查来帮助测试内核,此外还
可以在检查代码时报告各种错误和警告。关于如何使用它的细节,请参阅
Documentation/translations/zh_CN/dev-tools/sparse.rst。
Smatch扩展了Sparse,并提供了对编程逻辑错误的额外检查,如开关语句中
缺少断点,错误检查中未使用的返回值,忘记在错误路径的返回中设置错误代
码等。Smatch也有针对更严重问题的测试,如整数溢出、空指针解除引用和内
存泄漏。见项目页面http://smatch.sourceforge.net/。
Coccinelle是我们可以使用的另一个静态分析器。Coccinelle经常被用来
帮助源代码的重构和并行演化,但它也可以帮助避免常见代码模式中出现的某
些错误。可用的测试类型包括API测试、内核迭代器的正确使用测试、自由操
作的合理性检查、锁定行为的分析,以及已知的有助于保持内核使用一致性的
进一步测试。详情请见Documentation/dev-tools/coccinelle.rst。
不过要注意的是,静态分析工具存在**假阳性**的问题。在试图修复错误和警
告之前,需要仔细评估它们。
.. SPDX-License-Identifier: GPL-2.0
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/Devicetree/index.rst
:Original: Documentation/devicetree/index.rst
:翻译:
......
.. SPDX-License-Identifier: GPL-2.0
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/Devicetree/of_unittest.rst
:Original: Documentation/devicetree/of_unittest.rst
:翻译:
......
.. SPDX-License-Identifier: GPL-2.0
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/Devicetree/usage-model.rst
:Original: Documentation/devicetree/usage-model.rst
:翻译:
......
......@@ -14,7 +14,7 @@ Linux内核源文件可以包含kernel-doc格式的结构化文档注释,用
实际有着明显的不同。内核源包含成千上万个kernel-doc注释。请坚持遵循
此处描述的风格。
.. note:: kernel-doc无法包含Rust代码:请参考 Documentation/rust/docs.rst 。
.. note:: kernel-doc无法包含Rust代码:请参考 Documentation/rust/general-information.rst 。
从注释中提取kernel-doc结构,并从中生成适当的 `Sphinx C 域`_ 函数和带有锚点的
类型描述。这些注释将被过滤以生成特殊kernel-doc高亮和交叉引用。详见下文。
......
......@@ -37,10 +37,10 @@ configfs轻松配置的对象(例如:设备,触发器)。
3. 软件触发器
=============
IIO默认configfs组之一是“触发器”组。 挂载configfs后可以自动访问它,并且可
IIO默认configfs组之一是“触发器”组。挂载configfs后可以自动访问它,并且可
以在/config/iio/triggers下找到。
IIO软件触发器为创建多种触发器类型提供了支持。 通常在include/linux/iio
IIO软件触发器为创建多种触发器类型提供了支持。通常在include/linux/iio
/sw_trigger.h:中的接口下将新的触发器类型实现为单独的内核模块:
::
......@@ -76,10 +76,10 @@ IIO软件触发器为创建多种触发器类型提供了支持。 通常在incl
.ops = &iio_trig_sample_ops,
};
module_iio_sw_trigger_driver(iio_trig_sample);
module_iio_sw_trigger_driver(iio_trig_sample);
每种触发器类型在/config/iio/triggers下都有其自己的目录。 加载iio-trig-sample
模块将创建“ trig-sample”触发器类型目录/config/iio/triggers/trig-sample.
每种触发器类型在/config/iio/triggers下都有其自己的目录。加载iio-trig-sample
模块将创建“trig-sample”触发器类型目录/config/iio/triggers/trig-sample.
我们支持以下中断源(触发器类型)
......@@ -102,3 +102,5 @@ module_iio_sw_trigger_driver(iio_trig_sample);
----------------------------
"hrtimer”触发器类型没有来自/config dir的任何可配置属性。
它确实引入了触发目录的sampling_frequency属性。
该属性以Hz为单位设置轮询频率,精度为mHz。
\ No newline at end of file
......@@ -81,7 +81,7 @@
过硬件中断)的“软件中断”将运行( ``kernel/softirq.c`` )。
此处完成了许多真正的中断处理工作。在向SMP过渡的早期,只有“bottom halves下半
部”(BHs)机制,无法利用多个CPU的优势。在从那些一团糟的电脑切换过来后不久,
部”(BHs)机制,无法利用多个CPU的优势。在从那些一团糟的电脑切换过来后不久,
我们放弃了这个限制,转而使用“软中断”。
``include/linux/interrupt.h`` 列出了不同的软中断。定时器软中断是一个非常重要
......@@ -95,8 +95,7 @@
.. warning::
“tasklet”这个名字是误导性的:它们与“任务”无关,可能更多与当时
阿列克谢·库兹涅佐夫享用的糟糕伏特加有关。
“tasklet”这个名字是误导性的:它们与“任务”无关。
你可以使用 :c:func:`in_softirq()` 宏( ``include/linux/preempt.h`` )来确认
是否处于软中断(或子任务)中。
......@@ -247,7 +246,7 @@ Provide mechanism not policy”。
与 :c:func:`put_user()` 和 :c:func:`get_user()` 不同,它们返回未复制的
数据量(即0仍然意味着成功)。
【是的,这个愚蠢的接口真心让我尴尬。火爆的口水仗大概每年都会发生。
【是的,这个讨厌的接口真心让我尴尬。火爆的口水仗大概每年都会发生。
—— Rusty Russell】
这些函数可以隐式睡眠。它不应该在用户上下文之外调用(没有意义)、调用时禁用中断
......@@ -538,9 +537,9 @@ Documentation/core-api/symbol-namespaces.rst 。
Linus和其他开发人员有时会更改开发内核中的函数或结构体名称;这样做不仅是为了
让每个人都保持警惕,还反映了一个重大的更改(例如,不能再在打开中断的情况下
调用,或者执行额外的检查,或者不执行以前捕获的检查)。通常这会附带一个linux
内核邮件列表中相当全面的注释;请搜索存档以查看。简单地对文件进行全局替换通常
会让事情变得 **更糟** 。
调用,或者执行额外的检查,或者不执行以前捕获的检查)。通常这会附带发送一个
相当全面的注释到相应的内核邮件列表中;请搜索存档以查看。简单地对文件进行全局
替换通常只会让事情变得 **更糟** 。
初始化结构体成员
------------------
......@@ -610,7 +609,7 @@ C++
为了让你的东西更正式、补丁更整洁,还有一些工作要做:
- 搞清楚你在谁的地界儿上干活。查看源文件的顶部、 ``MAINTAINERS`` 文件以及
- 搞清楚你修改的代码属于谁。查看源文件的根目录、 ``MAINTAINERS`` 文件以及
``CREDITS`` 文件的最后一部分。你应该和此人协调,确保你没有重新发明轮子,
或者尝试一些已经被拒绝的东西。
......@@ -629,12 +628,12 @@ C++
“obj-$(CONFIG_xxx) += xxx.o”。语法记录在
Documentation/kbuild/makefiles.rst 。
- 如果你做了一些有意义的事情,那可以把自己放进 ``CREDITS`` ,通常不止一个
文件(无论如何你的名字都应该在源文件的顶部)。维护人员意味着您希望在对
子系统进行更改时得到询问,并了解缺陷;这意味着对某部分代码做出更多承诺。
- 如果你认为自己做了一些有意义的事情,可以把自己放进 ``CREDITS`` ,通常不
止一个文件(无论如何你的名字都应该在源文件的顶部)。 ``MAINTAINERS``
意味着您希望在对子系统进行更改时得到询问,并了解缺陷;这意味着对某部分
代码做出更多承诺。
- 最后,别忘记去阅读 Documentation/process/submitting-patches.rst ,
也许还有 Documentation/process/submitting-drivers.rst 。
- 最后,别忘记去阅读 Documentation/process/submitting-patches.rst。
Kernel 仙女棒
===============
......
......@@ -14,17 +14,18 @@
.. toctree::
:maxdepth: 1
mutex-design
spinlocks
TODOList:
* locktypes
* lockdep-design
* lockstat
* locktorture
* mutex-design
* rt-mutex-design
* rt-mutex
* seqlock
* spinlocks
* ww-mutex-design
* preempt-locking
* pi-futex
......
.. SPDX-License-Identifier: GPL-2.0
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/locking/mutex-design.rst
:翻译:
唐艺舟 Tang Yizhou <tangyeechou@gmail.com>
================
通用互斥锁子系统
================
:初稿:
Ingo Molnar <mingo@redhat.com>
:更新:
Davidlohr Bueso <davidlohr@hp.com>
什么是互斥锁?
--------------
在Linux内核中,互斥锁(mutex)指的是一个特殊的加锁原语,它在共享内存系统上
强制保证序列化,而不仅仅是指在学术界或类似的理论教科书中出现的通用术语“相互
排斥”。互斥锁是一种睡眠锁,它的行为类似于二进制信号量(semaphores),在
2006年被引入时[1],作为后者的替代品。这种新的数据结构提供了许多优点,包括更
简单的接口,以及在当时更少的代码量(见缺陷)。
[1] https://lwn.net/Articles/164802/
实现
----
互斥锁由“struct mutex”表示,在include/linux/mutex.h中定义,并在
kernel/locking/mutex.c中实现。这些锁使用一个原子变量(->owner)来跟踪
它们生命周期内的锁状态。字段owner实际上包含的是指向当前锁所有者的
`struct task_struct *` 指针,因此如果无人持有锁,则它的值为空(NULL)。
由于task_struct的指针至少按L1_CACHE_BYTES对齐,低位(3)被用来存储额外
的状态(例如,等待者列表非空)。在其最基本的形式中,它还包括一个等待队列和
一个确保对其序列化访问的自旋锁。此外,CONFIG_MUTEX_SPIN_ON_OWNER=y的
系统使用一个自旋MCS锁(->osq,译注:MCS是两个人名的合并缩写),在下文的
(ii)中描述。
准备获得一把自旋锁时,有三种可能经过的路径,取决于锁的状态:
(i) 快速路径:试图通过调用cmpxchg()修改锁的所有者为当前任务,以此原子化地
获取锁。这只在无竞争的情况下有效(cmpxchg()检查值是否为0,所以3个状态
比特必须为0)。如果锁处在竞争状态,代码进入下一个可能的路径。
(ii) 中速路径:也就是乐观自旋,当锁的所有者正在运行并且没有其它优先级更高的
任务(need_resched,需要重新调度)准备运行时,当前任务试图自旋来获得
锁。原理是,如果锁的所有者正在运行,它很可能不久就会释放锁。互斥锁自旋体
使用MCS锁排队,这样只有一个自旋体可以竞争互斥锁。
MCS锁(由Mellor-Crummey和Scott提出)是一个简单的自旋锁,它具有一些
理想的特性,比如公平,以及每个CPU在试图获得锁时在一个本地变量上自旋。
它避免了常见的“检测-设置”自旋锁实现导致的(CPU核间)缓存行回弹
(cacheline bouncing)这种昂贵的开销。一个类MCS锁是为实现睡眠锁的
乐观自旋而专门定制的。这种定制MCS锁的一个重要特性是,它有一个额外的属性,
当自旋体需要重新调度时,它们能够退出MCS自旋锁队列。这进一步有助于避免
以下场景:需要重新调度的MCS自旋体将继续自旋等待自旋体所有者,即将获得
MCS锁时却直接进入慢速路径。
(iii) 慢速路径:最后的手段,如果仍然无法获得锁,该任务会被添加到等待队列中,
休眠直到被解锁路径唤醒。在通常情况下,它以TASK_UNINTERRUPTIBLE状态
阻塞。
虽然从形式上看,内核互斥锁是可睡眠的锁,路径(ii)使它实际上成为混合类型。通过
简单地不中断一个任务并忙着等待几个周期,而不是立即睡眠,这种锁已经被认为显著
改善一些工作负载的性能。注意,这种技术也被用于读写信号量(rw-semaphores)。
语义
----
互斥锁子系统检查并强制执行以下规则:
- 每次只有一个任务可以持有该互斥锁。
- 只有锁的所有者可以解锁该互斥锁。
- 不允许多次解锁。
- 不允许递归加锁/解锁。
- 互斥锁只能通过API进行初始化(见下文)。
- 一个任务不能在持有互斥锁的情况下退出。
- 持有锁的内存区域不得被释放。
- 被持有的锁不能被重新初始化。
- 互斥锁不能用于硬件或软件中断上下文,如小任务(tasklet)和定时器。
当CONFIG DEBUG_MUTEXES被启用时,这些语义将被完全强制执行。此外,互斥锁
调试代码还实现了一些其它特性,使锁的调试更容易、更快速:
- 当打印到调试输出时,总是使用互斥锁的符号名称。
- 加锁点跟踪,函数名符号化查找,系统持有的全部锁的列表,打印出它们。
- 所有者跟踪。
- 检测自我递归的锁并打印所有相关信息。
- 检测多任务环形依赖死锁,并打印所有受影响的锁和任务(并且只限于这些任务)。
接口
----
静态定义互斥锁::
DEFINE_MUTEX(name);
动态初始化互斥锁::
mutex_init(mutex);
以不可中断方式(uninterruptible)获取互斥锁::
void mutex_lock(struct mutex *lock);
void mutex_lock_nested(struct mutex *lock, unsigned int subclass);
int mutex_trylock(struct mutex *lock);
以可中断方式(interruptible)获取互斥锁::
int mutex_lock_interruptible_nested(struct mutex *lock,
unsigned int subclass);
int mutex_lock_interruptible(struct mutex *lock);
当原子变量减为0时,以可中断方式(interruptible)获取互斥锁::
int atomic_dec_and_mutex_lock(atomic_t *cnt, struct mutex *lock);
释放互斥锁::
void mutex_unlock(struct mutex *lock);
检测是否已经获取互斥锁::
int mutex_is_locked(struct mutex *lock);
缺陷
----
与它最初的设计和目的不同,'struct mutex' 是内核中最大的锁之一。例如:在
x86-64上它是32字节,而 'struct semaphore' 是24字节,rw_semaphore是
40字节。更大的结构体大小意味着更多的CPU缓存和内存占用。
何时使用互斥锁
--------------
总是优先选择互斥锁而不是任何其它锁原语,除非互斥锁的严格语义不合适,和/或临界区
阻止锁被共享。
......@@ -19,8 +19,7 @@
内核开发社区已经发展出一套用于发布补丁的约定和过程;遵循这些约定和过程将使
参与其中的每个人的生活更加轻松。本文档试图描述这些约定的部分细节;更多信息
也可在以下文档中找到
:ref:`Documentation/translations/zh_CN/process/submitting-patches.rst <cn_submittingpatches>`,
:ref:`Documentation/translations/zh_CN/process/submitting-drivers.rst <cn_submittingdrivers>`
:ref:`Documentation/translations/zh_CN/process/submitting-patches.rst <cn_submittingpatches>`
和 :ref:`Documentation/translations/zh_CN/process/submit-checklist.rst <cn_submitchecklist>`。
何时寄送
......
......@@ -19,7 +19,6 @@
:ref:`Documentation/translations/zh_CN/process/howto.rst <cn_process_howto>`
文件是一个重要的起点;
:ref:`Documentation/translations/zh_CN/process/submitting-patches.rst <cn_submittingpatches>`
和 :ref:`Documentation/translations/zh_CN/process/submitting-drivers.rst <cn_submittingdrivers>`
也是所有内核开发人员都应该阅读的内容。许多内部内核API都是使用kerneldoc机制
记录的;“make htmldocs”或“make pdfdocs”可用于以HTML或PDF格式生成这些文档
(尽管某些发行版提供的tex版本会遇到内部限制,无法正确处理文档)。
......
......@@ -96,7 +96,6 @@ Linux内核代码中包含有大量的文档。这些文档对于学习如何与
的代码。
:ref:`Documentation/translations/zh_CN/process/submitting-patches.rst <cn_submittingpatches>`
:ref:`Documentation/process/submitting-drivers.rst <submittingdrivers>`
这两份文档明确描述如何创建和发送补丁,其中包括(但不仅限于):
- 邮件内容
......
......@@ -40,7 +40,6 @@
.. toctree::
:maxdepth: 1
submitting-drivers
submit-checklist
stable-api-nonsense
stable-kernel-rules
......
.. _cn_submittingdrivers:
.. include:: ../disclaimer-zh_CN.rst
:Original: :ref:`Documentation/process/submitting-drivers.rst
<submittingdrivers>`
如果想评论或更新本文的内容,请直接联系原文档的维护者。如果你使用英文
交流有困难的话,也可以向中文版维护者求助。如果本翻译更新不及时或者翻
译存在问题,请联系中文版维护者::
中文版维护者: 李阳 Li Yang <leoyang.li@nxp.com>
中文版翻译者: 李阳 Li Yang <leoyang.li@nxp.com>
中文版校译者: 陈琦 Maggie Chen <chenqi@beyondsoft.com>
王聪 Wang Cong <xiyou.wangcong@gmail.com>
张巍 Zhang Wei <wezhang@outlook.com>
如何向 Linux 内核提交驱动程序
=============================
这篇文档将会解释如何向不同的内核源码树提交设备驱动程序。请注意,如果你感
兴趣的是显卡驱动程序,你也许应该访问 XFree86 项目(https://www.xfree86.org/)
和/或 X.org 项目 (https://x.org)。
另请参阅 Documentation/translations/zh_CN/process/submitting-patches.rst 文档。
分配设备号
----------
块设备和字符设备的主设备号与从设备号是由 Linux 命名编号分配权威 LANANA(
现在是 Torben Mathiasen)负责分配。申请的网址是 https://www.lanana.org/。
即使不准备提交到主流内核的设备驱动也需要在这里分配设备号。有关详细信息,
请参阅 Documentation/admin-guide/devices.rst。
如果你使用的不是已经分配的设备号,那么当你提交设备驱动的时候,它将会被强
制分配一个新的设备号,即便这个设备号和你之前发给客户的截然不同。
设备驱动的提交对象
------------------
Linux 2.0:
此内核源码树不接受新的驱动程序。
Linux 2.2:
此内核源码树不接受新的驱动程序。
Linux 2.4:
如果所属的代码领域在内核的 MAINTAINERS 文件中列有一个总维护者,
那么请将驱动程序提交给他。如果此维护者没有回应或者你找不到恰当的
维护者,那么请联系 Willy Tarreau <w@1wt.eu>。
Linux 2.6:
除了遵循和 2.4 版内核同样的规则外,你还需要在 linux-kernel 邮件
列表上跟踪最新的 API 变化。向 Linux 2.6 内核提交驱动的顶级联系人
是 Andrew Morton <akpm@linux-foundation.org>。
决定设备驱动能否被接受的条件
----------------------------
许可: 代码必须使用 GNU 通用公开许可证 (GPL) 提交给 Linux,但是
我们并不要求 GPL 是唯一的许可。你或许会希望同时使用多种
许可证发布,如果希望驱动程序可以被其他开源社区(比如BSD)
使用。请参考 include/linux/module.h 文件中所列出的可被
接受共存的许可。
版权: 版权所有者必须同意使用 GPL 许可。最好提交者和版权所有者
是相同个人或实体。否则,必需列出授权使用 GPL 的版权所有
人或实体,以备验证之需。
接口: 如果你的驱动程序使用现成的接口并且和其他同类的驱动程序行
为相似,而不是去发明无谓的新接口,那么它将会更容易被接受。
如果你需要一个 Linux 和 NT 的通用驱动接口,那么请在用
户空间实现它。
代码: 请使用 Documentation/process/coding-style.rst 中所描述的 Linux 代码风
格。如果你的某些代码段(例如那些与 Windows 驱动程序包共
享的代码段)需要使用其他格式,而你却只希望维护一份代码,
那么请将它们很好地区分出来,并且注明原因。
可移植性: 请注意,指针并不永远是 32 位的,不是所有的计算机都使用小
尾模式 (little endian) 存储数据,不是所有的人都拥有浮点
单元,不要随便在你的驱动程序里嵌入 x86 汇编指令。只能在
x86 上运行的驱动程序一般是不受欢迎的。虽然你可能只有 x86
硬件,很难测试驱动程序在其他平台上是否可用,但是确保代码
可以被轻松地移植却是很简单的。
清晰度: 做到所有人都能修补这个驱动程序将会很有好处,因为这样你将
会直接收到修复的补丁而不是 bug 报告。如果你提交一个试图
隐藏硬件工作机理的驱动程序,那么它将会被扔进废纸篓。
电源管理: 因为 Linux 正在被很多移动设备和桌面系统使用,所以你的驱
动程序也很有可能被使用在这些设备上。它应该支持最基本的电
源管理,即在需要的情况下实现系统级休眠和唤醒要用到的
.suspend 和 .resume 函数。你应该检查你的驱动程序是否能正
确地处理休眠与唤醒,如果实在无法确认,请至少把 .suspend
函数定义成返回 -ENOSYS(功能未实现)错误。你还应该尝试确
保你的驱动在什么都不干的情况下将耗电降到最低。要获得驱动
程序测试的指导,请参阅
Documentation/power/drivers-testing.rst。有关驱动程序电
源管理问题相对全面的概述,请参阅
Documentation/driver-api/pm/devices.rst。
管理: 如果一个驱动程序的作者还在进行有效的维护,那么通常除了那
些明显正确且不需要任何检查的补丁以外,其他所有的补丁都会
被转发给作者。如果你希望成为驱动程序的联系人和更新者,最
好在代码注释中写明并且在 MAINTAINERS 文件中加入这个驱动
程序的条目。
不影响设备驱动能否被接受的条件
------------------------------
供应商: 由硬件供应商来维护驱动程序通常是一件好事。不过,如果源码
树里已经有其他人提供了可稳定工作的驱动程序,那么请不要期
望“我是供应商”会成为内核改用你的驱动程序的理由。理想的情
况是:供应商与现有驱动程序的作者合作,构建一个统一完美的
驱动程序。
作者: 驱动程序是由大的 Linux 公司研发还是由你个人编写,并不影
响其是否能被内核接受。没有人对内核源码树享有特权。只要你
充分了解内核社区,你就会发现这一点。
资源列表
--------
Linux 内核主源码树:
ftp.??.kernel.org:/pub/linux/kernel/...
?? == 你的国家代码,例如 "cn"、"us"、"uk"、"fr" 等等
Linux 内核邮件列表:
linux-kernel@vger.kernel.org
[可通过向majordomo@vger.kernel.org发邮件来订阅]
Linux 设备驱动程序,第三版(探讨 2.6.10 版内核):
https://lwn.net/Kernel/LDD3/ (免费版)
LWN.net:
每周内核开发活动摘要 - https://lwn.net/
2.6 版中 API 的变更:
https://lwn.net/Articles/2.6-kernel-api/
将旧版内核的驱动程序移植到 2.6 版:
https://lwn.net/Articles/driver-porting/
内核新手(KernelNewbies):
为新的内核开发者提供文档和帮助
https://kernelnewbies.org/
Linux USB项目:
http://www.linux-usb.org/
写内核驱动的“不要”(Arjan van de Ven著):
http://www.fenrus.org/how-to-not-write-a-device-driver-paper.pdf
内核清洁工 (Kernel Janitor):
https://kernelnewbies.org/KernelJanitors
......@@ -23,9 +23,7 @@
以下文档含有大量简洁的建议, 具体请见:
:ref:`Documentation/process <development_process_main>`
同样,:ref:`Documentation/translations/zh_CN/process/submit-checklist.rst <cn_submitchecklist>`
给出在提交代码前需要检查的项目的列表。如果你在提交一个驱动程序,那么
同时阅读一下:
:ref:`Documentation/process/submitting-drivers.rst <submittingdrivers>`
给出在提交代码前需要检查的项目的列表。
其中许多步骤描述了Git版本控制系统的默认行为;如果您使用Git来准备补丁,
您将发现它为您完成的大部分机械工作,尽管您仍然需要准备和记录一组合理的
......
......@@ -19,7 +19,6 @@ RISC-V 体系结构
boot-image-header
vm-layout
pmu
patch-acceptance
......
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/riscv/pmu.rst
:翻译:
司延腾 Yanteng Si <siyanteng@loongson.cn>
.. _cn_riscv_pmu:
========================
RISC-V平台上对PMUs的支持
========================
Alan Kao <alankao@andestech.com>, Mar 2018
简介
------------
截止本文撰写时,在The RISC-V ISA Privileged Version 1.10中提到的 perf_event
相关特性如下:
(详情请查阅手册)
* [m|s]counteren
* mcycle[h], cycle[h]
* minstret[h], instret[h]
* mhpeventx, mhpcounterx[h]
仅有以上这些功能,移植perf需要做很多工作,究其原因是缺少以下通用架构的性能
监测特性:
* 启用/停用计数器
在我们这里,计数器一直在自由运行。
* 计数器溢出引起的中断
规范中没有这种功能。
* 中断指示器
不可能所有的计数器都有很多的中断端口,所以需要一个中断指示器让软件来判断
哪个计数器刚好溢出。
* 写入计数器
由于内核不能修改计数器,所以会有一个SBI来支持这个功能[1]。 另外,一些厂商
考虑实现M-S-U型号机器的硬件扩展来直接写入计数器。
这篇文档旨在为开发者提供一个在内核中支持PMU的简要指南。下面的章节简要解释了
perf' 机制和待办事项。
你可以在这里查看以前的讨论[1][2]。 另外,查看附录中的相关内核结构体可能会有
帮助。
1. 初始化
---------
*riscv_pmu* 是一个类型为 *struct riscv_pmu* 的全局指针,它包含了根据perf内部
约定的各种方法和PMU-specific参数。人们应该声明这样的实例来代表PMU。 默认情况
下, *riscv_pmu* 指向一个常量结构体 *riscv_base_pmu* ,它对基准QEMU模型有非常
基础的支持。
然后他/她可以将实例的指针分配给 *riscv_pmu* ,这样就可以利用已经实现的最小逻
辑,或者创建他/她自己的 *riscv_init_platform_pmu* 实现。
换句话说,现有的 *riscv_base_pmu* 源只是提供了一个参考实现。 开发者可以灵活地
决定多少部分可用,在最极端的情况下,他们可以根据自己的需要定制每一个函数。
2. Event Initialization
-----------------------
当用户启动perf命令来监控一些事件时,首先会被用户空间的perf工具解释为多个
*perf_event_open* 系统调用,然后进一步调用上一步分配的 *event_init* 成员函数
的主体。 在 *riscv_base_pmu* 的情况下,就是 *riscv_event_init* 。
该功能的主要目的是将用户提供的事件翻译成映射图,从而可以直接对HW-related的控
制寄存器或计数器进行操作。该翻译基于 *riscv_pmu* 中提供的映射和方法。
注意,有些功能也可以在这个阶段完成:
(1) 中断设置,这个在下一节说;
(2) 特限级设置(仅用户空间、仅内核空间、两者都有);
(3) 析构函数设置。 通常应用 *riscv_destroy_event* 即可;
(4) 对非采样事件的调整,这将被函数应用,如 *perf_adjust_period* ,通常如下::
if (!is_sampling_event(event)) {
hwc->sample_period = x86_pmu.max_period;
hwc->last_period = hwc->sample_period;
local64_set(&hwc->period_left, hwc->sample_period);
}
在 *riscv_base_pmu* 的情况下,目前只提供了(3)。
3. 中断
-------
3.1. 中断初始化
这种情况经常出现在 *event_init* 方案的开头。通常情况下,这应该是一个代码段,如::
int x86_reserve_hardware(void)
{
int err = 0;
if (!atomic_inc_not_zero(&pmc_refcount)) {
mutex_lock(&pmc_reserve_mutex);
if (atomic_read(&pmc_refcount) == 0) {
if (!reserve_pmc_hardware())
err = -EBUSY;
else
reserve_ds_buffers();
}
if (!err)
atomic_inc(&pmc_refcount);
mutex_unlock(&pmc_reserve_mutex);
}
return err;
}
而神奇的是 *reserve_pmc_hardware* ,它通常做原子操作,使实现的IRQ可以从某个全局函
数指针访问。 而 *release_pmc_hardware* 的作用正好相反,它用在上一节提到的事件分配
器中。
(注:从所有架构的实现来看,*reserve/release* 对总是IRQ设置,所以 *pmc_hardware*
似乎有些误导。 它并不处理事件和物理计数器之间的绑定,这一点将在下一节介绍。)
3.2. IRQ结构体
基本上,一个IRQ运行以下伪代码::
for each hardware counter that triggered this overflow
get the event of this counter
// following two steps are defined as *read()*,
// check the section Reading/Writing Counters for details.
count the delta value since previous interrupt
update the event->count (# event occurs) by adding delta, and
event->hw.period_left by subtracting delta
if the event overflows
sample data
set the counter appropriately for the next overflow
if the event overflows again
too frequently, throttle this event
fi
fi
end for
然而截至目前,没有一个RISC-V的实现为perf设计了中断,所以具体的实现要在未来完成。
4. Reading/Writing 计数
-----------------------
它们看似差不多,但perf对待它们的态度却截然不同。 对于读,在 *struct pmu* 中有一个
*read* 接口,但它的作用不仅仅是读。 根据上下文,*read* 函数不仅要读取计数器的内容
(event->count),还要更新左周期到下一个中断(event->hw.period_left)。
但 perf 的核心不需要直接写计数器。 写计数器隐藏在以下两点的抽象化之后,
1) *pmu->start* ,从字面上看就是开始计数,所以必须把计数器设置成一个合适的值,以
便下一次中断;
2)在IRQ里面,应该把计数器设置成同样的合理值。
在RISC-V中,读操作不是问题,但写操作就需要费些力气了,因为S模式不允许写计数器。
5. add()/del()/start()/stop()
-----------------------------
基本思想: add()/del() 向PMU添加/删除事件,start()/stop() 启动/停止PMU中某个事件
的计数器。 所有这些函数都使用相同的参数: *struct perf_event *event* 和 *int flag* 。
把 perf 看作一个状态机,那么你会发现这些函数作为这些状态之间的状态转换过程。
定义了三种状态(event->hw.state):
* PERF_HES_STOPPED: 计数停止
* PERF_HES_UPTODATE: event->count是最新的
* PERF_HES_ARCH: 依赖于体系结构的用法,。。。我们现在并不需要它。
这些状态转换的正常流程如下:
* 用户启动一个 perf 事件,导致调用 *event_init* 。
* 当被上下文切换进来的时候,*add* 会被 perf core 调用,并带有一个标志 PERF_EF_START,
也就是说事件被添加后应该被启动。 在这个阶段,如果有的话,一般事件会被绑定到一个物
理计数器上。当状态变为PERF_HES_STOPPED和PERF_HES_UPTODATE,因为现在已经停止了,
(软件)事件计数不需要更新。
- 然后调用 *start* ,并启用计数器。
通过PERF_EF_RELOAD标志,它向计数器写入一个适当的值(详细情况请参考上一节)。
如果标志不包含PERF_EF_RELOAD,则不会写入任何内容。
现在状态被重置为none,因为它既没有停止也没有更新(计数已经开始)。
*当被上下文切换出来时被调用。 然后,它检查出PMU中的所有事件,并调用 *stop* 来更新它们
的计数。
- *stop* 被 *del* 和perf核心调用,标志为PERF_EF_UPDATE,它经常以相同的逻辑和 *read*
共用同一个子程序。
状态又一次变为PERF_HES_STOPPED和PERF_HES_UPTODATE。
- 这两对程序的生命周期: *add* 和 *del* 在任务切换时被反复调用;*start* 和 *stop* 在
perf核心需要快速停止和启动时也会被调用,比如在调整中断周期时。
目前的实现已经足够了,将来可以很容易地扩展到功能。
A. 相关结构体
-------------
* struct pmu: include/linux/perf_event.h
* struct riscv_pmu: arch/riscv/include/asm/perf_event.h
两个结构体都被设计为只读。
*struct pmu* 定义了一些函数指针接口,它们大多以 *struct perf_event* 作为主参数,根据
perf的内部状态机处理perf事件(详情请查看kernel/events/core.c)。
*struct riscv_pmu* 定义了PMU的具体参数。 命名遵循所有其它架构的惯例。
* struct perf_event: include/linux/perf_event.h
* struct hw_perf_event
表示 perf 事件的通用结构体,以及硬件相关的细节。
* struct riscv_hw_events: arch/riscv/include/asm/perf_event.h
保存事件状态的结构有两个固定成员。
事件的数量和事件的数组。
参考文献
--------
[1] https://github.com/riscv/riscv-linux/pull/124
[2] https://groups.google.com/a/groups.riscv.org/forum/#!topic/sw-dev/f19TmCNP6yA
......@@ -6,6 +6,7 @@
:翻译:
司延腾 Yanteng Si <siyanteng@loongson.cn>
Binbin Zhou <zhoubinbin@loongson.cn>
============================
RISC-V Linux上的虚拟内存布局
......@@ -65,3 +66,39 @@ RISC-V Linux Kernel SV39
ffffffff00000000 | -4 GB | ffffffff7fffffff | 2 GB | modules, BPF
ffffffff80000000 | -2 GB | ffffffffffffffff | 2 GB | kernel
__________________|____________|__________________|_________|____________________________________________________________
RISC-V Linux Kernel SV48
------------------------
::
========================================================================================================================
开始地址 | 偏移 | 结束地址 | 大小 | 虚拟内存区域描述
========================================================================================================================
| | | |
0000000000000000 | 0 | 00007fffffffffff | 128 TB | 用户空间虚拟内存,每个内存管理器不同
__________________|____________|__________________|_________|___________________________________________________________
| | | |
0000800000000000 | +128 TB | ffff7fffffffffff | ~16M TB | ... 巨大的、几乎64位宽的直到内核映射的-128TB地方
| | | | 开始偏移的非经典虚拟内存地址空洞。
| | | |
__________________|____________|__________________|_________|___________________________________________________________
|
| 内核空间的虚拟内存,在所有进程之间共享:
____________________________________________________________|___________________________________________________________
| | | |
ffff8d7ffee00000 | -114.5 TB | ffff8d7ffeffffff | 2 MB | fixmap
ffff8d7fff000000 | -114.5 TB | ffff8d7fffffffff | 16 MB | PCI io
ffff8d8000000000 | -114.5 TB | ffff8f7fffffffff | 2 TB | vmemmap
ffff8f8000000000 | -112.5 TB | ffffaf7fffffffff | 32 TB | vmalloc/ioremap space
ffffaf8000000000 | -80.5 TB | ffffef7fffffffff | 64 TB | 直接映射所有物理内存
ffffef8000000000 | -16.5 TB | fffffffeffffffff | 16.5 TB | kasan
__________________|____________|__________________|_________|____________________________________________________________
|
| 从此处开始,与39-bit布局相同:
____________________________________________________________|____________________________________________________________
| | | |
ffffffff00000000 | -4 GB | ffffffff7fffffff | 2 GB | modules, BPF
ffffffff80000000 | -2 GB | ffffffffffffffff | 2 GB | kernel
__________________|____________|__________________|_________|____________________________________________________________
......@@ -57,8 +57,8 @@ cpu<N> 1 2 3 4 5 6 7 8 9
接下来的三个统计数据描述了调度延迟:
7) 本处理器运行任务的总时间,单位是jiffies
8) 本处理器任务等待运行的时间,单位是jiffies
7) 本处理器运行任务的总时间,单位是纳秒
8) 本处理器任务等待运行的时间,单位是纳秒
9) 本CPU运行了#个时间片
域统计数据
......@@ -146,8 +146,8 @@ domain<N> <cpumask> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
schedstats还添加了一个新的/proc/<pid>/schedstat文件,来提供一些进程级的
相同信息。这个文件中,有三个字段与该进程相关:
1) 在CPU上运行花费的时间
2) 在运行队列上等待的时间
1) 在CPU上运行花费的时间(单位是纳秒)
2) 在运行队列上等待的时间(单位是纳秒)
3) 在CPU上运行了#个时间片
可以很容易地编写一个程序,利用这些额外的字段来报告一个特定的进程或一组进程在
......
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/vm/_free_page_reporting.rst
:Original: Documentation/vm/free_page_reporting.rst
:翻译:
......
:Original: Documentation/vm/_free_page_reporting.rst
:Original: Documentation/vm/free_page_reporting.rst
:翻译:
......
......@@ -50,55 +50,55 @@
临时虚拟映射
============
内核包含几种创建临时映射的方法。:
内核包含几种创建临时映射的方法。下面的列表按照使用的优先顺序显示了它们。
* vmap(). 这可以用来将多个物理页长期映射到一个连续的虚拟空间。它需要synchronization
来解除映射
* kmap_local_page()。这个函数是用来要求短期映射的。它可以从任何上下文(包括中断)中调用,
但是映射只能在获取它们的上下文中使用
* kmap(). 这允许对单个页面进行短期映射。它需要synchronization,但在一定程度上被摊销。
当以嵌套方式使用时,它也很容易出现死锁,因此不建议在新代码中使用它。
在可行的情况下,这个函数应该比其他所有的函数优先使用。
* kmap_atomic(). 这允许对单个页面进行非常短的时间映射。由于映射被限制在发布它的CPU上,
它表现得很好,但发布任务因此被要求留在该CPU上直到它完成,以免其他任务取代它的映射。
kmap_atomic() 也可以由中断上下文使用,因为它不睡眠,而且调用者可能在调用kunmap_atomic()
之后才睡眠。
可以假设k[un]map_atomic()不会失败。
这些映射是线程本地和CPU本地的,这意味着映射只能从这个线程中访问,并且当映射处于活动状
态时,该线程与CPU绑定。即使线程被抢占了(因为抢占永远不会被函数禁用),CPU也不能通过
CPU-hotplug从系统中拔出,直到映射被处理掉。
在本地的kmap区域中采取pagefaults是有效的,除非获取本地映射的上下文由于其他原因不允许
这样做。
使用kmap_atomic
===============
kmap_local_page()总是返回一个有效的虚拟地址,并且假定kunmap_local()不会失败。
何时何地使用 kmap_atomic() 是很直接的。当代码想要访问一个可能从高内存(见__GFP_HIGHMEM)
分配的页面的内容时,例如在页缓存中的页面,就会使用它。该API有两个函数,它们的使用方式与
下面类似::
嵌套kmap_local_page()和kmap_atomic()映射在一定程度上是允许的(最多到KMAP_TYPE_NR),
但是它们的调用必须严格排序,因为映射的实现是基于堆栈的。关于如何管理嵌套映射的细节,
请参见kmap_local_page() kdocs(包含在 "函数 "部分)。
/* 找到感兴趣的页面。 */
struct page *page = find_get_page(mapping, offset);
/* 获得对该页内容的访问权。 */
void *vaddr = kmap_atomic(page);
* kmap_atomic(). 这允许对单个页面进行非常短的时间映射。由于映射被限制在发布它的CPU上,
它表现得很好,但发布的任务因此被要求留在该CPU上直到它完成,以免其他任务取代它的映射。
/* 对该页的内容做一些处理。 */
memset(vaddr, 0, PAGE_SIZE);
kmap_atomic()也可以被中断上下文使用,因为它不睡眠,调用者也可能在调用kunmap_atomic()
后才睡眠。
/* 解除该页面的映射。 */
kunmap_atomic(vaddr);
内核中对kmap_atomic()的每次调用都会创建一个不可抢占的段,并禁用缺页异常。这可能是
未预期延迟的来源之一。因此用户应该选择kmap_local_page()而不是kmap_atomic()。
注意,kunmap_atomic()调用的是kmap_atomic()调用的结果而不是参数
假设k[un]map_atomic()不会失败
如果你需要映射两个页面,因为你想从一个页面复制到另一个页面,你需要保持kmap_atomic调用严
格嵌套,如::
* kmap()。这应该被用来对单个页面进行短时间的映射,对抢占或迁移没有限制。它会带来开销,
因为映射空间是受限制的,并且受到全局锁的保护,以实现同步。当不再需要映射时,必须用
kunmap()释放该页被映射的地址。
vaddr1 = kmap_atomic(page1);
vaddr2 = kmap_atomic(page2);
映射变化必须广播到所有CPU(核)上,kmap()还需要在kmap的池被回绕(TLB项用光了,需要从第
一项复用)时进行全局TLB无效化,当映射空间被完全利用时,它可能会阻塞,直到有一个可用的
槽出现。因此,kmap()只能从可抢占的上下文中调用。
memcpy(vaddr1, vaddr2, PAGE_SIZE);
如果一个映射必须持续相对较长的时间,上述所有的工作都是必要的,但是内核中大部分的
高内存映射都是短暂的,而且只在一个地方使用。这意味着在这种情况下,kmap()的成本大
多被浪费了。kmap()并不是为长期映射而设计的,但是它已经朝着这个方向发展了,在较新
的代码中强烈不鼓励使用它,前面的函数集应该是首选。
kunmap_atomic(vaddr2);
kunmap_atomic(vaddr1);
在64位系统中,调用kmap_local_page()、kmap_atomic()和kmap()没有实际作用,因为64位
地址空间足以永久映射所有物理内存页面。
* vmap()。这可以用来将多个物理页长期映射到一个连续的虚拟空间。它需要全局同步来解除
映射。
临时映射的成本
==============
......@@ -126,3 +126,12 @@ i386 PAE
一般的建议是,你不要在32位机器上使用超过8GiB的空间--尽管更多的空间可能对你和你的工作
量有用,但你几乎是靠你自己--不要指望内核开发者真的会很关心事情的进展情况。
函数
====
该API在以下内核代码中:
include/linux/highmem.h
include/linux/highmem-internal.h
......@@ -12,11 +12,27 @@
Linux内存管理文档
=================
这是一个关于Linux内存管理(mm)子系统内部的文档集,其中有不同层次的细节,包括注释
和邮件列表的回复,用于阐述数据结构和算法的基本情况。如果你正在寻找关于简单分配内存的建
议,请参阅(Documentation/translations/zh_CN/core-api/memory-allocation.rst)。
对于控制和调整指南,请参阅(Documentation/admin-guide/mm/index)。
TODO:待引用文档集被翻译完毕后请及时修改此处)
这是一份关于了解Linux的内存管理子系统的指南。如果你正在寻找关于简单分配内存的
建议,请参阅内存分配指南
(Documentation/translations/zh_CN/core-api/memory-allocation.rst)。
关于控制和调整的指南,请看管理指南
(Documentation/translations/zh_CN/admin-guide/mm/index.rst)。
.. toctree::
:maxdepth: 1
highmem
该处剩余文档待原始文档有内容后翻译。
遗留文档
========
这是一个关于Linux内存管理(MM)子系统内部的旧文档的集合,其中有不同层次的细节,
包括注释和邮件列表的回复,用于阐述数据结构和算法的描述。它应该被很好地整合到上述
结构化的文档中,如果它已经完成了它的使命,可以删除。
.. toctree::
:maxdepth: 1
......@@ -25,7 +41,6 @@ TODO:待引用文档集被翻译完毕后请及时修改此处)
balance
damon/index
free_page_reporting
highmem
ksm
frontswap
hmm
......@@ -36,10 +51,12 @@ TODO:待引用文档集被翻译完毕后请及时修改此处)
numa
overcommit-accounting
page_frags
page_migration
page_owner
page_table_check
remap_file_pages
split_page_table_lock
vmalloced-kernel-stacks
z3fold
zsmalloc
......@@ -47,8 +64,6 @@ TODOLIST:
* arch_pgtable_helpers
* free_page_reporting
* hugetlbfs_reserv
* page_migration
* slub
* transhuge
* unevictable-lru
* vmalloced-kernel-stacks
:Original: Documentation/vm/page_frag.rst
:Original: Documentation/vm/page_frags.rst
:翻译:
......
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/vm/index.rst
:翻译:
司延腾 Yanteng Si <siyanteng@loongson.cn>
:校译:
========
页面迁移
========
页面迁移允许在进程运行时在NUMA系统的节点之间移动页面的物理位置。这意味着进程所看到的虚拟地
址并没有改变。然而,系统会重新安排这些页面的物理位置。
也可以参见 :ref: `<异构内存管理 (HMM)>` 以了解将页面迁移到设备私有内存或从设备私有内存中迁移。
页面迁移的主要目的是通过将页面移到访问该内存的进程所运行的处理器附近来减少内存访问的延迟。
页面迁移允许进程通过MF_MOVE和MF_MOVE_ALL选项手动重新定位其页面所在的节点,同时通过
mbind()设置一个新的内存策略。一个进程的页面也可以通过sys_migrate_pages()函数调用从另
一个进程重新定位。migrate_pages()函数调用接收两组节点,并将一个进程位于旧节点上的页面移
动到目标节点上。页面迁移功能由Andi Kleen的numactl包提供(需要0.9.3以上的版本,其仓库
地址https://github.com/numactl/numactl.git)。numactl提供了libnuma,它为页面迁移
提供了与其他NUMA功能类似的接口。执行 cat ``/proc/<pid>/numa_maps`` 允许轻松查看进
程的页面位置。参见proc(5)手册中的numa_maps文档。
如果调度程序将一个进程重新安置到一个遥远的节点上的处理器,手动迁移是很有用的。批量调度程序
或管理员可以检测到这种情况,并将进程的页面移到新处理器附近。内核本身只提供手动的页迁移支持。
自动的页面迁移可以通过用户空间的进程移动页面来实现。一个特殊的函数调用 "move_pages" 允许
在一个进程中移动单个页面。例如,NUMA分析器可以获得一个显示频繁的节点外访问的日志,并可以使
用这个结果将页面移动到更有利的位置。
较大型的设备通常使用cpusets将系统分割成若干个节点。Paul Jackson为cpusets配备了当任务被
转移到另一个cpuset时移动页面的能力(见:ref:`CPUSETS <cpusets>`)。Cpusets允许进程定
位的自动化。如果一个任务被移到一个新的cpuset上,那么它的所有页面也会随之移动,这样进程的
性能就不会急剧下降。如果cpuset允许的内存节点发生变化,cpuset中的进程页也会被移动。
页面迁移允许为所有迁移技术保留一组节点中页面的相对位置,这将保留生成的特定内存分配模式即使
进程已被迁移。为了保留内存延迟,这一点是必要的。迁移后的进程将以类似的性能运行。
页面迁移分几个步骤进行。首先为那些试图从内核中使用migrate_pages()的进程做一个高层次的
描述(对于用户空间的使用,可以参考上面提到的Andi Kleen的numactl包),然后对低水平的细
节工作做一个低水平描述。
在内核中使用 migrate_pages()
============================
1. 从LRU中移除页面。
要迁移的页面列表是通过扫描页面并把它们移到列表中来生成的。这是通过调用 isolate_lru_page()
来完成的。调用isolate_lru_page()增加了对该页的引用,这样在页面迁移发生时它就不会
消失。它还可以防止交换器或其他扫描器遇到该页。
2. 我们需要有一个new_page_t类型的函数,可以传递给migrate_pages()。这个函数应该计算
出如何在给定的旧页面中分配正确的新页面。
3. migrate_pages()函数被调用,它试图进行迁移。它将调用该函数为每个被考虑迁移的页面分
配新的页面。
migrate_pages()如何工作
=======================
migrate_pages()对它的页面列表进行了多次处理。如果当时对一个页面的所有引用都可以被移除,
那么这个页面就会被移动。该页已经通过isolate_lru_page()从LRU中移除,并且refcount被
增加,以便在页面迁移发生时不释放该页。
步骤:
1. 锁定要迁移的页面。
2. 确保回写已经完成。
3. 锁定我们要迁移到的新页面。锁定它是为了在迁移过程中立即阻止对这个(尚未更新的)页面的
访问。
4. 所有对该页的页表引用都被转换为迁移条目。这就减少了一个页面的mapcount。如果产生的
mapcount不是零,那么我们就不迁移该页。所有试图访问该页的用户空间进程现在将等待页
面锁或者等待迁移页表项被移除。
5. i_pages的锁被持有。这将导致所有试图通过映射访问该页的进程在自旋锁上阻塞。
6. 检查该页的Refcount,如果还有引用,我们就退出。否则,我们知道我们是唯一引用这个页
面的人。
7. 检查基数树,如果它不包含指向这个页面的指针,那么我们就退出,因为其他人修改了基数树。
8. 新的页面要用旧的页面的一些设置进行预处理,这样访问新的页面就会发现一个具有正确设置
的页面。
9. 基数树被改变以指向新的页面。
10. 旧页的引用计数被删除,因为地址空间的引用已经消失。对新页的引用被建立,因为新页被
地址空间引用。
11. i_pages锁被放弃。这样一来,在映射中的查找又变得可能了。进程将从在锁上自旋到在
被锁的新页上睡眠。
12. 页面内容被复制到新的页面上。
13. 剩余的页面标志被复制到新的页面上。
14. 旧的页面标志被清除,以表明该页面不再提供任何信息。
15. 新页面上的回写队列被触发了。
16. 如果迁移条目被插入到页表中,那么就用真正的ptes替换它们。这样做将使那些尚未等待页
锁的用户空间进程能够访问。
17. 页面锁从新旧页面上被撤销。等待页锁的进程将重做他们的缺页异常,并将到达新的页面。
18. 新的页面被移到LRU中,可以被交换器等再次扫描。
非LRU页面迁移
=============
尽管迁移最初的目的是为了减少NUMA的内存访问延迟,但压缩也使用迁移来创建高阶页面。
目前实现的问题是,它被设计为只迁移*LRU*页。然而,有一些潜在的非LRU页面可以在驱动中
被迁移,例如,zsmalloc,virtio-balloon页面。
对于virtio-balloon页面,迁移代码路径的某些部分已经被钩住,并添加了virtio-balloon
的特定函数来拦截迁移逻辑。这对一个驱动来说太特殊了,所以其他想让自己的页面可移动的驱
动就必须在迁移路径中添加自己的特定钩子。
为了克服这个问题,VM支持非LRU页面迁移,它为非LRU可移动页面提供了通用函数,而在迁移
路径中没有特定的驱动程序钩子。
如果一个驱动程序想让它的页面可移动,它应该定义三个函数,这些函数是
struct address_space_operations的函数指针。
1. ``bool (*isolate_page) (struct page *page, isolate_mode_t mode);``
VM对驱动的isolate_page()函数的期望是,如果驱动成功隔离了该页,则返回*true*。
返回true后,VM会将该页标记为PG_isolated,这样多个CPU的并发隔离就会跳过该
页进行隔离。如果驱动程序不能隔离该页,它应该返回*false*。
一旦页面被成功隔离,VM就会使用page.lru字段,因此驱动程序不应期望保留这些字段的值。
2. ``int (*migratepage) (struct address_space *mapping,``
| ``struct page *newpage, struct page *oldpage, enum migrate_mode);``
隔离后,虚拟机用隔离的页面调用驱动的migratepage()。migratepage()的功能是将旧页
的内容移动到新页,并设置struct page newpage的字段。请记住,如果你成功迁移了旧页
并返回MIGRATEPAGE_SUCCESS,你应该通过page_lock下的__ClearPageMovable()向虚
拟机表明旧页不再可移动。如果驱动暂时不能迁移该页,驱动可以返回-EAGAIN。在-EAGAIN
时,VM会在短时间内重试页面迁移,因为VM将-EAGAIN理解为 "临时迁移失败"。在返回除
-EAGAIN以外的任何错误时,VM将放弃页面迁移而不重试。
在migratepage()函数中,驱动程序不应该接触page.lru字段。
3. ``void (*putback_page)(struct page *);``
如果在隔离页上迁移失败,VM应该将隔离页返回给驱动,因此VM用隔离页调用驱动的
putback_page()。在这个函数中,驱动应该把隔离页放回自己的数据结构中。
非LRU可移动页标志
有两个页面标志用于支持非LRU可移动页面。
* PG_movable
驱动应该使用下面的函数来使页面在page_lock下可移动。::
void __SetPageMovable(struct page *page, struct address_space *mapping)
它需要address_space的参数来注册将被VM调用的migration family函数。确切地说,
PG_movable不是struct page的一个真正的标志。相反,VM复用了page->mapping的低
位来表示它::
#define PAGE_MAPPING_MOVABLE 0x2
page->mapping = page->mapping | PAGE_MAPPING_MOVABLE;
所以驱动不应该直接访问page->mapping。相反,驱动应该使用page_mapping(),它可
以在页面锁下屏蔽掉page->mapping的低2位,从而获得正确的struct address_space。
对于非LRU可移动页面的测试,VM支持__PageMovable()函数。然而,它并不能保证识别
非LRU可移动页面,因为page->mapping字段与struct page中的其他变量是统一的。如
果驱动程序在被虚拟机隔离后释放了页面,尽管page->mapping设置了PAGE_MAPPING_MOVABLE,
但它并没有一个稳定的值(看看__ClearPageMovable)。但是__PageMovable()在页
面被隔离后,无论页面是LRU还是非LRU可移动的,调用它开销都很低,因为LRU页面在
page->mapping中不可能有PAGE_MAPPING_MOVABLE设置。在用pfn扫描中的lock_page()
进行更大开销的检查来选择受害者之前,它也很适合只是瞥一眼来测试非LRU可移动的页面。
为了保证非LRU的可移动页面,VM提供了PageMovable()函数。与__PageMovable()不
同,PageMovable()在lock_page()下验证page->mapping和
mapping->a_ops->isolate_page。lock_page()可以防止突然破坏page->mapping。
使用__SetPageMovable()的驱动应该在释放页面之前通过page_lock()下的
__ClearMovablePage()清除该标志。
* PG_isolated
为了防止几个CPU同时进行隔离,VM在lock_page()下将隔离的页面标记为PG_isolated。
因此,如果一个CPU遇到PG_isolated非LRU可移动页面,它可以跳过它。驱动程序不需要
操作这个标志,因为VM会自动设置/清除它。请记住,如果驱动程序看到PG_isolated页,
这意味着该页已经被VM隔离,所以它不应该碰page.lru字段。PG_isolated标志与
PG_reclaim标志是同义的,所以驱动程序不应该为自己的目的使用PG_isolated。
监测迁移
========
以下事件(计数器)可用于监控页面迁移。
1. PGMIGRATE_SUCCESS: 正常的页面迁移成功。每个计数器意味着一个页面被迁移了。如果该
页是一个非THP和非hugetlb页,那么这个计数器会增加1。如果该页面是一个THP或hugetlb
页面,那么这个计数器会随着THP或hugetlb子页面的数量而增加。例如,迁移一个有4KB大小
的基础页(子页)的2MB THP,将导致这个计数器增加512。
2. PGMIGRATE_FAIL: 正常的页面迁移失败。与上面PGMIGRATE_SUCCESS的计数规则相同:如
果是THP或hugetlb,这个计数将被子页的数量增加。
3. THP_MIGRATION_SUCCESS: 一个THP被迁移而没有被分割。
4. THP_MIGRATION_FAIL: 一个THP不能被迁移,也不能被分割。
5. THP_MIGRATION_SPLIT: 一个THP被迁移了,但不是这样的:首先,这个THP必须被分割。
在拆分之后,对它的子页面进行了迁移重试。
THP_MIGRATION_* 事件也会更新相应的PGMIGRATE_SUCCESS或PGMIGRATE_FAIL事件。
例如,一个THP迁移失败将导致THP_MIGRATION_FAIL和PGMIGRATE_FAIL增加。
Christoph Lameter,2006年5月8日。
Minchan Kim,2016年3月28日。
......@@ -96,21 +96,82 @@ page owner在默认情况下是禁用的。所以,如果你想使用它,你
默认情况下, ``page_owner_sort`` 是根据buf的时间来排序的。如果你想
按buf的页数排序,请使用-m参数。详细的参数是:
基本函数:
基本函数::
Sort:
排序:
-a 按内存分配时间排序
-m 按总内存排序
-p 按pid排序。
-P 按tgid排序。
-n 按任务命令名称排序。
-r 按内存释放时间排序。
-s 按堆栈跟踪排序。
-t 按时间排序(默认)。
其它函数:
Cull:
-c 通过比较堆栈跟踪而不是总块来进行剔除。
Filter:
--sort <order> 指定排序顺序。排序的语法是[+|-]key[,[+|-]key[,...]]。从
**标准格式指定器**那一节选择一个键。"+"是可选的,因为默认的方向是数字或
词法的增加。允许混合使用缩写和完整格式的键。
例子:
./page_owner_sort <input> <output> --sort=n,+pid,-tgid
./page_owner_sort <input> <output> --sort=at
其它函数::
剔除:
--cull <rules>
指定剔除规则。剔除的语法是key[,key[,...]]。从**标准格式指定器**
部分选择一个多字母键。
<rules>是一个以逗号分隔的列表形式的单一参数,它提供了一种指定单个剔除规则的
方法。 识别的关键字在下面的**标准格式指定器**部分有描述。<规则>可以通过键的
序列k1,k2,...来指定,在下面的标准排序键部分有描述。允许混合使用简写和完整形
式的键。
Examples:
./page_owner_sort <input> <output> --cull=stacktrace
./page_owner_sort <input> <output> --cull=st,pid,name
./page_owner_sort <input> <output> --cull=n,f
过滤:
-f 过滤掉内存已被释放的块的信息。
选择:
--pid <pidlist> 按pid选择。这将选择进程ID号出现在<pidlist>中的块。
--tgid <tgidlist> 按tgid选择。这将选择其线程组ID号出现在<tgidlist>
中的块。
--name <cmdlist> 按任务命令名称选择。这将选择其任务命令名称出现在
<cmdlist>中的区块。
<pidlist>, <tgidlist>, <cmdlist>是以逗号分隔的列表形式的单个参数,
它提供了一种指定单个选择规则的方法。
例子:
./page_owner_sort <input> <output> --pid=1
./page_owner_sort <input> <output> --tgid=1,2,3
./page_owner_sort <input> <output> --name name1,name2
标准格式指定器
==============
::
--sort的选项:
短键 长键 描述
p pid 进程ID
tg tgid 线程组ID
n name 任务命令名称
st stacktrace 页面分配的堆栈跟踪
T txt 块的全文
ft free_ts 页面释放时的时间戳
at alloc_ts 页面被分配时的时间戳
ator allocator 页面的内存分配器
--curl的选项:
短键 长键 描述
p pid 进程ID
tg tgid 线程组ID
n name 任务命令名称
f free 该页是否已经释放
st stacktrace 页面分配的堆栈跟踪
ator allocator 页面的内存分配器
.. SPDX-License-Identifier: GPL-2.0
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/vm/vmalloced-kernel-stacks.rst
:翻译:
司延腾 Yanteng Si <siyanteng@loongson.cn>
:校译:
====================
支持虚拟映射的内核栈
====================
:作者: Shuah Khan <skhan@linuxfoundation.org>
.. contents:: :local:
概览
----
这是介绍 `虚拟映射内核栈功能 <https://lwn.net/Articles/694348/>` 的代码
和原始补丁系列的信息汇总。
简介
----
内核堆栈溢出通常难以调试,并使内核容易被(恶意)利用。问题可能在稍后的时间出现,使其难以
隔离和究其根本原因。
带有保护页的虚拟映射内核堆栈如果溢出,会被立即捕获,而不会放任其导致难以诊断的损
坏。
HAVE_ARCH_VMAP_STACK和VMAP_STACK配置选项能够支持带有保护页的虚拟映射堆栈。
当堆栈溢出时,这个特性会引发可靠的异常。溢出后堆栈跟踪的可用性以及对溢出本身的
响应取决于架构。
.. note::
截至本文撰写时, arm64, powerpc, riscv, s390, um, 和 x86 支持VMAP_STACK。
HAVE_ARCH_VMAP_STACK
--------------------
能够支持虚拟映射内核栈的架构应该启用这个bool配置选项。要求是:
- vmalloc空间必须大到足以容纳许多内核堆栈。这可能排除了许多32位架构。
- vmalloc空间的堆栈需要可靠地工作。例如,如果vmap页表是按需创建的,当堆栈指向
具有未填充页表的虚拟地址时,这种机制需要工作,或者架构代码(switch_to()和
switch_mm(),很可能)需要确保堆栈的页表项在可能未填充的堆栈上运行之前已经填
充。
- 如果堆栈溢出到一个保护页,就应该发生一些合理的事情。“合理”的定义是灵活的,但
在没有记录任何东西的情况下立即重启是不友好的。
VMAP_STACK
----------
VMAP_STACK bool配置选项在启用时分配虚拟映射的任务栈。这个选项依赖于
HAVE_ARCH_VMAP_STACK。
- 如果你想使用带有保护页的虚拟映射的内核堆栈,请启用该选项。这将导致内核栈溢出
被立即捕获,而不是难以诊断的损坏。
.. note::
使用KASAN的这个功能需要架构支持用真实的影子内存来支持虚拟映射,并且
必须启用KASAN_VMALLOC。
.. note::
启用VMAP_STACK时,无法在堆栈分配的数据上运行DMA。
内核配置选项和依赖性不断变化。请参考最新的代码库:
`Kconfig <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/Kconfig>`
分配方法
--------
当一个新的内核线程被创建时,线程堆栈是由页级分配器分配的虚拟连续的内存页组成。这
些页面被映射到有PAGE_KERNEL保护的连续的内核虚拟空间。
alloc_thread_stack_node()调用__vmalloc_node_range()来分配带有PAGE_KERNEL
保护的栈。
- 分配的堆栈被缓存起来,以后会被新的线程重用,所以在分配/释放堆栈给任务时,要手动
进行memcg核算。因此,__vmalloc_node_range被调用时没有__GFP_ACCOUNT。
- vm_struct被缓存起来,以便能够找到在中断上下文中启动的空闲线程。 free_thread_stack()
可以在中断上下文中调用。
- 在arm64上,所有VMAP的堆栈都需要有相同的对齐方式,以确保VMAP的堆栈溢出检测正常
工作。架构特定的vmap堆栈分配器照顾到了这个细节。
- 这并不涉及中断堆栈--参考原始补丁
线程栈分配是由clone()、fork()、vfork()、kernel_thread()通过kernel_clone()
启动的。留点提示在这,以便搜索代码库,了解线程栈何时以及如何分配。
大量的代码是在:
`kernel/fork.c <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/fork.c>`.
task_struct中的stack_vm_area指针可以跟踪虚拟分配的堆栈,一个非空的stack_vm_area
指针可以表明虚拟映射的内核堆栈已经启用。
::
struct vm_struct *stack_vm_area;
堆栈溢出处理
------------
前守护页和后守护页有助于检测堆栈溢出。当堆栈溢出到守护页时,处理程序必须小心不要再
次溢出堆栈。当处理程序被调用时,很可能只留下很少的堆栈空间。
在x86上,这是通过处理表明内核堆栈溢出的双异常堆栈的缺页异常来实现的。
用守护页测试VMAP分配
--------------------
我们如何确保VMAP_STACK在分配时确实有前守护页和后守护页的保护?下面的 lkdtm 测试
可以帮助检测任何回归。
::
void lkdtm_STACK_GUARD_PAGE_LEADING()
void lkdtm_STACK_GUARD_PAGE_TRAILING()
结论
----
- vmalloced堆栈的percpu缓存似乎比高阶堆栈分配要快一些,至少在缓存命中时是这样。
- THREAD_INFO_IN_TASK完全摆脱了arch-specific thread_info,并简单地将
thread_info(仅包含标志)和'int cpu'嵌入task_struct中。
- 一旦任务死亡,线程栈就可以被释放(无需等待RCU),然后,如果使用vmapped栈,就
可以将整个栈缓存起来,以便在同一cpu上重复使用。
:Original: Documentation/vm/zs_malloc.rst
:Original: Documentation/vm/zsmalloc.rst
:翻译:
......
......@@ -22,8 +22,7 @@
內核開發社區已經發展出一套用於發布補丁的約定和過程;遵循這些約定和過程將使
參與其中的每個人的生活更加輕鬆。本文檔試圖描述這些約定的部分細節;更多信息
也可在以下文檔中找到
:ref:`Documentation/translations/zh_TW/process/submitting-patches.rst <tw_submittingpatches>`,
:ref:`Documentation/translations/zh_TW/process/submitting-drivers.rst <tw_submittingdrivers>`
:ref:`Documentation/translations/zh_TW/process/submitting-patches.rst <tw_submittingpatches>`
和 :ref:`Documentation/translations/zh_TW/process/submit-checklist.rst <tw_submitchecklist>`。
何時郵寄
......
......@@ -22,7 +22,6 @@
:ref:`Documentation/translations/zh_TW/process/howto.rst <tw_process_howto>`
文件是一個重要的起點;
:ref:`Documentation/translations/zh_TW/process/submitting-patches.rst <tw_submittingpatches>`
和 :ref:`Documentation/translations/zh_TW/process/submitting-drivers.rst <tw_submittingdrivers>`
也是所有內核開發人員都應該閱讀的內容。許多內部內核API都是使用kerneldoc機制
記錄的;「make htmldocs」或「make pdfdocs」可用於以HTML或PDF格式生成這些文檔
(儘管某些發行版提供的tex版本會遇到內部限制,無法正確處理文檔)。
......
......@@ -99,7 +99,6 @@ Linux內核代碼中包含有大量的文檔。這些文檔對於學習如何與
的代碼。
:ref:`Documentation/translations/zh_TW/process/submitting-patches.rst <tw_submittingpatches>`
:ref:`Documentation/process/submitting-drivers.rst <submittingdrivers>`
這兩份文檔明確描述如何創建和發送補丁,其中包括(但不僅限於):
- 郵件內容
......
......@@ -43,7 +43,6 @@
.. toctree::
:maxdepth: 1
submitting-drivers
submit-checklist
stable-api-nonsense
stable-kernel-rules
......
.. SPDX-License-Identifier: GPL-2.0
.. _tw_submittingdrivers:
.. include:: ../disclaimer-zh_TW.rst
:Original: :ref:`Documentation/process/submitting-drivers.rst
<submittingdrivers>`
如果想評論或更新本文的內容,請直接聯繫原文檔的維護者。如果你使用英文
交流有困難的話,也可以向中文版維護者求助。如果本翻譯更新不及時或者翻
譯存在問題,請聯繫中文版維護者::
中文版維護者: 李陽 Li Yang <leoyang.li@nxp.com>
中文版翻譯者: 李陽 Li Yang <leoyang.li@nxp.com>
中文版校譯者: 陳琦 Maggie Chen <chenqi@beyondsoft.com>
王聰 Wang Cong <xiyou.wangcong@gmail.com>
張巍 Zhang Wei <wezhang@outlook.com>
胡皓文 Hu Haowen <src.res@email.cn>
如何向 Linux 內核提交驅動程序
=============================
這篇文檔將會解釋如何向不同的內核源碼樹提交設備驅動程序。請注意,如果你感
興趣的是顯卡驅動程序,你也許應該訪問 XFree86 項目(https://www.xfree86.org/)
和/或 X.org 項目 (https://x.org)。
另請參閱 Documentation/translations/zh_TW/process/submitting-patches.rst 文檔。
分配設備號
----------
塊設備和字符設備的主設備號與從設備號是由 Linux 命名編號分配權威 LANANA(
現在是 Torben Mathiasen)負責分配。申請的網址是 https://www.lanana.org/。
即使不準備提交到主流內核的設備驅動也需要在這裡分配設備號。有關詳細信息,
請參閱 Documentation/admin-guide/devices.rst。
如果你使用的不是已經分配的設備號,那麼當你提交設備驅動的時候,它將會被強
制分配一個新的設備號,即便這個設備號和你之前發給客戶的截然不同。
設備驅動的提交對象
------------------
Linux 2.0:
此內核源碼樹不接受新的驅動程序。
Linux 2.2:
此內核源碼樹不接受新的驅動程序。
Linux 2.4:
如果所屬的代碼領域在內核的 MAINTAINERS 文件中列有一個總維護者,
那麼請將驅動程序提交給他。如果此維護者沒有回應或者你找不到恰當的
維護者,那麼請聯繫 Willy Tarreau <w@1wt.eu>。
Linux 2.6:
除了遵循和 2.4 版內核同樣的規則外,你還需要在 linux-kernel 郵件
列表上跟蹤最新的 API 變化。向 Linux 2.6 內核提交驅動的頂級聯繫人
是 Andrew Morton <akpm@linux-foundation.org>。
決定設備驅動能否被接受的條件
----------------------------
許可: 代碼必須使用 GNU 通用公開許可證 (GPL) 提交給 Linux,但是
我們並不要求 GPL 是唯一的許可。你或許會希望同時使用多種
許可證發布,如果希望驅動程序可以被其他開源社區(比如BSD)
使用。請參考 include/linux/module.h 文件中所列出的可被
接受共存的許可。
版權: 版權所有者必須同意使用 GPL 許可。最好提交者和版權所有者
是相同個人或實體。否則,必需列出授權使用 GPL 的版權所有
人或實體,以備驗證之需。
接口: 如果你的驅動程序使用現成的接口並且和其他同類的驅動程序行
爲相似,而不是去發明無謂的新接口,那麼它將會更容易被接受。
如果你需要一個 Linux 和 NT 的通用驅動接口,那麼請在用
戶空間實現它。
代碼: 請使用 Documentation/process/coding-style.rst 中所描述的 Linux 代碼風
格。如果你的某些代碼段(例如那些與 Windows 驅動程序包共
享的代碼段)需要使用其他格式,而你卻只希望維護一份代碼,
那麼請將它們很好地區分出來,並且註明原因。
可移植性: 請注意,指針並不永遠是 32 位的,不是所有的計算機都使用小
尾模式 (little endian) 存儲數據,不是所有的人都擁有浮點
單元,不要隨便在你的驅動程序里嵌入 x86 彙編指令。只能在
x86 上運行的驅動程序一般是不受歡迎的。雖然你可能只有 x86
硬體,很難測試驅動程序在其他平台上是否可用,但是確保代碼
可以被輕鬆地移植卻是很簡單的。
清晰度: 做到所有人都能修補這個驅動程序將會很有好處,因爲這樣你將
會直接收到修復的補丁而不是 bug 報告。如果你提交一個試圖
隱藏硬體工作機理的驅動程序,那麼它將會被扔進廢紙簍。
電源管理: 因爲 Linux 正在被很多行動裝置和桌面系統使用,所以你的驅
動程序也很有可能被使用在這些設備上。它應該支持最基本的電
源管理,即在需要的情況下實現系統級休眠和喚醒要用到的
.suspend 和 .resume 函數。你應該檢查你的驅動程序是否能正
確地處理休眠與喚醒,如果實在無法確認,請至少把 .suspend
函數定義成返回 -ENOSYS(功能未實現)錯誤。你還應該嘗試確
保你的驅動在什麼都不乾的情況下將耗電降到最低。要獲得驅動
程序測試的指導,請參閱
Documentation/power/drivers-testing.rst。有關驅動程序電
源管理問題相對全面的概述,請參閱
Documentation/driver-api/pm/devices.rst。
管理: 如果一個驅動程序的作者還在進行有效的維護,那麼通常除了那
些明顯正確且不需要任何檢查的補丁以外,其他所有的補丁都會
被轉發給作者。如果你希望成爲驅動程序的聯繫人和更新者,最
好在代碼注釋中寫明並且在 MAINTAINERS 文件中加入這個驅動
程序的條目。
不影響設備驅動能否被接受的條件
------------------------------
供應商: 由硬體供應商來維護驅動程序通常是一件好事。不過,如果源碼
樹里已經有其他人提供了可穩定工作的驅動程序,那麼請不要期
望「我是供應商」會成爲內核改用你的驅動程序的理由。理想的情
況是:供應商與現有驅動程序的作者合作,構建一個統一完美的
驅動程序。
作者: 驅動程序是由大的 Linux 公司研發還是由你個人編寫,並不影
響其是否能被內核接受。沒有人對內核源碼樹享有特權。只要你
充分了解內核社區,你就會發現這一點。
資源列表
--------
Linux 內核主源碼樹:
ftp.??.kernel.org:/pub/linux/kernel/...
?? == 你的國家代碼,例如 "cn"、"us"、"uk"、"fr" 等等
Linux 內核郵件列表:
linux-kernel@vger.kernel.org
[可通過向majordomo@vger.kernel.org發郵件來訂閱]
Linux 設備驅動程序,第三版(探討 2.6.10 版內核):
https://lwn.net/Kernel/LDD3/ (免費版)
LWN.net:
每周內核開發活動摘要 - https://lwn.net/
2.6 版中 API 的變更:
https://lwn.net/Articles/2.6-kernel-api/
將舊版內核的驅動程序移植到 2.6 版:
https://lwn.net/Articles/driver-porting/
內核新手(KernelNewbies):
爲新的內核開發者提供文檔和幫助
https://kernelnewbies.org/
Linux USB項目:
http://www.linux-usb.org/
寫內核驅動的「不要」(Arjan van de Ven著):
http://www.fenrus.org/how-to-not-write-a-device-driver-paper.pdf
內核清潔工 (Kernel Janitor):
https://kernelnewbies.org/KernelJanitors
......@@ -26,9 +26,7 @@
以下文檔含有大量簡潔的建議, 具體請見:
:ref:`Documentation/process <development_process_main>`
同樣,:ref:`Documentation/translations/zh_TW/process/submit-checklist.rst <tw_submitchecklist>`
給出在提交代碼前需要檢查的項目的列表。如果你在提交一個驅動程序,那麼
同時閱讀一下:
:ref:`Documentation/process/submitting-drivers.rst <submittingdrivers>`
給出在提交代碼前需要檢查的項目的列表。
其中許多步驟描述了Git版本控制系統的默認行爲;如果您使用Git來準備補丁,
您將發現它爲您完成的大部分機械工作,儘管您仍然需要準備和記錄一組合理的
......
.. SPDX-License-Identifier: GPL-2.0
Clocks and Timers
=================
arm64
-----
On arm64, Hyper-V virtualizes the ARMv8 architectural system counter
and timer. Guest VMs use this virtualized hardware as the Linux
clocksource and clockevents via the standard arm_arch_timer.c
driver, just as they would on bare metal. Linux vDSO support for the
architectural system counter is functional in guest VMs on Hyper-V.
While Hyper-V also provides a synthetic system clock and four synthetic
per-CPU timers as described in the TLFS, they are not used by the
Linux kernel in a Hyper-V guest on arm64. However, older versions
of Hyper-V for arm64 only partially virtualize the ARMv8
architectural timer, such that the timer does not generate
interrupts in the VM. Because of this limitation, running current
Linux kernel versions on these older Hyper-V versions requires an
out-of-tree patch to use the Hyper-V synthetic clocks/timers instead.
x86/x64
-------
On x86/x64, Hyper-V provides guest VMs with a synthetic system clock
and four synthetic per-CPU timers as described in the TLFS. Hyper-V
also provides access to the virtualized TSC via the RDTSC and
related instructions. These TSC instructions do not trap to
the hypervisor and so provide excellent performance in a VM.
Hyper-V performs TSC calibration, and provides the TSC frequency
to the guest VM via a synthetic MSR. Hyper-V initialization code
in Linux reads this MSR to get the frequency, so it skips TSC
calibration and sets tsc_reliable. Hyper-V provides virtualized
versions of the PIT (in Hyper-V Generation 1 VMs only), local
APIC timer, and RTC. Hyper-V does not provide a virtualized HPET in
guest VMs.
The Hyper-V synthetic system clock can be read via a synthetic MSR,
but this access traps to the hypervisor. As a faster alternative,
the guest can configure a memory page to be shared between the guest
and the hypervisor. Hyper-V populates this memory page with a
64-bit scale value and offset value. To read the synthetic clock
value, the guest reads the TSC and then applies the scale and offset
as described in the Hyper-V TLFS. The resulting value advances
at a constant 10 MHz frequency. In the case of a live migration
to a host with a different TSC frequency, Hyper-V adjusts the
scale and offset values in the shared page so that the 10 MHz
frequency is maintained.
Starting with Windows Server 2022 Hyper-V, Hyper-V uses hardware
support for TSC frequency scaling to enable live migration of VMs
across Hyper-V hosts where the TSC frequency may be different.
When a Linux guest detects that this Hyper-V functionality is
available, it prefers to use Linux's standard TSC-based clocksource.
Otherwise, it uses the clocksource for the Hyper-V synthetic system
clock implemented via the shared page (identified as
"hyperv_clocksource_tsc_page").
The Hyper-V synthetic system clock is available to user space via
vDSO, and gettimeofday() and related system calls can execute
entirely in user space. The vDSO is implemented by mapping the
shared page with scale and offset values into user space. User
space code performs the same algorithm of reading the TSC and
appying the scale and offset to get the constant 10 MHz clock.
Linux clockevents are based on Hyper-V synthetic timer 0. While
Hyper-V offers 4 synthetic timers for each CPU, Linux only uses
timer 0. Interrupts from stimer0 are recorded on the "HVS" line in
/proc/interrupts. Clockevents based on the virtualized PIT and
local APIC timer also work, but the Hyper-V synthetic timer is
preferred.
The driver for the Hyper-V synthetic system clock and timers is
drivers/clocksource/hyperv_timer.c.
.. SPDX-License-Identifier: GPL-2.0
======================
Hyper-V Enlightenments
======================
.. toctree::
:maxdepth: 1
overview
vmbus
clocks
.. SPDX-License-Identifier: GPL-2.0
Overview
========
The Linux kernel contains a variety of code for running as a fully
enlightened guest on Microsoft's Hyper-V hypervisor. Hyper-V
consists primarily of a bare-metal hypervisor plus a virtual machine
management service running in the parent partition (roughly
equivalent to KVM and QEMU, for example). Guest VMs run in child
partitions. In this documentation, references to Hyper-V usually
encompass both the hypervisor and the VMM service without making a
distinction about which functionality is provided by which
component.
Hyper-V runs on x86/x64 and arm64 architectures, and Linux guests
are supported on both. The functionality and behavior of Hyper-V is
generally the same on both architectures unless noted otherwise.
Linux Guest Communication with Hyper-V
--------------------------------------
Linux guests communicate with Hyper-V in four different ways:
* Implicit traps: As defined by the x86/x64 or arm64 architecture,
some guest actions trap to Hyper-V. Hyper-V emulates the action and
returns control to the guest. This behavior is generally invisible
to the Linux kernel.
* Explicit hypercalls: Linux makes an explicit function call to
Hyper-V, passing parameters. Hyper-V performs the requested action
and returns control to the caller. Parameters are passed in
processor registers or in memory shared between the Linux guest and
Hyper-V. On x86/x64, hypercalls use a Hyper-V specific calling
sequence. On arm64, hypercalls use the ARM standard SMCCC calling
sequence.
* Synthetic register access: Hyper-V implements a variety of
synthetic registers. On x86/x64 these registers appear as MSRs in
the guest, and the Linux kernel can read or write these MSRs using
the normal mechanisms defined by the x86/x64 architecture. On
arm64, these synthetic registers must be accessed using explicit
hypercalls.
* VMbus: VMbus is a higher-level software construct that is built on
the other 3 mechanisms. It is a message passing interface between
the Hyper-V host and the Linux guest. It uses memory that is shared
between Hyper-V and the guest, along with various signaling
mechanisms.
The first three communication mechanisms are documented in the
`Hyper-V Top Level Functional Spec (TLFS)`_. The TLFS describes
general Hyper-V functionality and provides details on the hypercalls
and synthetic registers. The TLFS is currently written for the
x86/x64 architecture only.
.. _Hyper-V Top Level Functional Spec (TLFS): https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/tlfs
VMbus is not documented. This documentation provides a high-level
overview of VMbus and how it works, but the details can be discerned
only from the code.
Sharing Memory
--------------
Many aspects are communication between Hyper-V and Linux are based
on sharing memory. Such sharing is generally accomplished as
follows:
* Linux allocates memory from its physical address space using
standard Linux mechanisms.
* Linux tells Hyper-V the guest physical address (GPA) of the
allocated memory. Many shared areas are kept to 1 page so that a
single GPA is sufficient. Larger shared areas require a list of
GPAs, which usually do not need to be contiguous in the guest
physical address space. How Hyper-V is told about the GPA or list
of GPAs varies. In some cases, a single GPA is written to a
synthetic register. In other cases, a GPA or list of GPAs is sent
in a VMbus message.
* Hyper-V translates the GPAs into "real" physical memory addresses,
and creates a virtual mapping that it can use to access the memory.
* Linux can later revoke sharing it has previously established by
telling Hyper-V to set the shared GPA to zero.
Hyper-V operates with a page size of 4 Kbytes. GPAs communicated to
Hyper-V may be in the form of page numbers, and always describe a
range of 4 Kbytes. Since the Linux guest page size on x86/x64 is
also 4 Kbytes, the mapping from guest page to Hyper-V page is 1-to-1.
On arm64, Hyper-V supports guests with 4/16/64 Kbyte pages as
defined by the arm64 architecture. If Linux is using 16 or 64
Kbyte pages, Linux code must be careful to communicate with Hyper-V
only in terms of 4 Kbyte pages. HV_HYP_PAGE_SIZE and related macros
are used in code that communicates with Hyper-V so that it works
correctly in all configurations.
As described in the TLFS, a few memory pages shared between Hyper-V
and the Linux guest are "overlay" pages. With overlay pages, Linux
uses the usual approach of allocating guest memory and telling
Hyper-V the GPA of the allocated memory. But Hyper-V then replaces
that physical memory page with a page it has allocated, and the
original physical memory page is no longer accessible in the guest
VM. Linux may access the memory normally as if it were the memory
that it originally allocated. The "overlay" behavior is visible
only because the contents of the page (as seen by Linux) change at
the time that Linux originally establishes the sharing and the
overlay page is inserted. Similarly, the contents change if Linux
revokes the sharing, in which case Hyper-V removes the overlay page,
and the guest page originally allocated by Linux becomes visible
again.
Before Linux does a kexec to a kdump kernel or any other kernel,
memory shared with Hyper-V should be revoked. Hyper-V could modify
a shared page or remove an overlay page after the new kernel is
using the page for a different purpose, corrupting the new kernel.
Hyper-V does not provide a single "set everything" operation to
guest VMs, so Linux code must individually revoke all sharing before
doing kexec. See hv_kexec_handler() and hv_crash_handler(). But
the crash/panic path still has holes in cleanup because some shared
pages are set using per-CPU synthetic registers and there's no
mechanism to revoke the shared pages for CPUs other than the CPU
running the panic path.
CPU Management
--------------
Hyper-V does not have a ability to hot-add or hot-remove a CPU
from a running VM. However, Windows Server 2019 Hyper-V and
earlier versions may provide guests with ACPI tables that indicate
more CPUs than are actually present in the VM. As is normal, Linux
treats these additional CPUs as potential hot-add CPUs, and reports
them as such even though Hyper-V will never actually hot-add them.
Starting in Windows Server 2022 Hyper-V, the ACPI tables reflect
only the CPUs actually present in the VM, so Linux does not report
any hot-add CPUs.
A Linux guest CPU may be taken offline using the normal Linux
mechanisms, provided no VMbus channel interrupts are assigned to
the CPU. See the section on VMbus Interrupts for more details
on how VMbus channel interrupts can be re-assigned to permit
taking a CPU offline.
32-bit and 64-bit
-----------------
On x86/x64, Hyper-V supports 32-bit and 64-bit guests, and Linux
will build and run in either version. While the 32-bit version is
expected to work, it is used rarely and may suffer from undetected
regressions.
On arm64, Hyper-V supports only 64-bit guests.
Endian-ness
-----------
All communication between Hyper-V and guest VMs uses Little-Endian
format on both x86/x64 and arm64. Big-endian format on arm64 is not
supported by Hyper-V, and Linux code does not use endian-ness macros
when accessing data shared with Hyper-V.
Versioning
----------
Current Linux kernels operate correctly with older versions of
Hyper-V back to Windows Server 2012 Hyper-V. Support for running
on the original Hyper-V release in Windows Server 2008/2008 R2
has been removed.
A Linux guest on Hyper-V outputs in dmesg the version of Hyper-V
it is running on. This version is in the form of a Windows build
number and is for display purposes only. Linux code does not
test this version number at runtime to determine available features
and functionality. Hyper-V indicates feature/function availability
via flags in synthetic MSRs that Hyper-V provides to the guest,
and the guest code tests these flags.
VMbus has its own protocol version that is negotiated during the
initial VMbus connection from the guest to Hyper-V. This version
number is also output to dmesg during boot. This version number
is checked in a few places in the code to determine if specific
functionality is present.
Furthermore, each synthetic device on VMbus also has a protocol
version that is separate from the VMbus protocol version. Device
drivers for these synthetic devices typically negotiate the device
protocol version, and may test that protocol version to determine
if specific device functionality is present.
Code Packaging
--------------
Hyper-V related code appears in the Linux kernel code tree in three
main areas:
1. drivers/hv
2. arch/x86/hyperv and arch/arm64/hyperv
3. individual device driver areas such as drivers/scsi, drivers/net,
drivers/clocksource, etc.
A few miscellaneous files appear elsewhere. See the full list under
"Hyper-V/Azure CORE AND DRIVERS" and "DRM DRIVER FOR HYPERV
SYNTHETIC VIDEO DEVICE" in the MAINTAINERS file.
The code in #1 and #2 is built only when CONFIG_HYPERV is set.
Similarly, the code for most Hyper-V related drivers is built only
when CONFIG_HYPERV is set.
Most Hyper-V related code in #1 and #3 can be built as a module.
The architecture specific code in #2 must be built-in. Also,
drivers/hv/hv_common.c is low-level code that is common across
architectures and must be built-in.
.. SPDX-License-Identifier: GPL-2.0
VMbus
=====
VMbus is a software construct provided by Hyper-V to guest VMs. It
consists of a control path and common facilities used by synthetic
devices that Hyper-V presents to guest VMs. The control path is
used to offer synthetic devices to the guest VM and, in some cases,
to rescind those devices. The common facilities include software
channels for communicating between the device driver in the guest VM
and the synthetic device implementation that is part of Hyper-V, and
signaling primitives to allow Hyper-V and the guest to interrupt
each other.
VMbus is modeled in Linux as a bus, with the expected /sys/bus/vmbus
entry in a running Linux guest. The VMbus driver (drivers/hv/vmbus_drv.c)
establishes the VMbus control path with the Hyper-V host, then
registers itself as a Linux bus driver. It implements the standard
bus functions for adding and removing devices to/from the bus.
Most synthetic devices offered by Hyper-V have a corresponding Linux
device driver. These devices include:
* SCSI controller
* NIC
* Graphics frame buffer
* Keyboard
* Mouse
* PCI device pass-thru
* Heartbeat
* Time Sync
* Shutdown
* Memory balloon
* Key/Value Pair (KVP) exchange with Hyper-V
* Hyper-V online backup (a.k.a. VSS)
Guest VMs may have multiple instances of the synthetic SCSI
controller, synthetic NIC, and PCI pass-thru devices. Other
synthetic devices are limited to a single instance per VM. Not
listed above are a small number of synthetic devices offered by
Hyper-V that are used only by Windows guests and for which Linux
does not have a driver.
Hyper-V uses the terms "VSP" and "VSC" in describing synthetic
devices. "VSP" refers to the Hyper-V code that implements a
particular synthetic device, while "VSC" refers to the driver for
the device in the guest VM. For example, the Linux driver for the
synthetic NIC is referred to as "netvsc" and the Linux driver for
the synthetic SCSI controller is "storvsc". These drivers contain
functions with names like "storvsc_connect_to_vsp".
VMbus channels
--------------
An instance of a synthetic device uses VMbus channels to communicate
between the VSP and the VSC. Channels are bi-directional and used
for passing messages. Most synthetic devices use a single channel,
but the synthetic SCSI controller and synthetic NIC may use multiple
channels to achieve higher performance and greater parallelism.
Each channel consists of two ring buffers. These are classic ring
buffers from a university data structures textbook. If the read
and writes pointers are equal, the ring buffer is considered to be
empty, so a full ring buffer always has at least one byte unused.
The "in" ring buffer is for messages from the Hyper-V host to the
guest, and the "out" ring buffer is for messages from the guest to
the Hyper-V host. In Linux, the "in" and "out" designations are as
viewed by the guest side. The ring buffers are memory that is
shared between the guest and the host, and they follow the standard
paradigm where the memory is allocated by the guest, with the list
of GPAs that make up the ring buffer communicated to the host. Each
ring buffer consists of a header page (4 Kbytes) with the read and
write indices and some control flags, followed by the memory for the
actual ring. The size of the ring is determined by the VSC in the
guest and is specific to each synthetic device. The list of GPAs
making up the ring is communicated to the Hyper-V host over the
VMbus control path as a GPA Descriptor List (GPADL). See function
vmbus_establish_gpadl().
Each ring buffer is mapped into contiguous Linux kernel virtual
space in three parts: 1) the 4 Kbyte header page, 2) the memory
that makes up the ring itself, and 3) a second mapping of the memory
that makes up the ring itself. Because (2) and (3) are contiguous
in kernel virtual space, the code that copies data to and from the
ring buffer need not be concerned with ring buffer wrap-around.
Once a copy operation has completed, the read or write index may
need to be reset to point back into the first mapping, but the
actual data copy does not need to be broken into two parts. This
approach also allows complex data structures to be easily accessed
directly in the ring without handling wrap-around.
On arm64 with page sizes > 4 Kbytes, the header page must still be
passed to Hyper-V as a 4 Kbyte area. But the memory for the actual
ring must be aligned to PAGE_SIZE and have a size that is a multiple
of PAGE_SIZE so that the duplicate mapping trick can be done. Hence
a portion of the header page is unused and not communicated to
Hyper-V. This case is handled by vmbus_establish_gpadl().
Hyper-V enforces a limit on the aggregate amount of guest memory
that can be shared with the host via GPADLs. This limit ensures
that a rogue guest can't force the consumption of excessive host
resources. For Windows Server 2019 and later, this limit is
approximately 1280 Mbytes. For versions prior to Windows Server
2019, the limit is approximately 384 Mbytes.
VMbus messages
--------------
All VMbus messages have a standard header that includes the message
length, the offset of the message payload, some flags, and a
transactionID. The portion of the message after the header is
unique to each VSP/VSC pair.
Messages follow one of two patterns:
* Unidirectional: Either side sends a message and does not
expect a response message
* Request/response: One side (usually the guest) sends a message
and expects a response
The transactionID (a.k.a. "requestID") is for matching requests &
responses. Some synthetic devices allow multiple requests to be in-
flight simultaneously, so the guest specifies a transactionID when
sending a request. Hyper-V sends back the same transactionID in the
matching response.
Messages passed between the VSP and VSC are control messages. For
example, a message sent from the storvsc driver might be "execute
this SCSI command". If a message also implies some data transfer
between the guest and the Hyper-V host, the actual data to be
transferred may be embedded with the control message, or it may be
specified as a separate data buffer that the Hyper-V host will
access as a DMA operation. The former case is used when the size of
the data is small and the cost of copying the data to and from the
ring buffer is minimal. For example, time sync messages from the
Hyper-V host to the guest contain the actual time value. When the
data is larger, a separate data buffer is used. In this case, the
control message contains a list of GPAs that describe the data
buffer. For example, the storvsc driver uses this approach to
specify the data buffers to/from which disk I/O is done.
Three functions exist to send VMbus messages:
1. vmbus_sendpacket(): Control-only messages and messages with
embedded data -- no GPAs
2. vmbus_sendpacket_pagebuffer(): Message with list of GPAs
identifying data to transfer. An offset and length is
associated with each GPA so that multiple discontinuous areas
of guest memory can be targeted.
3. vmbus_sendpacket_mpb_desc(): Message with list of GPAs
identifying data to transfer. A single offset and length is
associated with a list of GPAs. The GPAs must describe a
single logical area of guest memory to be targeted.
Historically, Linux guests have trusted Hyper-V to send well-formed
and valid messages, and Linux drivers for synthetic devices did not
fully validate messages. With the introduction of processor
technologies that fully encrypt guest memory and that allow the
guest to not trust the hypervisor (AMD SNP-SEV, Intel TDX), trusting
the Hyper-V host is no longer a valid assumption. The drivers for
VMbus synthetic devices are being updated to fully validate any
values read from memory that is shared with Hyper-V, which includes
messages from VMbus devices. To facilitate such validation,
messages read by the guest from the "in" ring buffer are copied to a
temporary buffer that is not shared with Hyper-V. Validation is
performed in this temporary buffer without the risk of Hyper-V
maliciously modifying the message after it is validated but before
it is used.
VMbus interrupts
----------------
VMbus provides a mechanism for the guest to interrupt the host when
the guest has queued new messages in a ring buffer. The host
expects that the guest will send an interrupt only when an "out"
ring buffer transitions from empty to non-empty. If the guest sends
interrupts at other times, the host deems such interrupts to be
unnecessary. If a guest sends an excessive number of unnecessary
interrupts, the host may throttle that guest by suspending its
execution for a few seconds to prevent a denial-of-service attack.
Similarly, the host will interrupt the guest when it sends a new
message on the VMbus control path, or when a VMbus channel "in" ring
buffer transitions from empty to non-empty. Each CPU in the guest
may receive VMbus interrupts, so they are best modeled as per-CPU
interrupts in Linux. This model works well on arm64 where a single
per-CPU IRQ is allocated for VMbus. Since x86/x64 lacks support for
per-CPU IRQs, an x86 interrupt vector is statically allocated (see
HYPERVISOR_CALLBACK_VECTOR) across all CPUs and explicitly coded to
call the VMbus interrupt service routine. These interrupts are
visible in /proc/interrupts on the "HYP" line.
The guest CPU that a VMbus channel will interrupt is selected by the
guest when the channel is created, and the host is informed of that
selection. VMbus devices are broadly grouped into two categories:
1. "Slow" devices that need only one VMbus channel. The devices
(such as keyboard, mouse, heartbeat, and timesync) generate
relatively few interrupts. Their VMbus channels are all
assigned to interrupt the VMBUS_CONNECT_CPU, which is always
CPU 0.
2. "High speed" devices that may use multiple VMbus channels for
higher parallelism and performance. These devices include the
synthetic SCSI controller and synthetic NIC. Their VMbus
channels interrupts are assigned to CPUs that are spread out
among the available CPUs in the VM so that interrupts on
multiple channels can be processed in parallel.
The assignment of VMbus channel interrupts to CPUs is done in the
function init_vp_index(). This assignment is done outside of the
normal Linux interrupt affinity mechanism, so the interrupts are
neither "unmanaged" nor "managed" interrupts.
The CPU that a VMbus channel will interrupt can be seen in
/sys/bus/vmbus/devices/<deviceGUID>/ channels/<channelRelID>/cpu.
When running on later versions of Hyper-V, the CPU can be changed
by writing a new value to this sysfs entry. Because the interrupt
assignment is done outside of the normal Linux affinity mechanism,
there are no entries in /proc/irq corresponding to individual
VMbus channel interrupts.
An online CPU in a Linux guest may not be taken offline if it has
VMbus channel interrupts assigned to it. Any such channel
interrupts must first be manually reassigned to another CPU as
described above. When no channel interrupts are assigned to the
CPU, it can be taken offline.
When a guest CPU receives a VMbus interrupt from the host, the
function vmbus_isr() handles the interrupt. It first checks for
channel interrupts by calling vmbus_chan_sched(), which looks at a
bitmap setup by the host to determine which channels have pending
interrupts on this CPU. If multiple channels have pending
interrupts for this CPU, they are processed sequentially. When all
channel interrupts have been processed, vmbus_isr() checks for and
processes any message received on the VMbus control path.
The VMbus channel interrupt handling code is designed to work
correctly even if an interrupt is received on a CPU other than the
CPU assigned to the channel. Specifically, the code does not use
CPU-based exclusion for correctness. In normal operation, Hyper-V
will interrupt the assigned CPU. But when the CPU assigned to a
channel is being changed via sysfs, the guest doesn't know exactly
when Hyper-V will make the transition. The code must work correctly
even if there is a time lag before Hyper-V starts interrupting the
new CPU. See comments in target_cpu_store().
VMbus device creation/deletion
------------------------------
Hyper-V and the Linux guest have a separate message-passing path
that is used for synthetic device creation and deletion. This
path does not use a VMbus channel. See vmbus_post_msg() and
vmbus_on_msg_dpc().
The first step is for the guest to connect to the generic
Hyper-V VMbus mechanism. As part of establishing this connection,
the guest and Hyper-V agree on a VMbus protocol version they will
use. This negotiation allows newer Linux kernels to run on older
Hyper-V versions, and vice versa.
The guest then tells Hyper-V to "send offers". Hyper-V sends an
offer message to the guest for each synthetic device that the VM
is configured to have. Each VMbus device type has a fixed GUID
known as the "class ID", and each VMbus device instance is also
identified by a GUID. The offer message from Hyper-V contains
both GUIDs to uniquely (within the VM) identify the device.
There is one offer message for each device instance, so a VM with
two synthetic NICs will get two offers messages with the NIC
class ID. The ordering of offer messages can vary from boot-to-boot
and must not be assumed to be consistent in Linux code. Offer
messages may also arrive long after Linux has initially booted
because Hyper-V supports adding devices, such as synthetic NICs,
to running VMs. A new offer message is processed by
vmbus_process_offer(), which indirectly invokes vmbus_add_channel_work().
Upon receipt of an offer message, the guest identifies the device
type based on the class ID, and invokes the correct driver to set up
the device. Driver/device matching is performed using the standard
Linux mechanism.
The device driver probe function opens the primary VMbus channel to
the corresponding VSP. It allocates guest memory for the channel
ring buffers and shares the ring buffer with the Hyper-V host by
giving the host a list of GPAs for the ring buffer memory. See
vmbus_establish_gpadl().
Once the ring buffer is set up, the device driver and VSP exchange
setup messages via the primary channel. These messages may include
negotiating the device protocol version to be used between the Linux
VSC and the VSP on the Hyper-V host. The setup messages may also
include creating additional VMbus channels, which are somewhat
mis-named as "sub-channels" since they are functionally
equivalent to the primary channel once they are created.
Finally, the device driver may create entries in /dev as with
any device driver.
The Hyper-V host can send a "rescind" message to the guest to
remove a device that was previously offered. Linux drivers must
handle such a rescind message at any time. Rescinding a device
invokes the device driver "remove" function to cleanly shut
down the device and remove it. Once a synthetic device is
rescinded, neither Hyper-V nor Linux retains any state about
its previous existence. Such a device might be re-added later,
in which case it is treated as an entirely new device. See
vmbus_onoffer_rescind().
......@@ -14,6 +14,7 @@ Linux Virtualization Support
ne_overview
acrn/index
coco/sev-guest
hyperv/index
.. only:: html and subproject
......
......@@ -4667,7 +4667,7 @@ encrypted VMs.
Currently, this ioctl is used for issuing Secure Encrypted Virtualization
(SEV) commands on AMD Processors. The SEV commands are defined in
Documentation/virt/kvm/amd-memory-encryption.rst.
Documentation/virt/kvm/x86/amd-memory-encryption.rst.
4.111 KVM_MEMORY_ENCRYPT_REG_REGION
-----------------------------------
......@@ -7679,7 +7679,7 @@ architecture-specific interfaces. This capability and the architecture-
specific interfaces must be consistent, i.e. if one says the feature
is supported, than the other should as well and vice versa. For arm64
see Documentation/virt/kvm/devices/vcpu.rst "KVM_ARM_VCPU_PVTIME_CTRL".
For x86 see Documentation/virt/kvm/msr.rst "MSR_KVM_STEAL_TIME".
For x86 see Documentation/virt/kvm/x86/msr.rst "MSR_KVM_STEAL_TIME".
8.25 KVM_CAP_S390_DIAG318
-------------------------
......
......@@ -10,7 +10,7 @@ The memory of Protected Virtual Machines (PVMs) is not accessible to
I/O or the hypervisor. In those cases where the hypervisor needs to
access the memory of a PVM, that memory must be made accessible.
Memory made accessible to the hypervisor will be encrypted. See
Documentation/virt/kvm/s390-pv.rst for details."
Documentation/virt/kvm/s390/s390-pv.rst for details."
On IPL (boot) a small plaintext bootloader is started, which provides
information about the encrypted components and necessary metadata to
......
......@@ -22,7 +22,7 @@ S390:
number in R1.
For further information on the S390 diagnose call as supported by KVM,
refer to Documentation/virt/kvm/s390-diag.rst.
refer to Documentation/virt/kvm/s390/s390-diag.rst.
PowerPC:
It uses R3-R10 and hypercall number in R11. R4-R11 are used as output registers.
......
......@@ -322,7 +322,7 @@ Shared Options
* ``v6=[0,1]`` to specify if a v6 connection is desired for all
transports which operate over IP. Additionally, for transports that
have some differences in the way they operate over v4 and v6 (for example
EoL2TPv3), sets the correct mode of operation. In the absense of this
EoL2TPv3), sets the correct mode of operation. In the absence of this
option, the socket type is determined based on what do the src and dst
arguments resolve/parse to.
......
.. _overcommit_accounting:
=====================
Overcommit Accounting
=====================
......
......@@ -140,7 +140,7 @@ Unwinder implementation details
Objtool generates the ORC data by integrating with the compile-time
stack metadata validation feature, which is described in detail in
tools/objtool/Documentation/stack-validation.txt. After analyzing all
tools/objtool/Documentation/objtool.txt. After analyzing all
the code paths of a .o file, it creates an array of orc_entry structs,
and a parallel array of instruction addresses associated with those
structs, and writes them to the .orc_unwind and .orc_unwind_ip sections
......
......@@ -6842,7 +6842,7 @@ L: dri-devel@lists.freedesktop.org
L: linux-tegra@vger.kernel.org
S: Supported
T: git git://anongit.freedesktop.org/tegra/linux.git
F: Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.txt
F: Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.yaml
F: Documentation/devicetree/bindings/gpu/host1x/
F: drivers/gpu/drm/tegra/
F: drivers/gpu/host1x/
......@@ -9326,6 +9326,7 @@ S: Supported
T: git git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux.git
F: Documentation/ABI/stable/sysfs-bus-vmbus
F: Documentation/ABI/testing/debugfs-hyperv
F: Documentation/virt/hyperv
F: Documentation/networking/device_drivers/ethernet/microsoft/netvsc.rst
F: arch/arm64/hyperv
F: arch/arm64/include/asm/hyperv-tlfs.h
......@@ -19845,7 +19846,7 @@ M: Sowjanya Komatineni <skomatineni@nvidia.com>
L: linux-media@vger.kernel.org
L: linux-tegra@vger.kernel.org
S: Maintained
F: Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.txt
F: Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.yaml
F: drivers/staging/media/tegra-video/
TEGRA XUSB PADCTL DRIVER
......@@ -20443,7 +20444,7 @@ F: tools/tracing/rtla/
TRADITIONAL CHINESE DOCUMENTATION
M: Hu Haowen <src.res@email.cn>
L: linux-doc-tw-discuss@lists.sourceforge.net
L: linux-doc-tw-discuss@lists.sourceforge.net (moderated for non-subscribers)
S: Maintained
W: https://github.com/srcres258/linux-doc
T: git git://github.com/srcres258/linux-doc.git doc-zh-tw
......
......@@ -30,11 +30,10 @@ drivers used by the Q40, apart from the very obvious (console etc.):
genrtc.c # RTC
char/joystick/* # most of this should work, not
# in default config.in
block/q40ide.c # startup for ide
ide* # see Documentation/ide/ide.rst
floppy.c # normal PC driver, DMA emu in asm/floppy.h
block/floppy.c # normal PC driver, DMA emu in asm/floppy.h
# and arch/m68k/kernel/entry.S
# see drivers/block/README.fd
ata/pata_falcon.c
net/ne.c
video/q40fb.c
parport/*
......
......@@ -379,7 +379,7 @@ void fscache_update_cookie(struct fscache_cookie *cookie, const void *aux_data,
*
* Request that the size of an object be changed.
*
* See Documentation/filesystems/caching/netfs-api.txt for a complete
* See Documentation/filesystems/caching/netfs-api.rst for a complete
* description.
*/
static inline
......
......@@ -67,7 +67,7 @@ struct unwind_hint {
* It should only be used in special cases where you're 100% sure it won't
* affect the reliability of frame pointers and kernel stack traces.
*
* For more information, see tools/objtool/Documentation/stack-validation.txt.
* For more information, see tools/objtool/Documentation/objtool.txt.
*/
#define STACK_FRAME_NON_STANDARD(func) \
static void __used __section(".discard.func_stack_frame_non_standard") \
......
......@@ -4,7 +4,7 @@
* Copyright (C) 2020 Red Hat, Inc. All Rights Reserved.
* Written by David Howells (dhowells@redhat.com)
*
* See Documentation/watch_queue.rst
* See Documentation/core-api/watch_queue.rst
*/
#ifndef _LINUX_WATCH_QUEUE_H
......
......@@ -414,7 +414,7 @@ config WATCH_QUEUE
with watches for key/keyring change notifications and device
notifications.
See Documentation/watch_queue.rst
See Documentation/core-api/watch_queue.rst
config CROSS_MEMORY_ATTACH
bool "Enable process_vm_readv/writev syscalls"
......
......@@ -4,7 +4,7 @@
* Copyright (C) 2020 Red Hat, Inc. All Rights Reserved.
* Written by David Howells (dhowells@redhat.com)
*
* See Documentation/watch_queue.rst
* See Documentation/core-api/watch_queue.rst
*/
#define pr_fmt(fmt) "watchq: " fmt
......
......@@ -498,7 +498,7 @@ config STACK_VALIDATION
runtime stack traces are more reliable.
For more information, see
tools/objtool/Documentation/stack-validation.txt.
tools/objtool/Documentation/objtool.txt.
config NOINSTR_VALIDATION
bool
......
#!/usr/bin/perl
#!/usr/bin/env perl
# SPDX-License-Identifier: GPL-2.0
use strict;
......
......@@ -427,6 +427,13 @@ sub print_lineno {
print ".. LINENO " . $lineno . "\n";
}
}
sub emit_warning {
my $location = shift;
my $msg = shift;
print STDERR "$location: warning: $msg";
++$warnings;
}
##
# dumps section contents to arrays/hashes intended for that purpose.
#
......@@ -451,8 +458,7 @@ sub dump_section {
if (defined($sections{$name}) && ($sections{$name} ne "")) {
# Only warn on user specified duplicate section names.
if ($name ne $section_default) {
print STDERR "${file}:$.: warning: duplicate section name '$name'\n";
++$warnings;
emit_warning("${file}:$.", "duplicate section name '$name'\n");
}
$sections{$name} .= $contents;
} else {
......@@ -1094,7 +1100,7 @@ sub dump_struct($$) {
if ($members) {
if ($identifier ne $declaration_name) {
print STDERR "${file}:$.: warning: expecting prototype for $decl_type $identifier. Prototype was for $decl_type $declaration_name instead\n";
emit_warning("${file}:$.", "expecting prototype for $decl_type $identifier. Prototype was for $decl_type $declaration_name instead\n");
return;
}
......@@ -1298,9 +1304,9 @@ sub dump_enum($$) {
if ($members) {
if ($identifier ne $declaration_name) {
if ($identifier eq "") {
print STDERR "${file}:$.: warning: wrong kernel-doc identifier on line:\n";
emit_warning("${file}:$.", "wrong kernel-doc identifier on line:\n");
} else {
print STDERR "${file}:$.: warning: expecting prototype for enum $identifier. Prototype was for enum $declaration_name instead\n";
emit_warning("${file}:$.", "expecting prototype for enum $identifier. Prototype was for enum $declaration_name instead\n");
}
return;
}
......@@ -1316,7 +1322,7 @@ sub dump_enum($$) {
if (!$parameterdescs{$arg}) {
$parameterdescs{$arg} = $undescribed;
if (show_warnings("enum", $declaration_name)) {
print STDERR "${file}:$.: warning: Enum value '$arg' not described in enum '$declaration_name'\n";
emit_warning("${file}:$.", "Enum value '$arg' not described in enum '$declaration_name'\n");
}
}
$_members{$arg} = 1;
......@@ -1325,7 +1331,7 @@ sub dump_enum($$) {
while (my ($k, $v) = each %parameterdescs) {
if (!exists($_members{$k})) {
if (show_warnings("enum", $declaration_name)) {
print STDERR "${file}:$.: warning: Excess enum value '$k' description in '$declaration_name'\n";
emit_warning("${file}:$.", "Excess enum value '$k' description in '$declaration_name'\n");
}
}
}
......@@ -1367,7 +1373,7 @@ sub dump_typedef($$) {
$return_type =~ s/^\s+//;
if ($identifier ne $declaration_name) {
print STDERR "${file}:$.: warning: expecting prototype for typedef $identifier. Prototype was for typedef $declaration_name instead\n";
emit_warning("${file}:$.", "expecting prototype for typedef $identifier. Prototype was for typedef $declaration_name instead\n");
return;
}
......@@ -1398,7 +1404,7 @@ sub dump_typedef($$) {
$declaration_name = $1;
if ($identifier ne $declaration_name) {
print STDERR "${file}:$.: warning: expecting prototype for typedef $identifier. Prototype was for typedef $declaration_name instead\n";
emit_warning("${file}:$.", "expecting prototype for typedef $identifier. Prototype was for typedef $declaration_name instead\n");
return;
}
......@@ -1554,9 +1560,7 @@ sub push_parameter($$$$$) {
$parameterdescs{$param} = $undescribed;
if (show_warnings($type, $declaration_name) && $param !~ /\./) {
print STDERR
"${file}:$.: warning: Function parameter or member '$param' not described in '$declaration_name'\n";
++$warnings;
emit_warning("${file}:$.", "Function parameter or member '$param' not described in '$declaration_name'\n");
}
}
......@@ -1604,11 +1608,10 @@ sub check_sections($$$$$) {
}
if ($err) {
if ($decl_type eq "function") {
print STDERR "${file}:$.: warning: " .
emit_warning("${file}:$.",
"Excess function parameter " .
"'$sects[$sx]' " .
"description in '$decl_name'\n";
++$warnings;
"description in '$decl_name'\n");
}
}
}
......@@ -1629,10 +1632,9 @@ sub check_return_section {
if (!defined($sections{$section_return}) ||
$sections{$section_return} eq "") {
print STDERR "${file}:$.: warning: " .
emit_warning("${file}:$.",
"No description found for return value of " .
"'$declaration_name'\n";
++$warnings;
"'$declaration_name'\n");
}
}
......@@ -1714,12 +1716,12 @@ sub dump_function($$) {
create_parameterlist($args, ',', $file, $declaration_name);
} else {
print STDERR "${file}:$.: warning: cannot understand function prototype: '$prototype'\n";
emit_warning("${file}:$.", "cannot understand function prototype: '$prototype'\n");
return;
}
if ($identifier ne $declaration_name) {
print STDERR "${file}:$.: warning: expecting prototype for $identifier(). Prototype was for $declaration_name() instead\n";
emit_warning("${file}:$.", "expecting prototype for $identifier(). Prototype was for $declaration_name() instead\n");
return;
}
......@@ -1801,8 +1803,8 @@ sub tracepoint_munge($) {
$tracepointargs = $1;
}
if (($tracepointname eq 0) || ($tracepointargs eq 0)) {
print STDERR "${file}:$.: warning: Unrecognized tracepoint format: \n".
"$prototype\n";
emit_warning("${file}:$.", "Unrecognized tracepoint format: \n".
"$prototype\n");
} else {
$prototype = "static inline void trace_$tracepointname($tracepointargs)";
$identifier = "trace_$identifier";
......@@ -2027,22 +2029,16 @@ sub process_name($$) {
}
if (!$is_kernel_comment) {
print STDERR "${file}:$.: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst\n";
print STDERR $_;
++$warnings;
emit_warning("${file}:$.", "This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst\n$_");
$state = STATE_NORMAL;
}
if (($declaration_purpose eq "") && $verbose) {
print STDERR "${file}:$.: warning: missing initial short description on line:\n";
print STDERR $_;
++$warnings;
emit_warning("${file}:$.", "missing initial short description on line:\n$_");
}
if ($identifier eq "" && $decl_type ne "enum") {
print STDERR "${file}:$.: warning: wrong kernel-doc identifier on line:\n";
print STDERR $_;
++$warnings;
emit_warning("${file}:$.", "wrong kernel-doc identifier on line:\n$_");
$state = STATE_NORMAL;
}
......@@ -2050,9 +2046,7 @@ sub process_name($$) {
print STDERR "${file}:$.: info: Scanning doc for $decl_type $identifier\n";
}
} else {
print STDERR "${file}:$.: warning: Cannot understand $_ on line $.",
" - I thought it was a doc line\n";
++$warnings;
emit_warning("${file}:$.", "Cannot understand $_ on line $. - I thought it was a doc line\n");
$state = STATE_NORMAL;
}
}
......@@ -2071,8 +2065,7 @@ sub process_body($$) {
$section =~ s/\.\.\.$//;
if ($verbose) {
print STDERR "${file}:$.: warning: Variable macro arguments should be documented without dots\n";
++$warnings;
emit_warning("${file}:$.", "Variable macro arguments should be documented without dots\n");
}
}
......@@ -2101,8 +2094,7 @@ sub process_body($$) {
if (($contents ne "") && ($contents ne "\n")) {
if (!$in_doc_sect && $verbose) {
print STDERR "${file}:$.: warning: contents before sections\n";
++$warnings;
emit_warning("${file}:$.", "contents before sections\n");
}
dump_section($file, $section, $contents);
$section = $section_default;
......@@ -2128,8 +2120,7 @@ sub process_body($$) {
}
# look for doc_com + <text> + doc_end:
if ($_ =~ m'\s*\*\s*[a-zA-Z_0-9:\.]+\*/') {
print STDERR "${file}:$.: warning: suspicious ending line: $_";
++$warnings;
emit_warning("${file}:$.", "suspicious ending line: $_");
}
$prototype = "";
......@@ -2173,8 +2164,7 @@ sub process_body($$) {
}
} else {
# i dont know - bad line? ignore.
print STDERR "${file}:$.: warning: bad line: $_";
++$warnings;
emit_warning("${file}:$.", "bad line: $_");
}
}
......@@ -2268,9 +2258,7 @@ sub process_inline($$) {
}
} elsif ($inline_doc_state == STATE_INLINE_NAME) {
$inline_doc_state = STATE_INLINE_ERROR;
print STDERR "${file}:$.: warning: ";
print STDERR "Incorrect use of kernel-doc format: $_";
++$warnings;
emit_warning("${file}:$.", "Incorrect use of kernel-doc format: $_");
}
}
}
......@@ -2319,11 +2307,11 @@ sub process_file($) {
if ($initial_section_counter == $section_counter && $
output_mode ne "none") {
if ($output_selection == OUTPUT_INCLUDE) {
print STDERR "${file}:1: warning: '$_' not found\n"
emit_warning("${file}:1", "'$_' not found\n")
for keys %function_table;
}
else {
print STDERR "${file}:1: warning: no structured comments found\n";
emit_warning("${file}:1", "no structured comments found\n");
}
}
close IN_FILE;
......
......@@ -25,6 +25,7 @@ my $need_sphinx = 0;
my $need_pip = 0;
my $need_virtualenv = 0;
my $rec_sphinx_upgrade = 0;
my $verbose_warn_install = 1;
my $install = "";
my $virtenv_dir = "";
my $python_cmd = "";
......@@ -103,11 +104,13 @@ sub check_missing(%)
next;
}
if ($verbose_warn_install) {
if ($is_optional) {
print "Warning: better to also install \"$prog\".\n";
} else {
print "ERROR: please install \"$prog\", otherwise, build won't work.\n";
}
}
if (defined($map{$prog})) {
$install .= " " . $map{$prog};
} else {
......@@ -386,7 +389,8 @@ sub give_debian_hints()
check_missing(\%map);
return if (!$need && !$optional);
printf("You should run:\n\n\tsudo apt-get install $install\n");
printf("You should run:\n") if ($verbose_warn_install);
printf("\n\tsudo apt-get install $install\n");
}
sub give_redhat_hints()
......@@ -458,10 +462,12 @@ sub give_redhat_hints()
if (!$old) {
# dnf, for Fedora 18+
printf("You should run:\n\n\tsudo dnf install -y $install\n");
printf("You should run:\n") if ($verbose_warn_install);
printf("\n\tsudo dnf install -y $install\n");
} else {
# yum, for RHEL (and clones) or Fedora version < 18
printf("You should run:\n\n\tsudo yum install -y $install\n");
printf("You should run:\n") if ($verbose_warn_install);
printf("\n\tsudo yum install -y $install\n");
}
}
......@@ -509,7 +515,8 @@ sub give_opensuse_hints()
check_missing(\%map);
return if (!$need && !$optional);
printf("You should run:\n\n\tsudo zypper install --no-recommends $install\n");
printf("You should run:\n") if ($verbose_warn_install);
printf("\n\tsudo zypper install --no-recommends $install\n");
}
sub give_mageia_hints()
......@@ -553,7 +560,8 @@ sub give_mageia_hints()
check_missing(\%map);
return if (!$need && !$optional);
printf("You should run:\n\n\tsudo $packager_cmd $install\n");
printf("You should run:\n") if ($verbose_warn_install);
printf("\n\tsudo $packager_cmd $install\n");
}
sub give_arch_linux_hints()
......@@ -583,7 +591,8 @@ sub give_arch_linux_hints()
check_missing(\%map);
return if (!$need && !$optional);
printf("You should run:\n\n\tsudo pacman -S $install\n");
printf("You should run:\n") if ($verbose_warn_install);
printf("\n\tsudo pacman -S $install\n");
}
sub give_gentoo_hints()
......@@ -610,7 +619,8 @@ sub give_gentoo_hints()
return if (!$need && !$optional);
printf("You should run:\n\n");
printf("You should run:\n") if ($verbose_warn_install);
printf("\n");
my $imagemagick = "media-gfx/imagemagick svg png";
my $cairo = "media-gfx/graphviz cairo pdf";
......@@ -700,7 +710,7 @@ sub check_distros()
sub deactivate_help()
{
printf "\nIf you want to exit the virtualenv, you can use:\n";
printf "\n If you want to exit the virtualenv, you can use:\n";
printf "\tdeactivate\n";
}
......@@ -720,6 +730,12 @@ sub get_virtenv()
next if (! -f $sphinx_cmd);
my $ver = get_sphinx_version($sphinx_cmd);
if (!$ver) {
$f =~ s#/bin/activate##;
print("Warning: virtual environment $f is not working.\nPython version upgrade? Remove it with:\n\n\trm -rf $f\n\n");
}
if ($need_sphinx && ($ver ge $min_version)) {
return ($f, $ver);
} elsif ($ver gt $cur_version) {
......@@ -741,7 +757,7 @@ sub recommend_sphinx_upgrade()
# Get the highest version from sphinx_*/bin/sphinx-build and the
# corresponding command to activate the venv/virtenv
$activate_cmd = get_virtenv();
($activate_cmd, $venv_ver) = get_virtenv();
# Store the highest version from Sphinx existing virtualenvs
if (($activate_cmd ne "") && ($venv_ver gt $cur_version)) {
......@@ -759,10 +775,14 @@ sub recommend_sphinx_upgrade()
# Either there are already a virtual env or a new one should be created
$need_pip = 1;
return if (!$latest_avail_ver);
# Return if the reason is due to an upgrade or not
if ($latest_avail_ver lt $rec_version) {
$rec_sphinx_upgrade = 1;
}
return $latest_avail_ver;
}
#
......@@ -775,12 +795,13 @@ sub recommend_sphinx_version($)
{
my $virtualenv_cmd = shift;
if ($latest_avail_ver lt $min_pdf_version) {
# Version is OK. Nothing to do.
if ($cur_version && ($cur_version ge $rec_version)) {
if ($cur_version lt $min_pdf_version) {
print "note: If you want pdf, you need at least Sphinx $min_pdf_version.\n";
}
# Version is OK. Nothing to do.
return if ($cur_version && ($cur_version ge $rec_version));
return;
};
if (!$need_sphinx) {
# sphinx-build is present and its version is >= $min_version
......@@ -820,13 +841,17 @@ sub recommend_sphinx_version($)
}
# Suggest newer versions if current ones are too old
if ($latest_avail_ver && $cur_version ge $min_version) {
if ($latest_avail_ver && $latest_avail_ver ge $min_version) {
# If there's a good enough version, ask the user to enable it
if ($latest_avail_ver ge $rec_version) {
printf "\nNeed to activate Sphinx (version $latest_avail_ver) on virtualenv with:\n";
printf "\t. $activate_cmd\n";
deactivate_help();
if ($latest_avail_ver lt $min_pdf_version) {
print "note: If you want pdf, you need at least Sphinx $min_pdf_version.\n";
}
return;
}
......@@ -848,7 +873,7 @@ sub recommend_sphinx_version($)
print "To upgrade Sphinx, use:\n\n";
}
} else {
print "Sphinx needs to be installed either as a package or via pip/pypi with:\n";
print "\nSphinx needs to be installed either:\n1) via pip/pypi with:\n\n";
}
$python_cmd = find_python_no_venv();
......@@ -858,6 +883,29 @@ sub recommend_sphinx_version($)
printf "\t. $virtenv_dir/bin/activate\n";
printf "\tpip install -r $requirement_file\n";
deactivate_help();
printf "\n2) As a package with:\n";
my $old_need = $need;
my $old_optional = $optional;
%missing = ();
$pdf = 0;
$optional = 0;
$install = "";
$verbose_warn_install = 0;
add_package("python-sphinx", 0);
check_python_module("sphinx_rtd_theme", 1);
check_distros();
$need = $old_need;
$optional = $old_optional;
printf "\n Please note that Sphinx >= 3.0 will currently produce false-positive\n";
printf " warning when the same name is used for more than one type (functions,\n";
printf " structs, enums,...). This is known Sphinx bug. For more details, see:\n";
printf "\thttps://github.com/sphinx-doc/sphinx/pull/8313\n";
}
sub check_needs()
......@@ -897,7 +945,7 @@ sub check_needs()
}
}
recommend_sphinx_upgrade();
my $venv_ver = recommend_sphinx_upgrade();
my $virtualenv_cmd;
......
......@@ -67,7 +67,7 @@ struct unwind_hint {
* It should only be used in special cases where you're 100% sure it won't
* affect the reliability of frame pointers and kernel stack traces.
*
* For more information, see tools/objtool/Documentation/stack-validation.txt.
* For more information, see tools/objtool/Documentation/objtool.txt.
*/
#define STACK_FRAME_NON_STANDARD(func) \
static void __used __section(".discard.func_stack_frame_non_standard") \
......
......@@ -3297,7 +3297,7 @@ static struct instruction *next_insn_to_validate(struct objtool_file *file,
* Follow the branch starting at the given instruction, and recursively follow
* any other branches (jumps). Meanwhile, track the frame pointer state at
* each instruction and validate all the rules described in
* tools/objtool/Documentation/stack-validation.txt.
* tools/objtool/Documentation/objtool.txt.
*/
static int validate_branch(struct objtool_file *file, struct symbol *func,
struct instruction *insn, struct insn_state state)
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment