Commit 88a61892 authored by Linus Torvalds's avatar Linus Torvalds

Merge tag 'docs-5.19' of git://git.lwn.net/linux

Pull documentation updates from Jonathan Corbet:
 "It was a moderately busy cycle for documentation; highlights include:

   - After a long period of inactivity, the Japanese translations are
     seeing some much-needed maintenance and updating.

   - Reworked IOMMU documentation

   - Some new documentation for static-analysis tools

   - A new overall structure for the memory-management documentation.
     This is an LSFMM outcome that, it is hoped, will help encourage
     developers to fill in the many gaps. Optimism is eternal...but
     hopefully it will work.

   - More Chinese translations.

  Plus the usual typo fixes, updates, etc"

* tag 'docs-5.19' of git://git.lwn.net/linux: (70 commits)
  docs: pdfdocs: Add space for chapter counts >= 100 in TOC
  docs/zh_CN: Add dev-tools/gdb-kernel-debugging.rst Chinese translation
  input: Docs: correct ntrig.rst typo
  input: Docs: correct atarikbd.rst typos
  MAINTAINERS: Become the docs/zh_CN maintainer
  docs/zh_CN: fix devicetree usage-model translation
  mm,doc: Add new documentation structure
  Documentation: drop more IDE boot options and ide-cd.rst
  Documentation/process: use scripts/get_maintainer.pl on patches
  MAINTAINERS: Add entry for DOCUMENTATION/JAPANESE
  docs/trans/ja_JP/howto: Don't mention specific kernel versions
  docs/ja_JP/SubmittingPatches: Request summaries for commit references
  docs/ja_JP/SubmittingPatches: Add Suggested-by as a standard signature
  docs/ja_JP/SubmittingPatches: Randy has moved
  docs/ja_JP/SubmittingPatches: Suggest the use of scripts/get_maintainer.pl
  docs/ja_JP/SubmittingPatches: Update GregKH links
  Documentation/sysctl: document max_rcu_stall_to_panic
  Documentation: add missing angle bracket in cgroup-v2 doc
  Documentation: dev-tools: use literal block instead of code-block
  docs/zh_CN: add vm numa translation
  ...
parents 537e62c8 b86f46d5
......@@ -1881,7 +1881,7 @@ IO Latency Interface Files
io.latency
This takes a similar format as the other controllers.
"MAJOR:MINOR target=<target time in microseconds"
"MAJOR:MINOR target=<target time in microseconds>"
io.stat
If the controller is enabled you will see extra stats in io.stat in
......
......@@ -99,6 +99,7 @@ parameter is applicable::
ALSA ALSA sound support is enabled.
APIC APIC support is enabled.
APM Advanced Power Management support is enabled.
APPARMOR AppArmor support is enabled.
ARM ARM architecture is enabled.
ARM64 ARM64 architecture is enabled.
AX25 Appropriate AX.25 support is enabled.
......@@ -108,15 +109,15 @@ parameter is applicable::
DYNAMIC_DEBUG Build in debug messages and enable them at runtime
EDD BIOS Enhanced Disk Drive Services (EDD) is enabled
EFI EFI Partitioning (GPT) is enabled
EIDE EIDE/ATAPI support is enabled.
EVM Extended Verification Module
FB The frame buffer device is enabled.
FTRACE Function tracing enabled.
GCOV GCOV profiling is enabled.
HIBERNATION HIBERNATION is enabled.
HW Appropriate hardware is enabled.
HYPER_V HYPERV support is enabled.
IA-64 IA-64 architecture is enabled.
IMA Integrity measurement architecture is enabled.
IOSCHED More than one I/O scheduler is enabled.
IP_PNP IP DHCP, BOOTP, or RARP is enabled.
IPV6 IPv6 support is enabled.
ISAPNP ISA PnP code is enabled.
......@@ -140,7 +141,6 @@ parameter is applicable::
NUMA NUMA support is enabled.
NFS Appropriate NFS support is enabled.
OF Devicetree is enabled.
OSS OSS sound support is enabled.
PV_OPS A paravirtualized kernel is enabled.
PARIDE The ParIDE (parallel port IDE) subsystem is enabled.
PARISC The PA-RISC architecture is enabled.
......@@ -160,7 +160,6 @@ parameter is applicable::
the Documentation/scsi/ sub-directory.
SECURITY Different security models are enabled.
SELINUX SELinux support is enabled.
APPARMOR AppArmor support is enabled.
SERIAL Serial support is enabled.
SH SuperH architecture is enabled.
SMP The kernel is an SMP kernel.
......@@ -168,7 +167,6 @@ parameter is applicable::
SWSUSP Software suspend (hibernation) is enabled.
SUSPEND System suspend states are enabled.
TPM TPM drivers are enabled.
TS Appropriate touchscreen support is enabled.
UMS USB Mass Storage support is enabled.
USB USB support is enabled.
USBHID USB Human Interface Device support is enabled.
......@@ -177,7 +175,6 @@ parameter is applicable::
VGA The VGA console has been enabled.
VT Virtual terminal support is enabled.
WDT Watchdog support is enabled.
XT IBM PC/XT MFM hard disk support is enabled.
X86-32 X86-32, aka i386 architecture is enabled.
X86-64 X86-64 architecture is enabled.
More X86-64 boot options can be found in
......@@ -211,7 +208,7 @@ The number of kernel parameters is not limited, but the length of the
complete command line (parameters including spaces etc.) is limited to
a fixed number of characters. This limit depends on the architecture
and is between 256 and 4096 characters. It is defined in the file
./include/asm/setup.h as COMMAND_LINE_SIZE.
./include/uapi/asm-generic/setup.h as COMMAND_LINE_SIZE.
Finally, the [KMG] suffix is commonly described after a number of kernel
parameter values. These 'K', 'M', and 'G' letters represent the _binary_
......
......@@ -783,6 +783,13 @@ is useful to define the root cause of RCU stalls using a vmcore.
1 panic() after printing RCU stall messages.
= ============================================================
max_rcu_stall_to_panic
======================
When ``panic_on_rcu_stall`` is set to 1, this value determines the
number of times that RCU can stall before panic() is called.
When ``panic_on_rcu_stall`` is set to 0, this value is has no effect.
perf_cpu_time_max_percent
=========================
......
This diff is collapsed.
......@@ -8,7 +8,6 @@ cdrom
:maxdepth: 1
cdrom-standard
ide-cd
packet-writing
.. only:: subproject and html
......
......@@ -18,6 +18,7 @@ it.
kernel-api
workqueue
watch_queue
printk-basics
printk-formats
printk-index
......
......@@ -115,34 +115,32 @@ The diagnostic data field is optional, and results which have neither a
directive nor any diagnostic data do not need to include the "#" field
separator.
Example result lines include:
.. code-block:: none
Example result lines include::
ok 1 test_case_name
The test "test_case_name" passed.
.. code-block:: none
::
not ok 1 test_case_name
The test "test_case_name" failed.
.. code-block:: none
::
ok 1 test # SKIP necessary dependency unavailable
The test "test" was SKIPPED with the diagnostic message "necessary dependency
unavailable".
.. code-block:: none
::
not ok 1 test # TIMEOUT 30 seconds
The test "test" timed out, with diagnostic data "30 seconds".
.. code-block:: none
::
ok 5 check return code # rcode=0
......@@ -202,7 +200,7 @@ allowed to be either indented or not indented.
An example of a test with two nested subtests:
.. code-block:: none
::
KTAP version 1
1..1
......@@ -215,7 +213,7 @@ An example of a test with two nested subtests:
An example format with multiple levels of nested testing:
.. code-block:: none
::
KTAP version 1
1..2
......@@ -250,7 +248,7 @@ nested version line, uses a line of the form
Example KTAP output
--------------------
.. code-block:: none
::
KTAP version 1
1..1
......
......@@ -125,7 +125,7 @@ All expectations/assertions are formatted as:
``void __noreturn kunit_try_catch_throw(struct kunit_try_catch *try_catch)``.
- ``kunit_try_catch_throw`` calls function:
``void complete_and_exit(struct completion *, long) __noreturn;``
``void kthread_complete_and_exit(struct completion *, long) __noreturn;``
and terminates the special thread context.
- ``<op>`` denotes a check with options: ``TRUE`` (supplied property
......
......@@ -115,3 +115,66 @@ that none of these errors are occurring during the test.
Some of these tools integrate with KUnit or kselftest and will
automatically fail tests if an issue is detected.
Static Analysis Tools
=====================
In addition to testing a running kernel, one can also analyze kernel source code
directly (**at compile time**) using **static analysis** tools. The tools
commonly used in the kernel allow one to inspect the whole source tree or just
specific files within it. They make it easier to detect and fix problems during
the development process.
Sparse can help test the kernel by performing type-checking, lock checking,
value range checking, in addition to reporting various errors and warnings while
examining the code. See the Documentation/dev-tools/sparse.rst documentation
page for details on how to use it.
Smatch extends Sparse and provides additional checks for programming logic
mistakes such as missing breaks in switch statements, unused return values on
error checking, forgetting to set an error code in the return of an error path,
etc. Smatch also has tests against more serious issues such as integer
overflows, null pointer dereferences, and memory leaks. See the project page at
http://smatch.sourceforge.net/.
Coccinelle is another static analyzer at our disposal. Coccinelle is often used
to aid refactoring and collateral evolution of source code, but it can also help
to avoid certain bugs that occur in common code patterns. The types of tests
available include API tests, tests for correct usage of kernel iterators, checks
for the soundness of free operations, analysis of locking behavior, and further
tests known to help keep consistent kernel usage. See the
Documentation/dev-tools/coccinelle.rst documentation page for details.
Beware, though, that static analysis tools suffer from **false positives**.
Errors and warns need to be evaluated carefully before attempting to fix them.
When to use Sparse and Smatch
-----------------------------
Sparse does type checking, such as verifying that annotated variables do not
cause endianness bugs, detecting places that use ``__user`` pointers improperly,
and analyzing the compatibility of symbol initializers.
Smatch does flow analysis and, if allowed to build the function database, it
also does cross function analysis. Smatch tries to answer questions like where
is this buffer allocated? How big is it? Can this index be controlled by the
user? Is this variable larger than that variable?
It's generally easier to write checks in Smatch than it is to write checks in
Sparse. Nevertheless, there are some overlaps between Sparse and Smatch checks.
Strong points of Smatch and Coccinelle
--------------------------------------
Coccinelle is probably the easiest for writing checks. It works before the
pre-processor so it's easier to check for bugs in macros using Coccinelle.
Coccinelle also creates patches for you, which no other tool does.
For example, with Coccinelle you can do a mass conversion from
``kmalloc(x * size, GFP_KERNEL)`` to ``kmalloc_array(x, size, GFP_KERNEL)``, and
that's really useful. If you just created a Smatch warning and try to push the
work of converting on to the maintainers they would be annoyed. You'd have to
argue about each warning if can really overflow or not.
Coccinelle does no analysis of variable values, which is the strong point of
Smatch. On the other hand, Coccinelle allows you to do simple things in a simple
way.
......@@ -79,8 +79,9 @@ simplistic idea of what C comment blocks look like. This problem had been
present since that comment was added in 2016 — a full four years. Fixing
it was a matter of adding the missing asterisks. A quick look at the
history for that file showed what the normal format for subject lines is,
and ``scripts/get_maintainer.pl`` told me who should receive it. The
resulting patch looked like this::
and ``scripts/get_maintainer.pl`` told me who should receive it (pass paths to
your patches as arguments to scripts/get_maintainer.pl). The resulting patch
looked like this::
[PATCH] PM / devfreq: Fix two malformed kerneldoc comments
......
===========================
Writing kernel-doc comments
===========================
......@@ -436,6 +437,7 @@ The title following ``DOC:`` acts as a heading within the source file, but also
as an identifier for extracting the documentation comment. Thus, the title must
be unique within the file.
=============================
Including kernel-doc comments
=============================
......
.. _sphinxdoc:
Introduction
============
=====================================
Using Sphinx for kernel documentation
=====================================
The Linux kernel uses `Sphinx`_ to generate pretty documentation from
`reStructuredText`_ files under ``Documentation``. To build the documentation in
......
......@@ -249,7 +249,7 @@ CLOCK
devm_clk_bulk_get()
devm_clk_bulk_get_all()
devm_clk_bulk_get_optional()
devm_get_clk_from_childl()
devm_get_clk_from_child()
devm_clk_hw_register()
devm_of_clk_add_hw_provider()
devm_clk_hw_register_clkdev()
......
......@@ -4,7 +4,7 @@
Intel(R) Dynamic Platform and Thermal Framework Sysfs Interface
===============================================================
:Copyright: |copy| 2022 Intel Corporation
:Copyright: © 2022 Intel Corporation
:Author: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
......
......@@ -132,16 +132,16 @@ configuration of fault-injection capabilities.
Format: { 'Y' | 'N' }
default is 'N', setting it to 'Y' won't inject failures into
highmem/user allocations.
default is 'Y', setting it to 'N' will also inject failures into
highmem/user allocations (__GFP_HIGHMEM allocations).
- /sys/kernel/debug/failslab/ignore-gfp-wait:
- /sys/kernel/debug/fail_page_alloc/ignore-gfp-wait:
Format: { 'Y' | 'N' }
default is 'N', setting it to 'Y' will inject failures
only into non-sleep allocations (GFP_ATOMIC allocations).
default is 'Y', setting it to 'N' will also inject failures
into allocations that can sleep (__GFP_DIRECT_RECLAIM allocations).
- /sys/kernel/debug/fail_page_alloc/min-order:
......@@ -280,7 +280,7 @@ Application Examples
printf %#x -1 > /sys/kernel/debug/$FAILTYPE/times
echo 0 > /sys/kernel/debug/$FAILTYPE/space
echo 2 > /sys/kernel/debug/$FAILTYPE/verbose
echo 1 > /sys/kernel/debug/$FAILTYPE/ignore-gfp-wait
echo Y > /sys/kernel/debug/$FAILTYPE/ignore-gfp-wait
faulty_system()
{
......@@ -334,8 +334,8 @@ Application Examples
printf %#x -1 > /sys/kernel/debug/$FAILTYPE/times
echo 0 > /sys/kernel/debug/$FAILTYPE/space
echo 2 > /sys/kernel/debug/$FAILTYPE/verbose
echo 1 > /sys/kernel/debug/$FAILTYPE/ignore-gfp-wait
echo 1 > /sys/kernel/debug/$FAILTYPE/ignore-gfp-highmem
echo Y > /sys/kernel/debug/$FAILTYPE/ignore-gfp-wait
echo Y > /sys/kernel/debug/$FAILTYPE/ignore-gfp-highmem
echo 10 > /sys/kernel/debug/$FAILTYPE/stacktrace-depth
trap "echo 0 > /sys/kernel/debug/$FAILTYPE/probability" SIGINT SIGTERM EXIT
......
This diff is collapsed.
/*
* Many thanks to Lode Leroy <Lode.Leroy@www.ibase.be>, who tested so many
* ALPHA patches to this driver on an EASYSTOR LS-120 ATAPI floppy drive.
*
* Ver 0.1 Oct 17 96 Initial test version, mostly based on ide-tape.c.
* Ver 0.2 Oct 31 96 Minor changes.
* Ver 0.3 Dec 2 96 Fixed error recovery bug.
* Ver 0.4 Jan 26 97 Add support for the HDIO_GETGEO ioctl.
* Ver 0.5 Feb 21 97 Add partitions support.
* Use the minimum of the LBA and CHS capacities.
* Avoid hwgroup->rq == NULL on the last irq.
* Fix potential null dereferencing with DEBUG_LOG.
* Ver 0.8 Dec 7 97 Increase irq timeout from 10 to 50 seconds.
* Add media write-protect detection.
* Issue START command only if TEST UNIT READY fails.
* Add work-around for IOMEGA ZIP revision 21.D.
* Remove idefloppy_get_capabilities().
* Ver 0.9 Jul 4 99 Fix a bug which might have caused the number of
* bytes requested on each interrupt to be zero.
* Thanks to <shanos@es.co.nz> for pointing this out.
* Ver 0.9.sv Jan 6 01 Sam Varshavchik <mrsam@courier-mta.com>
* Implement low level formatting. Reimplemented
* IDEFLOPPY_CAPABILITIES_PAGE, since we need the srfp
* bit. My LS-120 drive barfs on
* IDEFLOPPY_CAPABILITIES_PAGE, but maybe it's just me.
* Compromise by not reporting a failure to get this
* mode page. Implemented four IOCTLs in order to
* implement formatting. IOCTls begin with 0x4600,
* 0x46 is 'F' as in Format.
* Jan 9 01 Userland option to select format verify.
* Added PC_SUPPRESS_ERROR flag - some idefloppy drives
* do not implement IDEFLOPPY_CAPABILITIES_PAGE, and
* return a sense error. Suppress error reporting in
* this particular case in order to avoid spurious
* errors in syslog. The culprit is
* idefloppy_get_capability_page(), so move it to
* idefloppy_begin_format() so that it's not used
* unless absolutely necessary.
* If drive does not support format progress indication
* monitor the dsc bit in the status register.
* Also, O_NDELAY on open will allow the device to be
* opened without a disk available. This can be used to
* open an unformatted disk, or get the device capacity.
* Ver 0.91 Dec 11 99 Added IOMEGA Clik! drive support by
* <paul@paulbristow.net>
* Ver 0.92 Oct 22 00 Paul Bristow became official maintainer for this
* driver. Included Powerbook internal zip kludge.
* Ver 0.93 Oct 24 00 Fixed bugs for Clik! drive
* no disk on insert and disk change now works
* Ver 0.94 Oct 27 00 Tidied up to remove strstr(Clik) everywhere
* Ver 0.95 Nov 7 00 Brought across to kernel 2.4
* Ver 0.96 Jan 7 01 Actually in line with release version of 2.4.0
* including set_bit patch from Rusty Russell
* Ver 0.97 Jul 22 01 Merge 0.91-0.96 onto 0.9.sv for ac series
* Ver 0.97.sv Aug 3 01 Backported from 2.4.7-ac3
* Ver 0.98 Oct 26 01 Split idefloppy_transfer_pc into two pieces to
* fix a lost interrupt problem. It appears the busy
* bit was being deasserted by my IOMEGA ATAPI ZIP 100
* drive before the drive was actually ready.
* Ver 0.98a Oct 29 01 Expose delay value so we can play.
* Ver 0.99 Feb 24 02 Remove duplicate code, modify clik! detection code
* to support new PocketZip drives
*/
This diff is collapsed.
Changelog for ide cd
--------------------
.. include:: ChangeLog.ide-cd.1994-2004
:literal:
Changelog for ide floppy
------------------------
.. include:: ChangeLog.ide-floppy.1996-2002
:literal:
Changelog for ide tape
----------------------
.. include:: ChangeLog.ide-tape.1995-2002
:literal:
===============================
IDE ATAPI streaming tape driver
===============================
This driver is a part of the Linux ide driver.
The driver, in co-operation with ide.c, basically traverses the
request-list for the block device interface. The character device
interface, on the other hand, creates new requests, adds them
to the request-list of the block device, and waits for their completion.
The block device major and minor numbers are determined from the
tape's relative position in the ide interfaces, as explained in ide.c.
The character device interface consists of the following devices::
ht0 major 37, minor 0 first IDE tape, rewind on close.
ht1 major 37, minor 1 second IDE tape, rewind on close.
...
nht0 major 37, minor 128 first IDE tape, no rewind on close.
nht1 major 37, minor 129 second IDE tape, no rewind on close.
...
The general magnetic tape commands compatible interface, as defined by
include/linux/mtio.h, is accessible through the character device.
General ide driver configuration options, such as the interrupt-unmask
flag, can be configured by issuing an ioctl to the block device interface,
as any other ide device.
Our own ide-tape ioctl's can be issued to either the block device or
the character device interface.
Maximal throughput with minimal bus load will usually be achieved in the
following scenario:
1. ide-tape is operating in the pipelined operation mode.
2. No buffering is performed by the user backup program.
Testing was done with a 2 GB CONNER CTMA 4000 IDE ATAPI Streaming Tape Drive.
Here are some words from the first releases of hd.c, which are quoted
in ide.c and apply here as well:
* Special care is recommended. Have Fun!
Possible improvements
=====================
1. Support for the ATAPI overlap protocol.
In order to maximize bus throughput, we currently use the DSC
overlap method which enables ide.c to service requests from the
other device while the tape is busy executing a command. The
DSC overlap method involves polling the tape's status register
for the DSC bit, and servicing the other device while the tape
isn't ready.
In the current QIC development standard (December 1995),
it is recommended that new tape drives will *in addition*
implement the ATAPI overlap protocol, which is used for the
same purpose - efficient use of the IDE bus, but is interrupt
driven and thus has much less CPU overhead.
ATAPI overlap is likely to be supported in most new ATAPI
devices, including new ATAPI cdroms, and thus provides us
a method by which we can achieve higher throughput when
sharing a (fast) ATA-2 disk with any (slow) new ATAPI device.
This diff is collapsed.
.. SPDX-License-Identifier: GPL-2.0
==================================
Integrated Drive Electronics (IDE)
==================================
.. toctree::
:maxdepth: 1
ide
ide-tape
warm-plug-howto
changelogs
.. only:: subproject and html
Indices
=======
* :ref:`genindex`
===================
IDE warm-plug HOWTO
===================
To warm-plug devices on a port 'idex'::
# echo -n "1" > /sys/class/ide_port/idex/delete_devices
unplug old device(s) and plug new device(s)::
# echo -n "1" > /sys/class/ide_port/idex/scan
done
NOTE: please make sure that partitions are unmounted and that there are
no other active references to devices before doing "delete_devices" step,
also do not attempt "scan" step on devices currently in use -- otherwise
results may be unpredictable and lead to data loss if you're unlucky
......@@ -103,7 +103,6 @@ needed).
block/index
cdrom/index
cpu-freq/index
ide/index
fb/index
fpga/index
hid/index
......@@ -169,7 +168,6 @@ to ReStructured Text format, or are simply too old.
tools/index
staging/index
watch_queue
Translations
......
......@@ -288,7 +288,7 @@ between 0 and large positive numbers. Excess motion below 0 is ignored. The
command sets the maximum positive value that can be attained in the scaled
coordinate system. Motion beyond that value is also ignored.
SET MOUSE KEYCODE MOSE
SET MOUSE KEYCODE MODE
----------------------
::
......@@ -333,7 +333,7 @@ occur before the internally maintained coordinate is changed by one
(independently scaled for each axis). Remember that the mouse position
information is available only by interrogating the ikbd in the ABSOLUTE MOUSE
POSITIONING mode unless the ikbd has been commanded to report on button press
or release (see SET MOSE BUTTON ACTION).
or release (see SET MOUSE BUTTON ACTION).
INTERROGATE MOUSE POSITION
--------------------------
......
......@@ -32,7 +32,7 @@ The following parameters are used to configure filters to reduce noise:
|activation_height, |size threshold to activate immediately |
|activation_width | |
+-----------------------+-----------------------------------------------------+
|min_height, |size threshold bellow which fingers are ignored |
|min_height, |size threshold below which fingers are ignored |
|min_width |both to decide activation and during activity |
+-----------------------+-----------------------------------------------------+
|deactivate_slack |the number of "no contact" frames to ignore before |
......
......@@ -112,8 +112,7 @@ time, although different tasklets can run simultaneously.
.. warning::
The name 'tasklet' is misleading: they have nothing to do with
'tasks', and probably more to do with some bad vodka Alexey
Kuznetsov had at the time.
'tasks'.
You can tell you are in a softirq (or tasklet) using the
:c:func:`in_softirq()` macro (``include/linux/preempt.h``).
......@@ -290,8 +289,8 @@ userspace.
Unlike :c:func:`put_user()` and :c:func:`get_user()`, they
return the amount of uncopied data (ie. 0 still means success).
[Yes, this moronic interface makes me cringe. The flamewar comes up
every year or so. --RR.]
[Yes, this objectionable interface makes me cringe. The flamewar comes
up every year or so. --RR.]
The functions may sleep implicitly. This should never be called outside
user context (it makes no sense), with interrupts disabled, or a
......@@ -645,8 +644,9 @@ names in development kernels; this is not done just to keep everyone on
their toes: it reflects a fundamental change (eg. can no longer be
called with interrupts on, or does extra checks, or doesn't do checks
which were caught before). Usually this is accompanied by a fairly
complete note to the linux-kernel mailing list; search the archive.
Simply doing a global replace on the file usually makes things **worse**.
complete note to the appropriate kernel development mailing list; search
the archives. Simply doing a global replace on the file usually makes
things **worse**.
Initializing structure members
------------------------------
......@@ -723,14 +723,14 @@ Putting Your Stuff in the Kernel
In order to get your stuff into shape for official inclusion, or even to
make a neat patch, there's administrative work to be done:
- Figure out whose pond you've been pissing in. Look at the top of the
source files, inside the ``MAINTAINERS`` file, and last of all in the
``CREDITS`` file. You should coordinate with this person to make sure
you're not duplicating effort, or trying something that's already
been rejected.
- Figure out who are the owners of the code you've been modifying. Look
at the top of the source files, inside the ``MAINTAINERS`` file, and
last of all in the ``CREDITS`` file. You should coordinate with these
people to make sure you're not duplicating effort, or trying something
that's already been rejected.
Make sure you put your name and EMail address at the top of any files
you create or mangle significantly. This is the first place people
Make sure you put your name and email address at the top of any files
you create or modify significantly. This is the first place people
will look when they find a bug, or when **they** want to make a change.
- Usually you want a configuration option for your kernel hack. Edit
......@@ -748,11 +748,11 @@ make a neat patch, there's administrative work to be done:
can usually just add a "obj-$(CONFIG_xxx) += xxx.o" line. The syntax
is documented in ``Documentation/kbuild/makefiles.rst``.
- Put yourself in ``CREDITS`` if you've done something noteworthy,
usually beyond a single file (your name should be at the top of the
source files anyway). ``MAINTAINERS`` means you want to be consulted
when changes are made to a subsystem, and hear about bugs; it implies
a more-than-passing commitment to some part of the code.
- Put yourself in ``CREDITS`` if you consider what you've done
noteworthy, usually beyond a single file (your name should be at the
top of the source files anyway). ``MAINTAINERS`` means you want to be
consulted when changes are made to a subsystem, and hear about bugs;
it implies a more-than-passing commitment to some part of the code.
- Finally, don't forget to read
``Documentation/process/submitting-patches.rst`` and possibly
......
......@@ -941,8 +941,7 @@ lock.
A classic problem here is when you provide callbacks or hooks: if you
call these with the lock held, you risk simple deadlock, or a deadly
embrace (who knows what the callback will do?). Remember, the other
programmers are out to get you, so don't do this.
embrace (who knows what the callback will do?).
Overzealous Prevention Of Deadlocks
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
......@@ -952,8 +951,6 @@ grabs a read lock, searches a list, fails to find what it wants, drops
the read lock, grabs a write lock and inserts the object has a race
condition.
If you don't see why, please stay away from my code.
Racing Timers: A Kernel Pastime
-------------------------------
......
......@@ -154,10 +154,11 @@ that the kernel developers have added a script to ease the process:
This script will return the current maintainer(s) for a given file or
directory when given the "-f" option. If passed a patch on the
command line, it will list the maintainers who should probably receive
copies of the patch. There are a number of options regulating how hard
get_maintainer.pl will search for maintainers; please be careful about
using the more aggressive options as you may end up including developers
who have no real interest in the code you are modifying.
copies of the patch. This is the preferred way (unlike "-f" option) to get the
list of people to Cc for your patches. There are a number of options
regulating how hard get_maintainer.pl will search for maintainers; please be
careful about using the more aggressive options as you may end up including
developers who have no real interest in the code you are modifying.
If all else fails, talking to Andrew Morton can be an effective way to
track down a maintainer for a specific piece of code.
......
......@@ -7,7 +7,7 @@ Intro
=====
This document is designed to provide a list of the minimum levels of
software necessary to run the 4.x kernels.
software necessary to run the current kernel version.
This document is originally based on my "Changes" file for 2.0.x kernels
and therefore owes credit to the same people as that file (Jared Mauch,
......@@ -56,6 +56,7 @@ iptables 1.4.2 iptables -V
openssl & libcrypto 1.0.0 openssl version
bc 1.06.95 bc --version
Sphinx\ [#f1]_ 1.7 sphinx-build --version
cpio any cpio --version
====================== =============== ========================================
.. [#f1] Sphinx is needed only to build the Kernel documentation
......@@ -458,6 +459,11 @@ mcelog
- <http://www.mcelog.org/>
cpio
----
- <https://www.gnu.org/software/cpio/>
Networking
**********
......
......@@ -77,7 +77,7 @@ as you intend it to.
The maintainer will thank you if you write your patch description in a
form which can be easily pulled into Linux's source code management
system, ``git``, as a "commit log". See :ref:`explicit_in_reply_to`.
system, ``git``, as a "commit log". See :ref:`the_canonical_patch_format`.
Solve only one problem per patch. If your description starts to get
long, that's a sign that you probably need to split up your patch.
......@@ -227,9 +227,10 @@ Select the recipients for your patch
You should always copy the appropriate subsystem maintainer(s) on any patch
to code that they maintain; look through the MAINTAINERS file and the
source code revision history to see who those maintainers are. The
script scripts/get_maintainer.pl can be very useful at this step. If you
cannot find a maintainer for the subsystem you are working on, Andrew
Morton (akpm@linux-foundation.org) serves as a maintainer of last resort.
script scripts/get_maintainer.pl can be very useful at this step (pass paths to
your patches as arguments to scripts/get_maintainer.pl). If you cannot find a
maintainer for the subsystem you are working on, Andrew Morton
(akpm@linux-foundation.org) serves as a maintainer of last resort.
You should also normally choose at least one mailing list to receive a copy
of your patch set. linux-kernel@vger.kernel.org should be used by default
......@@ -318,7 +319,10 @@ understands what is going on.
Be sure to tell the reviewers what changes you are making and to thank them
for their time. Code review is a tiring and time-consuming process, and
reviewers sometimes get grumpy. Even in that case, though, respond
politely and address the problems they have pointed out.
politely and address the problems they have pointed out. When sending a next
version, add a ``patch changelog`` to the cover letter or to individual patches
explaining difference aganst previous submission (see
:ref:`the_canonical_patch_format`).
See Documentation/process/email-clients.rst for recommendations on email
clients and mailing list etiquette.
......
......@@ -56,9 +56,9 @@ Next two are try_to_wake_up() statistics:
Next three are statistics describing scheduling latency:
7) sum of all time spent running by tasks on this processor (in jiffies)
7) sum of all time spent running by tasks on this processor (in nanoseconds)
8) sum of all time spent waiting to run by tasks on this processor (in
jiffies)
nanoseconds)
9) # of timeslices run on this cpu
......@@ -155,8 +155,8 @@ schedstats also adds a new /proc/<pid>/schedstat file to include some of
the same information on a per-process level. There are three fields in
this file correlating for that process to:
1) time spent on the cpu
2) time spent waiting on a runqueue
1) time spent on the cpu (in nanoseconds)
2) time spent waiting on a runqueue (in nanoseconds)
3) # of timeslices run on this cpu
A program could be easily written to make use of these extra fields to
......
......@@ -20,13 +20,13 @@
% - Indent of 2 chars is preserved for ease of comparison.
% Summary of changes from default params:
% Width of page number (\@pnumwidth): 1.55em -> 2.7em
% Width of chapter number: 1.5em -> 1.8em
% Indent of section number: 1.5em -> 1.8em
% Width of chapter number: 1.5em -> 2.4em
% Indent of section number: 1.5em -> 2.4em
% Width of section number: 2.6em -> 3.2em
% Indent of sebsection number: 4.1em -> 5em
% Indent of subsection number: 4.1em -> 5.6em
% Width of subsection number: 3.5em -> 4.3em
%
% These params can have 4 digit page counts, 2 digit chapter counts,
% These params can have 4 digit page counts, 3 digit chapter counts,
% section counts of 4 digits + 1 period (e.g., 18.10), and subsection counts
% of 5 digits + 2 periods (e.g., 18.7.13).
\makeatletter
......@@ -37,7 +37,7 @@
\ifnum \c@tocdepth >\m@ne
\addpenalty{-\@highpenalty}%
\vskip 1.0em \@plus\p@
\setlength\@tempdima{1.8em}%
\setlength\@tempdima{2.4em}%
\begingroup
\parindent \z@ \rightskip \@pnumwidth
\parfillskip -\@pnumwidth
......@@ -51,8 +51,8 @@
\endgroup
\fi}
%% Redefine \l@section and \l@subsection
\renewcommand*\l@section{\@dottedtocline{1}{1.8em}{3.2em}}
\renewcommand*\l@subsection{\@dottedtocline{2}{5em}{4.3em}}
\renewcommand*\l@section{\@dottedtocline{1}{2.4em}{3.2em}}
\renewcommand*\l@subsection{\@dottedtocline{2}{5.6em}{4.3em}}
\makeatother
%% Sphinx < 1.8 doesn't have \sphinxtableofcontentshook
\providecommand{\sphinxtableofcontentshook}{}
......
REPORTING BUGS
==============
Report bugs to <lkml@vger.kernel.org>
Report bugs to <linux-kernel@vger.kernel.org>
and <linux-trace-devel@vger.kernel.org>
LICENSE
=======
......
......@@ -81,9 +81,7 @@ Linux カーネルに対する全ての変更は diff(1) コマンドによる
dontdiff ファイルには Linux カーネルのビルドプロセスの過程で生成された
ファイルの一覧がのっています。そして、それらはパッチを生成する diff(1)
コマンドで無視されるべきです。dontdiff ファイルは 2.6.12 以後のバージョ
ンの Linux カーネルソースツリーに含まれています。それより前のバージョン
の Linux カーネルソースツリーに対する dontdiff ファイルは、
<http://www.xenotime.net/linux/doc/dontdiff>から取得することができます。
ンの Linux カーネルソースツリーに含まれています。
投稿するパッチの中に関係のない余分なファイルが含まれていないことを確
認してください。diff(1) コマンドで生成したパッチがあなたの意図したとお
......@@ -125,6 +123,17 @@ http://savannah.nongnu.org/projects/quilt
登録済みのバグエントリを修正するパッチであれば、そのバグエントリを示すバグ ID
や URL を明記してください。
特定のコミットを参照したい場合は、その SHA-1 ID だけでなく、一行サマリ
も含めてください。それにより、それが何に関するコミットなのかがレビューする
人にわかりやすくなります。
例 (英文のママ):
Commit e21d2170f36602ae2708 ("video: remove unnecessary
platform_set_drvdata()") removed the unnecessary
platform_set_drvdata(), but left the variable "dev" unused,
delete it.
3) パッチの分割
意味のあるひとまとまりごとに変更を個々のパッチファイルに分けてください。
......@@ -162,7 +171,8 @@ http://savannah.nongnu.org/projects/quilt
MAINTAINERS ファイルとソースコードに目を通してください。そして、その変
更がメンテナのいる特定のサブシステムに加えられるものであることが分か
れば、その人に電子メールを送ってください。
れば、その人に電子メールを送ってください。その際
./scripts/get_maintainers.pl のスクリプトが有用です。
もし、メンテナが載っていなかったり、メンテナからの応答がないなら、
LKML ( linux-kernel@vger.kernel.org )へパッチを送ってください。ほとんど
......@@ -400,7 +410,7 @@ Acked-by: が必ずしもパッチ全体の承認を示しているわけでは
このタグはパッチに関心があると思われる人達がそのパッチの議論に含まれていたこと
を明文化します。
14) Reported-by と Tested-by: と Reviewed-by: の利用
14) Reported-by:, Tested-by:, Reviewed-by: および Suggested-by: の利用
他の誰かによって報告された問題を修正するパッチであれば、問題報告者という寄与を
クレジットするために、Reported-by: タグを追加することを検討してください。
......@@ -449,6 +459,13 @@ Reviewd-by タグはそのパッチがカーネルに対して適切な修正で
レビューを実施したレビューアによって提供される時、Reviewed-by: タグがあなたの
パッチをカーネルにマージする可能性を高めるでしょう。
Suggested-by: タグは、パッチのアイデアがその人からの提案に基づくものである
ことを示し、アイデアの提供をクレジットするものです。提案者の明示的な許可が
ない場合、特にそのアイデアが公開のフォーラムで示されていない場合には、この
タグをつけないように注意してください。とはいえ、アイデアの提供者をこつこつ
クレジットしていけば、望むらくはその人たちが将来別の機会に再度力を貸す気に
なってくれるかもしれません。
15) 標準的なパッチのフォーマット
標準的なパッチのサブジェクトは以下のとおりです。
......@@ -681,10 +698,11 @@ Jeff Garzik, "Linux kernel patch submission format".
<https://web.archive.org/web/20180829112450/http://linux.yyz.us/patch-format.html>
Greg Kroah-Hartman, "How to piss off a kernel subsystem maintainer".
<http://www.kroah.com/log/2005/03/31/>
<http://www.kroah.com/log/2005/07/08/>
<http://www.kroah.com/log/2005/10/19/>
<http://www.kroah.com/log/2006/01/11/>
<http://www.kroah.com/log/linux/maintainer.html>
<http://www.kroah.com/log/linux/maintainer-02.html>
<http://www.kroah.com/log/linux/maintainer-03.html>
<http://www.kroah.com/log/linux/maintainer-04.html>
<http://www.kroah.com/log/linux/maintainer-05.html>
NO!!!! No more huge patch bombs to linux-kernel@vger.kernel.org people!
<https://lore.kernel.org/r/20050711.125305.08322243.davem@davemloft.net>
......
......@@ -262,21 +262,21 @@ Linux カーネルの開発プロセスは現在幾つかの異なるメイン
チ」と多数のサブシステム毎のカーネルブランチから構成されます。これらの
ブランチとは -
- メインの 4.x カーネルツリー
- 4.x.y -stable カーネルツリー
- サブシステム毎のカーネルツリーとパッチ
- 統合テストのための 4.x -next カーネルツリー
- Linus のメインラインツリー
- メジャー番号をまたぐ数本の安定版ツリー
- サブシステム毎のカーネルツリー
- 統合テストのための linux-next カーネルツリー
4.x カーネルツリー
メインラインツリー
~~~~~~~~~~~~~~~~~~
4.x カーネルは Linus Torvalds によってメンテナンスされ、
https://kernel.org の pub/linux/kernel/v4.x/ ディレクトリに存在します。
メインラインツリーは Linus Torvalds によってメンテナンスされ、
https://kernel.org のリポジトリに存在します。
この開発プロセスは以下のとおり -
- 新しいカーネルがリリースされた直後に、2週間の特別期間が設けられ、
この期間中に、メンテナ達は Linus に大きな差分を送ることができます。
このような差分は通常 -next カーネルに数週間含まれてきたパッチです。
このような差分は通常 linux-next カーネルに数週間含まれてきたパッチです。
大きな変更は git(カーネルのソース管理ツール、詳細は
http://git-scm.com/ 参照) を使って送るのが好ましいやり方ですが、パッ
チファイルの形式のまま送るのでも十分です。
......@@ -303,20 +303,18 @@ Andrew Morton が Linux-kernel メーリングリストにカーネルリリー
前もって決められた計画によってリリースされるものではないから
です。」*
4.x.y -stable カーネルツリー
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
メジャー番号をまたぐ数本の安定版ツリー
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
バージョン番号が3つの数字に分かれているカーネルは -stable カーネルです。
これには、4.x カーネルで見つかったセキュリティ問題や重大な後戻りに対す
る比較的小さい重要な修正が含まれます。
これには最初の2つのバージョン番号の数字に対応した、
メインラインリリースで見つかったセキュリティ問題や
重大な後戻りに対する比較的小さい重要な修正が含まれます。
これは、開発/実験的バージョンのテストに協力することに興味が無く、最新
の安定したカーネルを使いたいユーザに推奨するブランチです。
もし、4.x.y カーネルが存在しない場合には、番号が一番大きい 4.x が最新
の安定版カーネルです。
4.x.y は "stable" チーム <stable@vger.kernel.org> でメンテされており、
安定版ツリーは"stable" チーム <stable@vger.kernel.org> でメンテされており、
必要に応じてリリースされます。通常のリリース期間は 2週間毎ですが、差
し迫った問題がなければもう少し長くなることもあります。セキュリティ関
連の問題の場合はこれに対してだいたいの場合、すぐにリリースがされます。
......@@ -326,7 +324,7 @@ Documentation/process/stable-kernel-rules.rst ファイルにはどのような
類の変更が -stable ツリーに受け入れ可能か、またリリースプロセスがどう
動くかが記述されています。
サブシステム毎のカーネルツリーとパッチ
サブシステム毎のカーネルツリー
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
それぞれのカーネルサブシステムのメンテナ達は --- そして多くのカーネル
......@@ -351,19 +349,19 @@ quilt シリーズとして公開されているパッチキューも使われ
けることができます。大部分のこれらの patchwork のサイトは
https://patchwork.kernel.org/ でリストされています。
統合テストのための 4.x -next カーネルツリー
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
統合テストのための linux-next カーネルツリー
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
サブシステムツリーの更新内容がメインラインの 4.x ツリーにマージされる
サブシステムツリーの更新内容がメインラインツリーにマージされる
前に、それらは統合テストされる必要があります。この目的のため、実質的に
全サブシステムツリーからほぼ毎日プルされてできる特別なテスト用のリポジ
トリが存在します-
https://git.kernel.org/?p=linux/kernel/git/next/linux-next.git
このやり方によって、-next カーネルは次のマージ機会でどんなものがメイン
ラインカーネルにマージされるか、おおまかなの展望を提供します。-next カー
ネルの実行テストを行う冒険好きなテスターは大いに歓迎されます。
このやり方によって、linux-next は次のマージ機会でどんなものがメイン
ラインにマージされるか、おおまかな展望を提供します。
linux-next の実行テストを行う冒険好きなテスターは大いに歓迎されます。
バグレポート
-------------
......
......@@ -5,7 +5,7 @@
\kerneldocCJKon
\kerneldocBeginJP{
Japanese translations
日本語訳
=====================
.. toctree::
......
......@@ -53,8 +53,8 @@ DAMON_RECLAIM找到在特定时间内没有被访问的内存区域并分页。
下面是每个参数的描述。
enable
------
enabled
-------
启用或禁用DAMON_RECLAIM。
......
.. highlight:: none
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/dev-tools/gdb-kernel-debugging.rst
:Translator: 高超 gao chao <gaochao49@huawei.com>
通过gdb调试内核和模块
=====================
Kgdb内核调试器、QEMU等虚拟机管理程序或基于JTAG的硬件接口,支持在运行时使用gdb
调试Linux内核及其模块。Gdb提供了一个强大的python脚本接口,内核也提供了一套
辅助脚本以简化典型的内核调试步骤。本文档为如何启用和使用这些脚本提供了一个简要的教程。
此教程基于QEMU/KVM虚拟机,但文中示例也适用于其他gdb stub。
环境配置要求
------------
- gdb 7.2+ (推荐版本: 7.4+) 且开启python支持 (通常发行版上都已支持)
设置
----
- 创建一个QEMU/KVM的linux虚拟机(详情请参考 www.linux-kvm.org 和 www.qemu.org )。
对于交叉开发,https://landley.net/aboriginal/bin 提供了一些镜像和工具链,
可以帮助搭建交叉开发环境。
- 编译内核时开启CONFIG_GDB_SCRIPTS,关闭CONFIG_DEBUG_INFO_REDUCED。
如果架构支持CONFIG_FRAME_POINTER,请保持开启。
- 在guest环境上安装该内核。如有必要,通过在内核command line中添加“nokaslr”来关闭KASLR。
此外,QEMU允许通过-kernel、-append、-initrd这些命令行选项直接启动内核。
但这通常仅在不依赖内核模块时才有效。有关此模式的更多详细信息,请参阅QEMU文档。
在这种情况下,如果架构支持KASLR,应该在禁用CONFIG_RANDOMIZE_BASE的情况下构建内核。
- 启用QEMU/KVM的gdb stub,可以通过如下方式实现
- 在VM启动时,通过在QEMU命令行中添加“-s”参数
- 在运行时通过从QEMU监视控制台发送“gdbserver”
- 切换到/path/to/linux-build(内核源码编译)目录
- 启动gdb:gdb vmlinux
注意:某些发行版可能会将gdb脚本的自动加载限制在已知的安全目录中。
如果gdb报告拒绝加载vmlinux-gdb.py(相关命令找不到),请将::
add-auto-load-safe-path /path/to/linux-build
添加到~/.gdbinit。更多详细信息,请参阅gdb帮助信息。
- 连接到已启动的guest环境::
(gdb) target remote :1234
使用Linux提供的gdb脚本的示例
----------------------------
- 加载模块(以及主内核)符号::
(gdb) lx-symbols
loading vmlinux
scanning for modules in /home/user/linux/build
loading @0xffffffffa0020000: /home/user/linux/build/net/netfilter/xt_tcpudp.ko
loading @0xffffffffa0016000: /home/user/linux/build/net/netfilter/xt_pkttype.ko
loading @0xffffffffa0002000: /home/user/linux/build/net/netfilter/xt_limit.ko
loading @0xffffffffa00ca000: /home/user/linux/build/net/packet/af_packet.ko
loading @0xffffffffa003c000: /home/user/linux/build/fs/fuse/fuse.ko
...
loading @0xffffffffa0000000: /home/user/linux/build/drivers/ata/ata_generic.ko
- 对一些尚未加载的模块中的函数函数设置断点,例如::
(gdb) b btrfs_init_sysfs
Function "btrfs_init_sysfs" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (btrfs_init_sysfs) pending.
- 继续执行::
(gdb) c
- 加载模块并且能观察到正在加载的符号以及断点命中::
loading @0xffffffffa0034000: /home/user/linux/build/lib/libcrc32c.ko
loading @0xffffffffa0050000: /home/user/linux/build/lib/lzo/lzo_compress.ko
loading @0xffffffffa006e000: /home/user/linux/build/lib/zlib_deflate/zlib_deflate.ko
loading @0xffffffffa01b1000: /home/user/linux/build/fs/btrfs/btrfs.ko
Breakpoint 1, btrfs_init_sysfs () at /home/user/linux/fs/btrfs/sysfs.c:36
36 btrfs_kset = kset_create_and_add("btrfs", NULL, fs_kobj);
- 查看内核的日志缓冲区::
(gdb) lx-dmesg
[ 0.000000] Initializing cgroup subsys cpuset
[ 0.000000] Initializing cgroup subsys cpu
[ 0.000000] Linux version 3.8.0-rc4-dbg+ (...
[ 0.000000] Command line: root=/dev/sda2 resume=/dev/sda1 vga=0x314
[ 0.000000] e820: BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
[ 0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
....
- 查看当前task struct结构体的字段(仅x86和arm64支持)::
(gdb) p $lx_current().pid
$1 = 4998
(gdb) p $lx_current().comm
$2 = "modprobe\000\000\000\000\000\000\000"
- 对当前或指定的CPU使用per-cpu函数::
(gdb) p $lx_per_cpu("runqueues").nr_running
$3 = 1
(gdb) p $lx_per_cpu("runqueues", 2).nr_running
$4 = 0
- 使用container_of查看更多hrtimers信息::
(gdb) set $next = $lx_per_cpu("hrtimer_bases").clock_base[0].active.next
(gdb) p *$container_of($next, "struct hrtimer", "node")
$5 = {
node = {
node = {
__rb_parent_color = 18446612133355256072,
rb_right = 0x0 <irq_stack_union>,
rb_left = 0x0 <irq_stack_union>
},
expires = {
tv64 = 1835268000000
}
},
_softexpires = {
tv64 = 1835268000000
},
function = 0xffffffff81078232 <tick_sched_timer>,
base = 0xffff88003fd0d6f0,
state = 1,
start_pid = 0,
start_site = 0xffffffff81055c1f <hrtimer_start_range_ns+20>,
start_comm = "swapper/2\000\000\000\000\000\000"
}
命令和辅助调试功能列表
----------------------
命令和辅助调试功能可能会随着时间的推移而改进,此文显示的是初始版本的部分示例::
(gdb) apropos lx
function lx_current -- Return current task
function lx_module -- Find module by name and return the module variable
function lx_per_cpu -- Return per-cpu variable
function lx_task_by_pid -- Find Linux task by PID and return the task_struct variable
function lx_thread_info -- Calculate Linux thread_info from task variable
lx-dmesg -- Print Linux kernel log buffer
lx-lsmod -- List currently loaded modules
lx-symbols -- (Re-)load symbols of Linux kernel and currently loaded modules
可以通过“help <command-name>”或“help function <function-name>”命令
获取指定命令或指定调试功能的更多详细信息。
......@@ -25,6 +25,7 @@ Documentation/translations/zh_CN/dev-tools/testing-overview.rst
sparse
gcov
kasan
gdb-kernel-debugging
Todolist:
......@@ -34,7 +35,6 @@ Todolist:
- kmemleak
- kcsan
- kfence
- gdb-kernel-debugging
- kgdb
- kselftest
- kunit/index
......@@ -120,24 +120,24 @@ dt_compat列表(如果你好奇,该列表定义在arch/arm/include/asm/mach/
表示什么。在Documentation/devicetree/bindings中添加兼容字符串的文档。
同样在ARM上,对于每个machine_desc,内核会查看是否有任何dt_compat列表条
目出现在兼容属性中。如果有,那么该机器_desc就是驱动该机器的候选者。在搜索
目出现在兼容属性中。如果有,那么该machine_desc就是驱动该机器的候选者。在搜索
了整个machine_descs表之后,setup_machine_fdt()根据每个machine_desc
在兼容属性中匹配的条目,返回 “最兼容” 的machine_desc。如果没有找到匹配
的machine_desc,那么它将返回NULL。
这个方案背后的原因是观察到,在大多数情况下,如果它们都使用相同的SoC或相同
系列的SoC,一个机器_desc可以支持大量的电路板。然而,不可避免地会有一些例
系列的SoC,一个machine_desc可以支持大量的电路板。然而,不可避免地会有一些例
外情况,即特定的板子需要特殊的设置代码,这在一般情况下是没有用的。特殊情况
可以通过在通用设置代码中明确检查有问题的板子来处理,但如果超过几个情况下,
这样做很快就会变得很难看和/或无法维护。
相反,兼容列表允许通用机器_desc通过在dt_compat列表中指定“不太兼容”的值
相反,兼容列表允许通用machine_desc通过在dt_compat列表中指定“不太兼容”的值
来提供对广泛的通用板的支持。在上面的例子中,通用板支持可以声称与“ti,ompa3”
或“ti,ompa3450”兼容。如果在最初的beagleboard上发现了一个bug,需要在
早期启动时使用特殊的变通代码,那么可以添加一个新的machine_desc,实现变通,
并且只在“ti,omap3-beagleboard”上匹配。
PowerPC使用了一个稍微不同的方案,它从每个机器_desc中调用.probe()钩子,
PowerPC使用了一个稍微不同的方案,它从每个machine_desc中调用.probe()钩子,
并使用第一个返回TRUE的钩子。然而,这种方法没有考虑到兼容列表的优先级,对于
新的架构支持可能应该避免。
......
......@@ -108,6 +108,7 @@ TODOList:
:maxdepth: 2
core-api/index
locking/index
accounting/index
cpu-freq/index
iio/index
......@@ -123,7 +124,6 @@ TODOList:
TODOList:
* driver-api/index
* locking/index
* block/index
* cdrom/index
* ide/index
......
.. SPDX-License-Identifier: GPL-2.0
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/locking/index.rst
:翻译:
唐艺舟 Tang Yizhou <tangyeechou@gmail.com>
==
==
.. toctree::
:maxdepth: 1
TODOList:
* locktypes
* lockdep-design
* lockstat
* locktorture
* mutex-design
* rt-mutex-design
* rt-mutex
* seqlock
* spinlocks
* ww-mutex-design
* preempt-locking
* pi-futex
* futex-requeue-pi
* hwspinlock
* percpu-rw-semaphore
* robust-futexes
* robust-futex-ABI
.. only:: subproject and html
Indices
=======
* :ref:`genindex`
.. SPDX-License-Identifier: GPL-2.0
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/locking/spinlocks.rst
:翻译:
唐艺舟 Tang Yizhou <tangyeechou@gmail.com>
==========
加锁的教训
==========
教训 1:自旋锁
==============
加锁最基本的原语是自旋锁(spinlock)::
static DEFINE_SPINLOCK(xxx_lock);
unsigned long flags;
spin_lock_irqsave(&xxx_lock, flags);
... 这里是临界区 ..
spin_unlock_irqrestore(&xxx_lock, flags);
上述代码总是安全的。自旋锁将在 _本地_ 禁用中断,但它本身将保证全局锁定。所以它
将保证在该锁保护的区域内只有一个控制线程。即使在单处理器(UP)下也能很好的工作,
所以代码 _不_ 需要担心UP还是SMP的问题:自旋锁在两种情况下都能正常工作。
注意!自旋锁对内存的潜在影响由下述文档进一步描述:
Documentation/memory-barriers.txt
(5) ACQUIRE operations.
(6) RELEASE operations.
上述代码通常非常简单(对大部分情况,你通常需要并且只希望有一个自旋锁——使用多个
自旋锁会使事情变得更复杂,甚至更慢,而且通常仅仅在你 **理解的** 序列有被拆分的
需求时才值得这么做:如果你不确定的话,请不惜一切代价避免这样做)。
这是关于自旋锁的唯一真正困难的部分:一旦你开始使用自旋锁,它们往往会扩展到你以前
可能没有注意到的领域,因为你必须确保自旋锁正确地保护共享数据结构 **每一处** 被
使用的地方。自旋锁是最容易被添加到完全独立于其它代码的地方(例如,没有人访问的
内部驱动数据结构)的。
注意!仅当你在跨CPU核访问时使用 **同一把** 自旋锁,对它的使用才是安全的。
这意味着所有访问共享变量的代码必须对它们想使用的自旋锁达成一致。
----
教训 2:读-写自旋锁
===================
如果你的数据访问有一个非常自然的模式,倾向于从共享变量中读取数据,读-写自旋锁
(rw_lock)有时是有用的。它们允许多个读者同时出现在同一个临界区,但是如果有人想
改变变量,它必须获得一个独占的写锁。
注意!读-写自旋锁比原始自旋锁需要更多的原子内存操作。除非读者的临界区很长,
否则你最好只使用原始自旋锁。
例程看起来和上面一样::
rwlock_t xxx_lock = __RW_LOCK_UNLOCKED(xxx_lock);
unsigned long flags;
read_lock_irqsave(&xxx_lock, flags);
.. 仅读取信息的临界区 ...
read_unlock_irqrestore(&xxx_lock, flags);
write_lock_irqsave(&xxx_lock, flags);
.. 读取和独占写信息 ...
write_unlock_irqrestore(&xxx_lock, flags);
上面这种锁对于复杂的数据结构如链表可能会有用,特别是在不改变链表的情况下搜索其中
的条目。读锁允许许多并发的读者。任何希望 **修改** 链表的代码将必须先获取写锁。
注意!RCU锁更适合遍历链表,但需要仔细注意设计细节(见Documentation/RCU/listRCU.rst)。
另外,你不能把读锁“升级”为写锁,所以如果你在 _任何_ 时候需要做任何修改
(即使你不是每次都这样做),你必须在一开始就获得写锁。
注意!我们正在努力消除大多数情况下的读-写自旋锁的使用,所以请不要在没有达成
共识的情况下增加一个新的(相反,请参阅Documentation/RCU/rcu.rst以获得完整
信息)。
----
教训 3:重新审视自旋锁
======================
上述的自旋锁原语绝不是唯一的。它们是最安全的,在所有情况下都能正常工作,但部分
**因为** 它们是安全的,它们也是相当慢的。它们比原本需要的更慢,因为它们必须要
禁用中断(在X86上只是一条指令,但却是一条昂贵的指令——而在其他体系结构上,情况
可能更糟)。
如果你有必须保护跨CPU访问的数据结构且你想使用自旋锁的场景,你有可能使用代价小的
自旋锁版本。当且仅当你知道某自旋锁永远不会在中断处理程序中使用,你可以使用非中断
的版本::
spin_lock(&lock);
...
spin_unlock(&lock);
(当然,也可以使用相应的读-写锁版本)。这种自旋锁将同样可以保证独占访问,而且
速度会快很多。如果你知道有关的数据只在“进程上下文”中被存取,即,不涉及中断,
这种自旋锁就有用了。
当这些版本的自旋锁涉及中断时,你不能使用的原因是会陷入死锁::
spin_lock(&lock);
...
<- 中断来临:
spin_lock(&lock);
一个中断试图对一个已经锁定的变量上锁。如果中断发生在另一个CPU上,不会有问题;
但如果中断发生在已经持有自旋锁的同一个CPU上,将 _会_ 有问题,因为该锁显然永远
不会被释放(因为中断正在等待该锁,而锁的持有者被中断打断,并且无法继续执行,
直到中断处理结束)。
(这也是自旋锁的中断版本只需要禁用 _本地_ 中断的原因——在发生于其它CPU的中断中
使用同一把自旋锁是没问题的,因为发生于其它CPU的中断不会打断已经持锁的CPU,所以
锁的持有者可以继续执行并最终释放锁)。
Linus
----
参考信息
========
对于动态初始化,使用spin_lock_init()或rwlock_init()是合适的::
spinlock_t xxx_lock;
rwlock_t xxx_rw_lock;
static int __init xxx_init(void)
{
spin_lock_init(&xxx_lock);
rwlock_init(&xxx_rw_lock);
...
}
module_init(xxx_init);
对于静态初始化,使用DEFINE_SPINLOCK() / DEFINE_RWLOCK()或
__SPIN_LOCK_UNLOCKED() / __RW_LOCK_UNLOCKED()是合适的。
......@@ -252,7 +252,7 @@ Linux-next 集成测试树
在将子系统树的更新合并到主线树之前,需要对它们进行集成测试。为此,存在一个
特殊的测试存储库,其中几乎每天都会提取所有子系统树:
https://git.kernel.org/p=linux/kernel/git/next/linux-next.git
https://git.kernel.org/?p=linux/kernel/git/next/linux-next.git
通过这种方式,Linux-next 对下一个合并阶段将进入主线内核的内容给出了一个概要
展望。非常欢冒险的测试者运行测试Linux-next。
......
......@@ -25,8 +25,10 @@ Linux调度器
sched-domains
sched-capacity
sched-energy
schedutil
sched-nice-design
sched-stats
sched-debug
TODOList:
......
.. SPDX-License-Identifier: GPL-2.0
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/scheduler/sched-debug.rst
:翻译:
唐艺舟 Tang Yizhou <tangyeechou@gmail.com>
=============
调度器debugfs
=============
用配置项CONFIG_SCHED_DEBUG=y启动内核后,将可以访问/sys/kernel/debug/sched
下的调度器专用调试文件。其中一些文件描述如下。
numa_balancing
==============
`numa_balancing` 目录用来存放控制非统一内存访问(NUMA)平衡特性的相关文件。
如果该特性导致系统负载太高,那么可以通过 `scan_period_min_ms, scan_delay_ms,
scan_period_max_ms, scan_size_mb` 文件控制NUMA缺页的内核采样速率。
scan_period_min_ms, scan_delay_ms, scan_period_max_ms, scan_size_mb
-------------------------------------------------------------------
自动NUMA平衡会扫描任务地址空间,检测页面是否被正确放置,或者数据是否应该被
迁移到任务正在运行的本地内存结点,此时需解映射页面。每个“扫描延迟”(scan delay)
时间之后,任务扫描其地址空间中下一批“扫描大小”(scan size)个页面。若抵达
内存地址空间末尾,扫描器将从头开始重新扫描。
结合来看,“扫描延迟”和“扫描大小”决定扫描速率。当“扫描延迟”减小时,扫描速率
增加。“扫描延迟”和每个任务的扫描速率都是自适应的,且依赖历史行为。如果页面被
正确放置,那么扫描延迟就会增加;否则扫描延迟就会减少。“扫描大小”不是自适应的,
“扫描大小”越大,扫描速率越高。
更高的扫描速率会产生更高的系统开销,因为必须捕获缺页异常,并且潜在地必须迁移
数据。然而,当扫描速率越高,若工作负载模式发生变化,任务的内存将越快地迁移到
本地结点,由于远程内存访问而产生的性能影响将降到最低。下面这些文件控制扫描延迟
的阈值和被扫描的页面数量。
``scan_period_min_ms`` 是扫描一个任务虚拟内存的最小时间,单位是毫秒。它有效地
控制了每个任务的最大扫描速率。
``scan_delay_ms`` 是一个任务初始化创建(fork)时,第一次使用的“扫描延迟”。
``scan_period_max_ms`` 是扫描一个任务虚拟内存的最大时间,单位是毫秒。它有效地
控制了每个任务的最小扫描速率。
``scan_size_mb`` 是一次特定的扫描中,要扫描多少兆字节(MB)对应的页面数。
.. SPDX-License-Identifier: GPL-2.0
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/scheduler/schedutil.rst
:翻译:
唐艺舟 Tang Yizhou <tangyeechou@gmail.com>
=========
Schedutil
=========
.. note::
本文所有内容都假设频率和工作算力之间存在线性关系。我们知道这是有瑕疵的,
但这是最可行的近似处理。
PELT(实体负载跟踪,Per Entity Load Tracking)
==============================================
通过PELT,我们跟踪了各种调度器实体的一些指标,从单个任务到任务组分片到CPU
运行队列。我们使用指数加权移动平均数(Exponentially Weighted Moving Average,
EWMA)作为其基础,每个周期(1024us)都会衰减,衰减速率满足y^32 = 0.5。
也就是说,最近的32ms贡献负载的一半,而历史上的其它时间则贡献另一半。
具体而言:
ewma_sum(u) := u_0 + u_1*y + u_2*y^2 + ...
ewma(u) = ewma_sum(u) / ewma_sum(1)
由于这本质上是一个无限几何级数的累加,结果是可组合的,即ewma(A) + ewma(B) = ewma(A+B)。
这个属性是关键,因为它提供了在任务迁移时重新组合平均数的能力。
请注意,阻塞态的任务仍然对累加值(任务组分片和CPU运行队列)有贡献,这反映了
它们在恢复运行后的预期贡献。
利用这一点,我们跟踪2个关键指标:“运行”和“可运行”。“运行”反映了一个调度实体
在CPU上花费的时间,而“可运行”反映了一个调度实体在运行队列中花费的时间。当只有
一个任务时,这两个指标是相同的,但一旦出现对CPU的争用,“运行”将减少以反映每个
任务在CPU上花费的时间,而“可运行”将增加以反映争用的激烈程度。
更多细节见:kernel/sched/pelt.c
频率 / CPU不变性
================
因为CPU频率在1GHz时利用率为50%和CPU频率在2GHz时利用率为50%是不一样的,同样
在小核上运行时利用率为50%和在大核上运行时利用率为50%是不一样的,我们允许架构
以两个比率来伸缩时间差,其中一个是动态电压频率升降(Dynamic Voltage and
Frequency Scaling,DVFS)比率,另一个是微架构比率。
对于简单的DVFS架构(软件有完全控制能力),我们可以很容易地计算该比率为::
f_cur
r_dvfs := -----
f_max
对于由硬件控制DVFS的更多动态系统,我们使用硬件计数器(Intel APERF/MPERF,
ARMv8.4-AMU)来计算这一比率。具体到Intel,我们使用::
APERF
f_cur := ----- * P0
MPERF
4C-turbo; 如果可用并且使能了turbo
f_max := { 1C-turbo; 如果使能了turbo
P0; 其它情况
f_cur
r_dvfs := min( 1, ----- )
f_max
我们选择4C turbo而不是1C turbo,以使其更持久性略微更强。
r_cpu被定义为当前CPU的最高性能水平与系统中任何其它CPU的最高性能水平的比率。
r_tot = r_dvfs * r_cpu
其结果是,上述“运行”和“可运行”的指标变成DVFS无关和CPU型号无关了。也就是说,
我们可以在CPU之间转移和比较它们。
更多细节见:
- kernel/sched/pelt.h:update_rq_clock_pelt()
- arch/x86/kernel/smpboot.c:"APERF/MPERF frequency ratio computation."
- Documentation/translations/zh_CN/scheduler/sched-capacity.rst:"1. CPU Capacity + 2. Task utilization"
UTIL_EST / UTIL_EST_FASTUP
==========================
由于周期性任务的平均数在睡眠时会衰减,而在运行时其预期利用率会和睡眠前相同,
因此它们在再次运行后会面临(DVFS)的上涨。
为了缓解这个问题,(一个默认使能的编译选项)UTIL_EST驱动一个无限脉冲响应
(Infinite Impulse Response,IIR)的EWMA,“运行”值在出队时是最高的。
另一个默认使能的编译选项UTIL_EST_FASTUP修改了IIR滤波器,使其允许立即增加,
仅在利用率下降时衰减。
进一步,运行队列的(可运行任务的)利用率之和由下式计算:
util_est := \Sum_t max( t_running, t_util_est_ewma )
更多细节见: kernel/sched/fair.c:util_est_dequeue()
UCLAMP
======
可以在每个CFS或RT任务上设置有效的u_min和u_max clamp值(译注:clamp可以理解
为类似滤波器的能力,它定义了有效取值范围的最大值和最小值);运行队列为所有正在
运行的任务保持这些clamp的最大聚合值。
更多细节见: include/uapi/linux/sched/types.h
Schedutil / DVFS
================
每当调度器的负载跟踪被更新时(任务唤醒、任务迁移、时间流逝),我们都会调用
schedutil来更新硬件DVFS状态。
其基础是CPU运行队列的“运行”指标,根据上面的内容,它是CPU的频率不变的利用率
估计值。由此我们计算出一个期望的频率,如下::
max( running, util_est ); 如果使能UTIL_EST
u_cfs := { running; 其它情况
clamp( u_cfs + u_rt, u_min, u_max ); 如果使能UCLAMP_TASK
u_clamp := { u_cfs + u_rt; 其它情况
u := u_clamp + u_irq + u_dl; [估计值。更多细节见源代码]
f_des := min( f_max, 1.25 u * f_max )
关于IO-wait的说明:当发生更新是因为任务从IO完成中唤醒时,我们提升上面的“u”。
然后,这个频率被用来选择一个P-state或OPP,或者直接混入一个发给硬件的CPPC式
请求。
关于截止期限调度器的说明: 截止期限任务(偶发任务模型)使我们能够计算出满足
工作负荷所需的硬f_min值。
因为这些回调函数是直接来自调度器的,所以DVFS的硬件交互应该是“快速”和非阻塞的。
在硬件交互缓慢和昂贵的时候,schedutil支持DVFS请求限速,不过会降低效率。
更多信息见: kernel/sched/cpufreq_schedutil.c
注意
====
- 在低负载场景下,DVFS是最相关的,“运行”的值将密切反映利用率。
- 在负载饱和的场景下,任务迁移会导致一些瞬时性的使用率下降。假设我们有一个
CPU,有4个任务占用导致其饱和,接下来我们将一个任务迁移到另一个空闲CPU上,
旧的CPU的“运行”值将为0.75,而新的CPU将获得0.25。这是不可避免的,而且随着
时间流逝将自动修正。另注,由于没有空闲时间,我们还能保证f_max值吗?
- 上述大部分内容是关于避免DVFS下滑,以及独立的DVFS域发生负载迁移时不得不
重新学习/提升频率。
......@@ -77,7 +77,7 @@ DAMON目前为物理和虚拟地址空间提供了基元的实现。下面两个
========================
下面四个部分分别描述了DAMON的核心机制和五个监测属性,即 ``采样间隔`` 、 ``聚集间隔`` 、
``区域更新间隔`` 、 ``最小区域数`` 和 ``最大区域数`` 。
``更新间隔`` 、 ``最小区域数`` 和 ``最大区域数`` 。
访问频率监测
......@@ -135,5 +135,6 @@ DAMON的输出显示了在给定的时间内哪些页面的访问频率是多少
监测目标地址范围可以动态改变。例如,虚拟内存可以动态地被映射和解映射。物理内存可以被
热插拔。
由于在某些情况下变化可能相当频繁,DAMON检查动态内存映射的变化,并仅在用户指定的时间
间隔( ``区域更新间隔`` )内将其应用于抽象的目标区域。
由于在某些情况下变化可能相当频繁,DAMON允许监控操作检查动态变化,包括内存映射变化,
并仅在用户指定的时间间隔( ``更新间隔`` )中的每个时间段,将其应用于监控操作相关的
数据结构,如抽象的监控目标内存区。
\ No newline at end of file
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
:Original: Documentation/vm/hwpoison.rst
:翻译:
司延腾 Yanteng Si <siyanteng@loongson.cn>
:校译:
========
hwpoison
========
什么是hwpoison?
===============
即将推出的英特尔CPU支持从一些内存错误中恢复( ``MCA恢复`` )。这需要操作系统宣布
一个页面"poisoned",杀死与之相关的进程,并避免在未来使用它。
这个补丁包在虚拟机中实现了必要的(编程)框架。
引用概述中的评论::
高级机器的检查与处理。处理方法是损坏的页面被硬件报告,通常是由于2位ECC内
存或高速缓存故障。
这主要是针对在后台检测到的损坏的页面。当当前的CPU试图访问它时,当前运行的进程
可以直接被杀死。因为还没有访问损坏的页面, 如果错误由于某种原因不能被处理,就可
以安全地忽略它. 而不是用另外一个机器检查去处理它。
处理不同状态的页面缓存页。这里棘手的部分是,相对于其他虚拟内存用户, 我们可以异
步访问任何页面。因为内存故障可能随时随地发生,可能违反了他们的一些假设。这就是
为什么这段代码必须非常小心。一般来说,它试图使用正常的锁规则,如获得标准锁,即使
这意味着错误处理可能需要很长的时间。
这里的一些操作有点低效,并且具有非线性的算法复杂性,因为数据结构没有针对这种情
况进行优化。特别是从vma到进程的映射就是这种情况。由于这种情况大概率是罕见的,所
以我们希望我们可以摆脱这种情况。
该代码由mm/memory-failure.c中的高级处理程序、一个新的页面poison位和虚拟机中的
各种检查组成,用来处理poison的页面。
现在主要目标是KVM客户机,但它适用于所有类型的应用程序。支持KVM需要最近的qemu-kvm
版本。
对于KVM的使用,需要一个新的信号类型,这样KVM就可以用适当的地址将机器检查注入到客户
机中。这在理论上也允许其他应用程序处理内存故障。我们的期望是,所有的应用程序都不要这
样做,但一些非常专业的应用程序可能会这样做。
故障恢复模式
============
有两种(实际上是三种)模式的内存故障恢复可以在。
vm.memory_failure_recovery sysctl 置零:
所有的内存故障都会导致panic。请不要尝试恢复。
早期处理
(可以在全局和每个进程中控制) 一旦检测到错误,立即向应用程序发送SIGBUS这允许
应用程序以温和的方式处理内存错误(例如,放弃受影响的对象) 这是KVM qemu使用的
模式。
推迟处理
当应用程序运行到损坏的页面时,发送SIGBUS。这对不知道内存错误的应用程序来说是
最好的,默认情况下注意一些页面总是被当作late kill处理。
用户控制
========
vm.memory_failure_recovery
参阅 sysctl.txt
vm.memory_failure_early_kill
全局启用early kill
PR_MCE_KILL
设置early/late kill mode/revert 到系统默认值。
arg1: PR_MCE_KILL_CLEAR:
恢复到系统默认值
arg1: PR_MCE_KILL_SET:
arg2定义了线程特定模式
PR_MCE_KILL_EARLY:
Early kill
PR_MCE_KILL_LATE:
Late kill
PR_MCE_KILL_DEFAULT
使用系统全局默认值
注意,如果你想有一个专门的线程代表进程处理SIGBUS(BUS_MCEERR_AO),你应该在
指定线程上调用prctl(PR_MCE_KILL_EARLY)。否则,SIGBUS将被发送到主线程。
PR_MCE_KILL_GET
返回当前模式
测试
====
* madvise(MADV_HWPOISON, ....) (as root) - 在测试过程中Poison一个页面
* 通过debugfs ``/sys/kernel/debug/hwpoison/`` hwpoison-inject模块
corrupt-pfn
在PFN处注入hwpoison故障,并echoed到这个文件。这做了一些早期过滤,以避
免在测试套件中损坏非预期页面。
unpoison-pfn
在PFN的Software-unpoison页面对应到这个文件。这样,一个页面可以再次被
复用。这只对Linux注入的故障起作用,对真正的内存故障不起作用。
注意这些注入接口并不稳定,可能会在不同的内核版本中发生变化
corrupt-filter-dev-major, corrupt-filter-dev-minor
只处理与块设备major/minor定义的文件系统相关的页面的内存故障。-1U是通
配符值。这应该只用于人工注入的测试。
corrupt-filter-memcg
限制注入到memgroup拥有的页面。由memcg的inode号指定。
Example::
mkdir /sys/fs/cgroup/mem/hwpoison
usemem -m 100 -s 1000 &
echo `jobs -p` > /sys/fs/cgroup/mem/hwpoison/tasks
memcg_ino=$(ls -id /sys/fs/cgroup/mem/hwpoison | cut -f1 -d' ')
echo $memcg_ino > /debug/hwpoison/corrupt-filter-memcg
page-types -p `pidof init` --hwpoison # shall do nothing
page-types -p `pidof usemem` --hwpoison # poison its pages
corrupt-filter-flags-mask, corrupt-filter-flags-value
当指定时,只有在((page_flags & mask) == value)的情况下才会poison页面。
这允许对许多种类的页面进行压力测试。page_flags与/proc/kpageflags中的相
同。这些标志位在include/linux/kernel-page-flags.h中定义,并在
Documentation/admin-guide/mm/pagemap.rst中记录。
* 架构特定的MCE注入器
x86 有 mce-inject, mce-test
在mce-test中的一些便携式hwpoison测试程序,见下文。
引用
====
http://halobates.de/mce-lc09-2.pdf
09年LinuxCon的概述演讲
git://git.kernel.org/pub/scm/utils/cpu/mce/mce-test.git
测试套件(在tsrc中的hwpoison特定可移植测试)。
git://git.kernel.org/pub/scm/utils/cpu/mce/mce-inject.git
x86特定的注入器
限制
====
- 不是所有的页面类型都被支持,而且永远不会。大多数内核内部对象不能被恢
复,目前只有LRU页。
---
Andi Kleen, 2009年10月
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
.. SPDX-License-Identifier: GPL-2.0
===========
Boot Memory
===========
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment