Commit c76c2230 authored by David S. Miller's avatar David S. Miller

Merge branch 'net-ReST-convert'

Mauro Carvalho Chehab says:

====================
net: manually convert files to ReST format - part 1

There are very few documents upstream that aren't converted upstream.

This series convert part of the networking text files into ReST.
It is part of a bigger set of patches, which were split on parts,
in order to make reviewing task easier.

The full series (including those ones) are at:

	https://git.linuxtv.org/mchehab/experimental.git/log/?h=net-docs

And the documents, converted to HTML via the building system
are at:

	https://www.infradead.org/~mchehab/kernel_docs/networking/
====================
Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
parents 790ab249 b9dd2bea
...@@ -356,7 +356,7 @@ ...@@ -356,7 +356,7 @@
shot down by NMI shot down by NMI
autoconf= [IPV6] autoconf= [IPV6]
See Documentation/networking/ipv6.txt. See Documentation/networking/ipv6.rst.
show_lapic= [APIC,X86] Advanced Programmable Interrupt Controller show_lapic= [APIC,X86] Advanced Programmable Interrupt Controller
Limit apic dumping. The parameter defines the maximal Limit apic dumping. The parameter defines the maximal
...@@ -831,7 +831,7 @@ ...@@ -831,7 +831,7 @@
decnet.addr= [HW,NET] decnet.addr= [HW,NET]
Format: <area>[,<node>] Format: <area>[,<node>]
See also Documentation/networking/decnet.txt. See also Documentation/networking/decnet.rst.
default_hugepagesz= default_hugepagesz=
[same as hugepagesz=] The size of the default [same as hugepagesz=] The size of the default
...@@ -872,7 +872,7 @@ ...@@ -872,7 +872,7 @@
miss to occur. miss to occur.
disable= [IPV6] disable= [IPV6]
See Documentation/networking/ipv6.txt. See Documentation/networking/ipv6.rst.
hardened_usercopy= hardened_usercopy=
[KNL] Under CONFIG_HARDENED_USERCOPY, whether [KNL] Under CONFIG_HARDENED_USERCOPY, whether
...@@ -912,7 +912,7 @@ ...@@ -912,7 +912,7 @@
to workaround buggy firmware. to workaround buggy firmware.
disable_ipv6= [IPV6] disable_ipv6= [IPV6]
See Documentation/networking/ipv6.txt. See Documentation/networking/ipv6.rst.
disable_mtrr_cleanup [X86] disable_mtrr_cleanup [X86]
The kernel tries to adjust MTRR layout from continuous The kernel tries to adjust MTRR layout from continuous
...@@ -4910,7 +4910,7 @@ ...@@ -4910,7 +4910,7 @@
Set the number of tcp_metrics_hash slots. Set the number of tcp_metrics_hash slots.
Default value is 8192 or 16384 depending on total Default value is 8192 or 16384 depending on total
ram pages. This is used to specify the TCP metrics ram pages. This is used to specify the TCP metrics
cache size. See Documentation/networking/ip-sysctl.txt cache size. See Documentation/networking/ip-sysctl.rst
"tcp_no_metrics_save" section for more details. "tcp_no_metrics_save" section for more details.
tdfx= [HW,DRM] tdfx= [HW,DRM]
......
...@@ -353,8 +353,8 @@ socket's buffer. It will not take effect unless PF_UNIX flag is specified. ...@@ -353,8 +353,8 @@ socket's buffer. It will not take effect unless PF_UNIX flag is specified.
3. /proc/sys/net/ipv4 - IPV4 settings 3. /proc/sys/net/ipv4 - IPV4 settings
------------------------------------- -------------------------------------
Please see: Documentation/networking/ip-sysctl.txt and ipvs-sysctl.txt for Please see: Documentation/networking/ip-sysctl.rst and
descriptions of these entries. Documentation/admin-guide/sysctl/net.rst for descriptions of these entries.
4. Appletalk 4. Appletalk
......
...@@ -7,7 +7,7 @@ Filter) facility, with a focus on the extended BPF version (eBPF). ...@@ -7,7 +7,7 @@ Filter) facility, with a focus on the extended BPF version (eBPF).
This kernel side documentation is still work in progress. The main This kernel side documentation is still work in progress. The main
textual documentation is (for historical reasons) described in textual documentation is (for historical reasons) described in
`Documentation/networking/filter.txt`_, which describe both classical `Documentation/networking/filter.rst`_, which describe both classical
and extended BPF instruction-set. and extended BPF instruction-set.
The Cilium project also maintains a `BPF and XDP Reference Guide`_ The Cilium project also maintains a `BPF and XDP Reference Guide`_
that goes into great technical depth about the BPF Architecture. that goes into great technical depth about the BPF Architecture.
...@@ -59,7 +59,7 @@ Testing and debugging BPF ...@@ -59,7 +59,7 @@ Testing and debugging BPF
.. Links: .. Links:
.. _Documentation/networking/filter.txt: ../networking/filter.txt .. _Documentation/networking/filter.rst: ../networking/filter.txt
.. _man-pages: https://www.kernel.org/doc/man-pages/ .. _man-pages: https://www.kernel.org/doc/man-pages/
.. _bpf(2): http://man7.org/linux/man-pages/man2/bpf.2.html .. _bpf(2): http://man7.org/linux/man-pages/man2/bpf.2.html
.. _BPF and XDP Reference Guide: http://cilium.readthedocs.io/en/latest/bpf/ .. _BPF and XDP Reference Guide: http://cilium.readthedocs.io/en/latest/bpf/
.. SPDX-License-Identifier: GPL-2.0
==============
6pack Protocol
==============
This is the 6pack-mini-HOWTO, written by This is the 6pack-mini-HOWTO, written by
Andreas Könsgen DG3KQ Andreas Könsgen DG3KQ
Internet: ajk@comnets.uni-bremen.de
AMPR-net: dg3kq@db0pra.ampr.org :Internet: ajk@comnets.uni-bremen.de
AX.25: dg3kq@db0ach.#nrw.deu.eu :AMPR-net: dg3kq@db0pra.ampr.org
:AX.25: dg3kq@db0ach.#nrw.deu.eu
Last update: April 7, 1998 Last update: April 7, 1998
1. What is 6pack, and what are the advantages to KISS? 1. What is 6pack, and what are the advantages to KISS?
======================================================
6pack is a transmission protocol for data exchange between the PC and 6pack is a transmission protocol for data exchange between the PC and
the TNC over a serial line. It can be used as an alternative to KISS. the TNC over a serial line. It can be used as an alternative to KISS.
6pack has two major advantages: 6pack has two major advantages:
- The PC is given full control over the radio - The PC is given full control over the radio
channel. Special control data is exchanged between the PC and the TNC so channel. Special control data is exchanged between the PC and the TNC so
that the PC knows at any time if the TNC is receiving data, if a TNC that the PC knows at any time if the TNC is receiving data, if a TNC
buffer underrun or overrun has occurred, if the PTT is buffer underrun or overrun has occurred, if the PTT is
set and so on. This control data is processed at a higher priority than set and so on. This control data is processed at a higher priority than
normal data, so a data stream can be interrupted at any time to issue an normal data, so a data stream can be interrupted at any time to issue an
important event. This helps to improve the channel access and timing important event. This helps to improve the channel access and timing
algorithms as everything is computed in the PC. It would even be possible algorithms as everything is computed in the PC. It would even be possible
to experiment with something completely different from the known CSMA and to experiment with something completely different from the known CSMA and
DAMA channel access methods. DAMA channel access methods.
This kind of real-time control is especially important to supply several This kind of real-time control is especially important to supply several
TNCs that are connected between each other and the PC by a daisy chain TNCs that are connected between each other and the PC by a daisy chain
...@@ -36,6 +45,7 @@ More details about 6pack are described in the file 6pack.ps that is located ...@@ -36,6 +45,7 @@ More details about 6pack are described in the file 6pack.ps that is located
in the doc directory of the AX.25 utilities package. in the doc directory of the AX.25 utilities package.
2. Who has developed the 6pack protocol? 2. Who has developed the 6pack protocol?
========================================
The 6pack protocol has been developed by Ekki Plicht DF4OR, Henning Rech The 6pack protocol has been developed by Ekki Plicht DF4OR, Henning Rech
DF9IC and Gunter Jost DK7WJ. A driver for 6pack, written by Gunter Jost and DF9IC and Gunter Jost DK7WJ. A driver for 6pack, written by Gunter Jost and
...@@ -44,12 +54,14 @@ They have also written a firmware for TNCs to perform the 6pack ...@@ -44,12 +54,14 @@ They have also written a firmware for TNCs to perform the 6pack
protocol (see section 4 below). protocol (see section 4 below).
3. Where can I get the latest version of 6pack for LinuX? 3. Where can I get the latest version of 6pack for LinuX?
=========================================================
At the moment, the 6pack stuff can obtained via anonymous ftp from At the moment, the 6pack stuff can obtained via anonymous ftp from
db0bm.automation.fh-aachen.de. In the directory /incoming/dg3kq, db0bm.automation.fh-aachen.de. In the directory /incoming/dg3kq,
there is a file named 6pack.tgz. there is a file named 6pack.tgz.
4. Preparing the TNC for 6pack operation 4. Preparing the TNC for 6pack operation
========================================
To be able to use 6pack, a special firmware for the TNC is needed. The EPROM To be able to use 6pack, a special firmware for the TNC is needed. The EPROM
of a newly bought TNC does not contain 6pack, so you will have to of a newly bought TNC does not contain 6pack, so you will have to
...@@ -75,12 +87,14 @@ and the status LED are lit for about a second if the firmware initialises ...@@ -75,12 +87,14 @@ and the status LED are lit for about a second if the firmware initialises
the TNC correctly. the TNC correctly.
5. Building and installing the 6pack driver 5. Building and installing the 6pack driver
===========================================
The driver has been tested with kernel version 2.1.90. Use with older The driver has been tested with kernel version 2.1.90. Use with older
kernels may lead to a compilation error because the interface to a kernel kernels may lead to a compilation error because the interface to a kernel
function has been changed in the 2.1.8x kernels. function has been changed in the 2.1.8x kernels.
How to turn on 6pack support: How to turn on 6pack support:
=============================
- In the linux kernel configuration program, select the code maturity level - In the linux kernel configuration program, select the code maturity level
options menu and turn on the prompting for development drivers. options menu and turn on the prompting for development drivers.
...@@ -94,27 +108,28 @@ To use the driver, the kissattach program delivered with the AX.25 utilities ...@@ -94,27 +108,28 @@ To use the driver, the kissattach program delivered with the AX.25 utilities
has to be modified. has to be modified.
- Do a cd to the directory that holds the kissattach sources. Edit the - Do a cd to the directory that holds the kissattach sources. Edit the
kissattach.c file. At the top, insert the following lines: kissattach.c file. At the top, insert the following lines::
#ifndef N_6PACK
#define N_6PACK (N_AX25+1)
#endif
#ifndef N_6PACK Then find the line:
#define N_6PACK (N_AX25+1)
#endif
Then find the line int disc = N_AX25;
int disc = N_AX25;
and replace N_AX25 by N_6PACK. and replace N_AX25 by N_6PACK.
- Recompile kissattach. Rename it to spattach to avoid confusions. - Recompile kissattach. Rename it to spattach to avoid confusions.
Installing the driver: Installing the driver:
----------------------
- Do an insmod 6pack. Look at your /var/log/messages file to check if the - Do an insmod 6pack. Look at your /var/log/messages file to check if the
module has printed its initialization message. module has printed its initialization message.
- Do a spattach as you would launch kissattach when starting a KISS port. - Do a spattach as you would launch kissattach when starting a KISS port.
Check if the kernel prints the message '6pack: TNC found'. Check if the kernel prints the message '6pack: TNC found'.
- From here, everything should work as if you were setting up a KISS port. - From here, everything should work as if you were setting up a KISS port.
The only difference is that the network device that represents The only difference is that the network device that represents
...@@ -138,6 +153,7 @@ from the PC to the TNC over the serial line, the status LED if data is ...@@ -138,6 +153,7 @@ from the PC to the TNC over the serial line, the status LED if data is
sent to the PC. sent to the PC.
6. Known problems 6. Known problems
=================
When testing the driver with 2.0.3x kernels and When testing the driver with 2.0.3x kernels and
operating with data rates on the radio channel of 9600 Baud or higher, operating with data rates on the radio channel of 9600 Baud or higher,
......
Altera Triple-Speed Ethernet MAC driver .. SPDX-License-Identifier: GPL-2.0
Copyright (C) 2008-2014 Altera Corporation .. include:: <isonum.txt>
=======================================
Altera Triple-Speed Ethernet MAC driver
=======================================
Copyright |copy| 2008-2014 Altera Corporation
This is the driver for the Altera Triple-Speed Ethernet (TSE) controllers This is the driver for the Altera Triple-Speed Ethernet (TSE) controllers
using the SGDMA and MSGDMA soft DMA IP components. The driver uses the using the SGDMA and MSGDMA soft DMA IP components. The driver uses the
...@@ -46,23 +52,33 @@ Jumbo frames are not supported at this time. ...@@ -46,23 +52,33 @@ Jumbo frames are not supported at this time.
The driver limits PHY operations to 10/100Mbps, and has not yet been fully The driver limits PHY operations to 10/100Mbps, and has not yet been fully
tested for 1Gbps. This support will be added in a future maintenance update. tested for 1Gbps. This support will be added in a future maintenance update.
1) Kernel Configuration 1. Kernel Configuration
=======================
The kernel configuration option is ALTERA_TSE: The kernel configuration option is ALTERA_TSE:
Device Drivers ---> Network device support ---> Ethernet driver support ---> Device Drivers ---> Network device support ---> Ethernet driver support --->
Altera Triple-Speed Ethernet MAC support (ALTERA_TSE) Altera Triple-Speed Ethernet MAC support (ALTERA_TSE)
2) Driver parameters list: 2. Driver parameters list
debug: message level (0: no output, 16: all); =========================
dma_rx_num: Number of descriptors in the RX list (default is 64);
dma_tx_num: Number of descriptors in the TX list (default is 64). - debug: message level (0: no output, 16: all);
- dma_rx_num: Number of descriptors in the RX list (default is 64);
- dma_tx_num: Number of descriptors in the TX list (default is 64).
3. Command line options
=======================
Driver parameters can be also passed in command line by using::
3) Command line options
Driver parameters can be also passed in command line by using:
altera_tse=dma_rx_num:128,dma_tx_num:512 altera_tse=dma_rx_num:128,dma_tx_num:512
4) Driver information and notes 4. Driver information and notes
===============================
4.1) Transmit process 4.1. Transmit process
---------------------
When the driver's transmit routine is called by the kernel, it sets up a When the driver's transmit routine is called by the kernel, it sets up a
transmit descriptor by calling the underlying DMA transmit routine (SGDMA or transmit descriptor by calling the underlying DMA transmit routine (SGDMA or
MSGDMA), and initiates a transmit operation. Once the transmit is complete, an MSGDMA), and initiates a transmit operation. Once the transmit is complete, an
...@@ -70,7 +86,8 @@ interrupt is driven by the transmit DMA logic. The driver handles the transmit ...@@ -70,7 +86,8 @@ interrupt is driven by the transmit DMA logic. The driver handles the transmit
completion in the context of the interrupt handling chain by recycling completion in the context of the interrupt handling chain by recycling
resource required to send and track the requested transmit operation. resource required to send and track the requested transmit operation.
4.2) Receive process 4.2. Receive process
--------------------
The driver will post receive buffers to the receive DMA logic during driver The driver will post receive buffers to the receive DMA logic during driver
initialization. Receive buffers may or may not be queued depending upon the initialization. Receive buffers may or may not be queued depending upon the
underlying DMA logic (MSGDMA is able queue receive buffers, SGDMA is not able underlying DMA logic (MSGDMA is able queue receive buffers, SGDMA is not able
...@@ -79,34 +96,39 @@ received, the DMA logic generates an interrupt. The driver handles a receive ...@@ -79,34 +96,39 @@ received, the DMA logic generates an interrupt. The driver handles a receive
interrupt by obtaining the DMA receive logic status, reaping receive interrupt by obtaining the DMA receive logic status, reaping receive
completions until no more receive completions are available. completions until no more receive completions are available.
4.3) Interrupt Mitigation 4.3. Interrupt Mitigation
-------------------------
The driver is able to mitigate the number of its DMA interrupts The driver is able to mitigate the number of its DMA interrupts
using NAPI for receive operations. Interrupt mitigation is not yet supported using NAPI for receive operations. Interrupt mitigation is not yet supported
for transmit operations, but will be added in a future maintenance release. for transmit operations, but will be added in a future maintenance release.
4.4) Ethtool support 4.4) Ethtool support
--------------------
Ethtool is supported. Driver statistics and internal errors can be taken using: Ethtool is supported. Driver statistics and internal errors can be taken using:
ethtool -S ethX command. It is possible to dump registers etc. ethtool -S ethX command. It is possible to dump registers etc.
4.5) PHY Support 4.5) PHY Support
----------------
The driver is compatible with PAL to work with PHY and GPHY devices. The driver is compatible with PAL to work with PHY and GPHY devices.
4.7) List of source files: 4.7) List of source files:
o Kconfig --------------------------
o Makefile - Kconfig
o altera_tse_main.c: main network device driver - Makefile
o altera_tse_ethtool.c: ethtool support - altera_tse_main.c: main network device driver
o altera_tse.h: private driver structure and common definitions - altera_tse_ethtool.c: ethtool support
o altera_msgdma.h: MSGDMA implementation function definitions - altera_tse.h: private driver structure and common definitions
o altera_sgdma.h: SGDMA implementation function definitions - altera_msgdma.h: MSGDMA implementation function definitions
o altera_msgdma.c: MSGDMA implementation - altera_sgdma.h: SGDMA implementation function definitions
o altera_sgdma.c: SGDMA implementation - altera_msgdma.c: MSGDMA implementation
o altera_sgdmahw.h: SGDMA register and descriptor definitions - altera_sgdma.c: SGDMA implementation
o altera_msgdmahw.h: MSGDMA register and descriptor definitions - altera_sgdmahw.h: SGDMA register and descriptor definitions
o altera_utils.c: Driver utility functions - altera_msgdmahw.h: MSGDMA register and descriptor definitions
o altera_utils.h: Driver utility function definitions - altera_utils.c: Driver utility functions
- altera_utils.h: Driver utility function definitions
5) Debug Information
5. Debug Information
====================
The driver exports debug information such as internal statistics, The driver exports debug information such as internal statistics,
debug information, MAC and DMA registers etc. debug information, MAC and DMA registers etc.
...@@ -118,17 +140,18 @@ or sees the MAC registers: e.g. using: ethtool -d ethX ...@@ -118,17 +140,18 @@ or sees the MAC registers: e.g. using: ethtool -d ethX
The developer can also use the "debug" module parameter to get The developer can also use the "debug" module parameter to get
further debug information. further debug information.
6) Statistics Support 6. Statistics Support
=====================
The controller and driver support a mix of IEEE standard defined statistics, The controller and driver support a mix of IEEE standard defined statistics,
RFC defined statistics, and driver or Altera defined statistics. The four RFC defined statistics, and driver or Altera defined statistics. The four
specifications containing the standard definitions for these statistics are specifications containing the standard definitions for these statistics are
as follows: as follows:
o IEEE 802.3-2012 - IEEE Standard for Ethernet. - IEEE 802.3-2012 - IEEE Standard for Ethernet.
o RFC 2863 found at http://www.rfc-editor.org/rfc/rfc2863.txt. - RFC 2863 found at http://www.rfc-editor.org/rfc/rfc2863.txt.
o RFC 2819 found at http://www.rfc-editor.org/rfc/rfc2819.txt. - RFC 2819 found at http://www.rfc-editor.org/rfc/rfc2819.txt.
o Altera Triple Speed Ethernet User Guide, found at http://www.altera.com - Altera Triple Speed Ethernet User Guide, found at http://www.altera.com
The statistics supported by the TSE and the device driver are as follows: The statistics supported by the TSE and the device driver are as follows:
......
---------------------------------------------------------------------------- .. SPDX-License-Identifier: GPL-2.0
NOTE: See also arcnet-hardware.txt in this directory for jumper-setting
and cabling information if you're like many of us and didn't happen to get a ======
manual with your ARCnet card. ARCnet
---------------------------------------------------------------------------- ======
.. note::
See also arcnet-hardware.txt in this directory for jumper-setting
and cabling information if you're like many of us and didn't happen to get a
manual with your ARCnet card.
Since no one seems to listen to me otherwise, perhaps a poem will get your Since no one seems to listen to me otherwise, perhaps a poem will get your
attention: attention::
This driver's getting fat and beefy, This driver's getting fat and beefy,
But my cat is still named Fifi. But my cat is still named Fifi.
...@@ -24,28 +31,21 @@ Come on, be a sport! Send me a success report! ...@@ -24,28 +31,21 @@ Come on, be a sport! Send me a success report!
(hey, that was even better than my original poem... this is getting bad!) (hey, that was even better than my original poem... this is getting bad!)
-------- .. warning::
WARNING:
--------
If you don't e-mail me about your success/failure soon, I may be forced to
start SINGING. And we don't want that, do we?
(You know, it might be argued that I'm pushing this point a little too much. If you don't e-mail me about your success/failure soon, I may be forced to
If you think so, why not flame me in a quick little e-mail? Please also start SINGING. And we don't want that, do we?
include the type of card(s) you're using, software, size of network, and
whether it's working or not.)
My e-mail address is: apenwarr@worldvisions.ca (You know, it might be argued that I'm pushing this point a little too much.
If you think so, why not flame me in a quick little e-mail? Please also
include the type of card(s) you're using, software, size of network, and
whether it's working or not.)
My e-mail address is: apenwarr@worldvisions.ca
---------------------------------------------------------------------------
These are the ARCnet drivers for Linux. These are the ARCnet drivers for Linux.
This new release (2.91) has been put together by David Woodhouse
This new release (2.91) has been put together by David Woodhouse
<dwmw2@infradead.org>, in an attempt to tidy up the driver after adding support <dwmw2@infradead.org>, in an attempt to tidy up the driver after adding support
for yet another chipset. Now the generic support has been separated from the for yet another chipset. Now the generic support has been separated from the
individual chipset drivers, and the source files aren't quite so packed with individual chipset drivers, and the source files aren't quite so packed with
...@@ -62,12 +62,13 @@ included and seems to be working fine! ...@@ -62,12 +62,13 @@ included and seems to be working fine!
Where do I discuss these drivers? Where do I discuss these drivers?
--------------------------------- ---------------------------------
Tomasz has been so kind as to set up a new and improved mailing list. Tomasz has been so kind as to set up a new and improved mailing list.
Subscribe by sending a message with the BODY "subscribe linux-arcnet YOUR Subscribe by sending a message with the BODY "subscribe linux-arcnet YOUR
REAL NAME" to listserv@tichy.ch.uj.edu.pl. Then, to submit messages to the REAL NAME" to listserv@tichy.ch.uj.edu.pl. Then, to submit messages to the
list, mail to linux-arcnet@tichy.ch.uj.edu.pl. list, mail to linux-arcnet@tichy.ch.uj.edu.pl.
There are archives of the mailing list at: There are archives of the mailing list at:
http://epistolary.org/mailman/listinfo.cgi/arcnet http://epistolary.org/mailman/listinfo.cgi/arcnet
The people on linux-net@vger.kernel.org (now defunct, replaced by The people on linux-net@vger.kernel.org (now defunct, replaced by
...@@ -80,17 +81,20 @@ Other Drivers and Info ...@@ -80,17 +81,20 @@ Other Drivers and Info
---------------------- ----------------------
You can try my ARCNET page on the World Wide Web at: You can try my ARCNET page on the World Wide Web at:
http://www.qis.net/~jschmitz/arcnet/
http://www.qis.net/~jschmitz/arcnet/
Also, SMC (one of the companies that makes ARCnet cards) has a WWW site you Also, SMC (one of the companies that makes ARCnet cards) has a WWW site you
might be interested in, which includes several drivers for various cards might be interested in, which includes several drivers for various cards
including ARCnet. Try: including ARCnet. Try:
http://www.smc.com/ http://www.smc.com/
Performance Technologies makes various network software that supports Performance Technologies makes various network software that supports
ARCnet: ARCnet:
http://www.perftech.com/ or ftp to ftp.perftech.com. http://www.perftech.com/ or ftp to ftp.perftech.com.
Novell makes a networking stack for DOS which includes ARCnet drivers. Try Novell makes a networking stack for DOS which includes ARCnet drivers. Try
FTPing to ftp.novell.com. FTPing to ftp.novell.com.
...@@ -99,19 +103,20 @@ one you'll want to use with ARCnet cards) from ...@@ -99,19 +103,20 @@ one you'll want to use with ARCnet cards) from
oak.oakland.edu:/simtel/msdos/pktdrvr. It won't work perfectly on a 386+ oak.oakland.edu:/simtel/msdos/pktdrvr. It won't work perfectly on a 386+
without patches, though, and also doesn't like several cards. Fixed without patches, though, and also doesn't like several cards. Fixed
versions are available on my WWW page, or via e-mail if you don't have WWW versions are available on my WWW page, or via e-mail if you don't have WWW
access. access.
Installing the Driver Installing the Driver
--------------------- ---------------------
All you will need to do in order to install the driver is: All you will need to do in order to install the driver is::
make config make config
(be sure to choose ARCnet in the network devices (be sure to choose ARCnet in the network devices
and at least one chipset driver.) and at least one chipset driver.)
make clean make clean
make zImage make zImage
If you obtained this ARCnet package as an upgrade to the ARCnet driver in If you obtained this ARCnet package as an upgrade to the ARCnet driver in
your current kernel, you will need to first copy arcnet.c over the one in your current kernel, you will need to first copy arcnet.c over the one in
the linux/drivers/net directory. the linux/drivers/net directory.
...@@ -125,10 +130,12 @@ There are four chipset options: ...@@ -125,10 +130,12 @@ There are four chipset options:
This is the normal ARCnet card, which you've probably got. This is the only This is the normal ARCnet card, which you've probably got. This is the only
chipset driver which will autoprobe if not told where the card is. chipset driver which will autoprobe if not told where the card is.
It following options on the command line: It following options on the command line::
com90xx=[<io>[,<irq>[,<shmem>]]][,<name>] | <name> com90xx=[<io>[,<irq>[,<shmem>]]][,<name>] | <name>
If you load the chipset support as a module, the options are: If you load the chipset support as a module, the options are::
io=<io> irq=<irq> shmem=<shmem> device=<name> io=<io> irq=<irq> shmem=<shmem> device=<name>
To disable the autoprobe, just specify "com90xx=" on the kernel command line. To disable the autoprobe, just specify "com90xx=" on the kernel command line.
...@@ -136,14 +143,17 @@ To specify the name alone, but allow autoprobe, just put "com90xx=<name>" ...@@ -136,14 +143,17 @@ To specify the name alone, but allow autoprobe, just put "com90xx=<name>"
2. ARCnet COM20020 chipset. 2. ARCnet COM20020 chipset.
This is the new chipset from SMC with support for promiscuous mode (packet This is the new chipset from SMC with support for promiscuous mode (packet
sniffing), extra diagnostic information, etc. Unfortunately, there is no sniffing), extra diagnostic information, etc. Unfortunately, there is no
sensible method of autoprobing for these cards. You must specify the I/O sensible method of autoprobing for these cards. You must specify the I/O
address on the kernel command line. address on the kernel command line.
The command line options are:
The command line options are::
com20020=<io>[,<irq>[,<node_ID>[,backplane[,CKP[,timeout]]]]][,name] com20020=<io>[,<irq>[,<node_ID>[,backplane[,CKP[,timeout]]]]][,name]
If you load the chipset support as a module, the options are: If you load the chipset support as a module, the options are::
io=<io> irq=<irq> node=<node_ID> backplane=<backplane> clock=<CKP> io=<io> irq=<irq> node=<node_ID> backplane=<backplane> clock=<CKP>
timeout=<timeout> device=<name> timeout=<timeout> device=<name>
...@@ -160,8 +170,10 @@ you have a card which doesn't support shared memory, or (strangely) in case ...@@ -160,8 +170,10 @@ you have a card which doesn't support shared memory, or (strangely) in case
you have so many ARCnet cards in your machine that you run out of shmem slots. you have so many ARCnet cards in your machine that you run out of shmem slots.
If you don't give the IO address on the kernel command line, then the driver If you don't give the IO address on the kernel command line, then the driver
will not find the card. will not find the card.
The command line options are:
com90io=<io>[,<irq>][,<name>] The command line options are::
com90io=<io>[,<irq>][,<name>]
If you load the chipset support as a module, the options are: If you load the chipset support as a module, the options are:
io=<io> irq=<irq> device=<name> io=<io> irq=<irq> device=<name>
...@@ -169,44 +181,49 @@ If you load the chipset support as a module, the options are: ...@@ -169,44 +181,49 @@ If you load the chipset support as a module, the options are:
4. ARCnet RIM I cards. 4. ARCnet RIM I cards.
These are COM90xx chips which are _completely_ memory mapped. The support for These are COM90xx chips which are _completely_ memory mapped. The support for
these is not tested. If you have one, please mail the author with a success these is not tested. If you have one, please mail the author with a success
report. All options must be specified, except the device name. report. All options must be specified, except the device name.
Command line options: Command line options::
arcrimi=<shmem>,<irq>,<node_ID>[,<name>] arcrimi=<shmem>,<irq>,<node_ID>[,<name>]
If you load the chipset support as a module, the options are: If you load the chipset support as a module, the options are::
shmem=<shmem> irq=<irq> node=<node_ID> device=<name> shmem=<shmem> irq=<irq> node=<node_ID> device=<name>
Loadable Module Support Loadable Module Support
----------------------- -----------------------
Configure and rebuild Linux. When asked, answer 'm' to "Generic ARCnet Configure and rebuild Linux. When asked, answer 'm' to "Generic ARCnet
support" and to support for your ARCnet chipset if you want to use the support" and to support for your ARCnet chipset if you want to use the
loadable module. You can also say 'y' to "Generic ARCnet support" and 'm' loadable module. You can also say 'y' to "Generic ARCnet support" and 'm'
to the chipset support if you wish. to the chipset support if you wish.
::
make config make config
make clean make clean
make zImage make zImage
make modules make modules
If you're using a loadable module, you need to use insmod to load it, and If you're using a loadable module, you need to use insmod to load it, and
you can specify various characteristics of your card on the command you can specify various characteristics of your card on the command
line. (In recent versions of the driver, autoprobing is much more reliable line. (In recent versions of the driver, autoprobing is much more reliable
and works as a module, so most of this is now unnecessary.) and works as a module, so most of this is now unnecessary.)
For example: For example::
cd /usr/src/linux/modules cd /usr/src/linux/modules
insmod arcnet.o insmod arcnet.o
insmod com90xx.o insmod com90xx.o
insmod com20020.o io=0x2e0 device=eth1 insmod com20020.o io=0x2e0 device=eth1
Using the Driver Using the Driver
---------------- ----------------
If you build your kernel with ARCnet COM90xx support included, it should If you build your kernel with ARCnet COM90xx support included, it should
probe for your card automatically when you boot. If you use a different probe for your card automatically when you boot. If you use a different
chipset driver complied into the kernel, you must give the necessary options chipset driver complied into the kernel, you must give the necessary options
on the kernel command line, as detailed above. on the kernel command line, as detailed above.
...@@ -224,69 +241,78 @@ Multiple Cards in One Computer ...@@ -224,69 +241,78 @@ Multiple Cards in One Computer
------------------------------ ------------------------------
Linux has pretty good support for this now, but since I've been busy, the Linux has pretty good support for this now, but since I've been busy, the
ARCnet driver has somewhat suffered in this respect. COM90xx support, if ARCnet driver has somewhat suffered in this respect. COM90xx support, if
compiled into the kernel, will (try to) autodetect all the installed cards. compiled into the kernel, will (try to) autodetect all the installed cards.
If you have other cards, with support compiled into the kernel, then you can
just repeat the options on the kernel command line, e.g.::
LILO: linux com20020=0x2e0 com20020=0x380 com90io=0x260
If you have other cards, with support compiled into the kernel, then you can If you have the chipset support built as a loadable module, then you need to
just repeat the options on the kernel command line, e.g.: do something like this::
LILO: linux com20020=0x2e0 com20020=0x380 com90io=0x260
If you have the chipset support built as a loadable module, then you need to
do something like this:
insmod -o arc0 com90xx insmod -o arc0 com90xx
insmod -o arc1 com20020 io=0x2e0 insmod -o arc1 com20020 io=0x2e0
insmod -o arc2 com90xx insmod -o arc2 com90xx
The ARCnet drivers will now sort out their names automatically. The ARCnet drivers will now sort out their names automatically.
How do I get it to work with...? How do I get it to work with...?
-------------------------------- --------------------------------
NFS: Should be fine linux->linux, just pretend you're using Ethernet cards. NFS:
oak.oakland.edu:/simtel/msdos/nfs has some nice DOS clients. There Should be fine linux->linux, just pretend you're using Ethernet cards.
is also a DOS-based NFS server called SOSS. It doesn't multitask oak.oakland.edu:/simtel/msdos/nfs has some nice DOS clients. There
quite the way Linux does (actually, it doesn't multitask AT ALL) but is also a DOS-based NFS server called SOSS. It doesn't multitask
you never know what you might need. quite the way Linux does (actually, it doesn't multitask AT ALL) but
you never know what you might need.
With AmiTCP (and possibly others), you may need to set the following
options in your Amiga nfstab: MD 1024 MR 1024 MW 1024 With AmiTCP (and possibly others), you may need to set the following
(Thanks to Christian Gottschling <ferksy@indigo.tng.oche.de> options in your Amiga nfstab: MD 1024 MR 1024 MW 1024
(Thanks to Christian Gottschling <ferksy@indigo.tng.oche.de>
for this.) for this.)
Probably these refer to maximum NFS data/read/write block sizes. I Probably these refer to maximum NFS data/read/write block sizes. I
don't know why the defaults on the Amiga didn't work; write to me if don't know why the defaults on the Amiga didn't work; write to me if
you know more. you know more.
DOS: If you're using the freeware arcether.com, you might want to install DOS:
the driver patch from my web page. It helps with PC/TCP, and also If you're using the freeware arcether.com, you might want to install
can get arcether to load if it timed out too quickly during the driver patch from my web page. It helps with PC/TCP, and also
initialization. In fact, if you use it on a 386+ you REALLY need can get arcether to load if it timed out too quickly during
the patch, really. initialization. In fact, if you use it on a 386+ you REALLY need
the patch, really.
Windows: See DOS :) Trumpet Winsock works fine with either the Novell or
Windows:
See DOS :) Trumpet Winsock works fine with either the Novell or
Arcether client, assuming you remember to load winpkt of course. Arcether client, assuming you remember to load winpkt of course.
LAN Manager and Windows for Workgroups: These programs use protocols that LAN Manager and Windows for Workgroups:
are incompatible with the Internet standard. They try to pretend These programs use protocols that
the cards are Ethernet, and confuse everyone else on the network. are incompatible with the Internet standard. They try to pretend
the cards are Ethernet, and confuse everyone else on the network.
However, v2.00 and higher of the Linux ARCnet driver supports this
protocol via the 'arc0e' device. See the section on "Multiprotocol However, v2.00 and higher of the Linux ARCnet driver supports this
Support" for more information. protocol via the 'arc0e' device. See the section on "Multiprotocol
Support" for more information.
Using the freeware Samba server and clients for Linux, you can now Using the freeware Samba server and clients for Linux, you can now
interface quite nicely with TCP/IP-based WfWg or Lan Manager interface quite nicely with TCP/IP-based WfWg or Lan Manager
networks. networks.
Windows 95: Tools are included with Win95 that let you use either the LANMAN Windows 95:
Tools are included with Win95 that let you use either the LANMAN
style network drivers (NDIS) or Novell drivers (ODI) to handle your style network drivers (NDIS) or Novell drivers (ODI) to handle your
ARCnet packets. If you use ODI, you'll need to use the 'arc0' ARCnet packets. If you use ODI, you'll need to use the 'arc0'
device with Linux. If you use NDIS, then try the 'arc0e' device. device with Linux. If you use NDIS, then try the 'arc0e' device.
See the "Multiprotocol Support" section below if you need arc0e, See the "Multiprotocol Support" section below if you need arc0e,
you're completely insane, and/or you need to build some kind of you're completely insane, and/or you need to build some kind of
hybrid network that uses both encapsulation types. hybrid network that uses both encapsulation types.
OS/2: I've been told it works under Warp Connect with an ARCnet driver from OS/2:
I've been told it works under Warp Connect with an ARCnet driver from
SMC. You need to use the 'arc0e' interface for this. If you get SMC. You need to use the 'arc0e' interface for this. If you get
the SMC driver to work with the TCP/IP stuff included in the the SMC driver to work with the TCP/IP stuff included in the
"normal" Warp Bonus Pack, let me know. "normal" Warp Bonus Pack, let me know.
...@@ -295,7 +321,8 @@ OS/2: I've been told it works under Warp Connect with an ARCnet driver from ...@@ -295,7 +321,8 @@ OS/2: I've been told it works under Warp Connect with an ARCnet driver from
which should use the same protocol as WfWg does. I had no luck which should use the same protocol as WfWg does. I had no luck
installing it under Warp, however. Please mail me with any results. installing it under Warp, however. Please mail me with any results.
NetBSD/AmiTCP: These use an old version of the Internet standard ARCnet NetBSD/AmiTCP:
These use an old version of the Internet standard ARCnet
protocol (RFC1051) which is compatible with the Linux driver v2.10 protocol (RFC1051) which is compatible with the Linux driver v2.10
ALPHA and above using the arc0s device. (See "Multiprotocol ARCnet" ALPHA and above using the arc0s device. (See "Multiprotocol ARCnet"
below.) ** Newer versions of NetBSD apparently support RFC1201. below.) ** Newer versions of NetBSD apparently support RFC1201.
...@@ -307,16 +334,17 @@ Using Multiprotocol ARCnet ...@@ -307,16 +334,17 @@ Using Multiprotocol ARCnet
The ARCnet driver v2.10 ALPHA supports three protocols, each on its own The ARCnet driver v2.10 ALPHA supports three protocols, each on its own
"virtual network device": "virtual network device":
arc0 - RFC1201 protocol, the official Internet standard which just ====== ===============================================================
happens to be 100% compatible with Novell's TRXNET driver. arc0 RFC1201 protocol, the official Internet standard which just
happens to be 100% compatible with Novell's TRXNET driver.
Version 1.00 of the ARCnet driver supported _only_ this Version 1.00 of the ARCnet driver supported _only_ this
protocol. arc0 is the fastest of the three protocols (for protocol. arc0 is the fastest of the three protocols (for
whatever reason), and allows larger packets to be used whatever reason), and allows larger packets to be used
because it supports RFC1201 "packet splitting" operations. because it supports RFC1201 "packet splitting" operations.
Unless you have a specific need to use a different protocol, Unless you have a specific need to use a different protocol,
I strongly suggest that you stick with this one. I strongly suggest that you stick with this one.
arc0e - "Ethernet-Encapsulation" which sends packets over ARCnet arc0e "Ethernet-Encapsulation" which sends packets over ARCnet
that are actually a lot like Ethernet packets, including the that are actually a lot like Ethernet packets, including the
6-byte hardware addresses. This protocol is compatible with 6-byte hardware addresses. This protocol is compatible with
Microsoft's NDIS ARCnet driver, like the one in WfWg and Microsoft's NDIS ARCnet driver, like the one in WfWg and
...@@ -328,8 +356,8 @@ The ARCnet driver v2.10 ALPHA supports three protocols, each on its own ...@@ -328,8 +356,8 @@ The ARCnet driver v2.10 ALPHA supports three protocols, each on its own
fit. arc0e also works slightly more slowly than arc0, for fit. arc0e also works slightly more slowly than arc0, for
reasons yet to be determined. (Probably it's the smaller reasons yet to be determined. (Probably it's the smaller
MTU that does it.) MTU that does it.)
arc0s - The "[s]imple" RFC1051 protocol is the "previous" Internet arc0s The "[s]imple" RFC1051 protocol is the "previous" Internet
standard that is completely incompatible with the new standard that is completely incompatible with the new
standard. Some software today, however, continues to standard. Some software today, however, continues to
support the old standard (and only the old standard) support the old standard (and only the old standard)
...@@ -338,9 +366,10 @@ The ARCnet driver v2.10 ALPHA supports three protocols, each on its own ...@@ -338,9 +366,10 @@ The ARCnet driver v2.10 ALPHA supports three protocols, each on its own
smaller than the Internet "requirement," so it's quite smaller than the Internet "requirement," so it's quite
possible that you may run into problems. It's also slower possible that you may run into problems. It's also slower
than RFC1201 by about 25%, for the same reason as arc0e. than RFC1201 by about 25%, for the same reason as arc0e.
The arc0s support was contributed by Tomasz Motylewski The arc0s support was contributed by Tomasz Motylewski
and modified somewhat by me. Bugs are probably my fault. and modified somewhat by me. Bugs are probably my fault.
====== ===============================================================
You can choose not to compile arc0e and arc0s into the driver if you want - You can choose not to compile arc0e and arc0s into the driver if you want -
this will save you a bit of memory and avoid confusion when eg. trying to this will save you a bit of memory and avoid confusion when eg. trying to
...@@ -358,19 +387,21 @@ can set up your network then: ...@@ -358,19 +387,21 @@ can set up your network then:
two available protocols. As mentioned above, it's a good idea to use two available protocols. As mentioned above, it's a good idea to use
only arc0 unless you have a good reason (like some other software, ie. only arc0 unless you have a good reason (like some other software, ie.
WfWg, that only works with arc0e). WfWg, that only works with arc0e).
If you need only arc0, then the following commands should get you going: If you need only arc0, then the following commands should get you going::
ifconfig arc0 MY.IP.ADD.RESS
route add MY.IP.ADD.RESS arc0 ifconfig arc0 MY.IP.ADD.RESS
route add -net SUB.NET.ADD.RESS arc0 route add MY.IP.ADD.RESS arc0
[add other local routes here] route add -net SUB.NET.ADD.RESS arc0
[add other local routes here]
If you need arc0e (and only arc0e), it's a little different:
ifconfig arc0 MY.IP.ADD.RESS If you need arc0e (and only arc0e), it's a little different::
ifconfig arc0e MY.IP.ADD.RESS
route add MY.IP.ADD.RESS arc0e ifconfig arc0 MY.IP.ADD.RESS
route add -net SUB.NET.ADD.RESS arc0e ifconfig arc0e MY.IP.ADD.RESS
route add MY.IP.ADD.RESS arc0e
route add -net SUB.NET.ADD.RESS arc0e
arc0s works much the same way as arc0e. arc0s works much the same way as arc0e.
...@@ -391,29 +422,32 @@ can set up your network then: ...@@ -391,29 +422,32 @@ can set up your network then:
XT (patience), however, does not have its own Internet IP address and so XT (patience), however, does not have its own Internet IP address and so
I assigned it one on a "private subnet" (as defined by RFC1597). I assigned it one on a "private subnet" (as defined by RFC1597).
To start with, take a simple network with just insight and freedom. To start with, take a simple network with just insight and freedom.
Insight needs to: Insight needs to:
- talk to freedom via RFC1201 (arc0) protocol, because I like it
- talk to freedom via RFC1201 (arc0) protocol, because I like it
more and it's faster. more and it's faster.
- use freedom as its Internet gateway. - use freedom as its Internet gateway.
That's pretty easy to do. Set up insight like this: That's pretty easy to do. Set up insight like this::
ifconfig arc0 insight
route add insight arc0 ifconfig arc0 insight
route add freedom arc0 /* I would use the subnet here (like I said route add insight arc0
route add freedom arc0 /* I would use the subnet here (like I said
to to in "single protocol" above), to to in "single protocol" above),
but the rest of the subnet but the rest of the subnet
unfortunately lies across the PPP unfortunately lies across the PPP
link on freedom, which confuses link on freedom, which confuses
things. */ things. */
route add default gw freedom route add default gw freedom
And freedom gets configured like so: And freedom gets configured like so::
ifconfig arc0 freedom
route add freedom arc0 ifconfig arc0 freedom
route add insight arc0 route add freedom arc0
/* and default gateway is configured by pppd */ route add insight arc0
/* and default gateway is configured by pppd */
Great, now insight talks to freedom directly on arc0, and sends packets Great, now insight talks to freedom directly on arc0, and sends packets
to the Internet through freedom. If you didn't know how to do the above, to the Internet through freedom. If you didn't know how to do the above,
you should probably stop reading this section now because it only gets you should probably stop reading this section now because it only gets
...@@ -425,7 +459,7 @@ can set up your network then: ...@@ -425,7 +459,7 @@ can set up your network then:
Internet. (Recall that patience has a "private IP address" which won't Internet. (Recall that patience has a "private IP address" which won't
work on the Internet; that's okay, I configured Linux IP masquerading on work on the Internet; that's okay, I configured Linux IP masquerading on
freedom for this subnet). freedom for this subnet).
So patience (necessarily; I don't have another IP number from my So patience (necessarily; I don't have another IP number from my
provider) has an IP address on a different subnet than freedom and provider) has an IP address on a different subnet than freedom and
insight, but needs to use freedom as an Internet gateway. Worse, most insight, but needs to use freedom as an Internet gateway. Worse, most
...@@ -435,53 +469,54 @@ can set up your network then: ...@@ -435,53 +469,54 @@ can set up your network then:
insight, patience WILL send through its default gateway, regardless of insight, patience WILL send through its default gateway, regardless of
the fact that both freedom and insight (courtesy of the arc0e device) the fact that both freedom and insight (courtesy of the arc0e device)
could understand a direct transmission. could understand a direct transmission.
I compensate by giving freedom an extra IP address - aliased 'gatekeeper' I compensate by giving freedom an extra IP address - aliased 'gatekeeper' -
- that is on my private subnet, the same subnet that patience is on. I that is on my private subnet, the same subnet that patience is on. I
then define gatekeeper to be the default gateway for patience. then define gatekeeper to be the default gateway for patience.
To configure freedom (in addition to the commands above): To configure freedom (in addition to the commands above)::
ifconfig arc0e gatekeeper
route add gatekeeper arc0e ifconfig arc0e gatekeeper
route add patience arc0e route add gatekeeper arc0e
route add patience arc0e
This way, freedom will send all packets for patience through arc0e, This way, freedom will send all packets for patience through arc0e,
giving its IP address as gatekeeper (on the private subnet). When it giving its IP address as gatekeeper (on the private subnet). When it
talks to insight or the Internet, it will use its "freedom" Internet IP talks to insight or the Internet, it will use its "freedom" Internet IP
address. address.
You will notice that we haven't configured the arc0e device on insight. You will notice that we haven't configured the arc0e device on insight.
This would work, but is not really necessary, and would require me to This would work, but is not really necessary, and would require me to
assign insight another special IP number from my private subnet. Since assign insight another special IP number from my private subnet. Since
both insight and patience are using freedom as their default gateway, the both insight and patience are using freedom as their default gateway, the
two can already talk to each other. two can already talk to each other.
It's quite fortunate that I set things up like this the first time (cough It's quite fortunate that I set things up like this the first time (cough
cough) because it's really handy when I boot insight into DOS. There, it cough) because it's really handy when I boot insight into DOS. There, it
runs the Novell ODI protocol stack, which only works with RFC1201 ARCnet. runs the Novell ODI protocol stack, which only works with RFC1201 ARCnet.
In this mode it would be impossible for insight to communicate directly In this mode it would be impossible for insight to communicate directly
with patience, since the Novell stack is incompatible with Microsoft's with patience, since the Novell stack is incompatible with Microsoft's
Ethernet-Encap. Without changing any settings on freedom or patience, I Ethernet-Encap. Without changing any settings on freedom or patience, I
simply set freedom as the default gateway for insight (now in DOS, simply set freedom as the default gateway for insight (now in DOS,
remember) and all the forwarding happens "automagically" between the two remember) and all the forwarding happens "automagically" between the two
hosts that would normally not be able to communicate at all. hosts that would normally not be able to communicate at all.
For those who like diagrams, I have created two "virtual subnets" on the For those who like diagrams, I have created two "virtual subnets" on the
same physical ARCnet wire. You can picture it like this: same physical ARCnet wire. You can picture it like this::
[RFC1201 NETWORK] [ETHER-ENCAP NETWORK] [RFC1201 NETWORK] [ETHER-ENCAP NETWORK]
(registered Internet subnet) (RFC1597 private subnet) (registered Internet subnet) (RFC1597 private subnet)
(IP Masquerade) (IP Masquerade)
/---------------\ * /---------------\ /---------------\ * /---------------\
| | * | | | | * | |
| +-Freedom-*-Gatekeeper-+ | | +-Freedom-*-Gatekeeper-+ |
| | | * | | | | | * | |
\-------+-------/ | * \-------+-------/ \-------+-------/ | * \-------+-------/
| | | | | |
Insight | Patience Insight | Patience
(Internet) (Internet)
...@@ -491,6 +526,7 @@ It works: what now? ...@@ -491,6 +526,7 @@ It works: what now?
Send mail describing your setup, preferably including driver version, kernel Send mail describing your setup, preferably including driver version, kernel
version, ARCnet card model, CPU type, number of systems on your network, and version, ARCnet card model, CPU type, number of systems on your network, and
list of software in use to me at the following address: list of software in use to me at the following address:
apenwarr@worldvisions.ca apenwarr@worldvisions.ca
I do send (sometimes automated) replies to all messages I receive. My email I do send (sometimes automated) replies to all messages I receive. My email
...@@ -525,7 +561,7 @@ this, you should grab the pertinent RFCs. (some are listed near the top of ...@@ -525,7 +561,7 @@ this, you should grab the pertinent RFCs. (some are listed near the top of
arcnet.c). arcdump assumes your card is at 0xD0000. If it isn't, edit the arcnet.c). arcdump assumes your card is at 0xD0000. If it isn't, edit the
script. script.
Buffers 0 and 1 are used for receiving, and Buffers 2 and 3 are for sending. Buffers 0 and 1 are used for receiving, and Buffers 2 and 3 are for sending.
Ping-pong buffers are implemented both ways. Ping-pong buffers are implemented both ways.
If your debug level includes D_DURING and you did NOT define SLOW_XMIT_COPY, If your debug level includes D_DURING and you did NOT define SLOW_XMIT_COPY,
...@@ -535,9 +571,11 @@ decides that the driver is broken). During a transmit, unused parts of the ...@@ -535,9 +571,11 @@ decides that the driver is broken). During a transmit, unused parts of the
buffer will be cleared to 0x42 as well. This is to make it easier to figure buffer will be cleared to 0x42 as well. This is to make it easier to figure
out which bytes are being used by a packet. out which bytes are being used by a packet.
You can change the debug level without recompiling the kernel by typing: You can change the debug level without recompiling the kernel by typing::
ifconfig arc0 down metric 1xxx ifconfig arc0 down metric 1xxx
/etc/rc.d/rc.inet1 /etc/rc.d/rc.inet1
where "xxx" is the debug level you want. For example, "metric 1015" would put where "xxx" is the debug level you want. For example, "metric 1015" would put
you at debug level 15. Debug level 7 is currently the default. you at debug level 15. Debug level 7 is currently the default.
...@@ -546,7 +584,7 @@ combination of different debug flags; so debug level 7 is really 1+2+4 or ...@@ -546,7 +584,7 @@ combination of different debug flags; so debug level 7 is really 1+2+4 or
D_NORMAL+D_EXTRA+D_INIT. To include D_DURING, you would add 16 to this, D_NORMAL+D_EXTRA+D_INIT. To include D_DURING, you would add 16 to this,
resulting in debug level 23. resulting in debug level 23.
If you don't understand that, you probably don't want to know anyway. If you don't understand that, you probably don't want to know anyway.
E-mail me about your problem. E-mail me about your problem.
......
.. SPDX-License-Identifier: GPL-2.0
===
ATM
===
In order to use anything but the most primitive functions of ATM, In order to use anything but the most primitive functions of ATM,
several user-mode programs are required to assist the kernel. These several user-mode programs are required to assist the kernel. These
programs and related material can be found via the ATM on Linux Web programs and related material can be found via the ATM on Linux Web
......
.. SPDX-License-Identifier: GPL-2.0
=====
AX.25
=====
To use the amateur radio protocols within Linux you will need to get a To use the amateur radio protocols within Linux you will need to get a
suitable copy of the AX.25 Utilities. More detailed information about suitable copy of the AX.25 Utilities. More detailed information about
AX.25, NET/ROM and ROSE, associated programs and and utilities can be AX.25, NET/ROM and ROSE, associated programs and and utilities can be
......
LINUX DRIVERS FOR BAYCOM MODEMS .. SPDX-License-Identifier: GPL-2.0
Thomas M. Sailer, HB9JNX/AE4WA, <sailer@ife.ee.ethz.ch> ===============================
Linux Drivers for Baycom Modems
===============================
!!NEW!! (04/98) The drivers for the baycom modems have been split into Thomas M. Sailer, HB9JNX/AE4WA, <sailer@ife.ee.ethz.ch>
The drivers for the baycom modems have been split into
separate drivers as they did not share any code, and the driver separate drivers as they did not share any code, and the driver
and device names have changed. and device names have changed.
This document describes the Linux Kernel Drivers for simple Baycom style This document describes the Linux Kernel Drivers for simple Baycom style
amateur radio modems. amateur radio modems.
The following drivers are available: The following drivers are available:
====================================
baycom_ser_fdx: baycom_ser_fdx:
This driver supports the SER12 modems either full or half duplex. This driver supports the SER12 modems either full or half duplex.
Its baud rate may be changed via the `baud' module parameter, Its baud rate may be changed via the ``baud`` module parameter,
therefore it supports just about every bit bang modem on a therefore it supports just about every bit bang modem on a
serial port. Its devices are called bcsf0 through bcsf3. serial port. Its devices are called bcsf0 through bcsf3.
This is the recommended driver for SER12 type modems, This is the recommended driver for SER12 type modems,
however if you have a broken UART clone that does not have working however if you have a broken UART clone that does not have working
delta status bits, you may try baycom_ser_hdx. delta status bits, you may try baycom_ser_hdx.
baycom_ser_hdx: baycom_ser_hdx:
This is an alternative driver for SER12 type modems. This is an alternative driver for SER12 type modems.
It only supports half duplex, and only 1200 baud. Its devices It only supports half duplex, and only 1200 baud. Its devices
are called bcsh0 through bcsh3. Use this driver only if baycom_ser_fdx are called bcsh0 through bcsh3. Use this driver only if baycom_ser_fdx
...@@ -37,45 +42,48 @@ baycom_epp: ...@@ -37,45 +42,48 @@ baycom_epp:
The following modems are supported: The following modems are supported:
ser12: This is a very simple 1200 baud AFSK modem. The modem consists only ======= ========================================================================
of a modulator/demodulator chip, usually a TI TCM3105. The computer ser12 This is a very simple 1200 baud AFSK modem. The modem consists only
is responsible for regenerating the receiver bit clock, as well as of a modulator/demodulator chip, usually a TI TCM3105. The computer
for handling the HDLC protocol. The modem connects to a serial port, is responsible for regenerating the receiver bit clock, as well as
hence the name. Since the serial port is not used as an async serial for handling the HDLC protocol. The modem connects to a serial port,
port, the kernel driver for serial ports cannot be used, and this hence the name. Since the serial port is not used as an async serial
driver only supports standard serial hardware (8250, 16450, 16550) port, the kernel driver for serial ports cannot be used, and this
driver only supports standard serial hardware (8250, 16450, 16550)
par96: This is a modem for 9600 baud FSK compatible to the G3RUH standard.
The modem does all the filtering and regenerates the receiver clock. par96 This is a modem for 9600 baud FSK compatible to the G3RUH standard.
Data is transferred from and to the PC via a shift register. The modem does all the filtering and regenerates the receiver clock.
The shift register is filled with 16 bits and an interrupt is signalled. Data is transferred from and to the PC via a shift register.
The PC then empties the shift register in a burst. This modem connects The shift register is filled with 16 bits and an interrupt is signalled.
to the parallel port, hence the name. The modem leaves the The PC then empties the shift register in a burst. This modem connects
implementation of the HDLC protocol and the scrambler polynomial to to the parallel port, hence the name. The modem leaves the
the PC. implementation of the HDLC protocol and the scrambler polynomial to
the PC.
picpar: This is a redesign of the par96 modem by Henning Rech, DF9IC. The modem
is protocol compatible to par96, but uses only three low power ICs picpar This is a redesign of the par96 modem by Henning Rech, DF9IC. The modem
and can therefore be fed from the parallel port and does not require is protocol compatible to par96, but uses only three low power ICs
an additional power supply. Furthermore, it incorporates a carrier and can therefore be fed from the parallel port and does not require
detect circuitry. an additional power supply. Furthermore, it incorporates a carrier
detect circuitry.
EPP: This is a high-speed modem adaptor that connects to an enhanced parallel port.
Its target audience is users working over a high speed hub (76.8kbit/s). EPP This is a high-speed modem adaptor that connects to an enhanced parallel
port.
eppfpga: This is a redesign of the EPP adaptor.
Its target audience is users working over a high speed hub (76.8kbit/s).
eppfpga This is a redesign of the EPP adaptor.
======= ========================================================================
All of the above modems only support half duplex communications. However, All of the above modems only support half duplex communications. However,
the driver supports the KISS (see below) fullduplex command. It then simply the driver supports the KISS (see below) fullduplex command. It then simply
starts to send as soon as there's a packet to transmit and does not care starts to send as soon as there's a packet to transmit and does not care
about DCD, i.e. it starts to send even if there's someone else on the channel. about DCD, i.e. it starts to send even if there's someone else on the channel.
This command is required by some implementations of the DAMA channel This command is required by some implementations of the DAMA channel
access protocol. access protocol.
The Interface of the drivers The Interface of the drivers
============================
Unlike previous drivers, these drivers are no longer character devices, Unlike previous drivers, these drivers are no longer character devices,
but they are now true kernel network interfaces. Installation is therefore but they are now true kernel network interfaces. Installation is therefore
...@@ -88,20 +96,22 @@ me for WAMPES which allows attaching a kernel network interface directly. ...@@ -88,20 +96,22 @@ me for WAMPES which allows attaching a kernel network interface directly.
Configuring the driver Configuring the driver
======================
Every time a driver is inserted into the kernel, it has to know which Every time a driver is inserted into the kernel, it has to know which
modems it should access at which ports. This can be done with the setbaycom modems it should access at which ports. This can be done with the setbaycom
utility. If you are only using one modem, you can also configure the utility. If you are only using one modem, you can also configure the
driver from the insmod command line (or by means of an option line in driver from the insmod command line (or by means of an option line in
/etc/modprobe.d/*.conf). ``/etc/modprobe.d/*.conf``).
Examples::
Examples:
modprobe baycom_ser_fdx mode="ser12*" iobase=0x3f8 irq=4 modprobe baycom_ser_fdx mode="ser12*" iobase=0x3f8 irq=4
sethdlc -i bcsf0 -p mode "ser12*" io 0x3f8 irq 4 sethdlc -i bcsf0 -p mode "ser12*" io 0x3f8 irq 4
Both lines configure the first port to drive a ser12 modem at the first Both lines configure the first port to drive a ser12 modem at the first
serial port (COM1 under DOS). The * in the mode parameter instructs the driver to use serial port (COM1 under DOS). The * in the mode parameter instructs the driver
the software DCD algorithm (see below). to use the software DCD algorithm (see below)::
insmod baycom_par mode="picpar" iobase=0x378 insmod baycom_par mode="picpar" iobase=0x378
sethdlc -i bcp0 -p mode "picpar" io 0x378 sethdlc -i bcp0 -p mode "picpar" io 0x378
...@@ -115,29 +125,33 @@ Note that both utilities interpret the values slightly differently. ...@@ -115,29 +125,33 @@ Note that both utilities interpret the values slightly differently.
Hardware DCD versus Software DCD Hardware DCD versus Software DCD
================================
To avoid collisions on the air, the driver must know when the channel is To avoid collisions on the air, the driver must know when the channel is
busy. This is the task of the DCD circuitry/software. The driver may either busy. This is the task of the DCD circuitry/software. The driver may either
utilise a software DCD algorithm (options=1) or use a DCD signal from utilise a software DCD algorithm (options=1) or use a DCD signal from
the hardware (options=0). the hardware (options=0).
ser12: if software DCD is utilised, the radio's squelch should always be ======= =================================================================
open. It is highly recommended to use the software DCD algorithm, ser12 if software DCD is utilised, the radio's squelch should always be
as it is much faster than most hardware squelch circuitry. The open. It is highly recommended to use the software DCD algorithm,
disadvantage is a slightly higher load on the system. as it is much faster than most hardware squelch circuitry. The
disadvantage is a slightly higher load on the system.
par96: the software DCD algorithm for this type of modem is rather poor. par96 the software DCD algorithm for this type of modem is rather poor.
The modem simply does not provide enough information to implement The modem simply does not provide enough information to implement
a reasonable DCD algorithm in software. Therefore, if your radio a reasonable DCD algorithm in software. Therefore, if your radio
feeds the DCD input of the PAR96 modem, the use of the hardware feeds the DCD input of the PAR96 modem, the use of the hardware
DCD circuitry is recommended. DCD circuitry is recommended.
picpar: the picpar modem features a builtin DCD hardware, which is highly picpar the picpar modem features a builtin DCD hardware, which is highly
recommended. recommended.
======= =================================================================
Compatibility with the rest of the Linux kernel Compatibility with the rest of the Linux kernel
===============================================
The serial driver and the baycom serial drivers compete The serial driver and the baycom serial drivers compete
for the same hardware resources. Of course only one driver can access a given for the same hardware resources. Of course only one driver can access a given
...@@ -154,5 +168,7 @@ The parallel port drivers (baycom_par, baycom_epp) now use the parport subsystem ...@@ -154,5 +168,7 @@ The parallel port drivers (baycom_par, baycom_epp) now use the parport subsystem
to arbitrate the ports between different client drivers. to arbitrate the ports between different client drivers.
vy 73s de vy 73s de
Tom Sailer, sailer@ife.ee.ethz.ch Tom Sailer, sailer@ife.ee.ethz.ch
hb9jnx @ hb9w.ampr.org hb9jnx @ hb9w.ampr.org
This source diff could not be displayed because it is too large. You can view the blob instead.
:orphan:
.. SPDX-License-Identifier: GPL-2.0 .. SPDX-License-Identifier: GPL-2.0
.. include:: <isonum.txt> .. include:: <isonum.txt>
......
.. SPDX-License-Identifier: GPL-2.0
CAIF
====
Contents:
.. toctree::
:maxdepth: 2
linux_caif
caif
spi_porting
.. SPDX-License-Identifier: GPL-2.0
.. include:: <isonum.txt>
==========
Linux CAIF Linux CAIF
=========== ==========
copyright (C) ST-Ericsson AB 2010
Author: Sjur Brendeland/ sjur.brandeland@stericsson.com Copyright |copy| ST-Ericsson AB 2010
License terms: GNU General Public License (GPL) version 2
:Author: Sjur Brendeland/ sjur.brandeland@stericsson.com
:License terms: GNU General Public License (GPL) version 2
Introduction Introduction
------------ ============
CAIF is a MUX protocol used by ST-Ericsson cellular modems for CAIF is a MUX protocol used by ST-Ericsson cellular modems for
communication between Modem and host. The host processes can open virtual AT communication between Modem and host. The host processes can open virtual AT
channels, initiate GPRS Data connections, Video channels and Utility Channels. channels, initiate GPRS Data connections, Video channels and Utility Channels.
...@@ -16,13 +23,16 @@ ST-Ericsson modems support a number of transports between modem ...@@ -16,13 +23,16 @@ ST-Ericsson modems support a number of transports between modem
and host. Currently, UART and Loopback are available for Linux. and host. Currently, UART and Loopback are available for Linux.
Architecture: Architecture
------------ ============
The implementation of CAIF is divided into: The implementation of CAIF is divided into:
* CAIF Socket Layer and GPRS IP Interface. * CAIF Socket Layer and GPRS IP Interface.
* CAIF Core Protocol Implementation * CAIF Core Protocol Implementation
* CAIF Link Layer, implemented as NET devices. * CAIF Link Layer, implemented as NET devices.
::
RTNL RTNL
! !
...@@ -46,12 +56,12 @@ The implementation of CAIF is divided into: ...@@ -46,12 +56,12 @@ The implementation of CAIF is divided into:
I M P L E M E N T A T I O N Implementation
=========================== ==============
CAIF Core Protocol Layer CAIF Core Protocol Layer
========================================= ------------------------
CAIF Core layer implements the CAIF protocol as defined by ST-Ericsson. CAIF Core layer implements the CAIF protocol as defined by ST-Ericsson.
It implements the CAIF protocol stack in a layered approach, where It implements the CAIF protocol stack in a layered approach, where
...@@ -59,8 +69,11 @@ each layer described in the specification is implemented as a separate layer. ...@@ -59,8 +69,11 @@ each layer described in the specification is implemented as a separate layer.
The architecture is inspired by the design patterns "Protocol Layer" and The architecture is inspired by the design patterns "Protocol Layer" and
"Protocol Packet". "Protocol Packet".
== CAIF structure == CAIF structure
^^^^^^^^^^^^^^
The Core CAIF implementation contains: The Core CAIF implementation contains:
- Simple implementation of CAIF. - Simple implementation of CAIF.
- Layered architecture (a la Streams), each layer in the CAIF - Layered architecture (a la Streams), each layer in the CAIF
specification is implemented in a separate c-file. specification is implemented in a separate c-file.
...@@ -73,7 +86,8 @@ The Core CAIF implementation contains: ...@@ -73,7 +86,8 @@ The Core CAIF implementation contains:
to the called function (except for framing layers' receive function) to the called function (except for framing layers' receive function)
Layered Architecture Layered Architecture
-------------------- ====================
The CAIF protocol can be divided into two parts: Support functions and Protocol The CAIF protocol can be divided into two parts: Support functions and Protocol
Implementation. The support functions include: Implementation. The support functions include:
...@@ -112,7 +126,7 @@ The CAIF Protocol implementation contains: ...@@ -112,7 +126,7 @@ The CAIF Protocol implementation contains:
- CFSERL CAIF Serial layer. Handles concatenation/split of frames - CFSERL CAIF Serial layer. Handles concatenation/split of frames
into CAIF Frames with correct length. into CAIF Frames with correct length.
::
+---------+ +---------+
| Config | | Config |
...@@ -143,18 +157,24 @@ The CAIF Protocol implementation contains: ...@@ -143,18 +157,24 @@ The CAIF Protocol implementation contains:
In this layered approach the following "rules" apply. In this layered approach the following "rules" apply.
- All layers embed the same structure "struct cflayer" - All layers embed the same structure "struct cflayer"
- A layer does not depend on any other layer's private data. - A layer does not depend on any other layer's private data.
- Layers are stacked by setting the pointers - Layers are stacked by setting the pointers::
layer->up , layer->dn layer->up , layer->dn
- In order to send data upwards, each layer should do
- In order to send data upwards, each layer should do::
layer->up->receive(layer->up, packet); layer->up->receive(layer->up, packet);
- In order to send data downwards, each layer should do
- In order to send data downwards, each layer should do::
layer->dn->transmit(layer->dn, packet); layer->dn->transmit(layer->dn, packet);
CAIF Socket and IP interface CAIF Socket and IP interface
=========================== ============================
The IP interface and CAIF socket API are implemented on top of the The IP interface and CAIF socket API are implemented on top of the
CAIF Core protocol. The IP Interface and CAIF socket have an instance of CAIF Core protocol. The IP Interface and CAIF socket have an instance of
......
- CAIF SPI porting - .. SPDX-License-Identifier: GPL-2.0
- CAIF SPI basics: ================
CAIF SPI porting
================
CAIF SPI basics
===============
Running CAIF over SPI needs some extra setup, owing to the nature of SPI. Running CAIF over SPI needs some extra setup, owing to the nature of SPI.
Two extra GPIOs have been added in order to negotiate the transfers Two extra GPIOs have been added in order to negotiate the transfers
between the master and the slave. The minimum requirement for running between the master and the slave. The minimum requirement for running
CAIF over SPI is a SPI slave chip and two GPIOs (more details below). CAIF over SPI is a SPI slave chip and two GPIOs (more details below).
Please note that running as a slave implies that you need to keep up Please note that running as a slave implies that you need to keep up
with the master clock. An overrun or underrun event is fatal. with the master clock. An overrun or underrun event is fatal.
- CAIF SPI framework: CAIF SPI framework
==================
To make porting as easy as possible, the CAIF SPI has been divided in To make porting as easy as possible, the CAIF SPI has been divided in
two parts. The first part (called the interface part) deals with all two parts. The first part (called the interface part) deals with all
...@@ -27,7 +33,9 @@ the physical hardware, both with regard to SPI and to GPIOs. ...@@ -27,7 +33,9 @@ the physical hardware, both with regard to SPI and to GPIOs.
need to implement the following need to implement the following
functions: functions:
int (*init_xfer) (struct cfspi_xfer * xfer, struct cfspi_dev *dev): ::
int (*init_xfer) (struct cfspi_xfer * xfer, struct cfspi_dev *dev):
This function is called by the CAIF SPI interface to give This function is called by the CAIF SPI interface to give
you a chance to set up your hardware to be ready to receive you a chance to set up your hardware to be ready to receive
...@@ -36,7 +44,9 @@ the physical hardware, both with regard to SPI and to GPIOs. ...@@ -36,7 +44,9 @@ the physical hardware, both with regard to SPI and to GPIOs.
of the transfer in both directions.The dev parameter can be used of the transfer in both directions.The dev parameter can be used
to map to different CAIF SPI slave devices. to map to different CAIF SPI slave devices.
void (*sig_xfer) (bool xfer, struct cfspi_dev *dev): ::
void (*sig_xfer) (bool xfer, struct cfspi_dev *dev):
This function is called by the CAIF SPI interface when the output This function is called by the CAIF SPI interface when the output
(SPI_INT) GPIO needs to change state. The boolean value of the xfer (SPI_INT) GPIO needs to change state. The boolean value of the xfer
...@@ -46,7 +56,9 @@ the physical hardware, both with regard to SPI and to GPIOs. ...@@ -46,7 +56,9 @@ the physical hardware, both with regard to SPI and to GPIOs.
- Functionality provided by the CAIF SPI interface: - Functionality provided by the CAIF SPI interface:
void (*ss_cb) (bool assert, struct cfspi_ifc *ifc); ::
void (*ss_cb) (bool assert, struct cfspi_ifc *ifc);
This function is called by the CAIF SPI slave device in order to This function is called by the CAIF SPI slave device in order to
signal a change of state of the input GPIO (SS) to the interface. signal a change of state of the input GPIO (SS) to the interface.
...@@ -55,7 +67,9 @@ the physical hardware, both with regard to SPI and to GPIOs. ...@@ -55,7 +67,9 @@ the physical hardware, both with regard to SPI and to GPIOs.
not to introduce latency). The ifc parameter should be the pointer not to introduce latency). The ifc parameter should be the pointer
returned from the platform probe function in the SPI device structure. returned from the platform probe function in the SPI device structure.
void (*xfer_done_cb) (struct cfspi_ifc *ifc); ::
void (*xfer_done_cb) (struct cfspi_ifc *ifc);
This function is called by the CAIF SPI slave device in order to This function is called by the CAIF SPI slave device in order to
report that a transfer is completed. This function should only be report that a transfer is completed. This function should only be
...@@ -68,17 +82,24 @@ the physical hardware, both with regard to SPI and to GPIOs. ...@@ -68,17 +82,24 @@ the physical hardware, both with regard to SPI and to GPIOs.
- Filling in the SPI slave device structure: - Filling in the SPI slave device structure:
Connect the necessary callback functions. Connect the necessary callback functions.
Indicate clock speed (used to calculate toggle delays).
Chose a suitable name (helps debugging if you use several CAIF Indicate clock speed (used to calculate toggle delays).
SPI slave devices).
Assign your private data (can be used to map to your structure). Chose a suitable name (helps debugging if you use several CAIF
SPI slave devices).
Assign your private data (can be used to map to your
structure).
- Filling in the SPI slave platform device structure: - Filling in the SPI slave platform device structure:
Add name of driver to connect to ("cfspi_sspi").
Assign the SPI slave device structure as platform data.
- Padding: Add name of driver to connect to ("cfspi_sspi").
Assign the SPI slave device structure as platform data.
Padding
=======
In order to optimize throughput, a number of SPI padding options are provided. In order to optimize throughput, a number of SPI padding options are provided.
Padding can be enabled independently for uplink and downlink transfers. Padding can be enabled independently for uplink and downlink transfers.
...@@ -87,122 +108,122 @@ The padding needs to be correctly configured on both sides of the link. ...@@ -87,122 +108,122 @@ The padding needs to be correctly configured on both sides of the link.
The padding can be changed via module parameters in cfspi_sspi.c or via The padding can be changed via module parameters in cfspi_sspi.c or via
the sysfs directory of the cfspi_sspi driver (before device registration). the sysfs directory of the cfspi_sspi driver (before device registration).
- CAIF SPI device template: - CAIF SPI device template::
/* /*
* Copyright (C) ST-Ericsson AB 2010 * Copyright (C) ST-Ericsson AB 2010
* Author: Daniel Martensson / Daniel.Martensson@stericsson.com * Author: Daniel Martensson / Daniel.Martensson@stericsson.com
* License terms: GNU General Public License (GPL), version 2. * License terms: GNU General Public License (GPL), version 2.
* *
*/ */
#include <linux/init.h> #include <linux/init.h>
#include <linux/module.h> #include <linux/module.h>
#include <linux/device.h> #include <linux/device.h>
#include <linux/wait.h> #include <linux/wait.h>
#include <linux/interrupt.h> #include <linux/interrupt.h>
#include <linux/dma-mapping.h> #include <linux/dma-mapping.h>
#include <net/caif/caif_spi.h> #include <net/caif/caif_spi.h>
MODULE_LICENSE("GPL"); MODULE_LICENSE("GPL");
struct sspi_struct { struct sspi_struct {
struct cfspi_dev sdev; struct cfspi_dev sdev;
struct cfspi_xfer *xfer; struct cfspi_xfer *xfer;
}; };
static struct sspi_struct slave; static struct sspi_struct slave;
static struct platform_device slave_device; static struct platform_device slave_device;
static irqreturn_t sspi_irq(int irq, void *arg) static irqreturn_t sspi_irq(int irq, void *arg)
{ {
/* You only need to trigger on an edge to the active state of the /* You only need to trigger on an edge to the active state of the
* SS signal. Once a edge is detected, the ss_cb() function should be * SS signal. Once a edge is detected, the ss_cb() function should be
* called with the parameter assert set to true. It is OK * called with the parameter assert set to true. It is OK
* (and even advised) to call the ss_cb() function in IRQ context in * (and even advised) to call the ss_cb() function in IRQ context in
* order not to add any delay. */ * order not to add any delay. */
return IRQ_HANDLED; return IRQ_HANDLED;
} }
static void sspi_complete(void *context) static void sspi_complete(void *context)
{ {
/* Normally the DMA or the SPI framework will call you back /* Normally the DMA or the SPI framework will call you back
* in something similar to this. The only thing you need to * in something similar to this. The only thing you need to
* do is to call the xfer_done_cb() function, providing the pointer * do is to call the xfer_done_cb() function, providing the pointer
* to the CAIF SPI interface. It is OK to call this function * to the CAIF SPI interface. It is OK to call this function
* from IRQ context. */ * from IRQ context. */
} }
static int sspi_init_xfer(struct cfspi_xfer *xfer, struct cfspi_dev *dev) static int sspi_init_xfer(struct cfspi_xfer *xfer, struct cfspi_dev *dev)
{ {
/* Store transfer info. For a normal implementation you should /* Store transfer info. For a normal implementation you should
* set up your DMA here and make sure that you are ready to * set up your DMA here and make sure that you are ready to
* receive the data from the master SPI. */ * receive the data from the master SPI. */
struct sspi_struct *sspi = (struct sspi_struct *)dev->priv; struct sspi_struct *sspi = (struct sspi_struct *)dev->priv;
sspi->xfer = xfer; sspi->xfer = xfer;
return 0; return 0;
} }
void sspi_sig_xfer(bool xfer, struct cfspi_dev *dev) void sspi_sig_xfer(bool xfer, struct cfspi_dev *dev)
{ {
/* If xfer is true then you should assert the SPI_INT to indicate to /* If xfer is true then you should assert the SPI_INT to indicate to
* the master that you are ready to receive the data from the master * the master that you are ready to receive the data from the master
* SPI. If xfer is false then you should de-assert SPI_INT to indicate * SPI. If xfer is false then you should de-assert SPI_INT to indicate
* that the transfer is done. * that the transfer is done.
*/ */
struct sspi_struct *sspi = (struct sspi_struct *)dev->priv; struct sspi_struct *sspi = (struct sspi_struct *)dev->priv;
} }
static void sspi_release(struct device *dev) static void sspi_release(struct device *dev)
{ {
/* /*
* Here you should release your SPI device resources. * Here you should release your SPI device resources.
*/ */
} }
static int __init sspi_init(void) static int __init sspi_init(void)
{ {
/* Here you should initialize your SPI device by providing the /* Here you should initialize your SPI device by providing the
* necessary functions, clock speed, name and private data. Once * necessary functions, clock speed, name and private data. Once
* done, you can register your device with the * done, you can register your device with the
* platform_device_register() function. This function will return * platform_device_register() function. This function will return
* with the CAIF SPI interface initialized. This is probably also * with the CAIF SPI interface initialized. This is probably also
* the place where you should set up your GPIOs, interrupts and SPI * the place where you should set up your GPIOs, interrupts and SPI
* resources. */ * resources. */
int res = 0; int res = 0;
/* Initialize slave device. */ /* Initialize slave device. */
slave.sdev.init_xfer = sspi_init_xfer; slave.sdev.init_xfer = sspi_init_xfer;
slave.sdev.sig_xfer = sspi_sig_xfer; slave.sdev.sig_xfer = sspi_sig_xfer;
slave.sdev.clk_mhz = 13; slave.sdev.clk_mhz = 13;
slave.sdev.priv = &slave; slave.sdev.priv = &slave;
slave.sdev.name = "spi_sspi"; slave.sdev.name = "spi_sspi";
slave_device.dev.release = sspi_release; slave_device.dev.release = sspi_release;
/* Initialize platform device. */ /* Initialize platform device. */
slave_device.name = "cfspi_sspi"; slave_device.name = "cfspi_sspi";
slave_device.dev.platform_data = &slave.sdev; slave_device.dev.platform_data = &slave.sdev;
/* Register platform device. */ /* Register platform device. */
res = platform_device_register(&slave_device); res = platform_device_register(&slave_device);
if (res) { if (res) {
printk(KERN_WARNING "sspi_init: failed to register dev.\n"); printk(KERN_WARNING "sspi_init: failed to register dev.\n");
return -ENODEV; return -ENODEV;
} }
return res; return res;
} }
static void __exit sspi_exit(void) static void __exit sspi_exit(void)
{ {
platform_device_del(&slave_device); platform_device_del(&slave_device);
} }
module_init(sspi_init); module_init(sspi_init);
module_exit(sspi_exit); module_exit(sspi_exit);
cdc_mbim - Driver for CDC MBIM Mobile Broadband modems .. SPDX-License-Identifier: GPL-2.0
========================================================
======================================================
cdc_mbim - Driver for CDC MBIM Mobile Broadband modems
======================================================
The cdc_mbim driver supports USB devices conforming to the "Universal The cdc_mbim driver supports USB devices conforming to the "Universal
Serial Bus Communications Class Subclass Specification for Mobile Serial Bus Communications Class Subclass Specification for Mobile
...@@ -19,9 +22,9 @@ by a cdc_ncm driver parameter: ...@@ -19,9 +22,9 @@ by a cdc_ncm driver parameter:
prefer_mbim prefer_mbim
----------- -----------
Type: Boolean :Type: Boolean
Valid Range: N/Y (0-1) :Valid Range: N/Y (0-1)
Default Value: Y (MBIM is preferred) :Default Value: Y (MBIM is preferred)
This parameter sets the system policy for NCM/MBIM functions. Such This parameter sets the system policy for NCM/MBIM functions. Such
functions will be handled by either the cdc_ncm driver or the cdc_mbim functions will be handled by either the cdc_ncm driver or the cdc_mbim
...@@ -44,11 +47,13 @@ userspace MBIM management application always is required to enable a ...@@ -44,11 +47,13 @@ userspace MBIM management application always is required to enable a
MBIM function. MBIM function.
Such userspace applications includes, but are not limited to: Such userspace applications includes, but are not limited to:
- mbimcli (included with the libmbim [3] library), and - mbimcli (included with the libmbim [3] library), and
- ModemManager [4] - ModemManager [4]
Establishing a MBIM IP session reequires at least these actions by the Establishing a MBIM IP session reequires at least these actions by the
management application: management application:
- open the control channel - open the control channel
- configure network connection settings - configure network connection settings
- connect to network - connect to network
...@@ -76,7 +81,7 @@ complies with all the control channel requirements in [1]. ...@@ -76,7 +81,7 @@ complies with all the control channel requirements in [1].
The cdc-wdmX device is created as a child of the MBIM control The cdc-wdmX device is created as a child of the MBIM control
interface USB device. The character device associated with a specific interface USB device. The character device associated with a specific
MBIM function can be looked up using sysfs. For example: MBIM function can be looked up using sysfs. For example::
bjorn@nemi:~$ ls /sys/bus/usb/drivers/cdc_mbim/2-4:2.12/usbmisc bjorn@nemi:~$ ls /sys/bus/usb/drivers/cdc_mbim/2-4:2.12/usbmisc
cdc-wdm0 cdc-wdm0
...@@ -119,13 +124,15 @@ negotiated control message size. ...@@ -119,13 +124,15 @@ negotiated control message size.
/dev/cdc-wdmX ioctl() /dev/cdc-wdmX ioctl()
-------------------- ---------------------
IOCTL_WDM_MAX_COMMAND: Get Maximum Command Size IOCTL_WDM_MAX_COMMAND: Get Maximum Command Size
This ioctl returns the wMaxControlMessage field of the CDC MBIM This ioctl returns the wMaxControlMessage field of the CDC MBIM
functional descriptor for MBIM devices. This is intended as a functional descriptor for MBIM devices. This is intended as a
convenience, eliminating the need to parse the USB descriptors from convenience, eliminating the need to parse the USB descriptors from
userspace. userspace.
::
#include <stdio.h> #include <stdio.h>
#include <fcntl.h> #include <fcntl.h>
#include <sys/ioctl.h> #include <sys/ioctl.h>
...@@ -178,7 +185,7 @@ VLAN links prior to establishing MBIM IP sessions where the SessionId ...@@ -178,7 +185,7 @@ VLAN links prior to establishing MBIM IP sessions where the SessionId
is greater than 0. These links can be added by using the normal VLAN is greater than 0. These links can be added by using the normal VLAN
kernel interfaces, either ioctl or netlink. kernel interfaces, either ioctl or netlink.
For example, adding a link for a MBIM IP session with SessionId 3: For example, adding a link for a MBIM IP session with SessionId 3::
ip link add link wwan0 name wwan0.3 type vlan id 3 ip link add link wwan0 name wwan0.3 type vlan id 3
...@@ -207,6 +214,7 @@ the stream to the end user in an appropriate way for the stream type. ...@@ -207,6 +214,7 @@ the stream to the end user in an appropriate way for the stream type.
The network device ABI requires a dummy ethernet header for every DSS The network device ABI requires a dummy ethernet header for every DSS
data frame being transported. The contents of this header is data frame being transported. The contents of this header is
arbitrary, with the following exceptions: arbitrary, with the following exceptions:
- TX frames using an IP protocol (0x0800 or 0x86dd) will be dropped - TX frames using an IP protocol (0x0800 or 0x86dd) will be dropped
- RX frames will have the protocol field set to ETH_P_802_3 (but will - RX frames will have the protocol field set to ETH_P_802_3 (but will
not be properly formatted 802.3 frames) not be properly formatted 802.3 frames)
...@@ -218,7 +226,7 @@ adding the dummy ethernet header on TX and stripping it on RX. ...@@ -218,7 +226,7 @@ adding the dummy ethernet header on TX and stripping it on RX.
This is a simple example using tools commonly available, exporting This is a simple example using tools commonly available, exporting
DssSessionId 5 as a pty character device pointed to by a /dev/nmea DssSessionId 5 as a pty character device pointed to by a /dev/nmea
symlink: symlink::
ip link add link wwan0 name wwan0.dss5 type vlan id 261 ip link add link wwan0 name wwan0.dss5 type vlan id 261
ip link set dev wwan0.dss5 up ip link set dev wwan0.dss5 up
...@@ -236,7 +244,7 @@ map frames to the correct DSS session and adding 18 byte VLAN ethernet ...@@ -236,7 +244,7 @@ map frames to the correct DSS session and adding 18 byte VLAN ethernet
headers with the appropriate tag on TX. In this case using a socket headers with the appropriate tag on TX. In this case using a socket
filter is recommended, matching only the DSS VLAN subset. This avoid filter is recommended, matching only the DSS VLAN subset. This avoid
unnecessary copying of unrelated IP session data to userspace. For unnecessary copying of unrelated IP session data to userspace. For
example: example::
static struct sock_filter dssfilter[] = { static struct sock_filter dssfilter[] = {
/* use special negative offsets to get VLAN tag */ /* use special negative offsets to get VLAN tag */
...@@ -249,11 +257,11 @@ example: ...@@ -249,11 +257,11 @@ example:
BPF_JUMP(BPF_JMP|BPF_JGE|BPF_K, 512, 3, 0), /* 511 is last DSS VLAN */ BPF_JUMP(BPF_JMP|BPF_JGE|BPF_K, 512, 3, 0), /* 511 is last DSS VLAN */
/* verify ethertype */ /* verify ethertype */
BPF_STMT(BPF_LD|BPF_H|BPF_ABS, 2 * ETH_ALEN), BPF_STMT(BPF_LD|BPF_H|BPF_ABS, 2 * ETH_ALEN),
BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, ETH_P_802_3, 0, 1), BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, ETH_P_802_3, 0, 1),
BPF_STMT(BPF_RET|BPF_K, (u_int)-1), /* accept */ BPF_STMT(BPF_RET|BPF_K, (u_int)-1), /* accept */
BPF_STMT(BPF_RET|BPF_K, 0), /* ignore */ BPF_STMT(BPF_RET|BPF_K, 0), /* ignore */
}; };
...@@ -266,6 +274,7 @@ network device. ...@@ -266,6 +274,7 @@ network device.
This mapping implies a few restrictions on multiplexed IPS and DSS This mapping implies a few restrictions on multiplexed IPS and DSS
sessions, which may not always be practical: sessions, which may not always be practical:
- no IPS or DSS session can use a frame size greater than the MTU on - no IPS or DSS session can use a frame size greater than the MTU on
IP session 0 IP session 0
- no IPS or DSS session can be in the up state unless the network - no IPS or DSS session can be in the up state unless the network
...@@ -280,7 +289,7 @@ device. ...@@ -280,7 +289,7 @@ device.
Tip: It might be less confusing to the end user to name this VLAN Tip: It might be less confusing to the end user to name this VLAN
subdevice after the MBIM SessionID instead of the VLAN ID. For subdevice after the MBIM SessionID instead of the VLAN ID. For
example: example::
ip link add link wwan0 name wwan0.0 type vlan id 4094 ip link add link wwan0 name wwan0.0 type vlan id 4094
...@@ -290,7 +299,7 @@ VLAN mapping ...@@ -290,7 +299,7 @@ VLAN mapping
Summarizing the cdc_mbim driver mapping described above, we have this Summarizing the cdc_mbim driver mapping described above, we have this
relationship between VLAN tags on the wwanY network device and MBIM relationship between VLAN tags on the wwanY network device and MBIM
sessions on the shared USB data channel: sessions on the shared USB data channel::
VLAN ID MBIM type MBIM SessionID Notes VLAN ID MBIM type MBIM SessionID Notes
--------------------------------------------------------- ---------------------------------------------------------
...@@ -310,30 +319,37 @@ sessions on the shared USB data channel: ...@@ -310,30 +319,37 @@ sessions on the shared USB data channel:
References References
========== ==========
[1] USB Implementers Forum, Inc. - "Universal Serial Bus 1) USB Implementers Forum, Inc. - "Universal Serial Bus
Communications Class Subclass Specification for Mobile Broadband Communications Class Subclass Specification for Mobile Broadband
Interface Model", Revision 1.0 (Errata 1), May 1, 2013 Interface Model", Revision 1.0 (Errata 1), May 1, 2013
- http://www.usb.org/developers/docs/devclass_docs/ - http://www.usb.org/developers/docs/devclass_docs/
[2] USB Implementers Forum, Inc. - "Universal Serial Bus 2) USB Implementers Forum, Inc. - "Universal Serial Bus
Communications Class Subclass Specifications for Network Control Communications Class Subclass Specifications for Network Control
Model Devices", Revision 1.0 (Errata 1), November 24, 2010 Model Devices", Revision 1.0 (Errata 1), November 24, 2010
- http://www.usb.org/developers/docs/devclass_docs/ - http://www.usb.org/developers/docs/devclass_docs/
[3] libmbim - "a glib-based library for talking to WWAN modems and 3) libmbim - "a glib-based library for talking to WWAN modems and
devices which speak the Mobile Interface Broadband Model (MBIM) devices which speak the Mobile Interface Broadband Model (MBIM)
protocol" protocol"
- http://www.freedesktop.org/wiki/Software/libmbim/ - http://www.freedesktop.org/wiki/Software/libmbim/
[4] ModemManager - "a DBus-activated daemon which controls mobile 4) ModemManager - "a DBus-activated daemon which controls mobile
broadband (2G/3G/4G) devices and connections" broadband (2G/3G/4G) devices and connections"
- http://www.freedesktop.org/wiki/Software/ModemManager/ - http://www.freedesktop.org/wiki/Software/ModemManager/
[5] "MBIM (Mobile Broadband Interface Model) Registry" 5) "MBIM (Mobile Broadband Interface Model) Registry"
- http://compliance.usb.org/mbim/ - http://compliance.usb.org/mbim/
[6] "/sys/kernel/debug/usb/devices output format" 6) "/sys/kernel/debug/usb/devices output format"
- Documentation/driver-api/usb/usb.rst - Documentation/driver-api/usb/usb.rst
[7] "/sys/bus/usb/devices/.../descriptors" 7) "/sys/bus/usb/devices/.../descriptors"
- Documentation/ABI/stable/sysfs-bus-usb - Documentation/ABI/stable/sysfs-bus-usb
Text File for the COPS LocalTalk Linux driver (cops.c). .. SPDX-License-Identifier: GPL-2.0
By Jay Schulist <jschlst@samba.org>
========================================
The COPS LocalTalk Linux driver (cops.c)
========================================
By Jay Schulist <jschlst@samba.org>
This driver has two modes and they are: Dayna mode and Tangent mode. This driver has two modes and they are: Dayna mode and Tangent mode.
Each mode corresponds with the type of card. It has been found Each mode corresponds with the type of card. It has been found
that there are 2 main types of cards and all other cards are that there are 2 main types of cards and all other cards are
the same and just have different names or only have minor differences the same and just have different names or only have minor differences
such as more IO ports. As this driver is tested it will such as more IO ports. As this driver is tested it will
become more clear exactly what cards are supported. become more clear exactly what cards are supported.
Right now these cards are known to work with the COPS driver. The Right now these cards are known to work with the COPS driver. The
LT-200 cards work in a somewhat more limited capacity than the LT-200 cards work in a somewhat more limited capacity than the
DL200 cards, which work very well and are in use by many people. DL200 cards, which work very well and are in use by many people.
TANGENT driver mode: TANGENT driver mode:
Tangent ATB-II, Novell NL-1000, Daystar Digital LT-200 - Tangent ATB-II, Novell NL-1000, Daystar Digital LT-200
DAYNA driver mode: DAYNA driver mode:
Dayna DL2000/DaynaTalk PC (Half Length), COPS LT-95, - Dayna DL2000/DaynaTalk PC (Half Length), COPS LT-95,
Farallon PhoneNET PC III, Farallon PhoneNET PC II - Farallon PhoneNET PC III, Farallon PhoneNET PC II
Other cards possibly supported mode unknown though: Other cards possibly supported mode unknown though:
Dayna DL2000 (Full length) - Dayna DL2000 (Full length)
The COPS driver defaults to using Dayna mode. To change the driver's The COPS driver defaults to using Dayna mode. To change the driver's
mode if you built a driver with dual support use board_type=1 or mode if you built a driver with dual support use board_type=1 or
board_type=2 for Dayna or Tangent with insmod. board_type=2 for Dayna or Tangent with insmod.
** Operation/loading of the driver. Operation/loading of the driver
===============================
Use modprobe like this: /sbin/modprobe cops.o (IO #) (IRQ #) Use modprobe like this: /sbin/modprobe cops.o (IO #) (IRQ #)
If you do not specify any options the driver will try and use the IO = 0x240, If you do not specify any options the driver will try and use the IO = 0x240,
IRQ = 5. As of right now I would only use IRQ 5 for the card, if autoprobing. IRQ = 5. As of right now I would only use IRQ 5 for the card, if autoprobing.
To load multiple COPS driver Localtalk cards you can do one of the following. To load multiple COPS driver Localtalk cards you can do one of the following::
insmod cops io=0x240 irq=5
insmod -o cops2 cops io=0x260 irq=3
insmod cops io=0x240 irq=5 Or in lilo.conf put something like this::
insmod -o cops2 cops io=0x260 irq=3
Or in lilo.conf put something like this:
append="ether=5,0x240,lt0 ether=3,0x260,lt1" append="ether=5,0x240,lt0 ether=3,0x260,lt1"
Then bring up the interface with ifconfig. It will look something like this: Then bring up the interface with ifconfig. It will look something like this::
lt0 Link encap:UNSPEC HWaddr 00-00-00-00-00-00-00-F7-00-00-00-00-00-00-00-00
inet addr:192.168.1.2 Bcast:192.168.1.255 Mask:255.255.255.0 lt0 Link encap:UNSPEC HWaddr 00-00-00-00-00-00-00-F7-00-00-00-00-00-00-00-00
UP BROADCAST RUNNING NOARP MULTICAST MTU:600 Metric:1 inet addr:192.168.1.2 Bcast:192.168.1.255 Mask:255.255.255.0
RX packets:0 errors:0 dropped:0 overruns:0 frame:0 UP BROADCAST RUNNING NOARP MULTICAST MTU:600 Metric:1
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 coll:0 RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 coll:0
Netatalk Configuration
======================
** Netatalk Configuration
You will need to configure atalkd with something like the following to make You will need to configure atalkd with something like the following to make
it work with the cops.c driver. it work with the cops.c driver.
* For single LTalk card use. * For single LTalk card use::
dummy -seed -phase 2 -net 2000 -addr 2000.10 -zone "1033"
lt0 -seed -phase 1 -net 1000 -addr 1000.50 -zone "1033" dummy -seed -phase 2 -net 2000 -addr 2000.10 -zone "1033"
lt0 -seed -phase 1 -net 1000 -addr 1000.50 -zone "1033"
* For multiple cards, Ethernet and LocalTalk. * For multiple cards, Ethernet and LocalTalk::
eth0 -seed -phase 2 -net 3000 -addr 3000.20 -zone "1033"
lt0 -seed -phase 1 -net 1000 -addr 1000.50 -zone "1033" eth0 -seed -phase 2 -net 3000 -addr 3000.20 -zone "1033"
lt0 -seed -phase 1 -net 1000 -addr 1000.50 -zone "1033"
* For multiple LocalTalk cards, and an Ethernet card. * For multiple LocalTalk cards, and an Ethernet card.
* Order seems to matter here, Ethernet last.
lt0 -seed -phase 1 -net 1000 -addr 1000.10 -zone "LocalTalk1" * Order seems to matter here, Ethernet last::
lt1 -seed -phase 1 -net 2000 -addr 2000.20 -zone "LocalTalk2"
eth0 -seed -phase 2 -net 3000 -addr 3000.30 -zone "EtherTalk" lt0 -seed -phase 1 -net 1000 -addr 1000.10 -zone "LocalTalk1"
lt1 -seed -phase 1 -net 2000 -addr 2000.20 -zone "LocalTalk2"
eth0 -seed -phase 2 -net 3000 -addr 3000.30 -zone "EtherTalk"
.. SPDX-License-Identifier: GPL-2.0
========================
ATM cxacru device driver
========================
Firmware is required for this device: http://accessrunner.sourceforge.net/ Firmware is required for this device: http://accessrunner.sourceforge.net/
While it is capable of managing/maintaining the ADSL connection without the While it is capable of managing/maintaining the ADSL connection without the
...@@ -19,29 +25,35 @@ several sysfs attribute files for retrieving device statistics: ...@@ -19,29 +25,35 @@ several sysfs attribute files for retrieving device statistics:
* adsl_headend * adsl_headend
* adsl_headend_environment * adsl_headend_environment
Information about the remote headend.
- Information about the remote headend.
* adsl_config * adsl_config
Configuration writing interface.
Write parameters in hexadecimal format <index>=<value>, - Configuration writing interface.
separated by whitespace, e.g.: - Write parameters in hexadecimal format <index>=<value>,
separated by whitespace, e.g.:
"1=0 a=5" "1=0 a=5"
Up to 7 parameters at a time will be sent and the modem will restart
the ADSL connection when any value is set. These are logged for future - Up to 7 parameters at a time will be sent and the modem will restart
reference. the ADSL connection when any value is set. These are logged for future
reference.
* downstream_attenuation (dB) * downstream_attenuation (dB)
* downstream_bits_per_frame * downstream_bits_per_frame
* downstream_rate (kbps) * downstream_rate (kbps)
* downstream_snr_margin (dB) * downstream_snr_margin (dB)
Downstream stats.
- Downstream stats.
* upstream_attenuation (dB) * upstream_attenuation (dB)
* upstream_bits_per_frame * upstream_bits_per_frame
* upstream_rate (kbps) * upstream_rate (kbps)
* upstream_snr_margin (dB) * upstream_snr_margin (dB)
* transmitter_power (dBm/Hz) * transmitter_power (dBm/Hz)
Upstream stats.
- Upstream stats.
* downstream_crc_errors * downstream_crc_errors
* downstream_fec_errors * downstream_fec_errors
...@@ -49,48 +61,56 @@ several sysfs attribute files for retrieving device statistics: ...@@ -49,48 +61,56 @@ several sysfs attribute files for retrieving device statistics:
* upstream_crc_errors * upstream_crc_errors
* upstream_fec_errors * upstream_fec_errors
* upstream_hec_errors * upstream_hec_errors
Error counts.
- Error counts.
* line_startable * line_startable
Indicates that ADSL support on the device
is/can be enabled, see adsl_start. - Indicates that ADSL support on the device
is/can be enabled, see adsl_start.
* line_status * line_status
"initialising"
"down" - "initialising"
"attempting to activate" - "down"
"training" - "attempting to activate"
"channel analysis" - "training"
"exchange" - "channel analysis"
"waiting" - "exchange"
"up" - "waiting"
- "up"
Changes between "down" and "attempting to activate" Changes between "down" and "attempting to activate"
if there is no signal. if there is no signal.
* link_status * link_status
"not connected"
"connected" - "not connected"
"lost" - "connected"
- "lost"
* mac_address * mac_address
* modulation * modulation
"" (when not connected)
"ANSI T1.413" - "" (when not connected)
"ITU-T G.992.1 (G.DMT)" - "ANSI T1.413"
"ITU-T G.992.2 (G.LITE)" - "ITU-T G.992.1 (G.DMT)"
- "ITU-T G.992.2 (G.LITE)"
* startup_attempts * startup_attempts
Count of total attempts to initialise ADSL.
- Count of total attempts to initialise ADSL.
To enable/disable ADSL, the following can be written to the adsl_state file: To enable/disable ADSL, the following can be written to the adsl_state file:
"start"
"stop
"restart" (stops, waits 1.5s, then starts)
"poll" (used to resume status polling if it was disabled due to failure)
Changes in adsl/line state are reported via kernel log messages: - "start"
- "stop
- "restart" (stops, waits 1.5s, then starts)
- "poll" (used to resume status polling if it was disabled due to failure)
Changes in adsl/line state are reported via kernel log messages::
[4942145.150704] ATM dev 0: ADSL state: running [4942145.150704] ATM dev 0: ADSL state: running
[4942243.663766] ATM dev 0: ADSL line: down [4942243.663766] ATM dev 0: ADSL line: down
[4942249.665075] ATM dev 0: ADSL line: attempting to activate [4942249.665075] ATM dev 0: ADSL line: attempting to activate
......
.. SPDX-License-Identifier: GPL-2.0
=============
DCCP protocol DCCP protocol
============= =============
Contents .. Contents
======== - Introduction
- Introduction - Missing features
- Missing features - Socket options
- Socket options - Sysctl variables
- Sysctl variables - IOCTLs
- IOCTLs - Other tunables
- Other tunables - Notes
- Notes
Introduction Introduction
...@@ -38,6 +40,7 @@ The Linux DCCP implementation does not currently support all the features that a ...@@ -38,6 +40,7 @@ The Linux DCCP implementation does not currently support all the features that a
specified in RFCs 4340...42. specified in RFCs 4340...42.
The known bugs are at: The known bugs are at:
http://www.linuxfoundation.org/collaborate/workgroups/networking/todo#DCCP http://www.linuxfoundation.org/collaborate/workgroups/networking/todo#DCCP
For more up-to-date versions of the DCCP implementation, please consider using For more up-to-date versions of the DCCP implementation, please consider using
...@@ -54,7 +57,8 @@ defined: the "simple" policy (DCCPQ_POLICY_SIMPLE), which does nothing special, ...@@ -54,7 +57,8 @@ defined: the "simple" policy (DCCPQ_POLICY_SIMPLE), which does nothing special,
and a priority-based variant (DCCPQ_POLICY_PRIO). The latter allows to pass an and a priority-based variant (DCCPQ_POLICY_PRIO). The latter allows to pass an
u32 priority value as ancillary data to sendmsg(), where higher numbers indicate u32 priority value as ancillary data to sendmsg(), where higher numbers indicate
a higher packet priority (similar to SO_PRIORITY). This ancillary data needs to a higher packet priority (similar to SO_PRIORITY). This ancillary data needs to
be formatted using a cmsg(3) message header filled in as follows: be formatted using a cmsg(3) message header filled in as follows::
cmsg->cmsg_level = SOL_DCCP; cmsg->cmsg_level = SOL_DCCP;
cmsg->cmsg_type = DCCP_SCM_PRIORITY; cmsg->cmsg_type = DCCP_SCM_PRIORITY;
cmsg->cmsg_len = CMSG_LEN(sizeof(uint32_t)); /* or CMSG_LEN(4) */ cmsg->cmsg_len = CMSG_LEN(sizeof(uint32_t)); /* or CMSG_LEN(4) */
...@@ -94,7 +98,7 @@ must be registered on the socket before calling connect() or listen(). ...@@ -94,7 +98,7 @@ must be registered on the socket before calling connect() or listen().
DCCP_SOCKOPT_TX_CCID is read/write. It returns the current CCID (if set) or sets DCCP_SOCKOPT_TX_CCID is read/write. It returns the current CCID (if set) or sets
the preference list for the TX CCID, using the same format as DCCP_SOCKOPT_CCID. the preference list for the TX CCID, using the same format as DCCP_SOCKOPT_CCID.
Please note that the getsockopt argument type here is `int', not uint8_t. Please note that the getsockopt argument type here is ``int``, not uint8_t.
DCCP_SOCKOPT_RX_CCID is analogous to DCCP_SOCKOPT_TX_CCID, but for the RX CCID. DCCP_SOCKOPT_RX_CCID is analogous to DCCP_SOCKOPT_TX_CCID, but for the RX CCID.
...@@ -113,6 +117,7 @@ be enabled at the receiver, too with suitable choice of CsCov. ...@@ -113,6 +117,7 @@ be enabled at the receiver, too with suitable choice of CsCov.
DCCP_SOCKOPT_SEND_CSCOV sets the sender checksum coverage. Values in the DCCP_SOCKOPT_SEND_CSCOV sets the sender checksum coverage. Values in the
range 0..15 are acceptable. The default setting is 0 (full coverage), range 0..15 are acceptable. The default setting is 0 (full coverage),
values between 1..15 indicate partial coverage. values between 1..15 indicate partial coverage.
DCCP_SOCKOPT_RECV_CSCOV is for the receiver and has a different meaning: it DCCP_SOCKOPT_RECV_CSCOV is for the receiver and has a different meaning: it
sets a threshold, where again values 0..15 are acceptable. The default sets a threshold, where again values 0..15 are acceptable. The default
of 0 means that all packets with a partial coverage will be discarded. of 0 means that all packets with a partial coverage will be discarded.
...@@ -123,11 +128,13 @@ DCCP_SOCKOPT_RECV_CSCOV is for the receiver and has a different meaning: it ...@@ -123,11 +128,13 @@ DCCP_SOCKOPT_RECV_CSCOV is for the receiver and has a different meaning: it
The following two options apply to CCID 3 exclusively and are getsockopt()-only. The following two options apply to CCID 3 exclusively and are getsockopt()-only.
In either case, a TFRC info struct (defined in <linux/tfrc.h>) is returned. In either case, a TFRC info struct (defined in <linux/tfrc.h>) is returned.
DCCP_SOCKOPT_CCID_RX_INFO DCCP_SOCKOPT_CCID_RX_INFO
Returns a `struct tfrc_rx_info' in optval; the buffer for optval and Returns a ``struct tfrc_rx_info`` in optval; the buffer for optval and
optlen must be set to at least sizeof(struct tfrc_rx_info). optlen must be set to at least sizeof(struct tfrc_rx_info).
DCCP_SOCKOPT_CCID_TX_INFO DCCP_SOCKOPT_CCID_TX_INFO
Returns a `struct tfrc_tx_info' in optval; the buffer for optval and Returns a ``struct tfrc_tx_info`` in optval; the buffer for optval and
optlen must be set to at least sizeof(struct tfrc_tx_info). optlen must be set to at least sizeof(struct tfrc_tx_info).
On unidirectional connections it is useful to close the unused half-connection On unidirectional connections it is useful to close the unused half-connection
...@@ -182,7 +189,7 @@ sync_ratelimit = 125 ms ...@@ -182,7 +189,7 @@ sync_ratelimit = 125 ms
IOCTLS IOCTLS
====== ======
FIONREAD FIONREAD
Works as in udp(7): returns in the `int' argument pointer the size of Works as in udp(7): returns in the ``int`` argument pointer the size of
the next pending datagram in bytes, or 0 when no datagram is pending. the next pending datagram in bytes, or 0 when no datagram is pending.
...@@ -191,10 +198,12 @@ Other tunables ...@@ -191,10 +198,12 @@ Other tunables
Per-route rto_min support Per-route rto_min support
CCID-2 supports the RTAX_RTO_MIN per-route setting for the minimum value CCID-2 supports the RTAX_RTO_MIN per-route setting for the minimum value
of the RTO timer. This setting can be modified via the 'rto_min' option of the RTO timer. This setting can be modified via the 'rto_min' option
of iproute2; for example: of iproute2; for example::
> ip route change 10.0.0.0/24 rto_min 250j dev wlan0 > ip route change 10.0.0.0/24 rto_min 250j dev wlan0
> ip route add 10.0.0.254/32 rto_min 800j dev wlan0 > ip route add 10.0.0.254/32 rto_min 800j dev wlan0
> ip route show dev wlan0 > ip route show dev wlan0
CCID-3 also supports the rto_min setting: it is used to define the lower CCID-3 also supports the rto_min setting: it is used to define the lower
bound for the expiry of the nofeedback timer. This can be useful on LANs bound for the expiry of the nofeedback timer. This can be useful on LANs
with very low RTTs (e.g., loopback, Gbit ethernet). with very low RTTs (e.g., loopback, Gbit ethernet).
......
.. SPDX-License-Identifier: GPL-2.0
======================
DCTCP (DataCenter TCP) DCTCP (DataCenter TCP)
---------------------- ======================
DCTCP is an enhancement to the TCP congestion control algorithm for data DCTCP is an enhancement to the TCP congestion control algorithm for data
center networks and leverages Explicit Congestion Notification (ECN) in center networks and leverages Explicit Congestion Notification (ECN) in
the data center network to provide multi-bit feedback to the end hosts. the data center network to provide multi-bit feedback to the end hosts.
To enable it on end hosts: To enable it on end hosts::
sysctl -w net.ipv4.tcp_congestion_control=dctcp sysctl -w net.ipv4.tcp_congestion_control=dctcp
sysctl -w net.ipv4.tcp_ecn_fallback=0 (optional) sysctl -w net.ipv4.tcp_ecn_fallback=0 (optional)
...@@ -25,14 +28,19 @@ SIGCOMM/SIGMETRICS papers: ...@@ -25,14 +28,19 @@ SIGCOMM/SIGMETRICS papers:
i) Mohammad Alizadeh, Albert Greenberg, David A. Maltz, Jitendra Padhye, i) Mohammad Alizadeh, Albert Greenberg, David A. Maltz, Jitendra Padhye,
Parveen Patel, Balaji Prabhakar, Sudipta Sengupta, and Murari Sridharan: Parveen Patel, Balaji Prabhakar, Sudipta Sengupta, and Murari Sridharan:
"Data Center TCP (DCTCP)", Data Center Networks session
"Data Center TCP (DCTCP)", Data Center Networks session"
Proc. ACM SIGCOMM, New Delhi, 2010. Proc. ACM SIGCOMM, New Delhi, 2010.
http://simula.stanford.edu/~alizade/Site/DCTCP_files/dctcp-final.pdf http://simula.stanford.edu/~alizade/Site/DCTCP_files/dctcp-final.pdf
http://www.sigcomm.org/ccr/papers/2010/October/1851275.1851192 http://www.sigcomm.org/ccr/papers/2010/October/1851275.1851192
ii) Mohammad Alizadeh, Adel Javanmard, and Balaji Prabhakar: ii) Mohammad Alizadeh, Adel Javanmard, and Balaji Prabhakar:
"Analysis of DCTCP: Stability, Convergence, and Fairness" "Analysis of DCTCP: Stability, Convergence, and Fairness"
Proc. ACM SIGMETRICS, San Jose, 2011. Proc. ACM SIGMETRICS, San Jose, 2011.
http://simula.stanford.edu/~alizade/Site/DCTCP_files/dctcp_analysis-full.pdf http://simula.stanford.edu/~alizade/Site/DCTCP_files/dctcp_analysis-full.pdf
IETF informational draft: IETF informational draft:
......
Linux DECnet Networking Layer Information .. SPDX-License-Identifier: GPL-2.0
===========================================
1) Other documentation.... =========================================
Linux DECnet Networking Layer Information
=========================================
o Project Home Pages 1. Other documentation....
http://www.chygwyn.com/ - Kernel info ==========================
http://linux-decnet.sourceforge.net/ - Userland tools
http://www.sourceforge.net/projects/linux-decnet/ - Status page
2) Configuring the kernel - Project Home Pages
- http://www.chygwyn.com/ - Kernel info
- http://linux-decnet.sourceforge.net/ - Userland tools
- http://www.sourceforge.net/projects/linux-decnet/ - Status page
2. Configuring the kernel
=========================
Be sure to turn on the following options: Be sure to turn on the following options:
CONFIG_DECNET (obviously) - CONFIG_DECNET (obviously)
CONFIG_PROC_FS (to see what's going on) - CONFIG_PROC_FS (to see what's going on)
CONFIG_SYSCTL (for easy configuration) - CONFIG_SYSCTL (for easy configuration)
if you want to try out router support (not properly debugged yet) if you want to try out router support (not properly debugged yet)
you'll need the following options as well... you'll need the following options as well...
CONFIG_DECNET_ROUTER (to be able to add/delete routes) - CONFIG_DECNET_ROUTER (to be able to add/delete routes)
CONFIG_NETFILTER (will be required for the DECnet routing daemon) - CONFIG_NETFILTER (will be required for the DECnet routing daemon)
Don't turn on SIOCGIFCONF support for DECnet unless you are really sure Don't turn on SIOCGIFCONF support for DECnet unless you are really sure
that you need it, in general you won't and it can cause ifconfig to that you need it, in general you won't and it can cause ifconfig to
...@@ -29,7 +34,7 @@ malfunction. ...@@ -29,7 +34,7 @@ malfunction.
Run time configuration has changed slightly from the 2.4 system. If you Run time configuration has changed slightly from the 2.4 system. If you
want to configure an endnode, then the simplified procedure is as follows: want to configure an endnode, then the simplified procedure is as follows:
o Set the MAC address on your ethernet card before starting _any_ other - Set the MAC address on your ethernet card before starting _any_ other
network protocols. network protocols.
As soon as your network card is brought into the UP state, DECnet should As soon as your network card is brought into the UP state, DECnet should
...@@ -37,7 +42,8 @@ start working. If you need something more complicated or are unsure how ...@@ -37,7 +42,8 @@ start working. If you need something more complicated or are unsure how
to set the MAC address, see the next section. Also all configurations which to set the MAC address, see the next section. Also all configurations which
worked with 2.4 will work under 2.5 with no change. worked with 2.4 will work under 2.5 with no change.
3) Command line options 3. Command line options
=======================
You can set a DECnet address on the kernel command line for compatibility You can set a DECnet address on the kernel command line for compatibility
with the 2.4 configuration procedure, but in general it's not needed any more. with the 2.4 configuration procedure, but in general it's not needed any more.
...@@ -56,7 +62,7 @@ interface then you won't see any entries in /proc/net/neigh for the local ...@@ -56,7 +62,7 @@ interface then you won't see any entries in /proc/net/neigh for the local
host until such time as you start a connection. This doesn't affect the host until such time as you start a connection. This doesn't affect the
operation of the local communications in any other way though. operation of the local communications in any other way though.
The kernel command line takes options looking like the following: The kernel command line takes options looking like the following::
decnet.addr=1,2 decnet.addr=1,2
...@@ -82,7 +88,7 @@ address of the node in order for it to be autoconfigured (and then appear in ...@@ -82,7 +88,7 @@ address of the node in order for it to be autoconfigured (and then appear in
FTP sites called dn2ethaddr which can compute the correct ethernet FTP sites called dn2ethaddr which can compute the correct ethernet
address to use. The address can be set by ifconfig either before or address to use. The address can be set by ifconfig either before or
at the time the device is brought up. If you are using RedHat you can at the time the device is brought up. If you are using RedHat you can
add the line: add the line::
MACADDR=AA:00:04:00:03:04 MACADDR=AA:00:04:00:03:04
...@@ -95,7 +101,7 @@ verify with iproute2). ...@@ -95,7 +101,7 @@ verify with iproute2).
The default device for routing can be set through the /proc filesystem The default device for routing can be set through the /proc filesystem
by setting /proc/sys/net/decnet/default_device to the by setting /proc/sys/net/decnet/default_device to the
device you want DECnet to route packets out of when no specific route device you want DECnet to route packets out of when no specific route
is available. Usually this will be eth0, for example: is available. Usually this will be eth0, for example::
echo -n "eth0" >/proc/sys/net/decnet/default_device echo -n "eth0" >/proc/sys/net/decnet/default_device
...@@ -106,7 +112,9 @@ confirm that by looking in the default_device file of course. ...@@ -106,7 +112,9 @@ confirm that by looking in the default_device file of course.
There is a list of what the other files under /proc/sys/net/decnet/ do There is a list of what the other files under /proc/sys/net/decnet/ do
on the kernel patch web site (shown above). on the kernel patch web site (shown above).
4) Run time kernel configuration 4. Run time kernel configuration
================================
This is either done through the sysctl/proc interface (see the kernel web This is either done through the sysctl/proc interface (see the kernel web
pages for details on what the various options do) or through the iproute2 pages for details on what the various options do) or through the iproute2
...@@ -122,20 +130,21 @@ since its the _only_ way to add and delete routes currently. Eventually ...@@ -122,20 +130,21 @@ since its the _only_ way to add and delete routes currently. Eventually
there will be a routing daemon to send and receive routing messages for there will be a routing daemon to send and receive routing messages for
each interface and update the kernel routing tables accordingly. The each interface and update the kernel routing tables accordingly. The
routing daemon will use netfilter to listen to routing packets, and routing daemon will use netfilter to listen to routing packets, and
rtnetlink to update the kernels routing tables. rtnetlink to update the kernels routing tables.
The DECnet raw socket layer has been removed since it was there purely The DECnet raw socket layer has been removed since it was there purely
for use by the routing daemon which will now use netfilter (a much cleaner for use by the routing daemon which will now use netfilter (a much cleaner
and more generic solution) instead. and more generic solution) instead.
5) How can I tell if its working ? 5. How can I tell if its working?
=================================
Here is a quick guide of what to look for in order to know if your DECnet Here is a quick guide of what to look for in order to know if your DECnet
kernel subsystem is working. kernel subsystem is working.
- Is the node address set (see /proc/sys/net/decnet/node_address) - Is the node address set (see /proc/sys/net/decnet/node_address)
- Is the node of the correct type - Is the node of the correct type
(see /proc/sys/net/decnet/conf/<dev>/forwarding) (see /proc/sys/net/decnet/conf/<dev>/forwarding)
- Is the Ethernet MAC address of each Ethernet card set to match - Is the Ethernet MAC address of each Ethernet card set to match
the DECnet address. If in doubt use the dn2ethaddr utility available the DECnet address. If in doubt use the dn2ethaddr utility available
at the ftp archive. at the ftp archive.
...@@ -160,7 +169,8 @@ kernel subsystem is working. ...@@ -160,7 +169,8 @@ kernel subsystem is working.
network, and see if you can obtain the same results. network, and see if you can obtain the same results.
- At this point you are on your own... :-) - At this point you are on your own... :-)
6) How to send a bug report 6. How to send a bug report
===========================
If you've found a bug and want to report it, then there are several things If you've found a bug and want to report it, then there are several things
you can do to help me work out exactly what it is that is wrong. Useful you can do to help me work out exactly what it is that is wrong. Useful
...@@ -175,18 +185,19 @@ information (_most_ of which _is_ _essential_) includes: ...@@ -175,18 +185,19 @@ information (_most_ of which _is_ _essential_) includes:
- How much data was being transferred ? - How much data was being transferred ?
- Was the network congested ? - Was the network congested ?
- How can the problem be reproduced ? - How can the problem be reproduced ?
- Can you use tcpdump to get a trace ? (N.B. Most (all?) versions of - Can you use tcpdump to get a trace ? (N.B. Most (all?) versions of
tcpdump don't understand how to dump DECnet properly, so including tcpdump don't understand how to dump DECnet properly, so including
the hex listing of the packet contents is _essential_, usually the -x flag. the hex listing of the packet contents is _essential_, usually the -x flag.
You may also need to increase the length grabbed with the -s flag. The You may also need to increase the length grabbed with the -s flag. The
-e flag also provides very useful information (ethernet MAC addresses)) -e flag also provides very useful information (ethernet MAC addresses))
7) MAC FAQ 7. MAC FAQ
==========
A quick FAQ on ethernet MAC addresses to explain how Linux and DECnet A quick FAQ on ethernet MAC addresses to explain how Linux and DECnet
interact and how to get the best performance from your hardware. interact and how to get the best performance from your hardware.
Ethernet cards are designed to normally only pass received network frames Ethernet cards are designed to normally only pass received network frames
to a host computer when they are addressed to it, or to the broadcast address. to a host computer when they are addressed to it, or to the broadcast address.
Linux has an interface which allows the setting of extra addresses for Linux has an interface which allows the setting of extra addresses for
...@@ -197,8 +208,8 @@ significant processor time and bus bandwidth can be used up on a busy ...@@ -197,8 +208,8 @@ significant processor time and bus bandwidth can be used up on a busy
network (see the NAPI documentation for a longer explanation of these network (see the NAPI documentation for a longer explanation of these
effects). effects).
DECnet makes use of this interface to allow running DECnet on an ethernet DECnet makes use of this interface to allow running DECnet on an ethernet
card which has already been configured using TCP/IP (presumably using the card which has already been configured using TCP/IP (presumably using the
built in MAC address of the card, as usual) and/or to allow multiple DECnet built in MAC address of the card, as usual) and/or to allow multiple DECnet
addresses on each physical interface. If you do this, be aware that if your addresses on each physical interface. If you do this, be aware that if your
ethernet card doesn't support perfect hashing in its MAC address filter ethernet card doesn't support perfect hashing in its MAC address filter
...@@ -210,7 +221,8 @@ to gain the best efficiency. Better still is to use a card which supports ...@@ -210,7 +221,8 @@ to gain the best efficiency. Better still is to use a card which supports
NAPI as well. NAPI as well.
8) Mailing list 8. Mailing list
===============
If you are keen to get involved in development, or want to ask questions If you are keen to get involved in development, or want to ask questions
about configuration, or even just report bugs, then there is a mailing about configuration, or even just report bugs, then there is a mailing
...@@ -218,7 +230,8 @@ list that you can join, details are at: ...@@ -218,7 +230,8 @@ list that you can join, details are at:
http://sourceforge.net/mail/?group_id=4993 http://sourceforge.net/mail/?group_id=4993
9) Legal Info 9. Legal Info
=============
The Linux DECnet project team have placed their code under the GPL. The The Linux DECnet project team have placed their code under the GPL. The
software is provided "as is" and without warranty express or implied. software is provided "as is" and without warranty express or implied.
......
Notes on the DEC FDDIcontroller 700 (DEFZA-xx) driver v.1.1.4. .. SPDX-License-Identifier: GPL-2.0
=====================================================
Notes on the DEC FDDIcontroller 700 (DEFZA-xx) driver
=====================================================
:Version: v.1.1.4
DEC FDDIcontroller 700 is DEC's first-generation TURBOchannel FDDI DEC FDDIcontroller 700 is DEC's first-generation TURBOchannel FDDI
......
...@@ -33,7 +33,7 @@ The following features are now available in supported kernels: ...@@ -33,7 +33,7 @@ The following features are now available in supported kernels:
- SNMP - SNMP
Channel Bonding documentation can be found in the Linux kernel source: Channel Bonding documentation can be found in the Linux kernel source:
/Documentation/networking/bonding.txt /Documentation/networking/bonding.rst
Identifying Your Adapter Identifying Your Adapter
......
...@@ -37,7 +37,7 @@ The following features are available in this kernel: ...@@ -37,7 +37,7 @@ The following features are available in this kernel:
- SNMP - SNMP
Channel Bonding documentation can be found in the Linux kernel source: Channel Bonding documentation can be found in the Linux kernel source:
/Documentation/networking/bonding.txt /Documentation/networking/bonding.rst
The driver information previously displayed in the /proc filesystem is not The driver information previously displayed in the /proc filesystem is not
supported in this release. Alternatively, you can use ethtool (version 1.6 supported in this release. Alternatively, you can use ethtool (version 1.6
......
=================== .. SPDX-License-Identifier: GPL-2.0
DNS Resolver Module
===================
Contents: ===================
DNS Resolver Module
===================
.. Contents:
- Overview. - Overview.
- Compilation. - Compilation.
...@@ -12,8 +14,7 @@ Contents: ...@@ -12,8 +14,7 @@ Contents:
- Debugging. - Debugging.
======== Overview
OVERVIEW
======== ========
The DNS resolver module provides a way for kernel services to make DNS queries The DNS resolver module provides a way for kernel services to make DNS queries
...@@ -33,50 +34,50 @@ It does not yet support the following AFS features: ...@@ -33,50 +34,50 @@ It does not yet support the following AFS features:
This code is extracted from the CIFS filesystem. This code is extracted from the CIFS filesystem.
=========== Compilation
COMPILATION
=========== ===========
The module should be enabled by turning on the kernel configuration options: The module should be enabled by turning on the kernel configuration options::
CONFIG_DNS_RESOLVER - tristate "DNS Resolver support" CONFIG_DNS_RESOLVER - tristate "DNS Resolver support"
========== Setting up
SETTING UP
========== ==========
To set up this facility, the /etc/request-key.conf file must be altered so that To set up this facility, the /etc/request-key.conf file must be altered so that
/sbin/request-key can appropriately direct the upcalls. For example, to handle /sbin/request-key can appropriately direct the upcalls. For example, to handle
basic dname to IPv4/IPv6 address resolution, the following line should be basic dname to IPv4/IPv6 address resolution, the following line should be
added: added::
#OP TYPE DESC CO-INFO PROGRAM ARG1 ARG2 ARG3 ... #OP TYPE DESC CO-INFO PROGRAM ARG1 ARG2 ARG3 ...
#====== ============ ======= ======= ========================== #====== ============ ======= ======= ==========================
create dns_resolver * * /usr/sbin/cifs.upcall %k create dns_resolver * * /usr/sbin/cifs.upcall %k
To direct a query for query type 'foo', a line of the following should be added To direct a query for query type 'foo', a line of the following should be added
before the more general line given above as the first match is the one taken. before the more general line given above as the first match is the one taken::
create dns_resolver foo:* * /usr/sbin/dns.foo %k create dns_resolver foo:* * /usr/sbin/dns.foo %k
===== Usage
USAGE
===== =====
To make use of this facility, one of the following functions that are To make use of this facility, one of the following functions that are
implemented in the module can be called after doing: implemented in the module can be called after doing::
#include <linux/dns_resolver.h> #include <linux/dns_resolver.h>
(1) int dns_query(const char *type, const char *name, size_t namelen, ::
const char *options, char **_result, time_t *_expiry);
int dns_query(const char *type, const char *name, size_t namelen,
const char *options, char **_result, time_t *_expiry);
This is the basic access function. It looks for a cached DNS query and if This is the basic access function. It looks for a cached DNS query and if
it doesn't find it, it upcalls to userspace to make a new DNS query, which it doesn't find it, it upcalls to userspace to make a new DNS query, which
may then be cached. The key description is constructed as a string of the may then be cached. The key description is constructed as a string of the
form: form::
[<type>:]<name> [<type>:]<name>
...@@ -107,16 +108,14 @@ This can be cleared by any process that has the CAP_SYS_ADMIN capability by ...@@ -107,16 +108,14 @@ This can be cleared by any process that has the CAP_SYS_ADMIN capability by
the use of KEYCTL_KEYRING_CLEAR on the keyring ID. the use of KEYCTL_KEYRING_CLEAR on the keyring ID.
=============================== Reading DNS Keys from Userspace
READING DNS KEYS FROM USERSPACE
=============================== ===============================
Keys of dns_resolver type can be read from userspace using keyctl_read() or Keys of dns_resolver type can be read from userspace using keyctl_read() or
"keyctl read/print/pipe". "keyctl read/print/pipe".
========= Mechanism
MECHANISM
========= =========
The dnsresolver module registers a key type called "dns_resolver". Keys of The dnsresolver module registers a key type called "dns_resolver". Keys of
...@@ -147,11 +146,10 @@ See <file:Documentation/security/keys/request-key.rst> for further ...@@ -147,11 +146,10 @@ See <file:Documentation/security/keys/request-key.rst> for further
information about request-key function. information about request-key function.
========= Debugging
DEBUGGING
========= =========
Debugging messages can be turned on dynamically by writing a 1 into the Debugging messages can be turned on dynamically by writing a 1 into the
following file: following file::
/sys/module/dnsresolver/parameters/debug /sys/module/dnsresolver/parameters/debug
Document about softnet driver issues .. SPDX-License-Identifier: GPL-2.0
=====================
Softnet Driver Issues
=====================
Transmit path guidelines: Transmit path guidelines:
...@@ -8,7 +12,7 @@ Transmit path guidelines: ...@@ -8,7 +12,7 @@ Transmit path guidelines:
transmit function will become busy. transmit function will become busy.
Instead it must maintain the queue properly. For example, Instead it must maintain the queue properly. For example,
for a driver implementing scatter-gather this means: for a driver implementing scatter-gather this means::
static netdev_tx_t drv_hard_start_xmit(struct sk_buff *skb, static netdev_tx_t drv_hard_start_xmit(struct sk_buff *skb,
struct net_device *dev) struct net_device *dev)
...@@ -38,25 +42,25 @@ Transmit path guidelines: ...@@ -38,25 +42,25 @@ Transmit path guidelines:
return NETDEV_TX_OK; return NETDEV_TX_OK;
} }
And then at the end of your TX reclamation event handling: And then at the end of your TX reclamation event handling::
if (netif_queue_stopped(dp->dev) && if (netif_queue_stopped(dp->dev) &&
TX_BUFFS_AVAIL(dp) > (MAX_SKB_FRAGS + 1)) TX_BUFFS_AVAIL(dp) > (MAX_SKB_FRAGS + 1))
netif_wake_queue(dp->dev); netif_wake_queue(dp->dev);
For a non-scatter-gather supporting card, the three tests simply become: For a non-scatter-gather supporting card, the three tests simply become::
/* This is a hard error log it. */ /* This is a hard error log it. */
if (TX_BUFFS_AVAIL(dp) <= 0) if (TX_BUFFS_AVAIL(dp) <= 0)
and: and::
if (TX_BUFFS_AVAIL(dp) == 0) if (TX_BUFFS_AVAIL(dp) == 0)
and: and::
if (netif_queue_stopped(dp->dev) && if (netif_queue_stopped(dp->dev) &&
TX_BUFFS_AVAIL(dp) > 0) TX_BUFFS_AVAIL(dp) > 0)
netif_wake_queue(dp->dev); netif_wake_queue(dp->dev);
2) An ndo_start_xmit method must not modify the shared parts of a 2) An ndo_start_xmit method must not modify the shared parts of a
...@@ -86,7 +90,7 @@ Close/stop guidelines: ...@@ -86,7 +90,7 @@ Close/stop guidelines:
1) After the ndo_stop routine has been called, the hardware must 1) After the ndo_stop routine has been called, the hardware must
not receive or transmit any data. All in flight packets must not receive or transmit any data. All in flight packets must
be aborted. If necessary, poll or wait for completion of be aborted. If necessary, poll or wait for completion of
any reset commands. any reset commands.
2) The ndo_stop routine will be called by unregister_netdevice 2) The ndo_stop routine will be called by unregister_netdevice
......
EQL Driver: Serial IP Load Balancing HOWTO .. SPDX-License-Identifier: GPL-2.0
==========================================
EQL Driver: Serial IP Load Balancing HOWTO
==========================================
Simon "Guru Aleph-Null" Janes, simon@ncm.com Simon "Guru Aleph-Null" Janes, simon@ncm.com
v1.1, February 27, 1995 v1.1, February 27, 1995
This is the manual for the EQL device driver. EQL is a software device This is the manual for the EQL device driver. EQL is a software device
...@@ -12,7 +18,8 @@ ...@@ -12,7 +18,8 @@
which was only created to patch cleanly in the very latest kernel which was only created to patch cleanly in the very latest kernel
source trees. (Yes, it worked fine.) source trees. (Yes, it worked fine.)
1. Introduction 1. Introduction
===============
Which is worse? A huge fee for a 56K leased line or two phone lines? Which is worse? A huge fee for a 56K leased line or two phone lines?
It's probably the former. If you find yourself craving more bandwidth, It's probably the former. If you find yourself craving more bandwidth,
...@@ -41,47 +48,40 @@ ...@@ -41,47 +48,40 @@
Hey, we can all dream you know... Hey, we can all dream you know...
2. Kernel Configuration 2. Kernel Configuration
=======================
Here I describe the general steps of getting a kernel up and working Here I describe the general steps of getting a kernel up and working
with the eql driver. From patching, building, to installing. with the eql driver. From patching, building, to installing.
2.1. Patching The Kernel 2.1. Patching The Kernel
------------------------
If you do not have or cannot get a copy of the kernel with the eql If you do not have or cannot get a copy of the kernel with the eql
driver folded into it, get your copy of the driver from driver folded into it, get your copy of the driver from
ftp://slaughter.ncm.com/pub/Linux/LOAD_BALANCING/eql-1.1.tar.gz. ftp://slaughter.ncm.com/pub/Linux/LOAD_BALANCING/eql-1.1.tar.gz.
Unpack this archive someplace obvious like /usr/local/src/. It will Unpack this archive someplace obvious like /usr/local/src/. It will
create the following files: create the following files::
______________________________________________________________________
-rw-r--r-- guru/ncm 198 Jan 19 18:53 1995 eql-1.1/NO-WARRANTY -rw-r--r-- guru/ncm 198 Jan 19 18:53 1995 eql-1.1/NO-WARRANTY
-rw-r--r-- guru/ncm 30620 Feb 27 21:40 1995 eql-1.1/eql-1.1.patch -rw-r--r-- guru/ncm 30620 Feb 27 21:40 1995 eql-1.1/eql-1.1.patch
-rwxr-xr-x guru/ncm 16111 Jan 12 22:29 1995 eql-1.1/eql_enslave -rwxr-xr-x guru/ncm 16111 Jan 12 22:29 1995 eql-1.1/eql_enslave
-rw-r--r-- guru/ncm 2195 Jan 10 21:48 1995 eql-1.1/eql_enslave.c -rw-r--r-- guru/ncm 2195 Jan 10 21:48 1995 eql-1.1/eql_enslave.c
______________________________________________________________________
Unpack a recent kernel (something after 1.1.92) someplace convenient Unpack a recent kernel (something after 1.1.92) someplace convenient
like say /usr/src/linux-1.1.92.eql. Use symbolic links to point like say /usr/src/linux-1.1.92.eql. Use symbolic links to point
/usr/src/linux to this development directory. /usr/src/linux to this development directory.
Apply the patch by running the commands: Apply the patch by running the commands::
______________________________________________________________________
cd /usr/src cd /usr/src
patch </usr/local/src/eql-1.1/eql-1.1.patch patch </usr/local/src/eql-1.1/eql-1.1.patch
______________________________________________________________________
2.2. Building The Kernel
2.2. Building The Kernel ------------------------
After patching the kernel, run make config and configure the kernel After patching the kernel, run make config and configure the kernel
for your hardware. for your hardware.
...@@ -90,7 +90,8 @@ ...@@ -90,7 +90,8 @@
After configuration, make and install according to your habit. After configuration, make and install according to your habit.
3. Network Configuration 3. Network Configuration
========================
So far, I have only used the eql device with the DSLIP SLIP connection So far, I have only used the eql device with the DSLIP SLIP connection
manager by Matt Dillon (-- "The man who sold his soul to code so much manager by Matt Dillon (-- "The man who sold his soul to code so much
...@@ -100,37 +101,27 @@ ...@@ -100,37 +101,27 @@
connection. connection.
3.1. /etc/rc.d/rc.inet1 3.1. /etc/rc.d/rc.inet1
-----------------------
In rc.inet1, ifconfig the eql device to the IP address you usually use In rc.inet1, ifconfig the eql device to the IP address you usually use
for your machine, and the MTU you prefer for your SLIP lines. One for your machine, and the MTU you prefer for your SLIP lines. One
could argue that MTU should be roughly half the usual size for two could argue that MTU should be roughly half the usual size for two
modems, one-third for three, one-fourth for four, etc... But going modems, one-third for three, one-fourth for four, etc... But going
too far below 296 is probably overkill. Here is an example ifconfig too far below 296 is probably overkill. Here is an example ifconfig
command that sets up the eql device: command that sets up the eql device::
______________________________________________________________________
ifconfig eql 198.67.33.239 mtu 1006 ifconfig eql 198.67.33.239 mtu 1006
______________________________________________________________________
Once the eql device is up and running, add a static default route to Once the eql device is up and running, add a static default route to
it in the routing table using the cool new route syntax that makes it in the routing table using the cool new route syntax that makes
life so much easier: life so much easier::
______________________________________________________________________
route add default eql route add default eql
______________________________________________________________________
3.2. Enslaving Devices By Hand 3.2. Enslaving Devices By Hand
------------------------------
Enslaving devices by hand requires two utility programs: eql_enslave Enslaving devices by hand requires two utility programs: eql_enslave
and eql_emancipate (-- eql_emancipate hasn't been written because when and eql_emancipate (-- eql_emancipate hasn't been written because when
...@@ -140,87 +131,56 @@ ...@@ -140,87 +131,56 @@
The syntax for enslaving a device is "eql_enslave <master-name> The syntax for enslaving a device is "eql_enslave <master-name>
<slave-name> <estimated-bps>". Here are some example enslavings: <slave-name> <estimated-bps>". Here are some example enslavings::
______________________________________________________________________
eql_enslave eql sl0 28800 eql_enslave eql sl0 28800
eql_enslave eql ppp0 14400 eql_enslave eql ppp0 14400
eql_enslave eql sl1 57600 eql_enslave eql sl1 57600
______________________________________________________________________
When you want to free a device from its life of slavery, you can When you want to free a device from its life of slavery, you can
either down the device with ifconfig (eql will automatically bury the either down the device with ifconfig (eql will automatically bury the
dead slave and remove it from its queue) or use eql_emancipate to free dead slave and remove it from its queue) or use eql_emancipate to free
it. (-- Or just ifconfig it down, and the eql driver will take it out it. (-- Or just ifconfig it down, and the eql driver will take it out
for you.--) for you.--)::
______________________________________________________________________
eql_emancipate eql sl0 eql_emancipate eql sl0
eql_emancipate eql ppp0 eql_emancipate eql ppp0
eql_emancipate eql sl1 eql_emancipate eql sl1
______________________________________________________________________
3.3. DSLIP Configuration for the eql Device
-------------------------------------------
3.3. DSLIP Configuration for the eql Device
The general idea is to bring up and keep up as many SLIP connections The general idea is to bring up and keep up as many SLIP connections
as you need, automatically. as you need, automatically.
3.3.1. /etc/slip/runslip.conf 3.3.1. /etc/slip/runslip.conf
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Here is an example runslip.conf:
Here is an example runslip.conf::
name sl-line-1
enabled
baud 38400
mtu 576
ducmd -e /etc/slip/dialout/cua2-288.xp -t 9
command eql_enslave eql $interface 28800
address 198.67.33.239
line /dev/cua2
name sl-line-2
enabled
baud 38400
mtu 576
ducmd -e /etc/slip/dialout/cua3-288.xp -t 9
command eql_enslave eql $interface 28800
address 198.67.33.239
line /dev/cua3
______________________________________________________________________
name sl-line-1
enabled
baud 38400
mtu 576
ducmd -e /etc/slip/dialout/cua2-288.xp -t 9
command eql_enslave eql $interface 28800
address 198.67.33.239
line /dev/cua2
name sl-line-2 3.4. Using PPP and the eql Device
enabled ---------------------------------
baud 38400
mtu 576
ducmd -e /etc/slip/dialout/cua3-288.xp -t 9
command eql_enslave eql $interface 28800
address 198.67.33.239
line /dev/cua3
______________________________________________________________________
3.4. Using PPP and the eql Device
I have not yet done any load-balancing testing for PPP devices, mainly I have not yet done any load-balancing testing for PPP devices, mainly
because I don't have a PPP-connection manager like SLIP has with because I don't have a PPP-connection manager like SLIP has with
...@@ -235,7 +195,8 @@ ...@@ -235,7 +195,8 @@
year. year.
4. About the Slave Scheduler Algorithm 4. About the Slave Scheduler Algorithm
======================================
The slave scheduler probably could be replaced with a dozen other The slave scheduler probably could be replaced with a dozen other
things and push traffic much faster. The formula in the current set things and push traffic much faster. The formula in the current set
...@@ -254,7 +215,8 @@ ...@@ -254,7 +215,8 @@
traffic and the "slower" modem starved. traffic and the "slower" modem starved.
5. Testers' Reports 5. Testers' Reports
===================
Some people have experimented with the eql device with newer Some people have experimented with the eql device with newer
kernels (than 1.1.75). I have since updated the driver to patch kernels (than 1.1.75). I have since updated the driver to patch
...@@ -262,87 +224,29 @@ ...@@ -262,87 +224,29 @@
balancing" driver config option. balancing" driver config option.
o icee from LinuxNET patched 1.1.86 without any rejects and was able - icee from LinuxNET patched 1.1.86 without any rejects and was able
to boot the kernel and enslave a couple of ISDN PPP links. to boot the kernel and enslave a couple of ISDN PPP links.
5.1. Randolph Bentson's Test Report 5.1. Randolph Bentson's Test Report
-----------------------------------
::
From bentson@grieg.seaslug.org Wed Feb 8 19:08:09 1995
Date: Tue, 7 Feb 95 22:57 PST
From: Randolph Bentson <bentson@grieg.seaslug.org>
To: guru@ncm.com
Subject: EQL driver tests
I have been checking out your eql driver. (Nice work, that!)
Although you may already done this performance testing, here
are some data I've discovered.
Randolph Bentson
bentson@grieg.seaslug.org
------------------------------------------------------------------
From bentson@grieg.seaslug.org Wed Feb 8 19:08:09 1995
Date: Tue, 7 Feb 95 22:57 PST
From: Randolph Bentson <bentson@grieg.seaslug.org>
To: guru@ncm.com
Subject: EQL driver tests
I have been checking out your eql driver. (Nice work, that!)
Although you may already done this performance testing, here
are some data I've discovered.
Randolph Bentson
bentson@grieg.seaslug.org
---------------------------------------------------------
A pseudo-device driver, EQL, written by Simon Janes, can be used A pseudo-device driver, EQL, written by Simon Janes, can be used
...@@ -363,7 +267,7 @@ ...@@ -363,7 +267,7 @@
Once a link was established, I timed a binary ftp transfer of Once a link was established, I timed a binary ftp transfer of
289284 bytes of data. If there were no overhead (packet headers, 289284 bytes of data. If there were no overhead (packet headers,
inter-character and inter-packet delays, etc.) the transfers inter-character and inter-packet delays, etc.) the transfers
would take the following times: would take the following times::
bits/sec seconds bits/sec seconds
345600 8.3 345600 8.3
...@@ -388,141 +292,82 @@ ...@@ -388,141 +292,82 @@
that the connection establishment seemed fragile for the higher that the connection establishment seemed fragile for the higher
speeds. Once established, the connection seemed robust enough.) speeds. Once established, the connection seemed robust enough.)
#lines speed mtu seconds theory actual %of ====== ======== === ======== ======= ======= ===
kbit/sec duration speed speed max #lines speed mtu seconds theory actual %of
3 115200 900 _ 345600 kbit/sec duration speed speed max
3 115200 400 18.1 345600 159825 46 ====== ======== === ======== ======= ======= ===
2 115200 900 _ 230400 3 115200 900 _ 345600
2 115200 600 18.1 230400 159825 69 3 115200 400 18.1 345600 159825 46
2 115200 400 19.3 230400 149888 65 2 115200 900 _ 230400
4 57600 900 _ 234600 2 115200 600 18.1 230400 159825 69
4 57600 600 _ 234600 2 115200 400 19.3 230400 149888 65
4 57600 400 _ 234600 4 57600 900 _ 234600
3 57600 600 20.9 172800 138413 80 4 57600 600 _ 234600
3 57600 900 21.2 172800 136455 78 4 57600 400 _ 234600
3 115200 600 21.7 345600 133311 38 3 57600 600 20.9 172800 138413 80
3 57600 400 22.5 172800 128571 74 3 57600 900 21.2 172800 136455 78
4 38400 900 25.2 153600 114795 74 3 115200 600 21.7 345600 133311 38
4 38400 600 26.4 153600 109577 71 3 57600 400 22.5 172800 128571 74
4 38400 400 27.3 153600 105965 68 4 38400 900 25.2 153600 114795 74
2 57600 900 29.1 115200 99410.3 86 4 38400 600 26.4 153600 109577 71
1 115200 900 30.7 115200 94229.3 81 4 38400 400 27.3 153600 105965 68
2 57600 600 30.2 115200 95789.4 83 2 57600 900 29.1 115200 99410.3 86
3 38400 900 30.3 115200 95473.3 82 1 115200 900 30.7 115200 94229.3 81
3 38400 600 31.2 115200 92719.2 80 2 57600 600 30.2 115200 95789.4 83
1 115200 600 31.3 115200 92423 80 3 38400 900 30.3 115200 95473.3 82
2 57600 400 32.3 115200 89561.6 77 3 38400 600 31.2 115200 92719.2 80
1 115200 400 32.8 115200 88196.3 76 1 115200 600 31.3 115200 92423 80
3 38400 400 33.5 115200 86353.4 74 2 57600 400 32.3 115200 89561.6 77
2 38400 900 43.7 76800 66197.7 86 1 115200 400 32.8 115200 88196.3 76
2 38400 600 44 76800 65746.4 85 3 38400 400 33.5 115200 86353.4 74
2 38400 400 47.2 76800 61289 79 2 38400 900 43.7 76800 66197.7 86
4 19200 900 50.8 76800 56945.7 74 2 38400 600 44 76800 65746.4 85
4 19200 400 53.2 76800 54376.7 70 2 38400 400 47.2 76800 61289 79
4 19200 600 53.7 76800 53870.4 70 4 19200 900 50.8 76800 56945.7 74
1 57600 900 54.6 57600 52982.4 91 4 19200 400 53.2 76800 54376.7 70
1 57600 600 56.2 57600 51474 89 4 19200 600 53.7 76800 53870.4 70
3 19200 900 60.5 57600 47815.5 83 1 57600 900 54.6 57600 52982.4 91
1 57600 400 60.2 57600 48053.8 83 1 57600 600 56.2 57600 51474 89
3 19200 600 62 57600 46658.7 81 3 19200 900 60.5 57600 47815.5 83
3 19200 400 64.7 57600 44711.6 77 1 57600 400 60.2 57600 48053.8 83
1 38400 900 79.4 38400 36433.8 94 3 19200 600 62 57600 46658.7 81
1 38400 600 82.4 38400 35107.3 91 3 19200 400 64.7 57600 44711.6 77
2 19200 900 84.4 38400 34275.4 89 1 38400 900 79.4 38400 36433.8 94
1 38400 400 86.8 38400 33327.6 86 1 38400 600 82.4 38400 35107.3 91
2 19200 600 87.6 38400 33023.3 85 2 19200 900 84.4 38400 34275.4 89
2 19200 400 91.2 38400 31719.7 82 1 38400 400 86.8 38400 33327.6 86
4 9600 900 94.7 38400 30547.4 79 2 19200 600 87.6 38400 33023.3 85
4 9600 400 106 38400 27290.9 71 2 19200 400 91.2 38400 31719.7 82
4 9600 600 110 38400 26298.5 68 4 9600 900 94.7 38400 30547.4 79
3 9600 900 118 28800 24515.6 85 4 9600 400 106 38400 27290.9 71
3 9600 600 120 28800 24107 83 4 9600 600 110 38400 26298.5 68
3 9600 400 131 28800 22082.7 76 3 9600 900 118 28800 24515.6 85
1 19200 900 155 19200 18663.5 97 3 9600 600 120 28800 24107 83
1 19200 600 161 19200 17968 93 3 9600 400 131 28800 22082.7 76
1 19200 400 170 19200 17016.7 88 1 19200 900 155 19200 18663.5 97
2 9600 600 176 19200 16436.6 85 1 19200 600 161 19200 17968 93
2 9600 900 180 19200 16071.3 83 1 19200 400 170 19200 17016.7 88
2 9600 400 181 19200 15982.5 83 2 9600 600 176 19200 16436.6 85
1 9600 900 305 9600 9484.72 98 2 9600 900 180 19200 16071.3 83
1 9600 600 314 9600 9212.87 95 2 9600 400 181 19200 15982.5 83
1 9600 400 332 9600 8713.37 90 1 9600 900 305 9600 9484.72 98
1 9600 600 314 9600 9212.87 95
1 9600 400 332 9600 8713.37 90
====== ======== === ======== ======= ======= ===
5.2. Anthony Healy's Report
5.2. Anthony Healy's Report ---------------------------
::
Date: Mon, 13 Feb 1995 16:17:29 +1100 (EST)
From: Antony Healey <ahealey@st.nepean.uws.edu.au>
To: Simon Janes <guru@ncm.com>
Subject: Re: Load Balancing
Date: Mon, 13 Feb 1995 16:17:29 +1100 (EST)
From: Antony Healey <ahealey@st.nepean.uws.edu.au> Hi Simon,
To: Simon Janes <guru@ncm.com>
Subject: Re: Load Balancing
Hi Simon,
I've installed your patch and it works great. I have trialed I've installed your patch and it works great. I have trialed
it over twin SL/IP lines, just over null modems, but I was it over twin SL/IP lines, just over null modems, but I was
able to data at over 48Kb/s [ISDN link -Simon]. I managed a able to data at over 48Kb/s [ISDN link -Simon]. I managed a
transfer of up to 7.5 Kbyte/s on one go, but averaged around transfer of up to 7.5 Kbyte/s on one go, but averaged around
6.4 Kbyte/s, which I think is pretty cool. :) 6.4 Kbyte/s, which I think is pretty cool. :)
LC-trie implementation notes. .. SPDX-License-Identifier: GPL-2.0
============================
LC-trie implementation notes
============================
Node types Node types
---------- ----------
leaf leaf
An end node with data. This has a copy of the relevant key, along An end node with data. This has a copy of the relevant key, along
with 'hlist' with routing table entries sorted by prefix length. with 'hlist' with routing table entries sorted by prefix length.
See struct leaf and struct leaf_info. See struct leaf and struct leaf_info.
...@@ -13,7 +17,7 @@ trie node or tnode ...@@ -13,7 +17,7 @@ trie node or tnode
A few concepts explained A few concepts explained
------------------------ ------------------------
Bits (tnode) Bits (tnode)
The number of bits in the key segment used for indexing into the The number of bits in the key segment used for indexing into the
child array - the "child index". See Level Compression. child array - the "child index". See Level Compression.
...@@ -23,7 +27,7 @@ Pos (tnode) ...@@ -23,7 +27,7 @@ Pos (tnode)
Path Compression / skipped bits Path Compression / skipped bits
Any given tnode is linked to from the child array of its parent, using Any given tnode is linked to from the child array of its parent, using
a segment of the key specified by the parent's "pos" and "bits" a segment of the key specified by the parent's "pos" and "bits"
In certain cases, this tnode's own "pos" will not be immediately In certain cases, this tnode's own "pos" will not be immediately
adjacent to the parent (pos+bits), but there will be some bits adjacent to the parent (pos+bits), but there will be some bits
in the key skipped over because they represent a single path with no in the key skipped over because they represent a single path with no
...@@ -56,8 +60,8 @@ full_children ...@@ -56,8 +60,8 @@ full_children
Comments Comments
--------- ---------
We have tried to keep the structure of the code as close to fib_hash as We have tried to keep the structure of the code as close to fib_hash as
possible to allow verification and help up reviewing. possible to allow verification and help up reviewing.
fib_find_node() fib_find_node()
A good start for understanding this code. This function implements a A good start for understanding this code. This function implements a
......
.. SPDX-License-Identifier: GPL-2.0
=======================================================
Linux Socket Filtering aka Berkeley Packet Filter (BPF) Linux Socket Filtering aka Berkeley Packet Filter (BPF)
======================================================= =======================================================
...@@ -42,10 +45,10 @@ displays what is being placed into this structure. ...@@ -42,10 +45,10 @@ displays what is being placed into this structure.
Although we were only speaking about sockets here, BPF in Linux is used Although we were only speaking about sockets here, BPF in Linux is used
in many more places. There's xt_bpf for netfilter, cls_bpf in the kernel in many more places. There's xt_bpf for netfilter, cls_bpf in the kernel
qdisc layer, SECCOMP-BPF (SECure COMPuting [1]), and lots of other places qdisc layer, SECCOMP-BPF (SECure COMPuting [1]_), and lots of other places
such as team driver, PTP code, etc where BPF is being used. such as team driver, PTP code, etc where BPF is being used.
[1] Documentation/userspace-api/seccomp_filter.rst .. [1] Documentation/userspace-api/seccomp_filter.rst
Original BPF paper: Original BPF paper:
...@@ -59,23 +62,23 @@ Structure ...@@ -59,23 +62,23 @@ Structure
--------- ---------
User space applications include <linux/filter.h> which contains the User space applications include <linux/filter.h> which contains the
following relevant structures: following relevant structures::
struct sock_filter { /* Filter block */ struct sock_filter { /* Filter block */
__u16 code; /* Actual filter code */ __u16 code; /* Actual filter code */
__u8 jt; /* Jump true */ __u8 jt; /* Jump true */
__u8 jf; /* Jump false */ __u8 jf; /* Jump false */
__u32 k; /* Generic multiuse field */ __u32 k; /* Generic multiuse field */
}; };
Such a structure is assembled as an array of 4-tuples, that contains Such a structure is assembled as an array of 4-tuples, that contains
a code, jt, jf and k value. jt and jf are jump offsets and k a generic a code, jt, jf and k value. jt and jf are jump offsets and k a generic
value to be used for a provided code. value to be used for a provided code::
struct sock_fprog { /* Required for SO_ATTACH_FILTER. */ struct sock_fprog { /* Required for SO_ATTACH_FILTER. */
unsigned short len; /* Number of filter blocks */ unsigned short len; /* Number of filter blocks */
struct sock_filter __user *filter; struct sock_filter __user *filter;
}; };
For socket filtering, a pointer to this structure (as shown in For socket filtering, a pointer to this structure (as shown in
follow-up example) is being passed to the kernel through setsockopt(2). follow-up example) is being passed to the kernel through setsockopt(2).
...@@ -83,55 +86,57 @@ follow-up example) is being passed to the kernel through setsockopt(2). ...@@ -83,55 +86,57 @@ follow-up example) is being passed to the kernel through setsockopt(2).
Example Example
------- -------
#include <sys/socket.h> ::
#include <sys/types.h>
#include <arpa/inet.h> #include <sys/socket.h>
#include <linux/if_ether.h> #include <sys/types.h>
/* ... */ #include <arpa/inet.h>
#include <linux/if_ether.h>
/* From the example above: tcpdump -i em1 port 22 -dd */ /* ... */
struct sock_filter code[] = {
{ 0x28, 0, 0, 0x0000000c }, /* From the example above: tcpdump -i em1 port 22 -dd */
{ 0x15, 0, 8, 0x000086dd }, struct sock_filter code[] = {
{ 0x30, 0, 0, 0x00000014 }, { 0x28, 0, 0, 0x0000000c },
{ 0x15, 2, 0, 0x00000084 }, { 0x15, 0, 8, 0x000086dd },
{ 0x15, 1, 0, 0x00000006 }, { 0x30, 0, 0, 0x00000014 },
{ 0x15, 0, 17, 0x00000011 }, { 0x15, 2, 0, 0x00000084 },
{ 0x28, 0, 0, 0x00000036 }, { 0x15, 1, 0, 0x00000006 },
{ 0x15, 14, 0, 0x00000016 }, { 0x15, 0, 17, 0x00000011 },
{ 0x28, 0, 0, 0x00000038 }, { 0x28, 0, 0, 0x00000036 },
{ 0x15, 12, 13, 0x00000016 }, { 0x15, 14, 0, 0x00000016 },
{ 0x15, 0, 12, 0x00000800 }, { 0x28, 0, 0, 0x00000038 },
{ 0x30, 0, 0, 0x00000017 }, { 0x15, 12, 13, 0x00000016 },
{ 0x15, 2, 0, 0x00000084 }, { 0x15, 0, 12, 0x00000800 },
{ 0x15, 1, 0, 0x00000006 }, { 0x30, 0, 0, 0x00000017 },
{ 0x15, 0, 8, 0x00000011 }, { 0x15, 2, 0, 0x00000084 },
{ 0x28, 0, 0, 0x00000014 }, { 0x15, 1, 0, 0x00000006 },
{ 0x45, 6, 0, 0x00001fff }, { 0x15, 0, 8, 0x00000011 },
{ 0xb1, 0, 0, 0x0000000e }, { 0x28, 0, 0, 0x00000014 },
{ 0x48, 0, 0, 0x0000000e }, { 0x45, 6, 0, 0x00001fff },
{ 0x15, 2, 0, 0x00000016 }, { 0xb1, 0, 0, 0x0000000e },
{ 0x48, 0, 0, 0x00000010 }, { 0x48, 0, 0, 0x0000000e },
{ 0x15, 0, 1, 0x00000016 }, { 0x15, 2, 0, 0x00000016 },
{ 0x06, 0, 0, 0x0000ffff }, { 0x48, 0, 0, 0x00000010 },
{ 0x06, 0, 0, 0x00000000 }, { 0x15, 0, 1, 0x00000016 },
}; { 0x06, 0, 0, 0x0000ffff },
{ 0x06, 0, 0, 0x00000000 },
struct sock_fprog bpf = { };
.len = ARRAY_SIZE(code),
.filter = code, struct sock_fprog bpf = {
}; .len = ARRAY_SIZE(code),
.filter = code,
sock = socket(PF_PACKET, SOCK_RAW, htons(ETH_P_ALL)); };
if (sock < 0)
/* ... bail out ... */ sock = socket(PF_PACKET, SOCK_RAW, htons(ETH_P_ALL));
if (sock < 0)
ret = setsockopt(sock, SOL_SOCKET, SO_ATTACH_FILTER, &bpf, sizeof(bpf)); /* ... bail out ... */
if (ret < 0)
/* ... bail out ... */ ret = setsockopt(sock, SOL_SOCKET, SO_ATTACH_FILTER, &bpf, sizeof(bpf));
if (ret < 0)
/* ... */ /* ... bail out ... */
close(sock);
/* ... */
close(sock);
The above example code attaches a socket filter for a PF_PACKET socket The above example code attaches a socket filter for a PF_PACKET socket
in order to let all IPv4/IPv6 packets with port 22 pass. The rest will in order to let all IPv4/IPv6 packets with port 22 pass. The rest will
...@@ -178,15 +183,17 @@ closely modelled after Steven McCanne's and Van Jacobson's BPF paper. ...@@ -178,15 +183,17 @@ closely modelled after Steven McCanne's and Van Jacobson's BPF paper.
The BPF architecture consists of the following basic elements: The BPF architecture consists of the following basic elements:
======= ====================================================
Element Description Element Description
======= ====================================================
A 32 bit wide accumulator A 32 bit wide accumulator
X 32 bit wide X register X 32 bit wide X register
M[] 16 x 32 bit wide misc registers aka "scratch memory M[] 16 x 32 bit wide misc registers aka "scratch memory
store", addressable from 0 to 15 store", addressable from 0 to 15
======= ====================================================
A program, that is translated by bpf_asm into "opcodes" is an array that A program, that is translated by bpf_asm into "opcodes" is an array that
consists of the following elements (as already mentioned): consists of the following elements (as already mentioned)::
op:16, jt:8, jf:8, k:32 op:16, jt:8, jf:8, k:32
...@@ -201,8 +208,9 @@ and return instructions that are also represented in bpf_asm syntax. This ...@@ -201,8 +208,9 @@ and return instructions that are also represented in bpf_asm syntax. This
table lists all bpf_asm instructions available resp. what their underlying table lists all bpf_asm instructions available resp. what their underlying
opcodes as defined in linux/filter.h stand for: opcodes as defined in linux/filter.h stand for:
=========== =================== =====================
Instruction Addressing mode Description Instruction Addressing mode Description
=========== =================== =====================
ld 1, 2, 3, 4, 12 Load word into A ld 1, 2, 3, 4, 12 Load word into A
ldi 4 Load word into A ldi 4 Load word into A
ldh 1, 2 Load half-word into A ldh 1, 2 Load half-word into A
...@@ -241,11 +249,13 @@ opcodes as defined in linux/filter.h stand for: ...@@ -241,11 +249,13 @@ opcodes as defined in linux/filter.h stand for:
txa Copy X into A txa Copy X into A
ret 4, 11 Return ret 4, 11 Return
=========== =================== =====================
The next table shows addressing formats from the 2nd column: The next table shows addressing formats from the 2nd column:
=============== =================== ===============================================
Addressing mode Syntax Description Addressing mode Syntax Description
=============== =================== ===============================================
0 x/%x Register X 0 x/%x Register X
1 [k] BHW at byte offset k in the packet 1 [k] BHW at byte offset k in the packet
2 [x + k] BHW at the offset X + k in the packet 2 [x + k] BHW at the offset X + k in the packet
...@@ -259,6 +269,7 @@ The next table shows addressing formats from the 2nd column: ...@@ -259,6 +269,7 @@ The next table shows addressing formats from the 2nd column:
10 x/%x,Lt Jump to Lt if predicate is true 10 x/%x,Lt Jump to Lt if predicate is true
11 a/%a Accumulator A 11 a/%a Accumulator A
12 extension BPF extension 12 extension BPF extension
=============== =================== ===============================================
The Linux kernel also has a couple of BPF extensions that are used along The Linux kernel also has a couple of BPF extensions that are used along
with the class of load instructions by "overloading" the k argument with with the class of load instructions by "overloading" the k argument with
...@@ -267,8 +278,9 @@ extensions are loaded into A. ...@@ -267,8 +278,9 @@ extensions are loaded into A.
Possible BPF extensions are shown in the following table: Possible BPF extensions are shown in the following table:
=================================== =================================================
Extension Description Extension Description
=================================== =================================================
len skb->len len skb->len
proto skb->protocol proto skb->protocol
type skb->pkt_type type skb->pkt_type
...@@ -285,18 +297,19 @@ Possible BPF extensions are shown in the following table: ...@@ -285,18 +297,19 @@ Possible BPF extensions are shown in the following table:
vlan_avail skb_vlan_tag_present(skb) vlan_avail skb_vlan_tag_present(skb)
vlan_tpid skb->vlan_proto vlan_tpid skb->vlan_proto
rand prandom_u32() rand prandom_u32()
=================================== =================================================
These extensions can also be prefixed with '#'. These extensions can also be prefixed with '#'.
Examples for low-level BPF: Examples for low-level BPF:
** ARP packets: **ARP packets**::
ldh [12] ldh [12]
jne #0x806, drop jne #0x806, drop
ret #-1 ret #-1
drop: ret #0 drop: ret #0
** IPv4 TCP packets: **IPv4 TCP packets**::
ldh [12] ldh [12]
jne #0x800, drop jne #0x800, drop
...@@ -305,14 +318,15 @@ Examples for low-level BPF: ...@@ -305,14 +318,15 @@ Examples for low-level BPF:
ret #-1 ret #-1
drop: ret #0 drop: ret #0
** (Accelerated) VLAN w/ id 10: **(Accelerated) VLAN w/ id 10**::
ld vlan_tci ld vlan_tci
jneq #10, drop jneq #10, drop
ret #-1 ret #-1
drop: ret #0 drop: ret #0
** icmp random packet sampling, 1 in 4 **icmp random packet sampling, 1 in 4**:
ldh [12] ldh [12]
jne #0x800, drop jne #0x800, drop
ldb [23] ldb [23]
...@@ -324,7 +338,7 @@ Examples for low-level BPF: ...@@ -324,7 +338,7 @@ Examples for low-level BPF:
ret #-1 ret #-1
drop: ret #0 drop: ret #0
** SECCOMP filter example: **SECCOMP filter example**::
ld [4] /* offsetof(struct seccomp_data, arch) */ ld [4] /* offsetof(struct seccomp_data, arch) */
jne #0xc000003e, bad /* AUDIT_ARCH_X86_64 */ jne #0xc000003e, bad /* AUDIT_ARCH_X86_64 */
...@@ -345,18 +359,18 @@ Examples for low-level BPF: ...@@ -345,18 +359,18 @@ Examples for low-level BPF:
The above example code can be placed into a file (here called "foo"), and The above example code can be placed into a file (here called "foo"), and
then be passed to the bpf_asm tool for generating opcodes, output that xt_bpf then be passed to the bpf_asm tool for generating opcodes, output that xt_bpf
and cls_bpf understands and can directly be loaded with. Example with above and cls_bpf understands and can directly be loaded with. Example with above
ARP code: ARP code::
$ ./bpf_asm foo $ ./bpf_asm foo
4,40 0 0 12,21 0 1 2054,6 0 0 4294967295,6 0 0 0, 4,40 0 0 12,21 0 1 2054,6 0 0 4294967295,6 0 0 0,
In copy and paste C-like output: In copy and paste C-like output::
$ ./bpf_asm -c foo $ ./bpf_asm -c foo
{ 0x28, 0, 0, 0x0000000c }, { 0x28, 0, 0, 0x0000000c },
{ 0x15, 0, 1, 0x00000806 }, { 0x15, 0, 1, 0x00000806 },
{ 0x06, 0, 0, 0xffffffff }, { 0x06, 0, 0, 0xffffffff },
{ 0x06, 0, 0, 0000000000 }, { 0x06, 0, 0, 0000000000 },
In particular, as usage with xt_bpf or cls_bpf can result in more complex BPF In particular, as usage with xt_bpf or cls_bpf can result in more complex BPF
filters that might not be obvious at first, it's good to test filters before filters that might not be obvious at first, it's good to test filters before
...@@ -365,9 +379,9 @@ bpf_dbg under tools/bpf/ in the kernel source directory. This debugger allows ...@@ -365,9 +379,9 @@ bpf_dbg under tools/bpf/ in the kernel source directory. This debugger allows
for testing BPF filters against given pcap files, single stepping through the for testing BPF filters against given pcap files, single stepping through the
BPF code on the pcap's packets and to do BPF machine register dumps. BPF code on the pcap's packets and to do BPF machine register dumps.
Starting bpf_dbg is trivial and just requires issuing: Starting bpf_dbg is trivial and just requires issuing::
# ./bpf_dbg # ./bpf_dbg
In case input and output do not equal stdin/stdout, bpf_dbg takes an In case input and output do not equal stdin/stdout, bpf_dbg takes an
alternative stdin source as a first argument, and an alternative stdout alternative stdin source as a first argument, and an alternative stdout
...@@ -381,84 +395,100 @@ Interaction in bpf_dbg happens through a shell that also has auto-completion ...@@ -381,84 +395,100 @@ Interaction in bpf_dbg happens through a shell that also has auto-completion
support (follow-up example commands starting with '>' denote bpf_dbg shell). support (follow-up example commands starting with '>' denote bpf_dbg shell).
The usual workflow would be to ... The usual workflow would be to ...
> load bpf 6,40 0 0 12,21 0 3 2048,48 0 0 23,21 0 1 1,6 0 0 65535,6 0 0 0 * load bpf 6,40 0 0 12,21 0 3 2048,48 0 0 23,21 0 1 1,6 0 0 65535,6 0 0 0
Loads a BPF filter from standard output of bpf_asm, or transformed via Loads a BPF filter from standard output of bpf_asm, or transformed via
e.g. `tcpdump -iem1 -ddd port 22 | tr '\n' ','`. Note that for JIT e.g. ``tcpdump -iem1 -ddd port 22 | tr '\n' ','``. Note that for JIT
debugging (next section), this command creates a temporary socket and debugging (next section), this command creates a temporary socket and
loads the BPF code into the kernel. Thus, this will also be useful for loads the BPF code into the kernel. Thus, this will also be useful for
JIT developers. JIT developers.
> load pcap foo.pcap * load pcap foo.pcap
Loads standard tcpdump pcap file. Loads standard tcpdump pcap file.
> run [<n>] * run [<n>]
bpf passes:1 fails:9 bpf passes:1 fails:9
Runs through all packets from a pcap to account how many passes and fails Runs through all packets from a pcap to account how many passes and fails
the filter will generate. A limit of packets to traverse can be given. the filter will generate. A limit of packets to traverse can be given.
> disassemble * disassemble::
l0: ldh [12]
l1: jeq #0x800, l2, l5 l0: ldh [12]
l2: ldb [23] l1: jeq #0x800, l2, l5
l3: jeq #0x1, l4, l5 l2: ldb [23]
l4: ret #0xffff l3: jeq #0x1, l4, l5
l5: ret #0 l4: ret #0xffff
l5: ret #0
Prints out BPF code disassembly. Prints out BPF code disassembly.
> dump * dump::
/* { op, jt, jf, k }, */
{ 0x28, 0, 0, 0x0000000c }, /* { op, jt, jf, k }, */
{ 0x15, 0, 3, 0x00000800 }, { 0x28, 0, 0, 0x0000000c },
{ 0x30, 0, 0, 0x00000017 }, { 0x15, 0, 3, 0x00000800 },
{ 0x15, 0, 1, 0x00000001 }, { 0x30, 0, 0, 0x00000017 },
{ 0x06, 0, 0, 0x0000ffff }, { 0x15, 0, 1, 0x00000001 },
{ 0x06, 0, 0, 0000000000 }, { 0x06, 0, 0, 0x0000ffff },
{ 0x06, 0, 0, 0000000000 },
Prints out C-style BPF code dump. Prints out C-style BPF code dump.
> breakpoint 0 * breakpoint 0::
breakpoint at: l0: ldh [12]
> breakpoint 1 breakpoint at: l0: ldh [12]
breakpoint at: l1: jeq #0x800, l2, l5
* breakpoint 1::
breakpoint at: l1: jeq #0x800, l2, l5
... ...
Sets breakpoints at particular BPF instructions. Issuing a `run` command Sets breakpoints at particular BPF instructions. Issuing a `run` command
will walk through the pcap file continuing from the current packet and will walk through the pcap file continuing from the current packet and
break when a breakpoint is being hit (another `run` will continue from break when a breakpoint is being hit (another `run` will continue from
the currently active breakpoint executing next instructions): the currently active breakpoint executing next instructions):
> run * run::
-- register dump --
pc: [0] <-- program counter -- register dump --
code: [40] jt[0] jf[0] k[12] <-- plain BPF code of current instruction pc: [0] <-- program counter
curr: l0: ldh [12] <-- disassembly of current instruction code: [40] jt[0] jf[0] k[12] <-- plain BPF code of current instruction
A: [00000000][0] <-- content of A (hex, decimal) curr: l0: ldh [12] <-- disassembly of current instruction
X: [00000000][0] <-- content of X (hex, decimal) A: [00000000][0] <-- content of A (hex, decimal)
M[0,15]: [00000000][0] <-- folded content of M (hex, decimal) X: [00000000][0] <-- content of X (hex, decimal)
-- packet dump -- <-- Current packet from pcap (hex) M[0,15]: [00000000][0] <-- folded content of M (hex, decimal)
len: 42 -- packet dump -- <-- Current packet from pcap (hex)
0: 00 19 cb 55 55 a4 00 14 a4 43 78 69 08 06 00 01 len: 42
16: 08 00 06 04 00 01 00 14 a4 43 78 69 0a 3b 01 26 0: 00 19 cb 55 55 a4 00 14 a4 43 78 69 08 06 00 01
32: 00 00 00 00 00 00 0a 3b 01 01 16: 08 00 06 04 00 01 00 14 a4 43 78 69 0a 3b 01 26
(breakpoint) 32: 00 00 00 00 00 00 0a 3b 01 01
> (breakpoint)
>
> breakpoint
breakpoints: 0 1 * breakpoint::
Prints currently set breakpoints.
breakpoints: 0 1
> step [-<n>, +<n>]
Prints currently set breakpoints.
* step [-<n>, +<n>]
Performs single stepping through the BPF program from the current pc Performs single stepping through the BPF program from the current pc
offset. Thus, on each step invocation, above register dump is issued. offset. Thus, on each step invocation, above register dump is issued.
This can go forwards and backwards in time, a plain `step` will break This can go forwards and backwards in time, a plain `step` will break
on the next BPF instruction, thus +1. (No `run` needs to be issued here.) on the next BPF instruction, thus +1. (No `run` needs to be issued here.)
> select <n> * select <n>
Selects a given packet from the pcap file to continue from. Thus, on Selects a given packet from the pcap file to continue from. Thus, on
the next `run` or `step`, the BPF program is being evaluated against the next `run` or `step`, the BPF program is being evaluated against
the user pre-selected packet. Numbering starts just as in Wireshark the user pre-selected packet. Numbering starts just as in Wireshark
with index 1. with index 1.
> quit * quit
#
Exits bpf_dbg. Exits bpf_dbg.
JIT compiler JIT compiler
...@@ -468,23 +498,23 @@ The Linux kernel has a built-in BPF JIT compiler for x86_64, SPARC, ...@@ -468,23 +498,23 @@ The Linux kernel has a built-in BPF JIT compiler for x86_64, SPARC,
PowerPC, ARM, ARM64, MIPS, RISC-V and s390 and can be enabled through PowerPC, ARM, ARM64, MIPS, RISC-V and s390 and can be enabled through
CONFIG_BPF_JIT. The JIT compiler is transparently invoked for each CONFIG_BPF_JIT. The JIT compiler is transparently invoked for each
attached filter from user space or for internal kernel users if it has attached filter from user space or for internal kernel users if it has
been previously enabled by root: been previously enabled by root::
echo 1 > /proc/sys/net/core/bpf_jit_enable echo 1 > /proc/sys/net/core/bpf_jit_enable
For JIT developers, doing audits etc, each compile run can output the generated For JIT developers, doing audits etc, each compile run can output the generated
opcode image into the kernel log via: opcode image into the kernel log via::
echo 2 > /proc/sys/net/core/bpf_jit_enable echo 2 > /proc/sys/net/core/bpf_jit_enable
Example output from dmesg: Example output from dmesg::
[ 3389.935842] flen=6 proglen=70 pass=3 image=ffffffffa0069c8f [ 3389.935842] flen=6 proglen=70 pass=3 image=ffffffffa0069c8f
[ 3389.935847] JIT code: 00000000: 55 48 89 e5 48 83 ec 60 48 89 5d f8 44 8b 4f 68 [ 3389.935847] JIT code: 00000000: 55 48 89 e5 48 83 ec 60 48 89 5d f8 44 8b 4f 68
[ 3389.935849] JIT code: 00000010: 44 2b 4f 6c 4c 8b 87 d8 00 00 00 be 0c 00 00 00 [ 3389.935849] JIT code: 00000010: 44 2b 4f 6c 4c 8b 87 d8 00 00 00 be 0c 00 00 00
[ 3389.935850] JIT code: 00000020: e8 1d 94 ff e0 3d 00 08 00 00 75 16 be 17 00 00 [ 3389.935850] JIT code: 00000020: e8 1d 94 ff e0 3d 00 08 00 00 75 16 be 17 00 00
[ 3389.935851] JIT code: 00000030: 00 e8 28 94 ff e0 83 f8 01 75 07 b8 ff ff 00 00 [ 3389.935851] JIT code: 00000030: 00 e8 28 94 ff e0 83 f8 01 75 07 b8 ff ff 00 00
[ 3389.935852] JIT code: 00000040: eb 02 31 c0 c9 c3 [ 3389.935852] JIT code: 00000040: eb 02 31 c0 c9 c3
When CONFIG_BPF_JIT_ALWAYS_ON is enabled, bpf_jit_enable is permanently set to 1 and When CONFIG_BPF_JIT_ALWAYS_ON is enabled, bpf_jit_enable is permanently set to 1 and
setting any other value than that will return in failure. This is even the case for setting any other value than that will return in failure. This is even the case for
...@@ -493,78 +523,78 @@ is discouraged and introspection through bpftool (under tools/bpf/bpftool/) is t ...@@ -493,78 +523,78 @@ is discouraged and introspection through bpftool (under tools/bpf/bpftool/) is t
generally recommended approach instead. generally recommended approach instead.
In the kernel source tree under tools/bpf/, there's bpf_jit_disasm for In the kernel source tree under tools/bpf/, there's bpf_jit_disasm for
generating disassembly out of the kernel log's hexdump: generating disassembly out of the kernel log's hexdump::
# ./bpf_jit_disasm # ./bpf_jit_disasm
70 bytes emitted from JIT compiler (pass:3, flen:6) 70 bytes emitted from JIT compiler (pass:3, flen:6)
ffffffffa0069c8f + <x>: ffffffffa0069c8f + <x>:
0: push %rbp 0: push %rbp
1: mov %rsp,%rbp 1: mov %rsp,%rbp
4: sub $0x60,%rsp 4: sub $0x60,%rsp
8: mov %rbx,-0x8(%rbp) 8: mov %rbx,-0x8(%rbp)
c: mov 0x68(%rdi),%r9d c: mov 0x68(%rdi),%r9d
10: sub 0x6c(%rdi),%r9d 10: sub 0x6c(%rdi),%r9d
14: mov 0xd8(%rdi),%r8 14: mov 0xd8(%rdi),%r8
1b: mov $0xc,%esi 1b: mov $0xc,%esi
20: callq 0xffffffffe0ff9442 20: callq 0xffffffffe0ff9442
25: cmp $0x800,%eax 25: cmp $0x800,%eax
2a: jne 0x0000000000000042 2a: jne 0x0000000000000042
2c: mov $0x17,%esi 2c: mov $0x17,%esi
31: callq 0xffffffffe0ff945e 31: callq 0xffffffffe0ff945e
36: cmp $0x1,%eax 36: cmp $0x1,%eax
39: jne 0x0000000000000042 39: jne 0x0000000000000042
3b: mov $0xffff,%eax 3b: mov $0xffff,%eax
40: jmp 0x0000000000000044 40: jmp 0x0000000000000044
42: xor %eax,%eax 42: xor %eax,%eax
44: leaveq 44: leaveq
45: retq 45: retq
Issuing option `-o` will "annotate" opcodes to resulting assembler Issuing option `-o` will "annotate" opcodes to resulting assembler
instructions, which can be very useful for JIT developers: instructions, which can be very useful for JIT developers:
# ./bpf_jit_disasm -o # ./bpf_jit_disasm -o
70 bytes emitted from JIT compiler (pass:3, flen:6) 70 bytes emitted from JIT compiler (pass:3, flen:6)
ffffffffa0069c8f + <x>: ffffffffa0069c8f + <x>:
0: push %rbp 0: push %rbp
55 55
1: mov %rsp,%rbp 1: mov %rsp,%rbp
48 89 e5 48 89 e5
4: sub $0x60,%rsp 4: sub $0x60,%rsp
48 83 ec 60 48 83 ec 60
8: mov %rbx,-0x8(%rbp) 8: mov %rbx,-0x8(%rbp)
48 89 5d f8 48 89 5d f8
c: mov 0x68(%rdi),%r9d c: mov 0x68(%rdi),%r9d
44 8b 4f 68 44 8b 4f 68
10: sub 0x6c(%rdi),%r9d 10: sub 0x6c(%rdi),%r9d
44 2b 4f 6c 44 2b 4f 6c
14: mov 0xd8(%rdi),%r8 14: mov 0xd8(%rdi),%r8
4c 8b 87 d8 00 00 00 4c 8b 87 d8 00 00 00
1b: mov $0xc,%esi 1b: mov $0xc,%esi
be 0c 00 00 00 be 0c 00 00 00
20: callq 0xffffffffe0ff9442 20: callq 0xffffffffe0ff9442
e8 1d 94 ff e0 e8 1d 94 ff e0
25: cmp $0x800,%eax 25: cmp $0x800,%eax
3d 00 08 00 00 3d 00 08 00 00
2a: jne 0x0000000000000042 2a: jne 0x0000000000000042
75 16 75 16
2c: mov $0x17,%esi 2c: mov $0x17,%esi
be 17 00 00 00 be 17 00 00 00
31: callq 0xffffffffe0ff945e 31: callq 0xffffffffe0ff945e
e8 28 94 ff e0 e8 28 94 ff e0
36: cmp $0x1,%eax 36: cmp $0x1,%eax
83 f8 01 83 f8 01
39: jne 0x0000000000000042 39: jne 0x0000000000000042
75 07 75 07
3b: mov $0xffff,%eax 3b: mov $0xffff,%eax
b8 ff ff 00 00 b8 ff ff 00 00
40: jmp 0x0000000000000044 40: jmp 0x0000000000000044
eb 02 eb 02
42: xor %eax,%eax 42: xor %eax,%eax
31 c0 31 c0
44: leaveq 44: leaveq
c9 c9
45: retq 45: retq
c3 c3
For BPF JIT developers, bpf_jit_disasm, bpf_asm and bpf_dbg provides a useful For BPF JIT developers, bpf_jit_disasm, bpf_asm and bpf_dbg provides a useful
toolchain for developing and testing the kernel's JIT compiler. toolchain for developing and testing the kernel's JIT compiler.
...@@ -663,9 +693,9 @@ Some core changes of the new internal format: ...@@ -663,9 +693,9 @@ Some core changes of the new internal format:
- Conditional jt/jf targets replaced with jt/fall-through: - Conditional jt/jf targets replaced with jt/fall-through:
While the original design has constructs such as "if (cond) jump_true; While the original design has constructs such as ``if (cond) jump_true;
else jump_false;", they are being replaced into alternative constructs like else jump_false;``, they are being replaced into alternative constructs like
"if (cond) jump_true; /* else fall-through */". ``if (cond) jump_true; /* else fall-through */``.
- Introduces bpf_call insn and register passing convention for zero overhead - Introduces bpf_call insn and register passing convention for zero overhead
calls from/to other kernel functions: calls from/to other kernel functions:
...@@ -684,32 +714,32 @@ Some core changes of the new internal format: ...@@ -684,32 +714,32 @@ Some core changes of the new internal format:
a return value of the function. Since R6 - R9 are callee saved, their state a return value of the function. Since R6 - R9 are callee saved, their state
is preserved across the call. is preserved across the call.
For example, consider three C functions: For example, consider three C functions::
u64 f1() { return (*_f2)(1); } u64 f1() { return (*_f2)(1); }
u64 f2(u64 a) { return f3(a + 1, a); } u64 f2(u64 a) { return f3(a + 1, a); }
u64 f3(u64 a, u64 b) { return a - b; } u64 f3(u64 a, u64 b) { return a - b; }
GCC can compile f1, f3 into x86_64: GCC can compile f1, f3 into x86_64::
f1: f1:
movl $1, %edi movl $1, %edi
movq _f2(%rip), %rax movq _f2(%rip), %rax
jmp *%rax jmp *%rax
f3: f3:
movq %rdi, %rax movq %rdi, %rax
subq %rsi, %rax subq %rsi, %rax
ret ret
Function f2 in eBPF may look like: Function f2 in eBPF may look like::
f2: f2:
bpf_mov R2, R1 bpf_mov R2, R1
bpf_add R1, 1 bpf_add R1, 1
bpf_call f3 bpf_call f3
bpf_exit bpf_exit
If f2 is JITed and the pointer stored to '_f2'. The calls f1 -> f2 -> f3 and If f2 is JITed and the pointer stored to ``_f2``. The calls f1 -> f2 -> f3 and
returns will be seamless. Without JIT, __bpf_prog_run() interpreter needs to returns will be seamless. Without JIT, __bpf_prog_run() interpreter needs to
be used to call into f2. be used to call into f2.
...@@ -722,6 +752,8 @@ Some core changes of the new internal format: ...@@ -722,6 +752,8 @@ Some core changes of the new internal format:
On 64-bit architectures all register map to HW registers one to one. For On 64-bit architectures all register map to HW registers one to one. For
example, x86_64 JIT compiler can map them as ... example, x86_64 JIT compiler can map them as ...
::
R0 - rax R0 - rax
R1 - rdi R1 - rdi
R2 - rsi R2 - rsi
...@@ -737,7 +769,7 @@ Some core changes of the new internal format: ...@@ -737,7 +769,7 @@ Some core changes of the new internal format:
... since x86_64 ABI mandates rdi, rsi, rdx, rcx, r8, r9 for argument passing ... since x86_64 ABI mandates rdi, rsi, rdx, rcx, r8, r9 for argument passing
and rbx, r12 - r15 are callee saved. and rbx, r12 - r15 are callee saved.
Then the following internal BPF pseudo-program: Then the following internal BPF pseudo-program::
bpf_mov R6, R1 /* save ctx */ bpf_mov R6, R1 /* save ctx */
bpf_mov R2, 2 bpf_mov R2, 2
...@@ -755,7 +787,7 @@ Some core changes of the new internal format: ...@@ -755,7 +787,7 @@ Some core changes of the new internal format:
bpf_add R0, R7 bpf_add R0, R7
bpf_exit bpf_exit
After JIT to x86_64 may look like: After JIT to x86_64 may look like::
push %rbp push %rbp
mov %rsp,%rbp mov %rsp,%rbp
...@@ -781,21 +813,21 @@ Some core changes of the new internal format: ...@@ -781,21 +813,21 @@ Some core changes of the new internal format:
leaveq leaveq
retq retq
Which is in this example equivalent in C to: Which is in this example equivalent in C to::
u64 bpf_filter(u64 ctx) u64 bpf_filter(u64 ctx)
{ {
return foo(ctx, 2, 3, 4, 5) + bar(ctx, 6, 7, 8, 9); return foo(ctx, 2, 3, 4, 5) + bar(ctx, 6, 7, 8, 9);
} }
In-kernel functions foo() and bar() with prototype: u64 (*)(u64 arg1, u64 In-kernel functions foo() and bar() with prototype: u64 (*)(u64 arg1, u64
arg2, u64 arg3, u64 arg4, u64 arg5); will receive arguments in proper arg2, u64 arg3, u64 arg4, u64 arg5); will receive arguments in proper
registers and place their return value into '%rax' which is R0 in eBPF. registers and place their return value into ``%rax`` which is R0 in eBPF.
Prologue and epilogue are emitted by JIT and are implicit in the Prologue and epilogue are emitted by JIT and are implicit in the
interpreter. R0-R5 are scratch registers, so eBPF program needs to preserve interpreter. R0-R5 are scratch registers, so eBPF program needs to preserve
them across the calls as defined by calling convention. them across the calls as defined by calling convention.
For example the following program is invalid: For example the following program is invalid::
bpf_mov R1, 1 bpf_mov R1, 1
bpf_call foo bpf_call foo
...@@ -814,7 +846,7 @@ The input context pointer for invoking the interpreter function is generic, ...@@ -814,7 +846,7 @@ The input context pointer for invoking the interpreter function is generic,
its content is defined by a specific use case. For seccomp register R1 points its content is defined by a specific use case. For seccomp register R1 points
to seccomp_data, for converted BPF filters R1 points to a skb. to seccomp_data, for converted BPF filters R1 points to a skb.
A program, that is translated internally consists of the following elements: A program, that is translated internally consists of the following elements::
op:16, jt:8, jf:8, k:32 ==> op:8, dst_reg:4, src_reg:4, off:16, imm:32 op:16, jt:8, jf:8, k:32 ==> op:8, dst_reg:4, src_reg:4, off:16, imm:32
...@@ -824,7 +856,7 @@ instructions must be multiple of 8 bytes to preserve backward compatibility. ...@@ -824,7 +856,7 @@ instructions must be multiple of 8 bytes to preserve backward compatibility.
Internal BPF is a general purpose RISC instruction set. Not every register and Internal BPF is a general purpose RISC instruction set. Not every register and
every instruction are used during translation from original BPF to new format. every instruction are used during translation from original BPF to new format.
For example, socket filters are not using 'exclusive add' instruction, but For example, socket filters are not using ``exclusive add`` instruction, but
tracing filters may do to maintain counters of events, for example. Register R9 tracing filters may do to maintain counters of events, for example. Register R9
is not used by socket filters either, but more complex filters may be running is not used by socket filters either, but more complex filters may be running
out of registers and would have to resort to spill/fill to stack. out of registers and would have to resort to spill/fill to stack.
...@@ -849,7 +881,7 @@ eBPF opcode encoding ...@@ -849,7 +881,7 @@ eBPF opcode encoding
eBPF is reusing most of the opcode encoding from classic to simplify conversion eBPF is reusing most of the opcode encoding from classic to simplify conversion
of classic BPF to eBPF. For arithmetic and jump instructions the 8-bit 'code' of classic BPF to eBPF. For arithmetic and jump instructions the 8-bit 'code'
field is divided into three parts: field is divided into three parts::
+----------------+--------+--------------------+ +----------------+--------+--------------------+
| 4 bits | 1 bit | 3 bits | | 4 bits | 1 bit | 3 bits |
...@@ -859,8 +891,9 @@ field is divided into three parts: ...@@ -859,8 +891,9 @@ field is divided into three parts:
Three LSB bits store instruction class which is one of: Three LSB bits store instruction class which is one of:
Classic BPF classes: eBPF classes: =================== ===============
Classic BPF classes eBPF classes
=================== ===============
BPF_LD 0x00 BPF_LD 0x00 BPF_LD 0x00 BPF_LD 0x00
BPF_LDX 0x01 BPF_LDX 0x01 BPF_LDX 0x01 BPF_LDX 0x01
BPF_ST 0x02 BPF_ST 0x02 BPF_ST 0x02 BPF_ST 0x02
...@@ -869,25 +902,28 @@ Three LSB bits store instruction class which is one of: ...@@ -869,25 +902,28 @@ Three LSB bits store instruction class which is one of:
BPF_JMP 0x05 BPF_JMP 0x05 BPF_JMP 0x05 BPF_JMP 0x05
BPF_RET 0x06 BPF_JMP32 0x06 BPF_RET 0x06 BPF_JMP32 0x06
BPF_MISC 0x07 BPF_ALU64 0x07 BPF_MISC 0x07 BPF_ALU64 0x07
=================== ===============
When BPF_CLASS(code) == BPF_ALU or BPF_JMP, 4th bit encodes source operand ... When BPF_CLASS(code) == BPF_ALU or BPF_JMP, 4th bit encodes source operand ...
BPF_K 0x00 ::
BPF_X 0x08
BPF_K 0x00
BPF_X 0x08
* in classic BPF, this means: * in classic BPF, this means::
BPF_SRC(code) == BPF_X - use register X as source operand BPF_SRC(code) == BPF_X - use register X as source operand
BPF_SRC(code) == BPF_K - use 32-bit immediate as source operand BPF_SRC(code) == BPF_K - use 32-bit immediate as source operand
* in eBPF, this means: * in eBPF, this means::
BPF_SRC(code) == BPF_X - use 'src_reg' register as source operand BPF_SRC(code) == BPF_X - use 'src_reg' register as source operand
BPF_SRC(code) == BPF_K - use 32-bit immediate as source operand BPF_SRC(code) == BPF_K - use 32-bit immediate as source operand
... and four MSB bits store operation code. ... and four MSB bits store operation code.
If BPF_CLASS(code) == BPF_ALU or BPF_ALU64 [ in eBPF ], BPF_OP(code) is one of: If BPF_CLASS(code) == BPF_ALU or BPF_ALU64 [ in eBPF ], BPF_OP(code) is one of::
BPF_ADD 0x00 BPF_ADD 0x00
BPF_SUB 0x10 BPF_SUB 0x10
...@@ -904,7 +940,7 @@ If BPF_CLASS(code) == BPF_ALU or BPF_ALU64 [ in eBPF ], BPF_OP(code) is one of: ...@@ -904,7 +940,7 @@ If BPF_CLASS(code) == BPF_ALU or BPF_ALU64 [ in eBPF ], BPF_OP(code) is one of:
BPF_ARSH 0xc0 /* eBPF only: sign extending shift right */ BPF_ARSH 0xc0 /* eBPF only: sign extending shift right */
BPF_END 0xd0 /* eBPF only: endianness conversion */ BPF_END 0xd0 /* eBPF only: endianness conversion */
If BPF_CLASS(code) == BPF_JMP or BPF_JMP32 [ in eBPF ], BPF_OP(code) is one of: If BPF_CLASS(code) == BPF_JMP or BPF_JMP32 [ in eBPF ], BPF_OP(code) is one of::
BPF_JA 0x00 /* BPF_JMP only */ BPF_JA 0x00 /* BPF_JMP only */
BPF_JEQ 0x10 BPF_JEQ 0x10
...@@ -934,7 +970,7 @@ exactly the same operations as BPF_ALU, but with 64-bit wide operands ...@@ -934,7 +970,7 @@ exactly the same operations as BPF_ALU, but with 64-bit wide operands
instead. So BPF_ADD | BPF_X | BPF_ALU64 means 64-bit addition, i.e.: instead. So BPF_ADD | BPF_X | BPF_ALU64 means 64-bit addition, i.e.:
dst_reg = dst_reg + src_reg dst_reg = dst_reg + src_reg
Classic BPF wastes the whole BPF_RET class to represent a single 'ret' Classic BPF wastes the whole BPF_RET class to represent a single ``ret``
operation. Classic BPF_RET | BPF_K means copy imm32 into return register operation. Classic BPF_RET | BPF_K means copy imm32 into return register
and perform function exit. eBPF is modeled to match CPU, so BPF_JMP | BPF_EXIT and perform function exit. eBPF is modeled to match CPU, so BPF_JMP | BPF_EXIT
in eBPF means function exit only. The eBPF program needs to store return in eBPF means function exit only. The eBPF program needs to store return
...@@ -942,7 +978,7 @@ value into register R0 before doing a BPF_EXIT. Class 6 in eBPF is used as ...@@ -942,7 +978,7 @@ value into register R0 before doing a BPF_EXIT. Class 6 in eBPF is used as
BPF_JMP32 to mean exactly the same operations as BPF_JMP, but with 32-bit wide BPF_JMP32 to mean exactly the same operations as BPF_JMP, but with 32-bit wide
operands for the comparisons instead. operands for the comparisons instead.
For load and store instructions the 8-bit 'code' field is divided as: For load and store instructions the 8-bit 'code' field is divided as::
+--------+--------+-------------------+ +--------+--------+-------------------+
| 3 bits | 2 bits | 3 bits | | 3 bits | 2 bits | 3 bits |
...@@ -952,19 +988,21 @@ For load and store instructions the 8-bit 'code' field is divided as: ...@@ -952,19 +988,21 @@ For load and store instructions the 8-bit 'code' field is divided as:
Size modifier is one of ... Size modifier is one of ...
::
BPF_W 0x00 /* word */ BPF_W 0x00 /* word */
BPF_H 0x08 /* half word */ BPF_H 0x08 /* half word */
BPF_B 0x10 /* byte */ BPF_B 0x10 /* byte */
BPF_DW 0x18 /* eBPF only, double word */ BPF_DW 0x18 /* eBPF only, double word */
... which encodes size of load/store operation: ... which encodes size of load/store operation::
B - 1 byte B - 1 byte
H - 2 byte H - 2 byte
W - 4 byte W - 4 byte
DW - 8 byte (eBPF only) DW - 8 byte (eBPF only)
Mode modifier is one of: Mode modifier is one of::
BPF_IMM 0x00 /* used for 32-bit mov in classic BPF and 64-bit in eBPF */ BPF_IMM 0x00 /* used for 32-bit mov in classic BPF and 64-bit in eBPF */
BPF_ABS 0x20 BPF_ABS 0x20
...@@ -979,7 +1017,7 @@ eBPF has two non-generic instructions: (BPF_ABS | <size> | BPF_LD) and ...@@ -979,7 +1017,7 @@ eBPF has two non-generic instructions: (BPF_ABS | <size> | BPF_LD) and
They had to be carried over from classic to have strong performance of They had to be carried over from classic to have strong performance of
socket filters running in eBPF interpreter. These instructions can only socket filters running in eBPF interpreter. These instructions can only
be used when interpreter context is a pointer to 'struct sk_buff' and be used when interpreter context is a pointer to ``struct sk_buff`` and
have seven implicit operands. Register R6 is an implicit input that must have seven implicit operands. Register R6 is an implicit input that must
contain pointer to sk_buff. Register R0 is an implicit output which contains contain pointer to sk_buff. Register R0 is an implicit output which contains
the data fetched from the packet. Registers R1-R5 are scratch registers the data fetched from the packet. Registers R1-R5 are scratch registers
...@@ -992,26 +1030,26 @@ the interpreter will abort the execution of the program. JIT compilers ...@@ -992,26 +1030,26 @@ the interpreter will abort the execution of the program. JIT compilers
therefore must preserve this property. src_reg and imm32 fields are therefore must preserve this property. src_reg and imm32 fields are
explicit inputs to these instructions. explicit inputs to these instructions.
For example: For example::
BPF_IND | BPF_W | BPF_LD means: BPF_IND | BPF_W | BPF_LD means:
R0 = ntohl(*(u32 *) (((struct sk_buff *) R6)->data + src_reg + imm32)) R0 = ntohl(*(u32 *) (((struct sk_buff *) R6)->data + src_reg + imm32))
and R1 - R5 were scratched. and R1 - R5 were scratched.
Unlike classic BPF instruction set, eBPF has generic load/store operations: Unlike classic BPF instruction set, eBPF has generic load/store operations::
BPF_MEM | <size> | BPF_STX: *(size *) (dst_reg + off) = src_reg BPF_MEM | <size> | BPF_STX: *(size *) (dst_reg + off) = src_reg
BPF_MEM | <size> | BPF_ST: *(size *) (dst_reg + off) = imm32 BPF_MEM | <size> | BPF_ST: *(size *) (dst_reg + off) = imm32
BPF_MEM | <size> | BPF_LDX: dst_reg = *(size *) (src_reg + off) BPF_MEM | <size> | BPF_LDX: dst_reg = *(size *) (src_reg + off)
BPF_XADD | BPF_W | BPF_STX: lock xadd *(u32 *)(dst_reg + off16) += src_reg BPF_XADD | BPF_W | BPF_STX: lock xadd *(u32 *)(dst_reg + off16) += src_reg
BPF_XADD | BPF_DW | BPF_STX: lock xadd *(u64 *)(dst_reg + off16) += src_reg BPF_XADD | BPF_DW | BPF_STX: lock xadd *(u64 *)(dst_reg + off16) += src_reg
Where size is one of: BPF_B or BPF_H or BPF_W or BPF_DW. Note that 1 and Where size is one of: BPF_B or BPF_H or BPF_W or BPF_DW. Note that 1 and
2 byte atomic increments are not supported. 2 byte atomic increments are not supported.
eBPF has one 16-byte instruction: BPF_LD | BPF_DW | BPF_IMM which consists eBPF has one 16-byte instruction: BPF_LD | BPF_DW | BPF_IMM which consists
of two consecutive 'struct bpf_insn' 8-byte blocks and interpreted as single of two consecutive ``struct bpf_insn`` 8-byte blocks and interpreted as single
instruction that loads 64-bit immediate value into a dst_reg. instruction that loads 64-bit immediate value into a dst_reg.
Classic BPF has similar instruction: BPF_LD | BPF_W | BPF_IMM which loads Classic BPF has similar instruction: BPF_LD | BPF_W | BPF_IMM which loads
32-bit immediate value into a register. 32-bit immediate value into a register.
...@@ -1037,38 +1075,48 @@ since addition of two valid pointers makes invalid pointer. ...@@ -1037,38 +1075,48 @@ since addition of two valid pointers makes invalid pointer.
(In 'secure' mode verifier will reject any type of pointer arithmetic to make (In 'secure' mode verifier will reject any type of pointer arithmetic to make
sure that kernel addresses don't leak to unprivileged users) sure that kernel addresses don't leak to unprivileged users)
If register was never written to, it's not readable: If register was never written to, it's not readable::
bpf_mov R0 = R2 bpf_mov R0 = R2
bpf_exit bpf_exit
will be rejected, since R2 is unreadable at the start of the program. will be rejected, since R2 is unreadable at the start of the program.
After kernel function call, R1-R5 are reset to unreadable and After kernel function call, R1-R5 are reset to unreadable and
R0 has a return type of the function. R0 has a return type of the function.
Since R6-R9 are callee saved, their state is preserved across the call. Since R6-R9 are callee saved, their state is preserved across the call.
::
bpf_mov R6 = 1 bpf_mov R6 = 1
bpf_call foo bpf_call foo
bpf_mov R0 = R6 bpf_mov R0 = R6
bpf_exit bpf_exit
is a correct program. If there was R1 instead of R6, it would have is a correct program. If there was R1 instead of R6, it would have
been rejected. been rejected.
load/store instructions are allowed only with registers of valid types, which load/store instructions are allowed only with registers of valid types, which
are PTR_TO_CTX, PTR_TO_MAP, PTR_TO_STACK. They are bounds and alignment checked. are PTR_TO_CTX, PTR_TO_MAP, PTR_TO_STACK. They are bounds and alignment checked.
For example: For example::
bpf_mov R1 = 1 bpf_mov R1 = 1
bpf_mov R2 = 2 bpf_mov R2 = 2
bpf_xadd *(u32 *)(R1 + 3) += R2 bpf_xadd *(u32 *)(R1 + 3) += R2
bpf_exit bpf_exit
will be rejected, since R1 doesn't have a valid pointer type at the time of will be rejected, since R1 doesn't have a valid pointer type at the time of
execution of instruction bpf_xadd. execution of instruction bpf_xadd.
At the start R1 type is PTR_TO_CTX (a pointer to generic 'struct bpf_context') At the start R1 type is PTR_TO_CTX (a pointer to generic ``struct bpf_context``)
A callback is used to customize verifier to restrict eBPF program access to only A callback is used to customize verifier to restrict eBPF program access to only
certain fields within ctx structure with specified size and alignment. certain fields within ctx structure with specified size and alignment.
For example, the following insn: For example, the following insn::
bpf_ld R0 = *(u32 *)(R6 + 8) bpf_ld R0 = *(u32 *)(R6 + 8)
intends to load a word from address R6 + 8 and store it into R0 intends to load a word from address R6 + 8 and store it into R0
If R6=PTR_TO_CTX, via is_valid_access() callback the verifier will know If R6=PTR_TO_CTX, via is_valid_access() callback the verifier will know
that offset 8 of size 4 bytes can be accessed for reading, otherwise that offset 8 of size 4 bytes can be accessed for reading, otherwise
...@@ -1079,10 +1127,13 @@ so it will fail verification, since it's out of bounds. ...@@ -1079,10 +1127,13 @@ so it will fail verification, since it's out of bounds.
The verifier will allow eBPF program to read data from stack only after The verifier will allow eBPF program to read data from stack only after
it wrote into it. it wrote into it.
Classic BPF verifier does similar check with M[0-15] memory slots. Classic BPF verifier does similar check with M[0-15] memory slots.
For example: For example::
bpf_ld R0 = *(u32 *)(R10 - 4) bpf_ld R0 = *(u32 *)(R10 - 4)
bpf_exit bpf_exit
is invalid program. is invalid program.
Though R10 is correct read-only register and has type PTR_TO_STACK Though R10 is correct read-only register and has type PTR_TO_STACK
and R10 - 4 is within stack bounds, there were no stores into that location. and R10 - 4 is within stack bounds, there were no stores into that location.
...@@ -1113,48 +1164,61 @@ Register value tracking ...@@ -1113,48 +1164,61 @@ Register value tracking
----------------------- -----------------------
In order to determine the safety of an eBPF program, the verifier must track In order to determine the safety of an eBPF program, the verifier must track
the range of possible values in each register and also in each stack slot. the range of possible values in each register and also in each stack slot.
This is done with 'struct bpf_reg_state', defined in include/linux/ This is done with ``struct bpf_reg_state``, defined in include/linux/
bpf_verifier.h, which unifies tracking of scalar and pointer values. Each bpf_verifier.h, which unifies tracking of scalar and pointer values. Each
register state has a type, which is either NOT_INIT (the register has not been register state has a type, which is either NOT_INIT (the register has not been
written to), SCALAR_VALUE (some value which is not usable as a pointer), or a written to), SCALAR_VALUE (some value which is not usable as a pointer), or a
pointer type. The types of pointers describe their base, as follows: pointer type. The types of pointers describe their base, as follows:
PTR_TO_CTX Pointer to bpf_context.
CONST_PTR_TO_MAP Pointer to struct bpf_map. "Const" because arithmetic
on these pointers is forbidden. PTR_TO_CTX
PTR_TO_MAP_VALUE Pointer to the value stored in a map element. Pointer to bpf_context.
CONST_PTR_TO_MAP
Pointer to struct bpf_map. "Const" because arithmetic
on these pointers is forbidden.
PTR_TO_MAP_VALUE
Pointer to the value stored in a map element.
PTR_TO_MAP_VALUE_OR_NULL PTR_TO_MAP_VALUE_OR_NULL
Either a pointer to a map value, or NULL; map accesses Either a pointer to a map value, or NULL; map accesses
(see section 'eBPF maps', below) return this type, (see section 'eBPF maps', below) return this type,
which becomes a PTR_TO_MAP_VALUE when checked != NULL. which becomes a PTR_TO_MAP_VALUE when checked != NULL.
Arithmetic on these pointers is forbidden. Arithmetic on these pointers is forbidden.
PTR_TO_STACK Frame pointer. PTR_TO_STACK
PTR_TO_PACKET skb->data. Frame pointer.
PTR_TO_PACKET_END skb->data + headlen; arithmetic forbidden. PTR_TO_PACKET
PTR_TO_SOCKET Pointer to struct bpf_sock_ops, implicitly refcounted. skb->data.
PTR_TO_PACKET_END
skb->data + headlen; arithmetic forbidden.
PTR_TO_SOCKET
Pointer to struct bpf_sock_ops, implicitly refcounted.
PTR_TO_SOCKET_OR_NULL PTR_TO_SOCKET_OR_NULL
Either a pointer to a socket, or NULL; socket lookup Either a pointer to a socket, or NULL; socket lookup
returns this type, which becomes a PTR_TO_SOCKET when returns this type, which becomes a PTR_TO_SOCKET when
checked != NULL. PTR_TO_SOCKET is reference-counted, checked != NULL. PTR_TO_SOCKET is reference-counted,
so programs must release the reference through the so programs must release the reference through the
socket release function before the end of the program. socket release function before the end of the program.
Arithmetic on these pointers is forbidden. Arithmetic on these pointers is forbidden.
However, a pointer may be offset from this base (as a result of pointer However, a pointer may be offset from this base (as a result of pointer
arithmetic), and this is tracked in two parts: the 'fixed offset' and 'variable arithmetic), and this is tracked in two parts: the 'fixed offset' and 'variable
offset'. The former is used when an exactly-known value (e.g. an immediate offset'. The former is used when an exactly-known value (e.g. an immediate
operand) is added to a pointer, while the latter is used for values which are operand) is added to a pointer, while the latter is used for values which are
not exactly known. The variable offset is also used in SCALAR_VALUEs, to track not exactly known. The variable offset is also used in SCALAR_VALUEs, to track
the range of possible values in the register. the range of possible values in the register.
The verifier's knowledge about the variable offset consists of: The verifier's knowledge about the variable offset consists of:
* minimum and maximum values as unsigned * minimum and maximum values as unsigned
* minimum and maximum values as signed * minimum and maximum values as signed
* knowledge of the values of individual bits, in the form of a 'tnum': a u64 * knowledge of the values of individual bits, in the form of a 'tnum': a u64
'mask' and a u64 'value'. 1s in the mask represent bits whose value is unknown; 'mask' and a u64 'value'. 1s in the mask represent bits whose value is unknown;
1s in the value represent bits known to be 1. Bits known to be 0 have 0 in both 1s in the value represent bits known to be 1. Bits known to be 0 have 0 in both
mask and value; no bit should ever be 1 in both. For example, if a byte is read mask and value; no bit should ever be 1 in both. For example, if a byte is read
into a register from memory, the register's top 56 bits are known zero, while into a register from memory, the register's top 56 bits are known zero, while
the low 8 are unknown - which is represented as the tnum (0x0; 0xff). If we the low 8 are unknown - which is represented as the tnum (0x0; 0xff). If we
then OR this with 0x40, we get (0x40; 0xbf), then if we add 1 we get (0x0; then OR this with 0x40, we get (0x40; 0xbf), then if we add 1 we get (0x0;
0x1ff), because of potential carries. 0x1ff), because of potential carries.
Besides arithmetic, the register state can also be updated by conditional Besides arithmetic, the register state can also be updated by conditional
branches. For instance, if a SCALAR_VALUE is compared > 8, in the 'true' branch branches. For instance, if a SCALAR_VALUE is compared > 8, in the 'true' branch
...@@ -1188,7 +1252,7 @@ The 'id' field is also used on PTR_TO_SOCKET and PTR_TO_SOCKET_OR_NULL, common ...@@ -1188,7 +1252,7 @@ The 'id' field is also used on PTR_TO_SOCKET and PTR_TO_SOCKET_OR_NULL, common
to all copies of the pointer returned from a socket lookup. This has similar to all copies of the pointer returned from a socket lookup. This has similar
behaviour to the handling for PTR_TO_MAP_VALUE_OR_NULL->PTR_TO_MAP_VALUE, but behaviour to the handling for PTR_TO_MAP_VALUE_OR_NULL->PTR_TO_MAP_VALUE, but
it also handles reference tracking for the pointer. PTR_TO_SOCKET implicitly it also handles reference tracking for the pointer. PTR_TO_SOCKET implicitly
represents a reference to the corresponding 'struct sock'. To ensure that the represents a reference to the corresponding ``struct sock``. To ensure that the
reference is not leaked, it is imperative to NULL-check the reference and in reference is not leaked, it is imperative to NULL-check the reference and in
the non-NULL case, and pass the valid reference to the socket release function. the non-NULL case, and pass the valid reference to the socket release function.
...@@ -1196,17 +1260,18 @@ Direct packet access ...@@ -1196,17 +1260,18 @@ Direct packet access
-------------------- --------------------
In cls_bpf and act_bpf programs the verifier allows direct access to the packet In cls_bpf and act_bpf programs the verifier allows direct access to the packet
data via skb->data and skb->data_end pointers. data via skb->data and skb->data_end pointers.
Ex: Ex::
1: r4 = *(u32 *)(r1 +80) /* load skb->data_end */
2: r3 = *(u32 *)(r1 +76) /* load skb->data */ 1: r4 = *(u32 *)(r1 +80) /* load skb->data_end */
3: r5 = r3 2: r3 = *(u32 *)(r1 +76) /* load skb->data */
4: r5 += 14 3: r5 = r3
5: if r5 > r4 goto pc+16 4: r5 += 14
R1=ctx R3=pkt(id=0,off=0,r=14) R4=pkt_end R5=pkt(id=0,off=14,r=14) R10=fp 5: if r5 > r4 goto pc+16
6: r0 = *(u16 *)(r3 +12) /* access 12 and 13 bytes of the packet */ R1=ctx R3=pkt(id=0,off=0,r=14) R4=pkt_end R5=pkt(id=0,off=14,r=14) R10=fp
6: r0 = *(u16 *)(r3 +12) /* access 12 and 13 bytes of the packet */
this 2byte load from the packet is safe to do, since the program author this 2byte load from the packet is safe to do, since the program author
did check 'if (skb->data + 14 > skb->data_end) goto err' at insn #5 which did check ``if (skb->data + 14 > skb->data_end) goto err`` at insn #5 which
means that in the fall-through case the register R3 (which points to skb->data) means that in the fall-through case the register R3 (which points to skb->data)
has at least 14 directly accessible bytes. The verifier marks it has at least 14 directly accessible bytes. The verifier marks it
as R3=pkt(id=0,off=0,r=14). as R3=pkt(id=0,off=0,r=14).
...@@ -1215,52 +1280,58 @@ off=0 means that no additional constants were added. ...@@ -1215,52 +1280,58 @@ off=0 means that no additional constants were added.
r=14 is the range of safe access which means that bytes [R3, R3 + 14) are ok. r=14 is the range of safe access which means that bytes [R3, R3 + 14) are ok.
Note that R5 is marked as R5=pkt(id=0,off=14,r=14). It also points Note that R5 is marked as R5=pkt(id=0,off=14,r=14). It also points
to the packet data, but constant 14 was added to the register, so to the packet data, but constant 14 was added to the register, so
it now points to 'skb->data + 14' and accessible range is [R5, R5 + 14 - 14) it now points to ``skb->data + 14`` and accessible range is [R5, R5 + 14 - 14)
which is zero bytes. which is zero bytes.
More complex packet access may look like: More complex packet access may look like::
R0=inv1 R1=ctx R3=pkt(id=0,off=0,r=14) R4=pkt_end R5=pkt(id=0,off=14,r=14) R10=fp
6: r0 = *(u8 *)(r3 +7) /* load 7th byte from the packet */
7: r4 = *(u8 *)(r3 +12) R0=inv1 R1=ctx R3=pkt(id=0,off=0,r=14) R4=pkt_end R5=pkt(id=0,off=14,r=14) R10=fp
8: r4 *= 14 6: r0 = *(u8 *)(r3 +7) /* load 7th byte from the packet */
9: r3 = *(u32 *)(r1 +76) /* load skb->data */ 7: r4 = *(u8 *)(r3 +12)
10: r3 += r4 8: r4 *= 14
11: r2 = r1 9: r3 = *(u32 *)(r1 +76) /* load skb->data */
12: r2 <<= 48 10: r3 += r4
13: r2 >>= 48 11: r2 = r1
14: r3 += r2 12: r2 <<= 48
15: r2 = r3 13: r2 >>= 48
16: r2 += 8 14: r3 += r2
17: r1 = *(u32 *)(r1 +80) /* load skb->data_end */ 15: r2 = r3
18: if r2 > r1 goto pc+2 16: r2 += 8
R0=inv(id=0,umax_value=255,var_off=(0x0; 0xff)) R1=pkt_end R2=pkt(id=2,off=8,r=8) R3=pkt(id=2,off=0,r=8) R4=inv(id=0,umax_value=3570,var_off=(0x0; 0xfffe)) R5=pkt(id=0,off=14,r=14) R10=fp 17: r1 = *(u32 *)(r1 +80) /* load skb->data_end */
19: r1 = *(u8 *)(r3 +4) 18: if r2 > r1 goto pc+2
R0=inv(id=0,umax_value=255,var_off=(0x0; 0xff)) R1=pkt_end R2=pkt(id=2,off=8,r=8) R3=pkt(id=2,off=0,r=8) R4=inv(id=0,umax_value=3570,var_off=(0x0; 0xfffe)) R5=pkt(id=0,off=14,r=14) R10=fp
19: r1 = *(u8 *)(r3 +4)
The state of the register R3 is R3=pkt(id=2,off=0,r=8) The state of the register R3 is R3=pkt(id=2,off=0,r=8)
id=2 means that two 'r3 += rX' instructions were seen, so r3 points to some id=2 means that two ``r3 += rX`` instructions were seen, so r3 points to some
offset within a packet and since the program author did offset within a packet and since the program author did
'if (r3 + 8 > r1) goto err' at insn #18, the safe range is [R3, R3 + 8). ``if (r3 + 8 > r1) goto err`` at insn #18, the safe range is [R3, R3 + 8).
The verifier only allows 'add'/'sub' operations on packet registers. Any other The verifier only allows 'add'/'sub' operations on packet registers. Any other
operation will set the register state to 'SCALAR_VALUE' and it won't be operation will set the register state to 'SCALAR_VALUE' and it won't be
available for direct packet access. available for direct packet access.
Operation 'r3 += rX' may overflow and become less than original skb->data,
therefore the verifier has to prevent that. So when it sees 'r3 += rX' Operation ``r3 += rX`` may overflow and become less than original skb->data,
therefore the verifier has to prevent that. So when it sees ``r3 += rX``
instruction and rX is more than 16-bit value, any subsequent bounds-check of r3 instruction and rX is more than 16-bit value, any subsequent bounds-check of r3
against skb->data_end will not give us 'range' information, so attempts to read against skb->data_end will not give us 'range' information, so attempts to read
through the pointer will give "invalid access to packet" error. through the pointer will give "invalid access to packet" error.
Ex. after insn 'r4 = *(u8 *)(r3 +12)' (insn #7 above) the state of r4 is
Ex. after insn ``r4 = *(u8 *)(r3 +12)`` (insn #7 above) the state of r4 is
R4=inv(id=0,umax_value=255,var_off=(0x0; 0xff)) which means that upper 56 bits R4=inv(id=0,umax_value=255,var_off=(0x0; 0xff)) which means that upper 56 bits
of the register are guaranteed to be zero, and nothing is known about the lower of the register are guaranteed to be zero, and nothing is known about the lower
8 bits. After insn 'r4 *= 14' the state becomes 8 bits. After insn ``r4 *= 14`` the state becomes
R4=inv(id=0,umax_value=3570,var_off=(0x0; 0xfffe)), since multiplying an 8-bit R4=inv(id=0,umax_value=3570,var_off=(0x0; 0xfffe)), since multiplying an 8-bit
value by constant 14 will keep upper 52 bits as zero, also the least significant value by constant 14 will keep upper 52 bits as zero, also the least significant
bit will be zero as 14 is even. Similarly 'r2 >>= 48' will make bit will be zero as 14 is even. Similarly ``r2 >>= 48`` will make
R2=inv(id=0,umax_value=65535,var_off=(0x0; 0xffff)), since the shift is not sign R2=inv(id=0,umax_value=65535,var_off=(0x0; 0xffff)), since the shift is not sign
extending. This logic is implemented in adjust_reg_min_max_vals() function, extending. This logic is implemented in adjust_reg_min_max_vals() function,
which calls adjust_ptr_min_max_vals() for adding pointer to scalar (or vice which calls adjust_ptr_min_max_vals() for adding pointer to scalar (or vice
versa) and adjust_scalar_min_max_vals() for operations on two scalars. versa) and adjust_scalar_min_max_vals() for operations on two scalars.
The end result is that bpf program author can access packet directly The end result is that bpf program author can access packet directly
using normal C code as: using normal C code as::
void *data = (void *)(long)skb->data; void *data = (void *)(long)skb->data;
void *data_end = (void *)(long)skb->data_end; void *data_end = (void *)(long)skb->data_end;
struct eth_hdr *eth = data; struct eth_hdr *eth = data;
...@@ -1268,13 +1339,14 @@ using normal C code as: ...@@ -1268,13 +1339,14 @@ using normal C code as:
struct udphdr *udp = data + sizeof(*eth) + sizeof(*iph); struct udphdr *udp = data + sizeof(*eth) + sizeof(*iph);
if (data + sizeof(*eth) + sizeof(*iph) + sizeof(*udp) > data_end) if (data + sizeof(*eth) + sizeof(*iph) + sizeof(*udp) > data_end)
return 0; return 0;
if (eth->h_proto != htons(ETH_P_IP)) if (eth->h_proto != htons(ETH_P_IP))
return 0; return 0;
if (iph->protocol != IPPROTO_UDP || iph->ihl != 5) if (iph->protocol != IPPROTO_UDP || iph->ihl != 5)
return 0; return 0;
if (udp->dest == 53 || udp->source == 9) if (udp->dest == 53 || udp->source == 9)
...; ...;
which makes such programs easier to write comparing to LD_ABS insn which makes such programs easier to write comparing to LD_ABS insn
and significantly faster. and significantly faster.
...@@ -1284,23 +1356,24 @@ eBPF maps ...@@ -1284,23 +1356,24 @@ eBPF maps
and userspace. and userspace.
The maps are accessed from user space via BPF syscall, which has commands: The maps are accessed from user space via BPF syscall, which has commands:
- create a map with given type and attributes - create a map with given type and attributes
map_fd = bpf(BPF_MAP_CREATE, union bpf_attr *attr, u32 size) ``map_fd = bpf(BPF_MAP_CREATE, union bpf_attr *attr, u32 size)``
using attr->map_type, attr->key_size, attr->value_size, attr->max_entries using attr->map_type, attr->key_size, attr->value_size, attr->max_entries
returns process-local file descriptor or negative error returns process-local file descriptor or negative error
- lookup key in a given map - lookup key in a given map
err = bpf(BPF_MAP_LOOKUP_ELEM, union bpf_attr *attr, u32 size) ``err = bpf(BPF_MAP_LOOKUP_ELEM, union bpf_attr *attr, u32 size)``
using attr->map_fd, attr->key, attr->value using attr->map_fd, attr->key, attr->value
returns zero and stores found elem into value or negative error returns zero and stores found elem into value or negative error
- create or update key/value pair in a given map - create or update key/value pair in a given map
err = bpf(BPF_MAP_UPDATE_ELEM, union bpf_attr *attr, u32 size) ``err = bpf(BPF_MAP_UPDATE_ELEM, union bpf_attr *attr, u32 size)``
using attr->map_fd, attr->key, attr->value using attr->map_fd, attr->key, attr->value
returns zero or negative error returns zero or negative error
- find and delete element by key in a given map - find and delete element by key in a given map
err = bpf(BPF_MAP_DELETE_ELEM, union bpf_attr *attr, u32 size) ``err = bpf(BPF_MAP_DELETE_ELEM, union bpf_attr *attr, u32 size)``
using attr->map_fd, attr->key using attr->map_fd, attr->key
- to delete map: close(fd) - to delete map: close(fd)
...@@ -1312,10 +1385,11 @@ are concurrently updating. ...@@ -1312,10 +1385,11 @@ are concurrently updating.
maps can have different types: hash, array, bloom filter, radix-tree, etc. maps can have different types: hash, array, bloom filter, radix-tree, etc.
The map is defined by: The map is defined by:
. type
. max number of elements - type
. key size in bytes - max number of elements
. value size in bytes - key size in bytes
- value size in bytes
Pruning Pruning
------- -------
...@@ -1339,57 +1413,75 @@ Understanding eBPF verifier messages ...@@ -1339,57 +1413,75 @@ Understanding eBPF verifier messages
The following are few examples of invalid eBPF programs and verifier error The following are few examples of invalid eBPF programs and verifier error
messages as seen in the log: messages as seen in the log:
Program with unreachable instructions: Program with unreachable instructions::
static struct bpf_insn prog[] = {
static struct bpf_insn prog[] = {
BPF_EXIT_INSN(), BPF_EXIT_INSN(),
BPF_EXIT_INSN(), BPF_EXIT_INSN(),
}; };
Error: Error:
unreachable insn 1 unreachable insn 1
Program that reads uninitialized register: Program that reads uninitialized register::
BPF_MOV64_REG(BPF_REG_0, BPF_REG_2), BPF_MOV64_REG(BPF_REG_0, BPF_REG_2),
BPF_EXIT_INSN(), BPF_EXIT_INSN(),
Error:
Error::
0: (bf) r0 = r2 0: (bf) r0 = r2
R2 !read_ok R2 !read_ok
Program that doesn't initialize R0 before exiting: Program that doesn't initialize R0 before exiting::
BPF_MOV64_REG(BPF_REG_2, BPF_REG_1), BPF_MOV64_REG(BPF_REG_2, BPF_REG_1),
BPF_EXIT_INSN(), BPF_EXIT_INSN(),
Error:
Error::
0: (bf) r2 = r1 0: (bf) r2 = r1
1: (95) exit 1: (95) exit
R0 !read_ok R0 !read_ok
Program that accesses stack out of bounds: Program that accesses stack out of bounds::
BPF_ST_MEM(BPF_DW, BPF_REG_10, 8, 0),
BPF_EXIT_INSN(), BPF_ST_MEM(BPF_DW, BPF_REG_10, 8, 0),
Error: BPF_EXIT_INSN(),
0: (7a) *(u64 *)(r10 +8) = 0
invalid stack off=8 size=8 Error::
0: (7a) *(u64 *)(r10 +8) = 0
invalid stack off=8 size=8
Program that doesn't initialize stack before passing its address into function::
Program that doesn't initialize stack before passing its address into function:
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
BPF_LD_MAP_FD(BPF_REG_1, 0), BPF_LD_MAP_FD(BPF_REG_1, 0),
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem), BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
BPF_EXIT_INSN(), BPF_EXIT_INSN(),
Error:
Error::
0: (bf) r2 = r10 0: (bf) r2 = r10
1: (07) r2 += -8 1: (07) r2 += -8
2: (b7) r1 = 0x0 2: (b7) r1 = 0x0
3: (85) call 1 3: (85) call 1
invalid indirect read from stack off -8+0 size 8 invalid indirect read from stack off -8+0 size 8
Program that uses invalid map_fd=0 while calling to map_lookup_elem() function: Program that uses invalid map_fd=0 while calling to map_lookup_elem() function::
BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
BPF_LD_MAP_FD(BPF_REG_1, 0), BPF_LD_MAP_FD(BPF_REG_1, 0),
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem), BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
BPF_EXIT_INSN(), BPF_EXIT_INSN(),
Error:
Error::
0: (7a) *(u64 *)(r10 -8) = 0 0: (7a) *(u64 *)(r10 -8) = 0
1: (bf) r2 = r10 1: (bf) r2 = r10
2: (07) r2 += -8 2: (07) r2 += -8
...@@ -1398,7 +1490,8 @@ Error: ...@@ -1398,7 +1490,8 @@ Error:
fd 0 is not pointing to valid bpf_map fd 0 is not pointing to valid bpf_map
Program that doesn't check return value of map_lookup_elem() before accessing Program that doesn't check return value of map_lookup_elem() before accessing
map element: map element::
BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
...@@ -1406,7 +1499,9 @@ map element: ...@@ -1406,7 +1499,9 @@ map element:
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem), BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 0), BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 0),
BPF_EXIT_INSN(), BPF_EXIT_INSN(),
Error:
Error::
0: (7a) *(u64 *)(r10 -8) = 0 0: (7a) *(u64 *)(r10 -8) = 0
1: (bf) r2 = r10 1: (bf) r2 = r10
2: (07) r2 += -8 2: (07) r2 += -8
...@@ -1416,7 +1511,8 @@ Error: ...@@ -1416,7 +1511,8 @@ Error:
R0 invalid mem access 'map_value_or_null' R0 invalid mem access 'map_value_or_null'
Program that correctly checks map_lookup_elem() returned value for NULL, but Program that correctly checks map_lookup_elem() returned value for NULL, but
accesses the memory with incorrect alignment: accesses the memory with incorrect alignment::
BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
...@@ -1425,7 +1521,9 @@ accesses the memory with incorrect alignment: ...@@ -1425,7 +1521,9 @@ accesses the memory with incorrect alignment:
BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1), BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1),
BPF_ST_MEM(BPF_DW, BPF_REG_0, 4, 0), BPF_ST_MEM(BPF_DW, BPF_REG_0, 4, 0),
BPF_EXIT_INSN(), BPF_EXIT_INSN(),
Error:
Error::
0: (7a) *(u64 *)(r10 -8) = 0 0: (7a) *(u64 *)(r10 -8) = 0
1: (bf) r2 = r10 1: (bf) r2 = r10
2: (07) r2 += -8 2: (07) r2 += -8
...@@ -1438,7 +1536,8 @@ Error: ...@@ -1438,7 +1536,8 @@ Error:
Program that correctly checks map_lookup_elem() returned value for NULL and Program that correctly checks map_lookup_elem() returned value for NULL and
accesses memory with correct alignment in one side of 'if' branch, but fails accesses memory with correct alignment in one side of 'if' branch, but fails
to do so in the other side of 'if' branch: to do so in the other side of 'if' branch::
BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
...@@ -1449,7 +1548,9 @@ to do so in the other side of 'if' branch: ...@@ -1449,7 +1548,9 @@ to do so in the other side of 'if' branch:
BPF_EXIT_INSN(), BPF_EXIT_INSN(),
BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 1), BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 1),
BPF_EXIT_INSN(), BPF_EXIT_INSN(),
Error:
Error::
0: (7a) *(u64 *)(r10 -8) = 0 0: (7a) *(u64 *)(r10 -8) = 0
1: (bf) r2 = r10 1: (bf) r2 = r10
2: (07) r2 += -8 2: (07) r2 += -8
...@@ -1465,8 +1566,8 @@ Error: ...@@ -1465,8 +1566,8 @@ Error:
R0 invalid mem access 'imm' R0 invalid mem access 'imm'
Program that performs a socket lookup then sets the pointer to NULL without Program that performs a socket lookup then sets the pointer to NULL without
checking it: checking it::
value:
BPF_MOV64_IMM(BPF_REG_2, 0), BPF_MOV64_IMM(BPF_REG_2, 0),
BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_2, -8), BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_2, -8),
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
...@@ -1477,7 +1578,9 @@ value: ...@@ -1477,7 +1578,9 @@ value:
BPF_EMIT_CALL(BPF_FUNC_sk_lookup_tcp), BPF_EMIT_CALL(BPF_FUNC_sk_lookup_tcp),
BPF_MOV64_IMM(BPF_REG_0, 0), BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_EXIT_INSN(), BPF_EXIT_INSN(),
Error:
Error::
0: (b7) r2 = 0 0: (b7) r2 = 0
1: (63) *(u32 *)(r10 -8) = r2 1: (63) *(u32 *)(r10 -8) = r2
2: (bf) r2 = r10 2: (bf) r2 = r10
...@@ -1491,7 +1594,8 @@ Error: ...@@ -1491,7 +1594,8 @@ Error:
Unreleased reference id=1, alloc_insn=7 Unreleased reference id=1, alloc_insn=7
Program that performs a socket lookup but does not NULL-check the returned Program that performs a socket lookup but does not NULL-check the returned
value: value::
BPF_MOV64_IMM(BPF_REG_2, 0), BPF_MOV64_IMM(BPF_REG_2, 0),
BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_2, -8), BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_2, -8),
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
...@@ -1501,7 +1605,9 @@ value: ...@@ -1501,7 +1605,9 @@ value:
BPF_MOV64_IMM(BPF_REG_5, 0), BPF_MOV64_IMM(BPF_REG_5, 0),
BPF_EMIT_CALL(BPF_FUNC_sk_lookup_tcp), BPF_EMIT_CALL(BPF_FUNC_sk_lookup_tcp),
BPF_EXIT_INSN(), BPF_EXIT_INSN(),
Error:
Error::
0: (b7) r2 = 0 0: (b7) r2 = 0
1: (63) *(u32 *)(r10 -8) = r2 1: (63) *(u32 *)(r10 -8) = r2
2: (bf) r2 = r10 2: (bf) r2 = r10
...@@ -1519,7 +1625,7 @@ Testing ...@@ -1519,7 +1625,7 @@ Testing
Next to the BPF toolchain, the kernel also ships a test module that contains Next to the BPF toolchain, the kernel also ships a test module that contains
various test cases for classic and internal BPF that can be executed against various test cases for classic and internal BPF that can be executed against
the BPF interpreter and JIT compiler. It can be found in lib/test_bpf.c and the BPF interpreter and JIT compiler. It can be found in lib/test_bpf.c and
enabled via Kconfig: enabled via Kconfig::
CONFIG_TEST_BPF=m CONFIG_TEST_BPF=m
...@@ -1540,6 +1646,6 @@ The document was written in the hope that it is found useful and in order ...@@ -1540,6 +1646,6 @@ The document was written in the hope that it is found useful and in order
to give potential BPF hackers or security auditors a better overview of to give potential BPF hackers or security auditors a better overview of
the underlying architecture. the underlying architecture.
Jay Schulist <jschlst@samba.org> - Jay Schulist <jschlst@samba.org>
Daniel Borkmann <daniel@iogearbox.net> - Daniel Borkmann <daniel@iogearbox.net>
Alexei Starovoitov <ast@kernel.org> - Alexei Starovoitov <ast@kernel.org>
.. SPDX-License-Identifier: GPL-2.0
=============================================
FORE Systems PCA-200E/SBA-200E ATM NIC driver FORE Systems PCA-200E/SBA-200E ATM NIC driver
--------------------------------------------- =============================================
This driver adds support for the FORE Systems 200E-series ATM adapters This driver adds support for the FORE Systems 200E-series ATM adapters
to the Linux operating system. It is based on the earlier PCA-200E driver to the Linux operating system. It is based on the earlier PCA-200E driver
...@@ -27,8 +29,8 @@ in the linux/drivers/atm directory for details and restrictions. ...@@ -27,8 +29,8 @@ in the linux/drivers/atm directory for details and restrictions.
Firmware Updates Firmware Updates
---------------- ----------------
The FORE Systems 200E-series driver is shipped with firmware data being The FORE Systems 200E-series driver is shipped with firmware data being
uploaded to the ATM adapters at system boot time or at module loading time. uploaded to the ATM adapters at system boot time or at module loading time.
The supplied firmware images should work with all adapters. The supplied firmware images should work with all adapters.
However, if you encounter problems (the firmware doesn't start or the driver However, if you encounter problems (the firmware doesn't start or the driver
......
Frame Relay (FR) support for linux is built into a two tiered system of device .. SPDX-License-Identifier: GPL-2.0
================
Frame Relay (FR)
================
Frame Relay (FR) support for linux is built into a two tiered system of device
drivers. The upper layer implements RFC1490 FR specification, and uses the drivers. The upper layer implements RFC1490 FR specification, and uses the
Data Link Connection Identifier (DLCI) as its hardware address. Usually these Data Link Connection Identifier (DLCI) as its hardware address. Usually these
are assigned by your network supplier, they give you the number/numbers of are assigned by your network supplier, they give you the number/numbers of
...@@ -7,18 +13,18 @@ the Virtual Connections (VC) assigned to you. ...@@ -7,18 +13,18 @@ the Virtual Connections (VC) assigned to you.
Each DLCI is a point-to-point link between your machine and a remote one. Each DLCI is a point-to-point link between your machine and a remote one.
As such, a separate device is needed to accommodate the routing. Within the As such, a separate device is needed to accommodate the routing. Within the
net-tools archives is 'dlcicfg'. This program will communicate with the net-tools archives is 'dlcicfg'. This program will communicate with the
base "DLCI" device, and create new net devices named 'dlci00', 'dlci01'... base "DLCI" device, and create new net devices named 'dlci00', 'dlci01'...
The configuration script will ask you how many DLCIs you need, as well as The configuration script will ask you how many DLCIs you need, as well as
how many DLCIs you want to assign to each Frame Relay Access Device (FRAD). how many DLCIs you want to assign to each Frame Relay Access Device (FRAD).
The DLCI uses a number of function calls to communicate with the FRAD, all The DLCI uses a number of function calls to communicate with the FRAD, all
of which are stored in the FRAD's private data area. assoc/deassoc, of which are stored in the FRAD's private data area. assoc/deassoc,
activate/deactivate and dlci_config. The DLCI supplies a receive function activate/deactivate and dlci_config. The DLCI supplies a receive function
to the FRAD to accept incoming packets. to the FRAD to accept incoming packets.
With this initial offering, only 1 FRAD driver is available. With many thanks With this initial offering, only 1 FRAD driver is available. With many thanks
to Sangoma Technologies, David Mandelstam & Gene Kozin, the S502A, S502E & to Sangoma Technologies, David Mandelstam & Gene Kozin, the S502A, S502E &
S508 are supported. This driver is currently set up for only FR, but as S508 are supported. This driver is currently set up for only FR, but as
Sangoma makes more firmware modules available, it can be updated to provide Sangoma makes more firmware modules available, it can be updated to provide
them as well. them as well.
...@@ -32,8 +38,7 @@ an initial configuration. ...@@ -32,8 +38,7 @@ an initial configuration.
Additional FRAD device drivers can be added as hardware is available. Additional FRAD device drivers can be added as hardware is available.
At this time, the dlcicfg and fradcfg programs have not been incorporated into At this time, the dlcicfg and fradcfg programs have not been incorporated into
the net-tools distribution. They can be found at ftp.invlogic.com, in the net-tools distribution. They can be found at ftp.invlogic.com, in
/pub/linux. Note that with OS/2 FTPD, you end up in /pub by default, so just /pub/linux. Note that with OS/2 FTPD, you end up in /pub by default, so just
use 'cd linux'. v0.10 is for use on pre-2.0.3 and earlier, v0.15 is for use 'cd linux'. v0.10 is for use on pre-2.0.3 and earlier, v0.15 is for
pre-2.0.4 and later. pre-2.0.4 and later.
.. SPDX-License-Identifier: GPL-2.0
===============================================
Generic networking statistics for netlink users Generic networking statistics for netlink users
====================================================================== ===============================================
Statistic counters are grouped into structs: Statistic counters are grouped into structs:
==================== ===================== =====================
Struct TLV type Description Struct TLV type Description
---------------------------------------------------------------------- ==================== ===================== =====================
gnet_stats_basic TCA_STATS_BASIC Basic statistics gnet_stats_basic TCA_STATS_BASIC Basic statistics
gnet_stats_rate_est TCA_STATS_RATE_EST Rate estimator gnet_stats_rate_est TCA_STATS_RATE_EST Rate estimator
gnet_stats_queue TCA_STATS_QUEUE Queue statistics gnet_stats_queue TCA_STATS_QUEUE Queue statistics
none TCA_STATS_APP Application specific none TCA_STATS_APP Application specific
==================== ===================== =====================
Collecting: Collecting:
----------- -----------
Declare the statistic structs you need: Declare the statistic structs you need::
struct mystruct {
struct gnet_stats_basic bstats; struct mystruct {
struct gnet_stats_queue qstats; struct gnet_stats_basic bstats;
... struct gnet_stats_queue qstats;
}; ...
};
Update statistics, in dequeue() methods only, (while owning qdisc->running)::
Update statistics, in dequeue() methods only, (while owning qdisc->running) mystruct->tstats.packet++;
mystruct->tstats.packet++; mystruct->qstats.backlog += skb->pkt_len;
mystruct->qstats.backlog += skb->pkt_len;
Export to userspace (Dump): Export to userspace (Dump):
--------------------------- ---------------------------
my_dumping_routine(struct sk_buff *skb, ...) ::
{
struct gnet_dump dump;
if (gnet_stats_start_copy(skb, TCA_STATS2, &mystruct->lock, &dump, my_dumping_routine(struct sk_buff *skb, ...)
TCA_PAD) < 0) {
goto rtattr_failure; struct gnet_dump dump;
if (gnet_stats_copy_basic(&dump, &mystruct->bstats) < 0 || if (gnet_stats_start_copy(skb, TCA_STATS2, &mystruct->lock, &dump,
gnet_stats_copy_queue(&dump, &mystruct->qstats) < 0 || TCA_PAD) < 0)
gnet_stats_copy_app(&dump, &xstats, sizeof(xstats)) < 0) goto rtattr_failure;
goto rtattr_failure;
if (gnet_stats_finish_copy(&dump) < 0) if (gnet_stats_copy_basic(&dump, &mystruct->bstats) < 0 ||
goto rtattr_failure; gnet_stats_copy_queue(&dump, &mystruct->qstats) < 0 ||
... gnet_stats_copy_app(&dump, &xstats, sizeof(xstats)) < 0)
} goto rtattr_failure;
if (gnet_stats_finish_copy(&dump) < 0)
goto rtattr_failure;
...
}
TCA_STATS/TCA_XSTATS backward compatibility: TCA_STATS/TCA_XSTATS backward compatibility:
-------------------------------------------- --------------------------------------------
Prior users of struct tc_stats and xstats can maintain backward Prior users of struct tc_stats and xstats can maintain backward
compatibility by calling the compat wrappers to keep providing the compatibility by calling the compat wrappers to keep providing the
existing TLV types. existing TLV types::
my_dumping_routine(struct sk_buff *skb, ...) my_dumping_routine(struct sk_buff *skb, ...)
{ {
if (gnet_stats_start_copy_compat(skb, TCA_STATS2, TCA_STATS, if (gnet_stats_start_copy_compat(skb, TCA_STATS2, TCA_STATS,
TCA_XSTATS, &mystruct->lock, &dump, TCA_XSTATS, &mystruct->lock, &dump,
TCA_PAD) < 0) TCA_PAD) < 0)
goto rtattr_failure; goto rtattr_failure;
... ...
} }
A struct tc_stats will be filled out during gnet_stats_copy_* calls A struct tc_stats will be filled out during gnet_stats_copy_* calls
and appended to the skb. TCA_XSTATS is provided if gnet_stats_copy_app and appended to the skb. TCA_XSTATS is provided if gnet_stats_copy_app
...@@ -77,7 +86,7 @@ are responsible for making sure that the lock is initialized. ...@@ -77,7 +86,7 @@ are responsible for making sure that the lock is initialized.
Rate Estimator: Rate Estimator:
-------------- ---------------
0) Prepare an estimator attribute. Most likely this would be in user 0) Prepare an estimator attribute. Most likely this would be in user
space. The value of this TLV should contain a tc_estimator structure. space. The value of this TLV should contain a tc_estimator structure.
...@@ -92,18 +101,19 @@ Rate Estimator: ...@@ -92,18 +101,19 @@ Rate Estimator:
TCA_RATE to your code in the kernel. TCA_RATE to your code in the kernel.
In the kernel when setting up: In the kernel when setting up:
1) make sure you have basic stats and rate stats setup first. 1) make sure you have basic stats and rate stats setup first.
2) make sure you have initialized stats lock that is used to setup such 2) make sure you have initialized stats lock that is used to setup such
stats. stats.
3) Now initialize a new estimator: 3) Now initialize a new estimator::
int ret = gen_new_estimator(my_basicstats,my_rate_est_stats, int ret = gen_new_estimator(my_basicstats,my_rate_est_stats,
mystats_lock, attr_with_tcestimator_struct); mystats_lock, attr_with_tcestimator_struct);
if ret == 0 if ret == 0
success success
else else
failed failed
From now on, every time you dump my_rate_est_stats it will contain From now on, every time you dump my_rate_est_stats it will contain
up-to-date info. up-to-date info.
...@@ -115,5 +125,5 @@ are still valid (i.e still exist) at the time of making this call. ...@@ -115,5 +125,5 @@ are still valid (i.e still exist) at the time of making this call.
Authors: Authors:
-------- --------
Thomas Graf <tgraf@suug.ch> - Thomas Graf <tgraf@suug.ch>
Jamal Hadi Salim <hadi@cyberus.ca> - Jamal Hadi Salim <hadi@cyberus.ca>
.. SPDX-License-Identifier: GPL-2.0
==================
Generic HDLC layer Generic HDLC layer
==================
Krzysztof Halasa <khc@pm.waw.pl> Krzysztof Halasa <khc@pm.waw.pl>
Generic HDLC layer currently supports: Generic HDLC layer currently supports:
1. Frame Relay (ANSI, CCITT, Cisco and no LMI) 1. Frame Relay (ANSI, CCITT, Cisco and no LMI)
- Normal (routed) and Ethernet-bridged (Ethernet device emulation) - Normal (routed) and Ethernet-bridged (Ethernet device emulation)
interfaces can share a single PVC. interfaces can share a single PVC.
- ARP support (no InARP support in the kernel - there is an - ARP support (no InARP support in the kernel - there is an
experimental InARP user-space daemon available on: experimental InARP user-space daemon available on:
http://www.kernel.org/pub/linux/utils/net/hdlc/). http://www.kernel.org/pub/linux/utils/net/hdlc/).
2. raw HDLC - either IP (IPv4) interface or Ethernet device emulation 2. raw HDLC - either IP (IPv4) interface or Ethernet device emulation
3. Cisco HDLC 3. Cisco HDLC
4. PPP 4. PPP
...@@ -24,19 +32,24 @@ with IEEE 802.1Q (VLANs) and 802.1D (Ethernet bridging). ...@@ -24,19 +32,24 @@ with IEEE 802.1Q (VLANs) and 802.1D (Ethernet bridging).
Make sure the hdlc.o and the hardware driver are loaded. It should Make sure the hdlc.o and the hardware driver are loaded. It should
create a number of "hdlc" (hdlc0 etc) network devices, one for each create a number of "hdlc" (hdlc0 etc) network devices, one for each
WAN port. You'll need the "sethdlc" utility, get it from: WAN port. You'll need the "sethdlc" utility, get it from:
http://www.kernel.org/pub/linux/utils/net/hdlc/ http://www.kernel.org/pub/linux/utils/net/hdlc/
Compile sethdlc.c utility: Compile sethdlc.c utility::
gcc -O2 -Wall -o sethdlc sethdlc.c gcc -O2 -Wall -o sethdlc sethdlc.c
Make sure you're using a correct version of sethdlc for your kernel. Make sure you're using a correct version of sethdlc for your kernel.
Use sethdlc to set physical interface, clock rate, HDLC mode used, Use sethdlc to set physical interface, clock rate, HDLC mode used,
and add any required PVCs if using Frame Relay. and add any required PVCs if using Frame Relay.
Usually you want something like: Usually you want something like::
sethdlc hdlc0 clock int rate 128000 sethdlc hdlc0 clock int rate 128000
sethdlc hdlc0 cisco interval 10 timeout 25 sethdlc hdlc0 cisco interval 10 timeout 25
or
or::
sethdlc hdlc0 rs232 clock ext sethdlc hdlc0 rs232 clock ext
sethdlc hdlc0 fr lmi ansi sethdlc hdlc0 fr lmi ansi
sethdlc hdlc0 create 99 sethdlc hdlc0 create 99
...@@ -49,46 +62,63 @@ any IP address to it) before using pvc devices. ...@@ -49,46 +62,63 @@ any IP address to it) before using pvc devices.
Setting interface: Setting interface:
* v35 | rs232 | x21 | t1 | e1 - sets physical interface for a given port * v35 | rs232 | x21 | t1 | e1
if the card has software-selectable interfaces - sets physical interface for a given port
loopback - activate hardware loopback (for testing only) if the card has software-selectable interfaces
* clock ext - both RX clock and TX clock external loopback
* clock int - both RX clock and TX clock internal - activate hardware loopback (for testing only)
* clock txint - RX clock external, TX clock internal * clock ext
* clock txfromrx - RX clock external, TX clock derived from RX clock - both RX clock and TX clock external
* rate - sets clock rate in bps (for "int" or "txint" clock only) * clock int
- both RX clock and TX clock internal
* clock txint
- RX clock external, TX clock internal
* clock txfromrx
- RX clock external, TX clock derived from RX clock
* rate
- sets clock rate in bps (for "int" or "txint" clock only)
Setting protocol: Setting protocol:
* hdlc - sets raw HDLC (IP-only) mode * hdlc - sets raw HDLC (IP-only) mode
nrz / nrzi / fm-mark / fm-space / manchester - sets transmission code nrz / nrzi / fm-mark / fm-space / manchester - sets transmission code
no-parity / crc16 / crc16-pr0 (CRC16 with preset zeros) / crc32-itu no-parity / crc16 / crc16-pr0 (CRC16 with preset zeros) / crc32-itu
crc16-itu (CRC16 with ITU-T polynomial) / crc16-itu-pr0 - sets parity crc16-itu (CRC16 with ITU-T polynomial) / crc16-itu-pr0 - sets parity
* hdlc-eth - Ethernet device emulation using HDLC. Parity and encoding * hdlc-eth - Ethernet device emulation using HDLC. Parity and encoding
as above. as above.
* cisco - sets Cisco HDLC mode (IP, IPv6 and IPX supported) * cisco - sets Cisco HDLC mode (IP, IPv6 and IPX supported)
interval - time in seconds between keepalive packets interval - time in seconds between keepalive packets
timeout - time in seconds after last received keepalive packet before timeout - time in seconds after last received keepalive packet before
we assume the link is down we assume the link is down
* ppp - sets synchronous PPP mode * ppp - sets synchronous PPP mode
* x25 - sets X.25 mode * x25 - sets X.25 mode
* fr - Frame Relay mode * fr - Frame Relay mode
lmi ansi / ccitt / cisco / none - LMI (link management) type lmi ansi / ccitt / cisco / none - LMI (link management) type
dce - Frame Relay DCE (network) side LMI instead of default DTE (user). dce - Frame Relay DCE (network) side LMI instead of default DTE (user).
It has nothing to do with clocks! It has nothing to do with clocks!
t391 - link integrity verification polling timer (in seconds) - user
t392 - polling verification timer (in seconds) - network - t391 - link integrity verification polling timer (in seconds) - user
n391 - full status polling counter - user - t392 - polling verification timer (in seconds) - network
n392 - error threshold - both user and network - n391 - full status polling counter - user
n393 - monitored events count - both user and network - n392 - error threshold - both user and network
- n393 - monitored events count - both user and network
Frame-Relay only: Frame-Relay only:
* create n | delete n - adds / deletes PVC interface with DLCI #n. * create n | delete n - adds / deletes PVC interface with DLCI #n.
Newly created interface will be named pvc0, pvc1 etc. Newly created interface will be named pvc0, pvc1 etc.
...@@ -101,26 +131,34 @@ Frame-Relay only: ...@@ -101,26 +131,34 @@ Frame-Relay only:
Board-specific issues Board-specific issues
--------------------- ---------------------
n2.o and c101.o need parameters to work: n2.o and c101.o need parameters to work::
insmod n2 hw=io,irq,ram,ports[:io,irq,...] insmod n2 hw=io,irq,ram,ports[:io,irq,...]
example:
example::
insmod n2 hw=0x300,10,0xD0000,01 insmod n2 hw=0x300,10,0xD0000,01
or or::
insmod c101 hw=irq,ram[:irq,...] insmod c101 hw=irq,ram[:irq,...]
example:
example::
insmod c101 hw=9,0xdc000 insmod c101 hw=9,0xdc000
If built into the kernel, these drivers need kernel (command line) parameters: If built into the kernel, these drivers need kernel (command line) parameters::
n2.hw=io,irq,ram,ports:... n2.hw=io,irq,ram,ports:...
or
or::
c101.hw=irq,ram:... c101.hw=irq,ram:...
If you have a problem with N2, C101 or PLX200SYN card, you can issue the If you have a problem with N2, C101 or PLX200SYN card, you can issue the
"private" command to see port's packet descriptor rings (in kernel logs): "private" command to see port's packet descriptor rings (in kernel logs)::
sethdlc hdlc0 private sethdlc hdlc0 private
......
.. SPDX-License-Identifier: GPL-2.0
===============
Generic Netlink
===============
A wiki document on how to use Generic Netlink can be found here: A wiki document on how to use Generic Netlink can be found here:
* http://www.linuxfoundation.org/collaborate/workgroups/networking/generic_netlink_howto * http://www.linuxfoundation.org/collaborate/workgroups/networking/generic_netlink_howto
.. SPDX-License-Identifier: GPL-2.0
=====================================
The Linux kernel GTP tunneling module The Linux kernel GTP tunneling module
====================================================================== =====================================
Documentation by Harald Welte <laforge@gnumonks.org> and
Andreas Schultz <aschultz@tpip.net> Documentation by
Harald Welte <laforge@gnumonks.org> and
Andreas Schultz <aschultz@tpip.net>
In 'drivers/net/gtp.c' you are finding a kernel-level implementation In 'drivers/net/gtp.c' you are finding a kernel-level implementation
of a GTP tunnel endpoint. of a GTP tunnel endpoint.
== What is GTP == What is GTP
===========
GTP is the Generic Tunnel Protocol, which is a 3GPP protocol used for GTP is the Generic Tunnel Protocol, which is a 3GPP protocol used for
tunneling User-IP payload between a mobile station (phone, modem) tunneling User-IP payload between a mobile station (phone, modem)
...@@ -41,7 +47,8 @@ publicly via the 3GPP website at http://www.3gpp.org/DynaReport/29060.htm ...@@ -41,7 +47,8 @@ publicly via the 3GPP website at http://www.3gpp.org/DynaReport/29060.htm
A direct PDF link to v13.6.0 is provided for convenience below: A direct PDF link to v13.6.0 is provided for convenience below:
http://www.etsi.org/deliver/etsi_ts/129000_129099/129060/13.06.00_60/ts_129060v130600p.pdf http://www.etsi.org/deliver/etsi_ts/129000_129099/129060/13.06.00_60/ts_129060v130600p.pdf
== The Linux GTP tunnelling module == The Linux GTP tunnelling module
===============================
The module implements the function of a tunnel endpoint, i.e. it is The module implements the function of a tunnel endpoint, i.e. it is
able to decapsulate tunneled IP packets in the uplink originated by able to decapsulate tunneled IP packets in the uplink originated by
...@@ -70,7 +77,8 @@ Userspace :) ...@@ -70,7 +77,8 @@ Userspace :)
The official homepage of the module is at The official homepage of the module is at
https://osmocom.org/projects/linux-kernel-gtp-u/wiki https://osmocom.org/projects/linux-kernel-gtp-u/wiki
== Userspace Programs with Linux Kernel GTP-U support == Userspace Programs with Linux Kernel GTP-U support
==================================================
At the time of this writing, there are at least two Free Software At the time of this writing, there are at least two Free Software
implementations that implement GTP-C and can use the netlink interface implementations that implement GTP-C and can use the netlink interface
...@@ -82,7 +90,8 @@ to make use of the Linux kernel GTP-U support: ...@@ -82,7 +90,8 @@ to make use of the Linux kernel GTP-U support:
* ergw (GGSN + P-GW in Erlang): * ergw (GGSN + P-GW in Erlang):
https://github.com/travelping/ergw https://github.com/travelping/ergw
== Userspace Library / Command Line Utilities == Userspace Library / Command Line Utilities
==========================================
There is a userspace library called 'libgtpnl' which is based on There is a userspace library called 'libgtpnl' which is based on
libmnl and which implements a C-language API towards the netlink libmnl and which implements a C-language API towards the netlink
...@@ -90,7 +99,8 @@ interface provided by the Kernel GTP module: ...@@ -90,7 +99,8 @@ interface provided by the Kernel GTP module:
http://git.osmocom.org/libgtpnl/ http://git.osmocom.org/libgtpnl/
== Protocol Versions == Protocol Versions
=================
There are two different versions of GTP-U: v0 [GSM TS 09.60] and v1 There are two different versions of GTP-U: v0 [GSM TS 09.60] and v1
[3GPP TS 29.281]. Both are implemented in the Kernel GTP module. [3GPP TS 29.281]. Both are implemented in the Kernel GTP module.
...@@ -105,7 +115,8 @@ doesn't implement GTP-C, we don't have to worry about this. It's the ...@@ -105,7 +115,8 @@ doesn't implement GTP-C, we don't have to worry about this. It's the
responsibility of the control plane implementation in userspace to responsibility of the control plane implementation in userspace to
implement that. implement that.
== IPv6 == IPv6
====
The 3GPP specifications indicate either IPv4 or IPv6 can be used both The 3GPP specifications indicate either IPv4 or IPv6 can be used both
on the inner (user) IP layer, or on the outer (transport) layer. on the inner (user) IP layer, or on the outer (transport) layer.
...@@ -114,22 +125,25 @@ Unfortunately, the Kernel module currently supports IPv6 neither for ...@@ -114,22 +125,25 @@ Unfortunately, the Kernel module currently supports IPv6 neither for
the User IP payload, nor for the outer IP layer. Patches or other the User IP payload, nor for the outer IP layer. Patches or other
Contributions to fix this are most welcome! Contributions to fix this are most welcome!
== Mailing List == Mailing List
============
If yo have questions regarding how to use the Kernel GTP module from If you have questions regarding how to use the Kernel GTP module from
your own software, or want to contribute to the code, please use the your own software, or want to contribute to the code, please use the
osmocom-net-grps mailing list for related discussion. The list can be osmocom-net-grps mailing list for related discussion. The list can be
reached at osmocom-net-gprs@lists.osmocom.org and the mailman reached at osmocom-net-gprs@lists.osmocom.org and the mailman
interface for managing your subscription is at interface for managing your subscription is at
https://lists.osmocom.org/mailman/listinfo/osmocom-net-gprs https://lists.osmocom.org/mailman/listinfo/osmocom-net-gprs
== Issue Tracker == Issue Tracker
=============
The Osmocom project maintains an issue tracker for the Kernel GTP-U The Osmocom project maintains an issue tracker for the Kernel GTP-U
module at module at
https://osmocom.org/projects/linux-kernel-gtp-u/issues https://osmocom.org/projects/linux-kernel-gtp-u/issues
== History / Acknowledgements == History / Acknowledgements
==========================
The Module was originally created in 2012 by Harald Welte, but never The Module was originally created in 2012 by Harald Welte, but never
completed. Pablo came in to finish the mess Harald left behind. But completed. Pablo came in to finish the mess Harald left behind. But
...@@ -139,9 +153,11 @@ In 2015, Andreas Schultz came to the rescue and fixed lots more bugs, ...@@ -139,9 +153,11 @@ In 2015, Andreas Schultz came to the rescue and fixed lots more bugs,
extended it with new features and finally pushed all of us to get it extended it with new features and finally pushed all of us to get it
mainline, where it was merged in 4.7.0. mainline, where it was merged in 4.7.0.
== Architectural Details == Architectural Details
=====================
=== Local GTP-U entity and tunnel identification === Local GTP-U entity and tunnel identification
--------------------------------------------
GTP-U uses UDP for transporting PDU's. The receiving UDP port is 2152 GTP-U uses UDP for transporting PDU's. The receiving UDP port is 2152
for GTPv1-U and 3386 for GTPv0-U. for GTPv1-U and 3386 for GTPv0-U.
...@@ -164,15 +180,15 @@ Therefore: ...@@ -164,15 +180,15 @@ Therefore:
destination IP and the tunnel endpoint id. The source IP and port destination IP and the tunnel endpoint id. The source IP and port
have no meaning and can change at any time. have no meaning and can change at any time.
[3GPP TS 29.281] Section 4.3.0 defines this so: [3GPP TS 29.281] Section 4.3.0 defines this so::
> The TEID in the GTP-U header is used to de-multiplex traffic The TEID in the GTP-U header is used to de-multiplex traffic
> incoming from remote tunnel endpoints so that it is delivered to the incoming from remote tunnel endpoints so that it is delivered to the
> User plane entities in a way that allows multiplexing of different User plane entities in a way that allows multiplexing of different
> users, different packet protocols and different QoS levels. users, different packet protocols and different QoS levels.
> Therefore no two remote GTP-U endpoints shall send traffic to a Therefore no two remote GTP-U endpoints shall send traffic to a
> GTP-U protocol entity using the same TEID value except GTP-U protocol entity using the same TEID value except
> for data forwarding as part of mobility procedures. for data forwarding as part of mobility procedures.
The definition above only defines that two remote GTP-U endpoints The definition above only defines that two remote GTP-U endpoints
*should not* send to the same TEID, it *does not* forbid or exclude *should not* send to the same TEID, it *does not* forbid or exclude
...@@ -183,7 +199,8 @@ multiple or unknown peers. ...@@ -183,7 +199,8 @@ multiple or unknown peers.
Therefore, the receiving side identifies tunnels exclusively based on Therefore, the receiving side identifies tunnels exclusively based on
TEIDs, not based on the source IP! TEIDs, not based on the source IP!
== APN vs. Network Device == APN vs. Network Device
======================
The GTP-U driver creates a Linux network device for each Gi/SGi The GTP-U driver creates a Linux network device for each Gi/SGi
interface. interface.
...@@ -201,29 +218,33 @@ number of Gi/SGi interfaces implemented by a GGSN/P-GW. ...@@ -201,29 +218,33 @@ number of Gi/SGi interfaces implemented by a GGSN/P-GW.
[3GPP TS 29.061] Section 11.3 makes it clear that the selection of a [3GPP TS 29.061] Section 11.3 makes it clear that the selection of a
specific Gi/SGi interfaces is made through the Access Point Name specific Gi/SGi interfaces is made through the Access Point Name
(APN): (APN)::
> 2. each private network manages its own addressing. In general this 2. each private network manages its own addressing. In general this
> will result in different private networks having overlapping will result in different private networks having overlapping
> address ranges. A logically separate connection (e.g. an IP in IP address ranges. A logically separate connection (e.g. an IP in IP
> tunnel or layer 2 virtual circuit) is used between the GGSN/P-GW tunnel or layer 2 virtual circuit) is used between the GGSN/P-GW
> and each private network. and each private network.
>
> In this case the IP address alone is not necessarily unique. The In this case the IP address alone is not necessarily unique. The
> pair of values, Access Point Name (APN) and IPv4 address and/or pair of values, Access Point Name (APN) and IPv4 address and/or
> IPv6 prefixes, is unique. IPv6 prefixes, is unique.
In order to support the overlapping address range use case, each APN In order to support the overlapping address range use case, each APN
is mapped to a separate Gi/SGi interface (network device). is mapped to a separate Gi/SGi interface (network device).
NOTE: The Access Point Name is purely a control plane (GTP-C) concept. .. note::
At the GTP-U level, only Tunnel Endpoint Identifiers are present in
GTP-U packets and network devices are known The Access Point Name is purely a control plane (GTP-C) concept.
At the GTP-U level, only Tunnel Endpoint Identifiers are present in
GTP-U packets and network devices are known
Therefore for a given UE the mapping in IP to PDN network is: Therefore for a given UE the mapping in IP to PDN network is:
* network device + MS IP -> Peer IP + Peer TEID, * network device + MS IP -> Peer IP + Peer TEID,
and from PDN to IP network: and from PDN to IP network:
* local GTP-U IP + TEID -> network device * local GTP-U IP + TEID -> network device
Furthermore, before a received T-PDU is injected into the network Furthermore, before a received T-PDU is injected into the network
......
.. SPDX-License-Identifier: GPL-2.0
============================================================
Linux Kernel Driver for Huawei Intelligent NIC(HiNIC) family Linux Kernel Driver for Huawei Intelligent NIC(HiNIC) family
============================================================ ============================================================
...@@ -110,7 +113,7 @@ hinic_dev - de/constructs the Logical Tx and Rx Queues. ...@@ -110,7 +113,7 @@ hinic_dev - de/constructs the Logical Tx and Rx Queues.
(hinic_main.c, hinic_dev.h) (hinic_main.c, hinic_dev.h)
Miscellaneous: Miscellaneous
============= =============
Common functions that are used by HW and Logical Device. Common functions that are used by HW and Logical Device.
......
.. SPDX-License-Identifier: GPL-2.0
===================================
Identifier Locator Addressing (ILA) Identifier Locator Addressing (ILA)
===================================
Introduction Introduction
...@@ -26,11 +30,13 @@ The ILA protocol is described in Internet-Draft draft-herbert-intarea-ila. ...@@ -26,11 +30,13 @@ The ILA protocol is described in Internet-Draft draft-herbert-intarea-ila.
ILA terminology ILA terminology
=============== ===============
- Identifier A number that identifies an addressable node in the network - Identifier
A number that identifies an addressable node in the network
independent of its location. ILA identifiers are sixty-four independent of its location. ILA identifiers are sixty-four
bit values. bit values.
- Locator A network prefix that routes to a physical host. Locators - Locator
A network prefix that routes to a physical host. Locators
provide the topological location of an addressed node. ILA provide the topological location of an addressed node. ILA
locators are sixty-four bit prefixes. locators are sixty-four bit prefixes.
...@@ -51,17 +57,20 @@ ILA terminology ...@@ -51,17 +57,20 @@ ILA terminology
bits) and an identifier (low order sixty-four bits). ILA bits) and an identifier (low order sixty-four bits). ILA
addresses are never visible to an application. addresses are never visible to an application.
- ILA host An end host that is capable of performing ILA translations - ILA host
An end host that is capable of performing ILA translations
on transmit or receive. on transmit or receive.
- ILA router A network node that performs ILA translation and forwarding - ILA router
A network node that performs ILA translation and forwarding
of translated packets. of translated packets.
- ILA forwarding cache - ILA forwarding cache
A type of ILA router that only maintains a working set A type of ILA router that only maintains a working set
cache of mappings. cache of mappings.
- ILA node A network node capable of performing ILA translations. This - ILA node
A network node capable of performing ILA translations. This
can be an ILA router, ILA forwarding cache, or ILA host. can be an ILA router, ILA forwarding cache, or ILA host.
...@@ -82,18 +91,18 @@ Configuration and datapath for these two points of deployment is somewhat ...@@ -82,18 +91,18 @@ Configuration and datapath for these two points of deployment is somewhat
different. different.
The diagram below illustrates the flow of packets through ILA as well The diagram below illustrates the flow of packets through ILA as well
as showing ILA hosts and routers. as showing ILA hosts and routers::
+--------+ +--------+ +--------+ +--------+
| Host A +-+ +--->| Host B | | Host A +-+ +--->| Host B |
| | | (2) ILA (') | | | | | (2) ILA (') | |
+--------+ | ...addressed.... ( ) +--------+ +--------+ | ...addressed.... ( ) +--------+
V +---+--+ . packet . +---+--+ (_) V +---+--+ . packet . +---+--+ (_)
(1) SIR | | ILA |----->-------->---->| ILA | | (3) SIR (1) SIR | | ILA |----->-------->---->| ILA | | (3) SIR
addressed +->|router| . . |router|->-+ addressed addressed +->|router| . . |router|->-+ addressed
packet +---+--+ . IPv6 . +---+--+ packet packet +---+--+ . IPv6 . +---+--+ packet
/ . Network . / . Network .
/ . . +--+-++--------+ / . . +--+-++--------+
+--------+ / . . |ILA || Host | +--------+ / . . |ILA || Host |
| Host +--+ . .- -|host|| | | Host +--+ . .- -|host|| |
| | . . +--+-++--------+ | | . . +--+-++--------+
...@@ -173,7 +182,7 @@ ILA address, never a SIR address. ...@@ -173,7 +182,7 @@ ILA address, never a SIR address.
In the simplest format the identifier types, C-bit, and checksum In the simplest format the identifier types, C-bit, and checksum
adjustment value are not present so an identifier is considered an adjustment value are not present so an identifier is considered an
unstructured sixty-four bit value. unstructured sixty-four bit value::
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identifier | | Identifier |
...@@ -184,7 +193,7 @@ unstructured sixty-four bit value. ...@@ -184,7 +193,7 @@ unstructured sixty-four bit value.
The checksum neutral adjustment may be configured to always be The checksum neutral adjustment may be configured to always be
present using neutral-map-auto. In this case there is no C-bit, but the present using neutral-map-auto. In this case there is no C-bit, but the
checksum adjustment is in the low order 16 bits. The identifier is checksum adjustment is in the low order 16 bits. The identifier is
still sixty-four bits. still sixty-four bits::
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identifier | | Identifier |
...@@ -193,7 +202,7 @@ still sixty-four bits. ...@@ -193,7 +202,7 @@ still sixty-four bits.
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The C-bit may used to explicitly indicate that checksum neutral The C-bit may used to explicitly indicate that checksum neutral
mapping has been applied to an ILA address. The format is: mapping has been applied to an ILA address. The format is::
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |C| Identifier | | |C| Identifier |
...@@ -204,7 +213,7 @@ mapping has been applied to an ILA address. The format is: ...@@ -204,7 +213,7 @@ mapping has been applied to an ILA address. The format is:
The identifier type field may be present to indicate the identifier The identifier type field may be present to indicate the identifier
type. If it is not present then the type is inferred based on mapping type. If it is not present then the type is inferred based on mapping
configuration. The checksum neutral adjustment may automatically configuration. The checksum neutral adjustment may automatically
used with the identifier type as illustrated below. used with the identifier type as illustrated below::
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type| Identifier | | Type| Identifier |
...@@ -213,7 +222,7 @@ used with the identifier type as illustrated below. ...@@ -213,7 +222,7 @@ used with the identifier type as illustrated below.
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
If the identifier type and the C-bit can be present simultaneously so If the identifier type and the C-bit can be present simultaneously so
the identifier format would be: the identifier format would be::
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type|C| Identifier | | Type|C| Identifier |
...@@ -258,28 +267,30 @@ same meanings as described above. ...@@ -258,28 +267,30 @@ same meanings as described above.
Some examples Some examples
============= =============
# Configure an ILA route that uses checksum neutral mapping as well ::
# as type field. Note that the type field is set in the SIR address
# (the 2000 implies type is 1 which is LUID). # Configure an ILA route that uses checksum neutral mapping as well
ip route add 3333:0:0:1:2000:0:1:87/128 encap ila 2001:0:87:0 \ # as type field. Note that the type field is set in the SIR address
csum-mode neutral-map ident-type use-format # (the 2000 implies type is 1 which is LUID).
ip route add 3333:0:0:1:2000:0:1:87/128 encap ila 2001:0:87:0 \
# Configure an ILA LWT route that uses auto checksum neutral mapping csum-mode neutral-map ident-type use-format
# (no C-bit) and configure identifier type to be LUID so that the
# identifier type field will not be present. # Configure an ILA LWT route that uses auto checksum neutral mapping
ip route add 3333:0:0:1:2000:0:2:87/128 encap ila 2001:0:87:1 \ # (no C-bit) and configure identifier type to be LUID so that the
csum-mode neutral-map-auto ident-type luid # identifier type field will not be present.
ip route add 3333:0:0:1:2000:0:2:87/128 encap ila 2001:0:87:1 \
ila_xlat configuration csum-mode neutral-map-auto ident-type luid
# Configure an ILA to SIR mapping that matches a locator and overwrites ila_xlat configuration
# it with a SIR address (3333:0:0:1 in this example). The C-bit and
# identifier field are used. # Configure an ILA to SIR mapping that matches a locator and overwrites
ip ila add loc_match 2001:0:119:0 loc 3333:0:0:1 \ # it with a SIR address (3333:0:0:1 in this example). The C-bit and
csum-mode neutral-map-auto ident-type use-format # identifier field are used.
ip ila add loc_match 2001:0:119:0 loc 3333:0:0:1 \
# Configure an ILA to SIR mapping where checksum neutral is automatically csum-mode neutral-map-auto ident-type use-format
# set without the C-bit and the identifier type is configured to be LUID
# so that the identifier type field is not present. # Configure an ILA to SIR mapping where checksum neutral is automatically
ip ila add loc_match 2001:0:119:0 loc 3333:0:0:1 \ # set without the C-bit and the identifier type is configured to be LUID
csum-mode neutral-map-auto ident-type use-format # so that the identifier type field is not present.
ip ila add loc_match 2001:0:119:0 loc 3333:0:0:1 \
csum-mode neutral-map-auto ident-type use-format
...@@ -15,6 +15,7 @@ Contents: ...@@ -15,6 +15,7 @@ Contents:
device_drivers/index device_drivers/index
dsa/index dsa/index
devlink/index devlink/index
caif/index
ethtool-netlink ethtool-netlink
ieee802154 ieee802154
j1939 j1939
...@@ -36,6 +37,43 @@ Contents: ...@@ -36,6 +37,43 @@ Contents:
tls-offload tls-offload
nfc nfc
6lowpan 6lowpan
6pack
altera_tse
arcnet-hardware
arcnet
atm
ax25
baycom
bonding
cdc_mbim
cops
cxacru
dccp
dctcp
decnet
defza
dns_resolver
driver
eql
fib_trie
filter
fore200e
framerelay
generic-hdlc
generic_netlink
gen_stats
gtp
hinic
ila
ipddp
ip_dynaddr
iphase
ipsec
ip-sysctl
ipv6
ipvlan
ipvs-sysctl
kcm
.. only:: subproject and html .. only:: subproject and html
......
/proc/sys/net/ipv4/* Variables: .. SPDX-License-Identifier: GPL-2.0
=========
IP Sysctl
=========
/proc/sys/net/ipv4/* Variables
==============================
ip_forward - BOOLEAN ip_forward - BOOLEAN
0 - disabled (default) - 0 - disabled (default)
not 0 - enabled - not 0 - enabled
Forward Packets between interfaces. Forward Packets between interfaces.
...@@ -38,6 +45,7 @@ ip_no_pmtu_disc - INTEGER ...@@ -38,6 +45,7 @@ ip_no_pmtu_disc - INTEGER
could break other protocols. could break other protocols.
Possible values: 0-3 Possible values: 0-3
Default: FALSE Default: FALSE
min_pmtu - INTEGER min_pmtu - INTEGER
...@@ -51,16 +59,20 @@ ip_forward_use_pmtu - BOOLEAN ...@@ -51,16 +59,20 @@ ip_forward_use_pmtu - BOOLEAN
which tries to discover path mtus by itself and depends on the which tries to discover path mtus by itself and depends on the
kernel honoring this information. This is normally not the kernel honoring this information. This is normally not the
case. case.
Default: 0 (disabled) Default: 0 (disabled)
Possible values: Possible values:
0 - disabled
1 - enabled - 0 - disabled
- 1 - enabled
fwmark_reflect - BOOLEAN fwmark_reflect - BOOLEAN
Controls the fwmark of kernel-generated IPv4 reply packets that are not Controls the fwmark of kernel-generated IPv4 reply packets that are not
associated with a socket for example, TCP RSTs or ICMP echo replies). associated with a socket for example, TCP RSTs or ICMP echo replies).
If unset, these packets have a fwmark of zero. If set, they have the If unset, these packets have a fwmark of zero. If set, they have the
fwmark of the packet they are replying to. fwmark of the packet they are replying to.
Default: 0 Default: 0
fib_multipath_use_neigh - BOOLEAN fib_multipath_use_neigh - BOOLEAN
...@@ -68,63 +80,80 @@ fib_multipath_use_neigh - BOOLEAN ...@@ -68,63 +80,80 @@ fib_multipath_use_neigh - BOOLEAN
multipath routes. If disabled, neighbor information is not used and multipath routes. If disabled, neighbor information is not used and
packets could be directed to a failed nexthop. Only valid for kernels packets could be directed to a failed nexthop. Only valid for kernels
built with CONFIG_IP_ROUTE_MULTIPATH enabled. built with CONFIG_IP_ROUTE_MULTIPATH enabled.
Default: 0 (disabled) Default: 0 (disabled)
Possible values: Possible values:
0 - disabled
1 - enabled - 0 - disabled
- 1 - enabled
fib_multipath_hash_policy - INTEGER fib_multipath_hash_policy - INTEGER
Controls which hash policy to use for multipath routes. Only valid Controls which hash policy to use for multipath routes. Only valid
for kernels built with CONFIG_IP_ROUTE_MULTIPATH enabled. for kernels built with CONFIG_IP_ROUTE_MULTIPATH enabled.
Default: 0 (Layer 3) Default: 0 (Layer 3)
Possible values: Possible values:
0 - Layer 3
1 - Layer 4 - 0 - Layer 3
2 - Layer 3 or inner Layer 3 if present - 1 - Layer 4
- 2 - Layer 3 or inner Layer 3 if present
fib_sync_mem - UNSIGNED INTEGER fib_sync_mem - UNSIGNED INTEGER
Amount of dirty memory from fib entries that can be backlogged before Amount of dirty memory from fib entries that can be backlogged before
synchronize_rcu is forced. synchronize_rcu is forced.
Default: 512kB Minimum: 64kB Maximum: 64MB
Default: 512kB Minimum: 64kB Maximum: 64MB
ip_forward_update_priority - INTEGER ip_forward_update_priority - INTEGER
Whether to update SKB priority from "TOS" field in IPv4 header after it Whether to update SKB priority from "TOS" field in IPv4 header after it
is forwarded. The new SKB priority is mapped from TOS field value is forwarded. The new SKB priority is mapped from TOS field value
according to an rt_tos2priority table (see e.g. man tc-prio). according to an rt_tos2priority table (see e.g. man tc-prio).
Default: 1 (Update priority.) Default: 1 (Update priority.)
Possible values: Possible values:
0 - Do not update priority.
1 - Update priority. - 0 - Do not update priority.
- 1 - Update priority.
route/max_size - INTEGER route/max_size - INTEGER
Maximum number of routes allowed in the kernel. Increase Maximum number of routes allowed in the kernel. Increase
this when using large numbers of interfaces and/or routes. this when using large numbers of interfaces and/or routes.
From linux kernel 3.6 onwards, this is deprecated for ipv4 From linux kernel 3.6 onwards, this is deprecated for ipv4
as route cache is no longer used. as route cache is no longer used.
neigh/default/gc_thresh1 - INTEGER neigh/default/gc_thresh1 - INTEGER
Minimum number of entries to keep. Garbage collector will not Minimum number of entries to keep. Garbage collector will not
purge entries if there are fewer than this number. purge entries if there are fewer than this number.
Default: 128 Default: 128
neigh/default/gc_thresh2 - INTEGER neigh/default/gc_thresh2 - INTEGER
Threshold when garbage collector becomes more aggressive about Threshold when garbage collector becomes more aggressive about
purging entries. Entries older than 5 seconds will be cleared purging entries. Entries older than 5 seconds will be cleared
when over this number. when over this number.
Default: 512 Default: 512
neigh/default/gc_thresh3 - INTEGER neigh/default/gc_thresh3 - INTEGER
Maximum number of non-PERMANENT neighbor entries allowed. Increase Maximum number of non-PERMANENT neighbor entries allowed. Increase
this when using large numbers of interfaces and when communicating this when using large numbers of interfaces and when communicating
with large numbers of directly-connected peers. with large numbers of directly-connected peers.
Default: 1024 Default: 1024
neigh/default/unres_qlen_bytes - INTEGER neigh/default/unres_qlen_bytes - INTEGER
The maximum number of bytes which may be used by packets The maximum number of bytes which may be used by packets
queued for each unresolved address by other network layers. queued for each unresolved address by other network layers.
(added in linux 3.3) (added in linux 3.3)
Setting negative value is meaningless and will return error. Setting negative value is meaningless and will return error.
Default: SK_WMEM_MAX, (same as net.core.wmem_default). Default: SK_WMEM_MAX, (same as net.core.wmem_default).
Exact value depends on architecture and kernel options, Exact value depends on architecture and kernel options,
but should be enough to allow queuing 256 packets but should be enough to allow queuing 256 packets
of medium size. of medium size.
...@@ -132,11 +161,14 @@ neigh/default/unres_qlen_bytes - INTEGER ...@@ -132,11 +161,14 @@ neigh/default/unres_qlen_bytes - INTEGER
neigh/default/unres_qlen - INTEGER neigh/default/unres_qlen - INTEGER
The maximum number of packets which may be queued for each The maximum number of packets which may be queued for each
unresolved address by other network layers. unresolved address by other network layers.
(deprecated in linux 3.3) : use unres_qlen_bytes instead. (deprecated in linux 3.3) : use unres_qlen_bytes instead.
Prior to linux 3.3, the default value is 3 which may cause Prior to linux 3.3, the default value is 3 which may cause
unexpected packet loss. The current default value is calculated unexpected packet loss. The current default value is calculated
according to default value of unres_qlen_bytes and true size of according to default value of unres_qlen_bytes and true size of
packet. packet.
Default: 101 Default: 101
mtu_expires - INTEGER mtu_expires - INTEGER
...@@ -183,7 +215,8 @@ ipfrag_max_dist - INTEGER ...@@ -183,7 +215,8 @@ ipfrag_max_dist - INTEGER
from different IP datagrams, which could result in data corruption. from different IP datagrams, which could result in data corruption.
Default: 64 Default: 64
INET peer storage: INET peer storage
=================
inet_peer_threshold - INTEGER inet_peer_threshold - INTEGER
The approximate size of the storage. Starting from this threshold The approximate size of the storage. Starting from this threshold
...@@ -203,7 +236,8 @@ inet_peer_maxttl - INTEGER ...@@ -203,7 +236,8 @@ inet_peer_maxttl - INTEGER
when the number of entries in the pool is very small). when the number of entries in the pool is very small).
Measured in seconds. Measured in seconds.
TCP variables: TCP variables
=============
somaxconn - INTEGER somaxconn - INTEGER
Limit of socket listen() backlog, known in userspace as SOMAXCONN. Limit of socket listen() backlog, known in userspace as SOMAXCONN.
...@@ -222,18 +256,22 @@ tcp_adv_win_scale - INTEGER ...@@ -222,18 +256,22 @@ tcp_adv_win_scale - INTEGER
Count buffering overhead as bytes/2^tcp_adv_win_scale Count buffering overhead as bytes/2^tcp_adv_win_scale
(if tcp_adv_win_scale > 0) or bytes-bytes/2^(-tcp_adv_win_scale), (if tcp_adv_win_scale > 0) or bytes-bytes/2^(-tcp_adv_win_scale),
if it is <= 0. if it is <= 0.
Possible values are [-31, 31], inclusive. Possible values are [-31, 31], inclusive.
Default: 1 Default: 1
tcp_allowed_congestion_control - STRING tcp_allowed_congestion_control - STRING
Show/set the congestion control choices available to non-privileged Show/set the congestion control choices available to non-privileged
processes. The list is a subset of those listed in processes. The list is a subset of those listed in
tcp_available_congestion_control. tcp_available_congestion_control.
Default is "reno" and the default setting (tcp_congestion_control). Default is "reno" and the default setting (tcp_congestion_control).
tcp_app_win - INTEGER tcp_app_win - INTEGER
Reserve max(window/2^tcp_app_win, mss) of window for application Reserve max(window/2^tcp_app_win, mss) of window for application
buffer. Value 0 is special, it means that nothing is reserved. buffer. Value 0 is special, it means that nothing is reserved.
Default: 31 Default: 31
tcp_autocorking - BOOLEAN tcp_autocorking - BOOLEAN
...@@ -244,6 +282,7 @@ tcp_autocorking - BOOLEAN ...@@ -244,6 +282,7 @@ tcp_autocorking - BOOLEAN
packet for the flow is waiting in Qdisc queues or device transmit packet for the flow is waiting in Qdisc queues or device transmit
queue. Applications can still use TCP_CORK for optimal behavior queue. Applications can still use TCP_CORK for optimal behavior
when they know how/when to uncork their sockets. when they know how/when to uncork their sockets.
Default : 1 Default : 1
tcp_available_congestion_control - STRING tcp_available_congestion_control - STRING
...@@ -265,6 +304,7 @@ tcp_mtu_probe_floor - INTEGER ...@@ -265,6 +304,7 @@ tcp_mtu_probe_floor - INTEGER
tcp_min_snd_mss - INTEGER tcp_min_snd_mss - INTEGER
TCP SYN and SYNACK messages usually advertise an ADVMSS option, TCP SYN and SYNACK messages usually advertise an ADVMSS option,
as described in RFC 1122 and RFC 6691. as described in RFC 1122 and RFC 6691.
If this ADVMSS option is smaller than tcp_min_snd_mss, If this ADVMSS option is smaller than tcp_min_snd_mss,
it is silently capped to tcp_min_snd_mss. it is silently capped to tcp_min_snd_mss.
...@@ -277,6 +317,7 @@ tcp_congestion_control - STRING ...@@ -277,6 +317,7 @@ tcp_congestion_control - STRING
Default is set as part of kernel configuration. Default is set as part of kernel configuration.
For passive connections, the listener congestion control choice For passive connections, the listener congestion control choice
is inherited. is inherited.
[see setsockopt(listenfd, SOL_TCP, TCP_CONGESTION, "name" ...) ] [see setsockopt(listenfd, SOL_TCP, TCP_CONGESTION, "name" ...) ]
tcp_dsack - BOOLEAN tcp_dsack - BOOLEAN
...@@ -286,9 +327,12 @@ tcp_early_retrans - INTEGER ...@@ -286,9 +327,12 @@ tcp_early_retrans - INTEGER
Tail loss probe (TLP) converts RTOs occurring due to tail Tail loss probe (TLP) converts RTOs occurring due to tail
losses into fast recovery (draft-ietf-tcpm-rack). Note that losses into fast recovery (draft-ietf-tcpm-rack). Note that
TLP requires RACK to function properly (see tcp_recovery below) TLP requires RACK to function properly (see tcp_recovery below)
Possible values: Possible values:
0 disables TLP
3 or 4 enables TLP - 0 disables TLP
- 3 or 4 enables TLP
Default: 3 Default: 3
tcp_ecn - INTEGER tcp_ecn - INTEGER
...@@ -297,12 +341,17 @@ tcp_ecn - INTEGER ...@@ -297,12 +341,17 @@ tcp_ecn - INTEGER
support for it. This feature is useful in avoiding losses due support for it. This feature is useful in avoiding losses due
to congestion by allowing supporting routers to signal to congestion by allowing supporting routers to signal
congestion before having to drop packets. congestion before having to drop packets.
Possible values are: Possible values are:
0 Disable ECN. Neither initiate nor accept ECN.
1 Enable ECN when requested by incoming connections and = =====================================================
also request ECN on outgoing connection attempts. 0 Disable ECN. Neither initiate nor accept ECN.
2 Enable ECN when requested by incoming connections 1 Enable ECN when requested by incoming connections and
but do not request ECN on outgoing connections. also request ECN on outgoing connection attempts.
2 Enable ECN when requested by incoming connections
but do not request ECN on outgoing connections.
= =====================================================
Default: 2 Default: 2
tcp_ecn_fallback - BOOLEAN tcp_ecn_fallback - BOOLEAN
...@@ -312,6 +361,7 @@ tcp_ecn_fallback - BOOLEAN ...@@ -312,6 +361,7 @@ tcp_ecn_fallback - BOOLEAN
additional detection mechanisms could be implemented under this additional detection mechanisms could be implemented under this
knob. The value is not used, if tcp_ecn or per route (or congestion knob. The value is not used, if tcp_ecn or per route (or congestion
control) ECN settings are disabled. control) ECN settings are disabled.
Default: 1 (fallback enabled) Default: 1 (fallback enabled)
tcp_fack - BOOLEAN tcp_fack - BOOLEAN
...@@ -324,7 +374,9 @@ tcp_fin_timeout - INTEGER ...@@ -324,7 +374,9 @@ tcp_fin_timeout - INTEGER
valid "receive only" state for an un-orphaned connection, an valid "receive only" state for an un-orphaned connection, an
orphaned connection in FIN_WAIT_2 state could otherwise wait orphaned connection in FIN_WAIT_2 state could otherwise wait
forever for the remote to close its end of the connection. forever for the remote to close its end of the connection.
Cf. tcp_max_orphans Cf. tcp_max_orphans
Default: 60 seconds Default: 60 seconds
tcp_frto - INTEGER tcp_frto - INTEGER
...@@ -390,7 +442,8 @@ tcp_l3mdev_accept - BOOLEAN ...@@ -390,7 +442,8 @@ tcp_l3mdev_accept - BOOLEAN
derived from the listen socket to be bound to the L3 domain in derived from the listen socket to be bound to the L3 domain in
which the packets originated. Only valid when the kernel was which the packets originated. Only valid when the kernel was
compiled with CONFIG_NET_L3_MASTER_DEV. compiled with CONFIG_NET_L3_MASTER_DEV.
Default: 0 (disabled)
Default: 0 (disabled)
tcp_low_latency - BOOLEAN tcp_low_latency - BOOLEAN
This is a legacy option, it has no effect anymore. This is a legacy option, it has no effect anymore.
...@@ -410,10 +463,14 @@ tcp_max_orphans - INTEGER ...@@ -410,10 +463,14 @@ tcp_max_orphans - INTEGER
tcp_max_syn_backlog - INTEGER tcp_max_syn_backlog - INTEGER
Maximal number of remembered connection requests (SYN_RECV), Maximal number of remembered connection requests (SYN_RECV),
which have not received an acknowledgment from connecting client. which have not received an acknowledgment from connecting client.
This is a per-listener limit. This is a per-listener limit.
The minimal value is 128 for low memory machines, and it will The minimal value is 128 for low memory machines, and it will
increase in proportion to the memory of machine. increase in proportion to the memory of machine.
If server suffers from overload, try increasing this number. If server suffers from overload, try increasing this number.
Remember to also check /proc/sys/net/core/somaxconn Remember to also check /proc/sys/net/core/somaxconn
A SYN_RECV request socket consumes about 304 bytes of memory. A SYN_RECV request socket consumes about 304 bytes of memory.
...@@ -445,7 +502,9 @@ tcp_min_rtt_wlen - INTEGER ...@@ -445,7 +502,9 @@ tcp_min_rtt_wlen - INTEGER
minimum RTT when it is moved to a longer path (e.g., due to traffic minimum RTT when it is moved to a longer path (e.g., due to traffic
engineering). A longer window makes the filter more resistant to RTT engineering). A longer window makes the filter more resistant to RTT
inflations such as transient congestion. The unit is seconds. inflations such as transient congestion. The unit is seconds.
Possible values: 0 - 86400 (1 day) Possible values: 0 - 86400 (1 day)
Default: 300 Default: 300
tcp_moderate_rcvbuf - BOOLEAN tcp_moderate_rcvbuf - BOOLEAN
...@@ -457,9 +516,10 @@ tcp_moderate_rcvbuf - BOOLEAN ...@@ -457,9 +516,10 @@ tcp_moderate_rcvbuf - BOOLEAN
tcp_mtu_probing - INTEGER tcp_mtu_probing - INTEGER
Controls TCP Packetization-Layer Path MTU Discovery. Takes three Controls TCP Packetization-Layer Path MTU Discovery. Takes three
values: values:
0 - Disabled
1 - Disabled by default, enabled when an ICMP black hole detected - 0 - Disabled
2 - Always enabled, use initial MSS of tcp_base_mss. - 1 - Disabled by default, enabled when an ICMP black hole detected
- 2 - Always enabled, use initial MSS of tcp_base_mss.
tcp_probe_interval - UNSIGNED INTEGER tcp_probe_interval - UNSIGNED INTEGER
Controls how often to start TCP Packetization-Layer Path MTU Controls how often to start TCP Packetization-Layer Path MTU
...@@ -481,6 +541,7 @@ tcp_no_metrics_save - BOOLEAN ...@@ -481,6 +541,7 @@ tcp_no_metrics_save - BOOLEAN
tcp_no_ssthresh_metrics_save - BOOLEAN tcp_no_ssthresh_metrics_save - BOOLEAN
Controls whether TCP saves ssthresh metrics in the route cache. Controls whether TCP saves ssthresh metrics in the route cache.
Default is 1, which disables ssthresh metrics. Default is 1, which disables ssthresh metrics.
tcp_orphan_retries - INTEGER tcp_orphan_retries - INTEGER
...@@ -489,6 +550,7 @@ tcp_orphan_retries - INTEGER ...@@ -489,6 +550,7 @@ tcp_orphan_retries - INTEGER
See tcp_retries2 for more details. See tcp_retries2 for more details.
The default value is 8. The default value is 8.
If your machine is a loaded WEB server, If your machine is a loaded WEB server,
you should think about lowering this value, such sockets you should think about lowering this value, such sockets
may consume significant resources. Cf. tcp_max_orphans. may consume significant resources. Cf. tcp_max_orphans.
...@@ -497,11 +559,15 @@ tcp_recovery - INTEGER ...@@ -497,11 +559,15 @@ tcp_recovery - INTEGER
This value is a bitmap to enable various experimental loss recovery This value is a bitmap to enable various experimental loss recovery
features. features.
RACK: 0x1 enables the RACK loss detection for fast detection of lost ========= =============================================================
retransmissions and tail drops. It also subsumes and disables RACK: 0x1 enables the RACK loss detection for fast detection of lost
RFC6675 recovery for SACK connections. retransmissions and tail drops. It also subsumes and disables
RACK: 0x2 makes RACK's reordering window static (min_rtt/4). RFC6675 recovery for SACK connections.
RACK: 0x4 disables RACK's DUPACK threshold heuristic
RACK: 0x2 makes RACK's reordering window static (min_rtt/4).
RACK: 0x4 disables RACK's DUPACK threshold heuristic
========= =============================================================
Default: 0x1 Default: 0x1
...@@ -509,12 +575,14 @@ tcp_reordering - INTEGER ...@@ -509,12 +575,14 @@ tcp_reordering - INTEGER
Initial reordering level of packets in a TCP stream. Initial reordering level of packets in a TCP stream.
TCP stack can then dynamically adjust flow reordering level TCP stack can then dynamically adjust flow reordering level
between this initial value and tcp_max_reordering between this initial value and tcp_max_reordering
Default: 3 Default: 3
tcp_max_reordering - INTEGER tcp_max_reordering - INTEGER
Maximal reordering level of packets in a TCP stream. Maximal reordering level of packets in a TCP stream.
300 is a fairly conservative value, but you might increase it 300 is a fairly conservative value, but you might increase it
if paths are using per packet load balancing (like bonding rr mode) if paths are using per packet load balancing (like bonding rr mode)
Default: 300 Default: 300
tcp_retrans_collapse - BOOLEAN tcp_retrans_collapse - BOOLEAN
...@@ -550,12 +618,14 @@ tcp_rfc1337 - BOOLEAN ...@@ -550,12 +618,14 @@ tcp_rfc1337 - BOOLEAN
If set, the TCP stack behaves conforming to RFC1337. If unset, If set, the TCP stack behaves conforming to RFC1337. If unset,
we are not conforming to RFC, but prevent TCP TIME_WAIT we are not conforming to RFC, but prevent TCP TIME_WAIT
assassination. assassination.
Default: 0 Default: 0
tcp_rmem - vector of 3 INTEGERs: min, default, max tcp_rmem - vector of 3 INTEGERs: min, default, max
min: Minimal size of receive buffer used by TCP sockets. min: Minimal size of receive buffer used by TCP sockets.
It is guaranteed to each TCP socket, even under moderate memory It is guaranteed to each TCP socket, even under moderate memory
pressure. pressure.
Default: 4K Default: 4K
default: initial size of receive buffer used by TCP sockets. default: initial size of receive buffer used by TCP sockets.
...@@ -592,12 +662,14 @@ tcp_slow_start_after_idle - BOOLEAN ...@@ -592,12 +662,14 @@ tcp_slow_start_after_idle - BOOLEAN
window after an idle period. An idle period is defined at window after an idle period. An idle period is defined at
the current RTO. If unset, the congestion window will not the current RTO. If unset, the congestion window will not
be timed out after an idle period. be timed out after an idle period.
Default: 1 Default: 1
tcp_stdurg - BOOLEAN tcp_stdurg - BOOLEAN
Use the Host requirements interpretation of the TCP urgent pointer field. Use the Host requirements interpretation of the TCP urgent pointer field.
Most hosts use the older BSD interpretation, so if you turn this on Most hosts use the older BSD interpretation, so if you turn this on
Linux might not communicate correctly with them. Linux might not communicate correctly with them.
Default: FALSE Default: FALSE
tcp_synack_retries - INTEGER tcp_synack_retries - INTEGER
...@@ -646,15 +718,18 @@ tcp_fastopen - INTEGER ...@@ -646,15 +718,18 @@ tcp_fastopen - INTEGER
the option value being the length of the syn-data backlog. the option value being the length of the syn-data backlog.
The values (bitmap) are The values (bitmap) are
0x1: (client) enables sending data in the opening SYN on the client.
0x2: (server) enables the server support, i.e., allowing data in ===== ======== ======================================================
0x1 (client) enables sending data in the opening SYN on the client.
0x2 (server) enables the server support, i.e., allowing data in
a SYN packet to be accepted and passed to the a SYN packet to be accepted and passed to the
application before 3-way handshake finishes. application before 3-way handshake finishes.
0x4: (client) send data in the opening SYN regardless of cookie 0x4 (client) send data in the opening SYN regardless of cookie
availability and without a cookie option. availability and without a cookie option.
0x200: (server) accept data-in-SYN w/o any cookie option present. 0x200 (server) accept data-in-SYN w/o any cookie option present.
0x400: (server) enable all listeners to support Fast Open by 0x400 (server) enable all listeners to support Fast Open by
default without explicit TCP_FASTOPEN socket option. default without explicit TCP_FASTOPEN socket option.
===== ======== ======================================================
Default: 0x1 Default: 0x1
...@@ -668,6 +743,7 @@ tcp_fastopen_blackhole_timeout_sec - INTEGER ...@@ -668,6 +743,7 @@ tcp_fastopen_blackhole_timeout_sec - INTEGER
get detected right after Fastopen is re-enabled and will reset to get detected right after Fastopen is re-enabled and will reset to
initial value when the blackhole issue goes away. initial value when the blackhole issue goes away.
0 to disable the blackhole detection. 0 to disable the blackhole detection.
By default, it is set to 1hr. By default, it is set to 1hr.
tcp_fastopen_key - list of comma separated 32-digit hexadecimal INTEGERs tcp_fastopen_key - list of comma separated 32-digit hexadecimal INTEGERs
...@@ -698,20 +774,24 @@ tcp_syn_retries - INTEGER ...@@ -698,20 +774,24 @@ tcp_syn_retries - INTEGER
for an active TCP connection attempt will happen after 127seconds. for an active TCP connection attempt will happen after 127seconds.
tcp_timestamps - INTEGER tcp_timestamps - INTEGER
Enable timestamps as defined in RFC1323. Enable timestamps as defined in RFC1323.
0: Disabled.
1: Enable timestamps as defined in RFC1323 and use random offset for - 0: Disabled.
each connection rather than only using the current time. - 1: Enable timestamps as defined in RFC1323 and use random offset for
2: Like 1, but without random offsets. each connection rather than only using the current time.
- 2: Like 1, but without random offsets.
Default: 1 Default: 1
tcp_min_tso_segs - INTEGER tcp_min_tso_segs - INTEGER
Minimal number of segments per TSO frame. Minimal number of segments per TSO frame.
Since linux-3.12, TCP does an automatic sizing of TSO frames, Since linux-3.12, TCP does an automatic sizing of TSO frames,
depending on flow rate, instead of filling 64Kbytes packets. depending on flow rate, instead of filling 64Kbytes packets.
For specific usages, it's possible to force TCP to build big For specific usages, it's possible to force TCP to build big
TSO frames. Note that TCP stack might split too big TSO packets TSO frames. Note that TCP stack might split too big TSO packets
if available window is too small. if available window is too small.
Default: 2 Default: 2
tcp_pacing_ss_ratio - INTEGER tcp_pacing_ss_ratio - INTEGER
...@@ -720,6 +800,7 @@ tcp_pacing_ss_ratio - INTEGER ...@@ -720,6 +800,7 @@ tcp_pacing_ss_ratio - INTEGER
If TCP is in slow start, tcp_pacing_ss_ratio is applied If TCP is in slow start, tcp_pacing_ss_ratio is applied
to let TCP probe for bigger speeds, assuming cwnd can be to let TCP probe for bigger speeds, assuming cwnd can be
doubled every other RTT. doubled every other RTT.
Default: 200 Default: 200
tcp_pacing_ca_ratio - INTEGER tcp_pacing_ca_ratio - INTEGER
...@@ -727,6 +808,7 @@ tcp_pacing_ca_ratio - INTEGER ...@@ -727,6 +808,7 @@ tcp_pacing_ca_ratio - INTEGER
to current rate. (current_rate = cwnd * mss / srtt) to current rate. (current_rate = cwnd * mss / srtt)
If TCP is in congestion avoidance phase, tcp_pacing_ca_ratio If TCP is in congestion avoidance phase, tcp_pacing_ca_ratio
is applied to conservatively probe for bigger throughput. is applied to conservatively probe for bigger throughput.
Default: 120 Default: 120
tcp_tso_win_divisor - INTEGER tcp_tso_win_divisor - INTEGER
...@@ -734,16 +816,20 @@ tcp_tso_win_divisor - INTEGER ...@@ -734,16 +816,20 @@ tcp_tso_win_divisor - INTEGER
can be consumed by a single TSO frame. can be consumed by a single TSO frame.
The setting of this parameter is a choice between burstiness and The setting of this parameter is a choice between burstiness and
building larger TSO frames. building larger TSO frames.
Default: 3 Default: 3
tcp_tw_reuse - INTEGER tcp_tw_reuse - INTEGER
Enable reuse of TIME-WAIT sockets for new connections when it is Enable reuse of TIME-WAIT sockets for new connections when it is
safe from protocol viewpoint. safe from protocol viewpoint.
0 - disable
1 - global enable - 0 - disable
2 - enable for loopback traffic only - 1 - global enable
- 2 - enable for loopback traffic only
It should not be changed without advice/request of technical It should not be changed without advice/request of technical
experts. experts.
Default: 2 Default: 2
tcp_window_scaling - BOOLEAN tcp_window_scaling - BOOLEAN
...@@ -752,11 +838,14 @@ tcp_window_scaling - BOOLEAN ...@@ -752,11 +838,14 @@ tcp_window_scaling - BOOLEAN
tcp_wmem - vector of 3 INTEGERs: min, default, max tcp_wmem - vector of 3 INTEGERs: min, default, max
min: Amount of memory reserved for send buffers for TCP sockets. min: Amount of memory reserved for send buffers for TCP sockets.
Each TCP socket has rights to use it due to fact of its birth. Each TCP socket has rights to use it due to fact of its birth.
Default: 4K Default: 4K
default: initial size of send buffer used by TCP sockets. This default: initial size of send buffer used by TCP sockets. This
value overrides net.core.wmem_default used by other protocols. value overrides net.core.wmem_default used by other protocols.
It is usually lower than net.core.wmem_default. It is usually lower than net.core.wmem_default.
Default: 16K Default: 16K
max: Maximal amount of memory allowed for automatically tuned max: Maximal amount of memory allowed for automatically tuned
...@@ -764,6 +853,7 @@ tcp_wmem - vector of 3 INTEGERs: min, default, max ...@@ -764,6 +853,7 @@ tcp_wmem - vector of 3 INTEGERs: min, default, max
net.core.wmem_max. Calling setsockopt() with SO_SNDBUF disables net.core.wmem_max. Calling setsockopt() with SO_SNDBUF disables
automatic tuning of that socket's send buffer size, in which case automatic tuning of that socket's send buffer size, in which case
this value is ignored. this value is ignored.
Default: between 64K and 4MB, depending on RAM size. Default: between 64K and 4MB, depending on RAM size.
tcp_notsent_lowat - UNSIGNED INTEGER tcp_notsent_lowat - UNSIGNED INTEGER
...@@ -784,6 +874,7 @@ tcp_workaround_signed_windows - BOOLEAN ...@@ -784,6 +874,7 @@ tcp_workaround_signed_windows - BOOLEAN
remote TCP is broken and treats the window as a signed quantity. remote TCP is broken and treats the window as a signed quantity.
If unset, assume the remote TCP is not broken even if we do If unset, assume the remote TCP is not broken even if we do
not receive a window scaling option from them. not receive a window scaling option from them.
Default: 0 Default: 0
tcp_thin_linear_timeouts - BOOLEAN tcp_thin_linear_timeouts - BOOLEAN
...@@ -796,6 +887,7 @@ tcp_thin_linear_timeouts - BOOLEAN ...@@ -796,6 +887,7 @@ tcp_thin_linear_timeouts - BOOLEAN
non-aggressive thin streams, often found to be time-dependent. non-aggressive thin streams, often found to be time-dependent.
For more information on thin streams, see For more information on thin streams, see
Documentation/networking/tcp-thin.txt Documentation/networking/tcp-thin.txt
Default: 0 Default: 0
tcp_limit_output_bytes - INTEGER tcp_limit_output_bytes - INTEGER
...@@ -807,6 +899,7 @@ tcp_limit_output_bytes - INTEGER ...@@ -807,6 +899,7 @@ tcp_limit_output_bytes - INTEGER
flows, for typical pfifo_fast qdiscs. tcp_limit_output_bytes flows, for typical pfifo_fast qdiscs. tcp_limit_output_bytes
limits the number of bytes on qdisc or device to reduce artificial limits the number of bytes on qdisc or device to reduce artificial
RTT/cwnd and reduce bufferbloat. RTT/cwnd and reduce bufferbloat.
Default: 1048576 (16 * 65536) Default: 1048576 (16 * 65536)
tcp_challenge_ack_limit - INTEGER tcp_challenge_ack_limit - INTEGER
...@@ -822,7 +915,8 @@ tcp_rx_skb_cache - BOOLEAN ...@@ -822,7 +915,8 @@ tcp_rx_skb_cache - BOOLEAN
Default: 0 (disabled) Default: 0 (disabled)
UDP variables: UDP variables
=============
udp_l3mdev_accept - BOOLEAN udp_l3mdev_accept - BOOLEAN
Enabling this option allows a "global" bound socket to work Enabling this option allows a "global" bound socket to work
...@@ -830,7 +924,8 @@ udp_l3mdev_accept - BOOLEAN ...@@ -830,7 +924,8 @@ udp_l3mdev_accept - BOOLEAN
being received regardless of the L3 domain in which they being received regardless of the L3 domain in which they
originated. Only valid when the kernel was compiled with originated. Only valid when the kernel was compiled with
CONFIG_NET_L3_MASTER_DEV. CONFIG_NET_L3_MASTER_DEV.
Default: 0 (disabled)
Default: 0 (disabled)
udp_mem - vector of 3 INTEGERs: min, pressure, max udp_mem - vector of 3 INTEGERs: min, pressure, max
Number of pages allowed for queueing by all UDP sockets. Number of pages allowed for queueing by all UDP sockets.
...@@ -849,15 +944,18 @@ udp_rmem_min - INTEGER ...@@ -849,15 +944,18 @@ udp_rmem_min - INTEGER
Minimal size of receive buffer used by UDP sockets in moderation. Minimal size of receive buffer used by UDP sockets in moderation.
Each UDP socket is able to use the size for receiving data, even if Each UDP socket is able to use the size for receiving data, even if
total pages of UDP sockets exceed udp_mem pressure. The unit is byte. total pages of UDP sockets exceed udp_mem pressure. The unit is byte.
Default: 4K Default: 4K
udp_wmem_min - INTEGER udp_wmem_min - INTEGER
Minimal size of send buffer used by UDP sockets in moderation. Minimal size of send buffer used by UDP sockets in moderation.
Each UDP socket is able to use the size for sending data, even if Each UDP socket is able to use the size for sending data, even if
total pages of UDP sockets exceed udp_mem pressure. The unit is byte. total pages of UDP sockets exceed udp_mem pressure. The unit is byte.
Default: 4K Default: 4K
RAW variables: RAW variables
=============
raw_l3mdev_accept - BOOLEAN raw_l3mdev_accept - BOOLEAN
Enabling this option allows a "global" bound socket to work Enabling this option allows a "global" bound socket to work
...@@ -865,9 +963,11 @@ raw_l3mdev_accept - BOOLEAN ...@@ -865,9 +963,11 @@ raw_l3mdev_accept - BOOLEAN
being received regardless of the L3 domain in which they being received regardless of the L3 domain in which they
originated. Only valid when the kernel was compiled with originated. Only valid when the kernel was compiled with
CONFIG_NET_L3_MASTER_DEV. CONFIG_NET_L3_MASTER_DEV.
Default: 1 (enabled) Default: 1 (enabled)
CIPSOv4 Variables: CIPSOv4 Variables
=================
cipso_cache_enable - BOOLEAN cipso_cache_enable - BOOLEAN
If set, enable additions to and lookups from the CIPSO label mapping If set, enable additions to and lookups from the CIPSO label mapping
...@@ -875,6 +975,7 @@ cipso_cache_enable - BOOLEAN ...@@ -875,6 +975,7 @@ cipso_cache_enable - BOOLEAN
miss. However, regardless of the setting the cache is still miss. However, regardless of the setting the cache is still
invalidated when required when means you can safely toggle this on and invalidated when required when means you can safely toggle this on and
off and the cache will always be "safe". off and the cache will always be "safe".
Default: 1 Default: 1
cipso_cache_bucket_size - INTEGER cipso_cache_bucket_size - INTEGER
...@@ -884,6 +985,7 @@ cipso_cache_bucket_size - INTEGER ...@@ -884,6 +985,7 @@ cipso_cache_bucket_size - INTEGER
more CIPSO label mappings that can be cached. When the number of more CIPSO label mappings that can be cached. When the number of
entries in a given hash bucket reaches this limit adding new entries entries in a given hash bucket reaches this limit adding new entries
causes the oldest entry in the bucket to be removed to make room. causes the oldest entry in the bucket to be removed to make room.
Default: 10 Default: 10
cipso_rbm_optfmt - BOOLEAN cipso_rbm_optfmt - BOOLEAN
...@@ -891,6 +993,7 @@ cipso_rbm_optfmt - BOOLEAN ...@@ -891,6 +993,7 @@ cipso_rbm_optfmt - BOOLEAN
the CIPSO draft specification (see Documentation/netlabel for details). the CIPSO draft specification (see Documentation/netlabel for details).
This means that when set the CIPSO tag will be padded with empty This means that when set the CIPSO tag will be padded with empty
categories in order to make the packet data 32-bit aligned. categories in order to make the packet data 32-bit aligned.
Default: 0 Default: 0
cipso_rbm_structvalid - BOOLEAN cipso_rbm_structvalid - BOOLEAN
...@@ -900,9 +1003,11 @@ cipso_rbm_structvalid - BOOLEAN ...@@ -900,9 +1003,11 @@ cipso_rbm_structvalid - BOOLEAN
where in the CIPSO processing code but setting this to 0 (False) should where in the CIPSO processing code but setting this to 0 (False) should
result in less work (i.e. it should be faster) but could cause problems result in less work (i.e. it should be faster) but could cause problems
with other implementations that require strict checking. with other implementations that require strict checking.
Default: 0 Default: 0
IP Variables: IP Variables
============
ip_local_port_range - 2 INTEGERS ip_local_port_range - 2 INTEGERS
Defines the local port range that is used by TCP and UDP to Defines the local port range that is used by TCP and UDP to
...@@ -931,12 +1036,12 @@ ip_local_reserved_ports - list of comma separated ranges ...@@ -931,12 +1036,12 @@ ip_local_reserved_ports - list of comma separated ranges
assignments. assignments.
You can reserve ports which are not in the current You can reserve ports which are not in the current
ip_local_port_range, e.g.: ip_local_port_range, e.g.::
$ cat /proc/sys/net/ipv4/ip_local_port_range $ cat /proc/sys/net/ipv4/ip_local_port_range
32000 60999 32000 60999
$ cat /proc/sys/net/ipv4/ip_local_reserved_ports $ cat /proc/sys/net/ipv4/ip_local_reserved_ports
8080,9148 8080,9148
although this is redundant. However such a setting is useful although this is redundant. However such a setting is useful
if later the port range is changed to a value that will if later the port range is changed to a value that will
...@@ -956,6 +1061,7 @@ ip_unprivileged_port_start - INTEGER ...@@ -956,6 +1061,7 @@ ip_unprivileged_port_start - INTEGER
ip_nonlocal_bind - BOOLEAN ip_nonlocal_bind - BOOLEAN
If set, allows processes to bind() to non-local IP addresses, If set, allows processes to bind() to non-local IP addresses,
which can be quite useful - but may break some applications. which can be quite useful - but may break some applications.
Default: 0 Default: 0
ip_autobind_reuse - BOOLEAN ip_autobind_reuse - BOOLEAN
...@@ -972,6 +1078,7 @@ ip_dynaddr - BOOLEAN ...@@ -972,6 +1078,7 @@ ip_dynaddr - BOOLEAN
If set to a non-zero value larger than 1, a kernel log If set to a non-zero value larger than 1, a kernel log
message will be printed when dynamic address rewriting message will be printed when dynamic address rewriting
occurs. occurs.
Default: 0 Default: 0
ip_early_demux - BOOLEAN ip_early_demux - BOOLEAN
...@@ -981,6 +1088,7 @@ ip_early_demux - BOOLEAN ...@@ -981,6 +1088,7 @@ ip_early_demux - BOOLEAN
It may add an additional cost for pure routing workloads that It may add an additional cost for pure routing workloads that
reduces overall throughput, in such case you should disable it. reduces overall throughput, in such case you should disable it.
Default: 1 Default: 1
ping_group_range - 2 INTEGERS ping_group_range - 2 INTEGERS
...@@ -992,21 +1100,25 @@ ping_group_range - 2 INTEGERS ...@@ -992,21 +1100,25 @@ ping_group_range - 2 INTEGERS
tcp_early_demux - BOOLEAN tcp_early_demux - BOOLEAN
Enable early demux for established TCP sockets. Enable early demux for established TCP sockets.
Default: 1 Default: 1
udp_early_demux - BOOLEAN udp_early_demux - BOOLEAN
Enable early demux for connected UDP sockets. Disable this if Enable early demux for connected UDP sockets. Disable this if
your system could experience more unconnected load. your system could experience more unconnected load.
Default: 1 Default: 1
icmp_echo_ignore_all - BOOLEAN icmp_echo_ignore_all - BOOLEAN
If set non-zero, then the kernel will ignore all ICMP ECHO If set non-zero, then the kernel will ignore all ICMP ECHO
requests sent to it. requests sent to it.
Default: 0 Default: 0
icmp_echo_ignore_broadcasts - BOOLEAN icmp_echo_ignore_broadcasts - BOOLEAN
If set non-zero, then the kernel will ignore all ICMP ECHO and If set non-zero, then the kernel will ignore all ICMP ECHO and
TIMESTAMP requests sent to it via broadcast/multicast. TIMESTAMP requests sent to it via broadcast/multicast.
Default: 1 Default: 1
icmp_ratelimit - INTEGER icmp_ratelimit - INTEGER
...@@ -1016,46 +1128,55 @@ icmp_ratelimit - INTEGER ...@@ -1016,46 +1128,55 @@ icmp_ratelimit - INTEGER
otherwise the minimal space between responses in milliseconds. otherwise the minimal space between responses in milliseconds.
Note that another sysctl, icmp_msgs_per_sec limits the number Note that another sysctl, icmp_msgs_per_sec limits the number
of ICMP packets sent on all targets. of ICMP packets sent on all targets.
Default: 1000 Default: 1000
icmp_msgs_per_sec - INTEGER icmp_msgs_per_sec - INTEGER
Limit maximal number of ICMP packets sent per second from this host. Limit maximal number of ICMP packets sent per second from this host.
Only messages whose type matches icmp_ratemask (see below) are Only messages whose type matches icmp_ratemask (see below) are
controlled by this limit. controlled by this limit.
Default: 1000 Default: 1000
icmp_msgs_burst - INTEGER icmp_msgs_burst - INTEGER
icmp_msgs_per_sec controls number of ICMP packets sent per second, icmp_msgs_per_sec controls number of ICMP packets sent per second,
while icmp_msgs_burst controls the burst size of these packets. while icmp_msgs_burst controls the burst size of these packets.
Default: 50 Default: 50
icmp_ratemask - INTEGER icmp_ratemask - INTEGER
Mask made of ICMP types for which rates are being limited. Mask made of ICMP types for which rates are being limited.
Significant bits: IHGFEDCBA9876543210 Significant bits: IHGFEDCBA9876543210
Default mask: 0000001100000011000 (6168) Default mask: 0000001100000011000 (6168)
Bit definitions (see include/linux/icmp.h): Bit definitions (see include/linux/icmp.h):
= =========================
0 Echo Reply 0 Echo Reply
3 Destination Unreachable * 3 Destination Unreachable [1]_
4 Source Quench * 4 Source Quench [1]_
5 Redirect 5 Redirect
8 Echo Request 8 Echo Request
B Time Exceeded * B Time Exceeded [1]_
C Parameter Problem * C Parameter Problem [1]_
D Timestamp Request D Timestamp Request
E Timestamp Reply E Timestamp Reply
F Info Request F Info Request
G Info Reply G Info Reply
H Address Mask Request H Address Mask Request
I Address Mask Reply I Address Mask Reply
= =========================
* These are rate limited by default (see default mask above) .. [1] These are rate limited by default (see default mask above)
icmp_ignore_bogus_error_responses - BOOLEAN icmp_ignore_bogus_error_responses - BOOLEAN
Some routers violate RFC1122 by sending bogus responses to broadcast Some routers violate RFC1122 by sending bogus responses to broadcast
frames. Such violations are normally logged via a kernel warning. frames. Such violations are normally logged via a kernel warning.
If this is set to TRUE, the kernel will not give such warnings, which If this is set to TRUE, the kernel will not give such warnings, which
will avoid log file clutter. will avoid log file clutter.
Default: 1 Default: 1
icmp_errors_use_inbound_ifaddr - BOOLEAN icmp_errors_use_inbound_ifaddr - BOOLEAN
...@@ -1100,32 +1221,39 @@ igmp_max_memberships - INTEGER ...@@ -1100,32 +1221,39 @@ igmp_max_memberships - INTEGER
igmp_max_msf - INTEGER igmp_max_msf - INTEGER
Maximum number of addresses allowed in the source filter list for a Maximum number of addresses allowed in the source filter list for a
multicast group. multicast group.
Default: 10 Default: 10
igmp_qrv - INTEGER igmp_qrv - INTEGER
Controls the IGMP query robustness variable (see RFC2236 8.1). Controls the IGMP query robustness variable (see RFC2236 8.1).
Default: 2 (as specified by RFC2236 8.1) Default: 2 (as specified by RFC2236 8.1)
Minimum: 1 (as specified by RFC6636 4.5) Minimum: 1 (as specified by RFC6636 4.5)
force_igmp_version - INTEGER force_igmp_version - INTEGER
0 - (default) No enforcement of a IGMP version, IGMPv1/v2 fallback - 0 - (default) No enforcement of a IGMP version, IGMPv1/v2 fallback
allowed. Will back to IGMPv3 mode again if all IGMPv1/v2 Querier allowed. Will back to IGMPv3 mode again if all IGMPv1/v2 Querier
Present timer expires. Present timer expires.
1 - Enforce to use IGMP version 1. Will also reply IGMPv1 report if - 1 - Enforce to use IGMP version 1. Will also reply IGMPv1 report if
receive IGMPv2/v3 query. receive IGMPv2/v3 query.
2 - Enforce to use IGMP version 2. Will fallback to IGMPv1 if receive - 2 - Enforce to use IGMP version 2. Will fallback to IGMPv1 if receive
IGMPv1 query message. Will reply report if receive IGMPv3 query. IGMPv1 query message. Will reply report if receive IGMPv3 query.
3 - Enforce to use IGMP version 3. The same react with default 0. - 3 - Enforce to use IGMP version 3. The same react with default 0.
.. note::
Note: this is not the same with force_mld_version because IGMPv3 RFC3376 this is not the same with force_mld_version because IGMPv3 RFC3376
Security Considerations does not have clear description that we could Security Considerations does not have clear description that we could
ignore other version messages completely as MLDv2 RFC3810. So make ignore other version messages completely as MLDv2 RFC3810. So make
this value as default 0 is recommended. this value as default 0 is recommended.
conf/interface/* changes special settings per interface (where ``conf/interface/*``
"interface" is the name of your network interface) changes special settings per interface (where
interface" is the name of your network interface)
conf/all/* is special, changes the settings for all interfaces ``conf/all/*``
is special, changes the settings for all interfaces
log_martians - BOOLEAN log_martians - BOOLEAN
Log packets with impossible addresses to kernel log. Log packets with impossible addresses to kernel log.
...@@ -1136,14 +1264,21 @@ log_martians - BOOLEAN ...@@ -1136,14 +1264,21 @@ log_martians - BOOLEAN
accept_redirects - BOOLEAN accept_redirects - BOOLEAN
Accept ICMP redirect messages. Accept ICMP redirect messages.
accept_redirects for the interface will be enabled if: accept_redirects for the interface will be enabled if:
- both conf/{all,interface}/accept_redirects are TRUE in the case - both conf/{all,interface}/accept_redirects are TRUE in the case
forwarding for the interface is enabled forwarding for the interface is enabled
or or
- at least one of conf/{all,interface}/accept_redirects is TRUE in the - at least one of conf/{all,interface}/accept_redirects is TRUE in the
case forwarding for the interface is disabled case forwarding for the interface is disabled
accept_redirects for the interface will be disabled otherwise accept_redirects for the interface will be disabled otherwise
default TRUE (host)
FALSE (router) default:
- TRUE (host)
- FALSE (router)
forwarding - BOOLEAN forwarding - BOOLEAN
Enable IP forwarding on this interface. This controls whether packets Enable IP forwarding on this interface. This controls whether packets
...@@ -1168,12 +1303,14 @@ medium_id - INTEGER ...@@ -1168,12 +1303,14 @@ medium_id - INTEGER
proxy_arp - BOOLEAN proxy_arp - BOOLEAN
Do proxy arp. Do proxy arp.
proxy_arp for the interface will be enabled if at least one of proxy_arp for the interface will be enabled if at least one of
conf/{all,interface}/proxy_arp is set to TRUE, conf/{all,interface}/proxy_arp is set to TRUE,
it will be disabled otherwise it will be disabled otherwise
proxy_arp_pvlan - BOOLEAN proxy_arp_pvlan - BOOLEAN
Private VLAN proxy arp. Private VLAN proxy arp.
Basically allow proxy arp replies back to the same interface Basically allow proxy arp replies back to the same interface
(from which the ARP request/solicitation was received). (from which the ARP request/solicitation was received).
...@@ -1186,6 +1323,7 @@ proxy_arp_pvlan - BOOLEAN ...@@ -1186,6 +1323,7 @@ proxy_arp_pvlan - BOOLEAN
proxy_arp. proxy_arp.
This technology is known by different names: This technology is known by different names:
In RFC 3069 it is called VLAN Aggregation. In RFC 3069 it is called VLAN Aggregation.
Cisco and Allied Telesyn call it Private VLAN. Cisco and Allied Telesyn call it Private VLAN.
Hewlett-Packard call it Source-Port filtering or port-isolation. Hewlett-Packard call it Source-Port filtering or port-isolation.
...@@ -1194,26 +1332,33 @@ proxy_arp_pvlan - BOOLEAN ...@@ -1194,26 +1332,33 @@ proxy_arp_pvlan - BOOLEAN
shared_media - BOOLEAN shared_media - BOOLEAN
Send(router) or accept(host) RFC1620 shared media redirects. Send(router) or accept(host) RFC1620 shared media redirects.
Overrides secure_redirects. Overrides secure_redirects.
shared_media for the interface will be enabled if at least one of shared_media for the interface will be enabled if at least one of
conf/{all,interface}/shared_media is set to TRUE, conf/{all,interface}/shared_media is set to TRUE,
it will be disabled otherwise it will be disabled otherwise
default TRUE default TRUE
secure_redirects - BOOLEAN secure_redirects - BOOLEAN
Accept ICMP redirect messages only to gateways listed in the Accept ICMP redirect messages only to gateways listed in the
interface's current gateway list. Even if disabled, RFC1122 redirect interface's current gateway list. Even if disabled, RFC1122 redirect
rules still apply. rules still apply.
Overridden by shared_media. Overridden by shared_media.
secure_redirects for the interface will be enabled if at least one of secure_redirects for the interface will be enabled if at least one of
conf/{all,interface}/secure_redirects is set to TRUE, conf/{all,interface}/secure_redirects is set to TRUE,
it will be disabled otherwise it will be disabled otherwise
default TRUE default TRUE
send_redirects - BOOLEAN send_redirects - BOOLEAN
Send redirects, if router. Send redirects, if router.
send_redirects for the interface will be enabled if at least one of send_redirects for the interface will be enabled if at least one of
conf/{all,interface}/send_redirects is set to TRUE, conf/{all,interface}/send_redirects is set to TRUE,
it will be disabled otherwise it will be disabled otherwise
Default: TRUE Default: TRUE
bootp_relay - BOOLEAN bootp_relay - BOOLEAN
...@@ -1222,15 +1367,20 @@ bootp_relay - BOOLEAN ...@@ -1222,15 +1367,20 @@ bootp_relay - BOOLEAN
BOOTP relay daemon will catch and forward such packets. BOOTP relay daemon will catch and forward such packets.
conf/all/bootp_relay must also be set to TRUE to enable BOOTP relay conf/all/bootp_relay must also be set to TRUE to enable BOOTP relay
for the interface for the interface
default FALSE default FALSE
Not Implemented Yet. Not Implemented Yet.
accept_source_route - BOOLEAN accept_source_route - BOOLEAN
Accept packets with SRR option. Accept packets with SRR option.
conf/all/accept_source_route must also be set to TRUE to accept packets conf/all/accept_source_route must also be set to TRUE to accept packets
with SRR option on the interface with SRR option on the interface
default TRUE (router)
FALSE (host) default
- TRUE (router)
- FALSE (host)
accept_local - BOOLEAN accept_local - BOOLEAN
Accept packets with local source addresses. In combination with Accept packets with local source addresses. In combination with
...@@ -1241,18 +1391,19 @@ accept_local - BOOLEAN ...@@ -1241,18 +1391,19 @@ accept_local - BOOLEAN
route_localnet - BOOLEAN route_localnet - BOOLEAN
Do not consider loopback addresses as martian source or destination Do not consider loopback addresses as martian source or destination
while routing. This enables the use of 127/8 for local routing purposes. while routing. This enables the use of 127/8 for local routing purposes.
default FALSE default FALSE
rp_filter - INTEGER rp_filter - INTEGER
0 - No source validation. - 0 - No source validation.
1 - Strict mode as defined in RFC3704 Strict Reverse Path - 1 - Strict mode as defined in RFC3704 Strict Reverse Path
Each incoming packet is tested against the FIB and if the interface Each incoming packet is tested against the FIB and if the interface
is not the best reverse path the packet check will fail. is not the best reverse path the packet check will fail.
By default failed packets are discarded. By default failed packets are discarded.
2 - Loose mode as defined in RFC3704 Loose Reverse Path - 2 - Loose mode as defined in RFC3704 Loose Reverse Path
Each incoming packet's source address is also tested against the FIB Each incoming packet's source address is also tested against the FIB
and if the source address is not reachable via any interface and if the source address is not reachable via any interface
the packet check will fail. the packet check will fail.
Current recommended practice in RFC3704 is to enable strict mode Current recommended practice in RFC3704 is to enable strict mode
to prevent IP spoofing from DDos attacks. If using asymmetric routing to prevent IP spoofing from DDos attacks. If using asymmetric routing
...@@ -1265,19 +1416,19 @@ rp_filter - INTEGER ...@@ -1265,19 +1416,19 @@ rp_filter - INTEGER
in startup scripts. in startup scripts.
arp_filter - BOOLEAN arp_filter - BOOLEAN
1 - Allows you to have multiple network interfaces on the same - 1 - Allows you to have multiple network interfaces on the same
subnet, and have the ARPs for each interface be answered subnet, and have the ARPs for each interface be answered
based on whether or not the kernel would route a packet from based on whether or not the kernel would route a packet from
the ARP'd IP out that interface (therefore you must use source the ARP'd IP out that interface (therefore you must use source
based routing for this to work). In other words it allows control based routing for this to work). In other words it allows control
of which cards (usually 1) will respond to an arp request. of which cards (usually 1) will respond to an arp request.
0 - (default) The kernel can respond to arp requests with addresses - 0 - (default) The kernel can respond to arp requests with addresses
from other interfaces. This may seem wrong but it usually makes from other interfaces. This may seem wrong but it usually makes
sense, because it increases the chance of successful communication. sense, because it increases the chance of successful communication.
IP addresses are owned by the complete host on Linux, not by IP addresses are owned by the complete host on Linux, not by
particular interfaces. Only for more complex setups like load- particular interfaces. Only for more complex setups like load-
balancing, does this behaviour cause problems. balancing, does this behaviour cause problems.
arp_filter for the interface will be enabled if at least one of arp_filter for the interface will be enabled if at least one of
conf/{all,interface}/arp_filter is set to TRUE, conf/{all,interface}/arp_filter is set to TRUE,
...@@ -1287,26 +1438,27 @@ arp_announce - INTEGER ...@@ -1287,26 +1438,27 @@ arp_announce - INTEGER
Define different restriction levels for announcing the local Define different restriction levels for announcing the local
source IP address from IP packets in ARP requests sent on source IP address from IP packets in ARP requests sent on
interface: interface:
0 - (default) Use any local address, configured on any interface
1 - Try to avoid local addresses that are not in the target's - 0 - (default) Use any local address, configured on any interface
subnet for this interface. This mode is useful when target - 1 - Try to avoid local addresses that are not in the target's
hosts reachable via this interface require the source IP subnet for this interface. This mode is useful when target
address in ARP requests to be part of their logical network hosts reachable via this interface require the source IP
configured on the receiving interface. When we generate the address in ARP requests to be part of their logical network
request we will check all our subnets that include the configured on the receiving interface. When we generate the
target IP and will preserve the source address if it is from request we will check all our subnets that include the
such subnet. If there is no such subnet we select source target IP and will preserve the source address if it is from
address according to the rules for level 2. such subnet. If there is no such subnet we select source
2 - Always use the best local address for this target. address according to the rules for level 2.
In this mode we ignore the source address in the IP packet - 2 - Always use the best local address for this target.
and try to select local address that we prefer for talks with In this mode we ignore the source address in the IP packet
the target host. Such local address is selected by looking and try to select local address that we prefer for talks with
for primary IP addresses on all our subnets on the outgoing the target host. Such local address is selected by looking
interface that include the target IP address. If no suitable for primary IP addresses on all our subnets on the outgoing
local address is found we select the first local address interface that include the target IP address. If no suitable
we have on the outgoing interface or on all other interfaces, local address is found we select the first local address
with the hope we will receive reply for our request and we have on the outgoing interface or on all other interfaces,
even sometimes no matter the source IP address we announce. with the hope we will receive reply for our request and
even sometimes no matter the source IP address we announce.
The max value from conf/{all,interface}/arp_announce is used. The max value from conf/{all,interface}/arp_announce is used.
...@@ -1317,32 +1469,37 @@ arp_announce - INTEGER ...@@ -1317,32 +1469,37 @@ arp_announce - INTEGER
arp_ignore - INTEGER arp_ignore - INTEGER
Define different modes for sending replies in response to Define different modes for sending replies in response to
received ARP requests that resolve local target IP addresses: received ARP requests that resolve local target IP addresses:
0 - (default): reply for any local target IP address, configured
on any interface - 0 - (default): reply for any local target IP address, configured
1 - reply only if the target IP address is local address on any interface
configured on the incoming interface - 1 - reply only if the target IP address is local address
2 - reply only if the target IP address is local address configured on the incoming interface
configured on the incoming interface and both with the - 2 - reply only if the target IP address is local address
sender's IP address are part from same subnet on this interface configured on the incoming interface and both with the
3 - do not reply for local addresses configured with scope host, sender's IP address are part from same subnet on this interface
only resolutions for global and link addresses are replied - 3 - do not reply for local addresses configured with scope host,
4-7 - reserved only resolutions for global and link addresses are replied
8 - do not reply for all local addresses - 4-7 - reserved
- 8 - do not reply for all local addresses
The max value from conf/{all,interface}/arp_ignore is used The max value from conf/{all,interface}/arp_ignore is used
when ARP request is received on the {interface} when ARP request is received on the {interface}
arp_notify - BOOLEAN arp_notify - BOOLEAN
Define mode for notification of address and device changes. Define mode for notification of address and device changes.
0 - (default): do nothing
1 - Generate gratuitous arp requests when device is brought up == ==========================================================
or hardware address changes. 0 (default): do nothing
1 Generate gratuitous arp requests when device is brought up
or hardware address changes.
== ==========================================================
arp_accept - BOOLEAN arp_accept - BOOLEAN
Define behavior for gratuitous ARP frames who's IP is not Define behavior for gratuitous ARP frames who's IP is not
already present in the ARP table: already present in the ARP table:
0 - don't create new entries in the ARP table
1 - create new entries in the ARP table - 0 - don't create new entries in the ARP table
- 1 - create new entries in the ARP table
Both replies and requests type gratuitous arp will trigger the Both replies and requests type gratuitous arp will trigger the
ARP table to be updated, if this setting is on. ARP table to be updated, if this setting is on.
...@@ -1378,11 +1535,13 @@ disable_xfrm - BOOLEAN ...@@ -1378,11 +1535,13 @@ disable_xfrm - BOOLEAN
igmpv2_unsolicited_report_interval - INTEGER igmpv2_unsolicited_report_interval - INTEGER
The interval in milliseconds in which the next unsolicited The interval in milliseconds in which the next unsolicited
IGMPv1 or IGMPv2 report retransmit will take place. IGMPv1 or IGMPv2 report retransmit will take place.
Default: 10000 (10 seconds) Default: 10000 (10 seconds)
igmpv3_unsolicited_report_interval - INTEGER igmpv3_unsolicited_report_interval - INTEGER
The interval in milliseconds in which the next unsolicited The interval in milliseconds in which the next unsolicited
IGMPv3 report retransmit will take place. IGMPv3 report retransmit will take place.
Default: 1000 (1 seconds) Default: 1000 (1 seconds)
promote_secondaries - BOOLEAN promote_secondaries - BOOLEAN
...@@ -1393,19 +1552,23 @@ promote_secondaries - BOOLEAN ...@@ -1393,19 +1552,23 @@ promote_secondaries - BOOLEAN
drop_unicast_in_l2_multicast - BOOLEAN drop_unicast_in_l2_multicast - BOOLEAN
Drop any unicast IP packets that are received in link-layer Drop any unicast IP packets that are received in link-layer
multicast (or broadcast) frames. multicast (or broadcast) frames.
This behavior (for multicast) is actually a SHOULD in RFC This behavior (for multicast) is actually a SHOULD in RFC
1122, but is disabled by default for compatibility reasons. 1122, but is disabled by default for compatibility reasons.
Default: off (0) Default: off (0)
drop_gratuitous_arp - BOOLEAN drop_gratuitous_arp - BOOLEAN
Drop all gratuitous ARP frames, for example if there's a known Drop all gratuitous ARP frames, for example if there's a known
good ARP proxy on the network and such frames need not be used good ARP proxy on the network and such frames need not be used
(or in the case of 802.11, must not be used to prevent attacks.) (or in the case of 802.11, must not be used to prevent attacks.)
Default: off (0) Default: off (0)
tag - INTEGER tag - INTEGER
Allows you to write a number, which can be used as required. Allows you to write a number, which can be used as required.
Default value is 0. Default value is 0.
xfrm4_gc_thresh - INTEGER xfrm4_gc_thresh - INTEGER
...@@ -1417,21 +1580,24 @@ xfrm4_gc_thresh - INTEGER ...@@ -1417,21 +1580,24 @@ xfrm4_gc_thresh - INTEGER
igmp_link_local_mcast_reports - BOOLEAN igmp_link_local_mcast_reports - BOOLEAN
Enable IGMP reports for link local multicast groups in the Enable IGMP reports for link local multicast groups in the
224.0.0.X range. 224.0.0.X range.
Default TRUE Default TRUE
Alexey Kuznetsov. Alexey Kuznetsov.
kuznet@ms2.inr.ac.ru kuznet@ms2.inr.ac.ru
Updated by: Updated by:
Andi Kleen
ak@muc.de
Nicolas Delon
delon.nicolas@wanadoo.fr
- Andi Kleen
ak@muc.de
- Nicolas Delon
delon.nicolas@wanadoo.fr
/proc/sys/net/ipv6/* Variables:
/proc/sys/net/ipv6/* Variables
==============================
IPv6 has no global variables such as tcp_*. tcp_* settings under ipv4/ also IPv6 has no global variables such as tcp_*. tcp_* settings under ipv4/ also
apply to IPv6 [XXX?]. apply to IPv6 [XXX?].
...@@ -1440,8 +1606,9 @@ bindv6only - BOOLEAN ...@@ -1440,8 +1606,9 @@ bindv6only - BOOLEAN
Default value for IPV6_V6ONLY socket option, Default value for IPV6_V6ONLY socket option,
which restricts use of the IPv6 socket to IPv6 communication which restricts use of the IPv6 socket to IPv6 communication
only. only.
TRUE: disable IPv4-mapped address feature
FALSE: enable IPv4-mapped address feature - TRUE: disable IPv4-mapped address feature
- FALSE: enable IPv4-mapped address feature
Default: FALSE (as specified in RFC3493) Default: FALSE (as specified in RFC3493)
...@@ -1449,8 +1616,10 @@ flowlabel_consistency - BOOLEAN ...@@ -1449,8 +1616,10 @@ flowlabel_consistency - BOOLEAN
Protect the consistency (and unicity) of flow label. Protect the consistency (and unicity) of flow label.
You have to disable it to use IPV6_FL_F_REFLECT flag on the You have to disable it to use IPV6_FL_F_REFLECT flag on the
flow label manager. flow label manager.
TRUE: enabled
FALSE: disabled - TRUE: enabled
- FALSE: disabled
Default: TRUE Default: TRUE
auto_flowlabels - INTEGER auto_flowlabels - INTEGER
...@@ -1458,22 +1627,28 @@ auto_flowlabels - INTEGER ...@@ -1458,22 +1627,28 @@ auto_flowlabels - INTEGER
packet. This allows intermediate devices, such as routers, to packet. This allows intermediate devices, such as routers, to
identify packet flows for mechanisms like Equal Cost Multipath identify packet flows for mechanisms like Equal Cost Multipath
Routing (see RFC 6438). Routing (see RFC 6438).
0: automatic flow labels are completely disabled
1: automatic flow labels are enabled by default, they can be = ===========================================================
0 automatic flow labels are completely disabled
1 automatic flow labels are enabled by default, they can be
disabled on a per socket basis using the IPV6_AUTOFLOWLABEL disabled on a per socket basis using the IPV6_AUTOFLOWLABEL
socket option socket option
2: automatic flow labels are allowed, they may be enabled on a 2 automatic flow labels are allowed, they may be enabled on a
per socket basis using the IPV6_AUTOFLOWLABEL socket option per socket basis using the IPV6_AUTOFLOWLABEL socket option
3: automatic flow labels are enabled and enforced, they cannot 3 automatic flow labels are enabled and enforced, they cannot
be disabled by the socket option be disabled by the socket option
= ===========================================================
Default: 1 Default: 1
flowlabel_state_ranges - BOOLEAN flowlabel_state_ranges - BOOLEAN
Split the flow label number space into two ranges. 0-0x7FFFF is Split the flow label number space into two ranges. 0-0x7FFFF is
reserved for the IPv6 flow manager facility, 0x80000-0xFFFFF reserved for the IPv6 flow manager facility, 0x80000-0xFFFFF
is reserved for stateless flow labels as described in RFC6437. is reserved for stateless flow labels as described in RFC6437.
TRUE: enabled
FALSE: disabled - TRUE: enabled
- FALSE: disabled
Default: true Default: true
flowlabel_reflect - INTEGER flowlabel_reflect - INTEGER
...@@ -1483,49 +1658,59 @@ flowlabel_reflect - INTEGER ...@@ -1483,49 +1658,59 @@ flowlabel_reflect - INTEGER
https://tools.ietf.org/html/draft-wang-6man-flow-label-reflection-01 https://tools.ietf.org/html/draft-wang-6man-flow-label-reflection-01
This is a bitmask. This is a bitmask.
1: enabled for established flows
Note that this prevents automatic flowlabel changes, as done - 1: enabled for established flows
in "tcp: change IPv6 flow-label upon receiving spurious retransmission"
and "tcp: Change txhash on every SYN and RTO retransmit" Note that this prevents automatic flowlabel changes, as done
in "tcp: change IPv6 flow-label upon receiving spurious retransmission"
and "tcp: Change txhash on every SYN and RTO retransmit"
2: enabled for TCP RESET packets (no active listener) - 2: enabled for TCP RESET packets (no active listener)
If set, a RST packet sent in response to a SYN packet on a closed If set, a RST packet sent in response to a SYN packet on a closed
port will reflect the incoming flow label. port will reflect the incoming flow label.
4: enabled for ICMPv6 echo reply messages. - 4: enabled for ICMPv6 echo reply messages.
Default: 0 Default: 0
fib_multipath_hash_policy - INTEGER fib_multipath_hash_policy - INTEGER
Controls which hash policy to use for multipath routes. Controls which hash policy to use for multipath routes.
Default: 0 (Layer 3) Default: 0 (Layer 3)
Possible values: Possible values:
0 - Layer 3 (source and destination addresses plus flow label)
1 - Layer 4 (standard 5-tuple) - 0 - Layer 3 (source and destination addresses plus flow label)
2 - Layer 3 or inner Layer 3 if present - 1 - Layer 4 (standard 5-tuple)
- 2 - Layer 3 or inner Layer 3 if present
anycast_src_echo_reply - BOOLEAN anycast_src_echo_reply - BOOLEAN
Controls the use of anycast addresses as source addresses for ICMPv6 Controls the use of anycast addresses as source addresses for ICMPv6
echo reply echo reply
TRUE: enabled
FALSE: disabled - TRUE: enabled
- FALSE: disabled
Default: FALSE Default: FALSE
idgen_delay - INTEGER idgen_delay - INTEGER
Controls the delay in seconds after which time to retry Controls the delay in seconds after which time to retry
privacy stable address generation if a DAD conflict is privacy stable address generation if a DAD conflict is
detected. detected.
Default: 1 (as specified in RFC7217) Default: 1 (as specified in RFC7217)
idgen_retries - INTEGER idgen_retries - INTEGER
Controls the number of retries to generate a stable privacy Controls the number of retries to generate a stable privacy
address if a DAD conflict is detected. address if a DAD conflict is detected.
Default: 3 (as specified in RFC7217) Default: 3 (as specified in RFC7217)
mld_qrv - INTEGER mld_qrv - INTEGER
Controls the MLD query robustness variable (see RFC3810 9.1). Controls the MLD query robustness variable (see RFC3810 9.1).
Default: 2 (as specified by RFC3810 9.1) Default: 2 (as specified by RFC3810 9.1)
Minimum: 1 (as specified by RFC6636 4.5) Minimum: 1 (as specified by RFC6636 4.5)
max_dst_opts_number - INTEGER max_dst_opts_number - INTEGER
...@@ -1533,6 +1718,7 @@ max_dst_opts_number - INTEGER ...@@ -1533,6 +1718,7 @@ max_dst_opts_number - INTEGER
options extension header. If this value is less than zero options extension header. If this value is less than zero
then unknown options are disallowed and the number of known then unknown options are disallowed and the number of known
TLVs allowed is the absolute value of this number. TLVs allowed is the absolute value of this number.
Default: 8 Default: 8
max_hbh_opts_number - INTEGER max_hbh_opts_number - INTEGER
...@@ -1540,16 +1726,19 @@ max_hbh_opts_number - INTEGER ...@@ -1540,16 +1726,19 @@ max_hbh_opts_number - INTEGER
options extension header. If this value is less than zero options extension header. If this value is less than zero
then unknown options are disallowed and the number of known then unknown options are disallowed and the number of known
TLVs allowed is the absolute value of this number. TLVs allowed is the absolute value of this number.
Default: 8 Default: 8
max_dst_opts_length - INTEGER max_dst_opts_length - INTEGER
Maximum length allowed for a Destination options extension Maximum length allowed for a Destination options extension
header. header.
Default: INT_MAX (unlimited) Default: INT_MAX (unlimited)
max_hbh_length - INTEGER max_hbh_length - INTEGER
Maximum length allowed for a Hop-by-Hop options extension Maximum length allowed for a Hop-by-Hop options extension
header. header.
Default: INT_MAX (unlimited) Default: INT_MAX (unlimited)
skip_notify_on_dev_down - BOOLEAN skip_notify_on_dev_down - BOOLEAN
...@@ -1558,6 +1747,7 @@ skip_notify_on_dev_down - BOOLEAN ...@@ -1558,6 +1747,7 @@ skip_notify_on_dev_down - BOOLEAN
generate this message; IPv6 does by default. Setting this sysctl generate this message; IPv6 does by default. Setting this sysctl
to true skips the message, making IPv4 and IPv6 on par in relying to true skips the message, making IPv4 and IPv6 on par in relying
on userspace caches to track link events and evict routes. on userspace caches to track link events and evict routes.
Default: false (generate message) Default: false (generate message)
nexthop_compat_mode - BOOLEAN nexthop_compat_mode - BOOLEAN
...@@ -1592,18 +1782,20 @@ seg6_flowlabel - INTEGER ...@@ -1592,18 +1782,20 @@ seg6_flowlabel - INTEGER
Controls the behaviour of computing the flowlabel of outer Controls the behaviour of computing the flowlabel of outer
IPv6 header in case of SR T.encaps IPv6 header in case of SR T.encaps
-1 set flowlabel to zero. == =======================================================
0 copy flowlabel from Inner packet in case of Inner IPv6 -1 set flowlabel to zero.
(Set flowlabel to 0 in case IPv4/L2) 0 copy flowlabel from Inner packet in case of Inner IPv6
1 Compute the flowlabel using seg6_make_flowlabel() (Set flowlabel to 0 in case IPv4/L2)
1 Compute the flowlabel using seg6_make_flowlabel()
== =======================================================
Default is 0. Default is 0.
conf/default/*: ``conf/default/*``:
Change the interface-specific default settings. Change the interface-specific default settings.
conf/all/*: ``conf/all/*``:
Change all the interface-specific settings. Change all the interface-specific settings.
[XXX: Other special features than forwarding?] [XXX: Other special features than forwarding?]
...@@ -1627,9 +1819,10 @@ fwmark_reflect - BOOLEAN ...@@ -1627,9 +1819,10 @@ fwmark_reflect - BOOLEAN
associated with a socket for example, TCP RSTs or ICMPv6 echo replies). associated with a socket for example, TCP RSTs or ICMPv6 echo replies).
If unset, these packets have a fwmark of zero. If set, they have the If unset, these packets have a fwmark of zero. If set, they have the
fwmark of the packet they are replying to. fwmark of the packet they are replying to.
Default: 0 Default: 0
conf/interface/*: ``conf/interface/*``:
Change special settings per interface. Change special settings per interface.
The functional behaviour for certain settings is different The functional behaviour for certain settings is different
...@@ -1644,31 +1837,40 @@ accept_ra - INTEGER ...@@ -1644,31 +1837,40 @@ accept_ra - INTEGER
transmitted. transmitted.
Possible values are: Possible values are:
0 Do not accept Router Advertisements.
1 Accept Router Advertisements if forwarding is disabled.
2 Overrule forwarding behaviour. Accept Router Advertisements
even if forwarding is enabled.
Functional default: enabled if local forwarding is disabled. == ===========================================================
disabled if local forwarding is enabled. 0 Do not accept Router Advertisements.
1 Accept Router Advertisements if forwarding is disabled.
2 Overrule forwarding behaviour. Accept Router Advertisements
even if forwarding is enabled.
== ===========================================================
Functional default:
- enabled if local forwarding is disabled.
- disabled if local forwarding is enabled.
accept_ra_defrtr - BOOLEAN accept_ra_defrtr - BOOLEAN
Learn default router in Router Advertisement. Learn default router in Router Advertisement.
Functional default: enabled if accept_ra is enabled. Functional default:
disabled if accept_ra is disabled.
- enabled if accept_ra is enabled.
- disabled if accept_ra is disabled.
accept_ra_from_local - BOOLEAN accept_ra_from_local - BOOLEAN
Accept RA with source-address that is found on local machine Accept RA with source-address that is found on local machine
if the RA is otherwise proper and able to be accepted. if the RA is otherwise proper and able to be accepted.
Default is to NOT accept these as it may be an un-intended
network loop. Default is to NOT accept these as it may be an un-intended
network loop.
Functional default: Functional default:
enabled if accept_ra_from_local is enabled
on a specific interface. - enabled if accept_ra_from_local is enabled
disabled if accept_ra_from_local is disabled on a specific interface.
on a specific interface. - disabled if accept_ra_from_local is disabled
on a specific interface.
accept_ra_min_hop_limit - INTEGER accept_ra_min_hop_limit - INTEGER
Minimum hop limit Information in Router Advertisement. Minimum hop limit Information in Router Advertisement.
...@@ -1681,8 +1883,10 @@ accept_ra_min_hop_limit - INTEGER ...@@ -1681,8 +1883,10 @@ accept_ra_min_hop_limit - INTEGER
accept_ra_pinfo - BOOLEAN accept_ra_pinfo - BOOLEAN
Learn Prefix Information in Router Advertisement. Learn Prefix Information in Router Advertisement.
Functional default: enabled if accept_ra is enabled. Functional default:
disabled if accept_ra is disabled.
- enabled if accept_ra is enabled.
- disabled if accept_ra is disabled.
accept_ra_rt_info_min_plen - INTEGER accept_ra_rt_info_min_plen - INTEGER
Minimum prefix length of Route Information in RA. Minimum prefix length of Route Information in RA.
...@@ -1690,8 +1894,10 @@ accept_ra_rt_info_min_plen - INTEGER ...@@ -1690,8 +1894,10 @@ accept_ra_rt_info_min_plen - INTEGER
Route Information w/ prefix smaller than this variable shall Route Information w/ prefix smaller than this variable shall
be ignored. be ignored.
Functional default: 0 if accept_ra_rtr_pref is enabled. Functional default:
-1 if accept_ra_rtr_pref is disabled.
* 0 if accept_ra_rtr_pref is enabled.
* -1 if accept_ra_rtr_pref is disabled.
accept_ra_rt_info_max_plen - INTEGER accept_ra_rt_info_max_plen - INTEGER
Maximum prefix length of Route Information in RA. Maximum prefix length of Route Information in RA.
...@@ -1699,33 +1905,41 @@ accept_ra_rt_info_max_plen - INTEGER ...@@ -1699,33 +1905,41 @@ accept_ra_rt_info_max_plen - INTEGER
Route Information w/ prefix larger than this variable shall Route Information w/ prefix larger than this variable shall
be ignored. be ignored.
Functional default: 0 if accept_ra_rtr_pref is enabled. Functional default:
-1 if accept_ra_rtr_pref is disabled.
* 0 if accept_ra_rtr_pref is enabled.
* -1 if accept_ra_rtr_pref is disabled.
accept_ra_rtr_pref - BOOLEAN accept_ra_rtr_pref - BOOLEAN
Accept Router Preference in RA. Accept Router Preference in RA.
Functional default: enabled if accept_ra is enabled. Functional default:
disabled if accept_ra is disabled.
- enabled if accept_ra is enabled.
- disabled if accept_ra is disabled.
accept_ra_mtu - BOOLEAN accept_ra_mtu - BOOLEAN
Apply the MTU value specified in RA option 5 (RFC4861). If Apply the MTU value specified in RA option 5 (RFC4861). If
disabled, the MTU specified in the RA will be ignored. disabled, the MTU specified in the RA will be ignored.
Functional default: enabled if accept_ra is enabled. Functional default:
disabled if accept_ra is disabled.
- enabled if accept_ra is enabled.
- disabled if accept_ra is disabled.
accept_redirects - BOOLEAN accept_redirects - BOOLEAN
Accept Redirects. Accept Redirects.
Functional default: enabled if local forwarding is disabled. Functional default:
disabled if local forwarding is enabled.
- enabled if local forwarding is disabled.
- disabled if local forwarding is enabled.
accept_source_route - INTEGER accept_source_route - INTEGER
Accept source routing (routing extension header). Accept source routing (routing extension header).
>= 0: Accept only routing header type 2. - >= 0: Accept only routing header type 2.
< 0: Do not accept routing header. - < 0: Do not accept routing header.
Default: 0 Default: 0
...@@ -1733,24 +1947,30 @@ autoconf - BOOLEAN ...@@ -1733,24 +1947,30 @@ autoconf - BOOLEAN
Autoconfigure addresses using Prefix Information in Router Autoconfigure addresses using Prefix Information in Router
Advertisements. Advertisements.
Functional default: enabled if accept_ra_pinfo is enabled. Functional default:
disabled if accept_ra_pinfo is disabled.
- enabled if accept_ra_pinfo is enabled.
- disabled if accept_ra_pinfo is disabled.
dad_transmits - INTEGER dad_transmits - INTEGER
The amount of Duplicate Address Detection probes to send. The amount of Duplicate Address Detection probes to send.
Default: 1 Default: 1
forwarding - INTEGER forwarding - INTEGER
Configure interface-specific Host/Router behaviour. Configure interface-specific Host/Router behaviour.
Note: It is recommended to have the same setting on all .. note::
interfaces; mixed router/host scenarios are rather uncommon.
It is recommended to have the same setting on all
interfaces; mixed router/host scenarios are rather uncommon.
Possible values are: Possible values are:
0 Forwarding disabled
1 Forwarding enabled
FALSE (0): - 0 Forwarding disabled
- 1 Forwarding enabled
**FALSE (0)**:
By default, Host behaviour is assumed. This means: By default, Host behaviour is assumed. This means:
...@@ -1761,7 +1981,7 @@ forwarding - INTEGER ...@@ -1761,7 +1981,7 @@ forwarding - INTEGER
Advertisements (and do autoconfiguration). Advertisements (and do autoconfiguration).
4. If accept_redirects is TRUE (default), accept Redirects. 4. If accept_redirects is TRUE (default), accept Redirects.
TRUE (1): **TRUE (1)**:
If local forwarding is enabled, Router behaviour is assumed. If local forwarding is enabled, Router behaviour is assumed.
This means exactly the reverse from the above: This means exactly the reverse from the above:
...@@ -1772,19 +1992,22 @@ forwarding - INTEGER ...@@ -1772,19 +1992,22 @@ forwarding - INTEGER
4. Redirects are ignored. 4. Redirects are ignored.
Default: 0 (disabled) if global forwarding is disabled (default), Default: 0 (disabled) if global forwarding is disabled (default),
otherwise 1 (enabled). otherwise 1 (enabled).
hop_limit - INTEGER hop_limit - INTEGER
Default Hop Limit to set. Default Hop Limit to set.
Default: 64 Default: 64
mtu - INTEGER mtu - INTEGER
Default Maximum Transfer Unit Default Maximum Transfer Unit
Default: 1280 (IPv6 required minimum) Default: 1280 (IPv6 required minimum)
ip_nonlocal_bind - BOOLEAN ip_nonlocal_bind - BOOLEAN
If set, allows processes to bind() to non-local IPv6 addresses, If set, allows processes to bind() to non-local IPv6 addresses,
which can be quite useful - but may break some applications. which can be quite useful - but may break some applications.
Default: 0 Default: 0
router_probe_interval - INTEGER router_probe_interval - INTEGER
...@@ -1796,15 +2019,18 @@ router_probe_interval - INTEGER ...@@ -1796,15 +2019,18 @@ router_probe_interval - INTEGER
router_solicitation_delay - INTEGER router_solicitation_delay - INTEGER
Number of seconds to wait after interface is brought up Number of seconds to wait after interface is brought up
before sending Router Solicitations. before sending Router Solicitations.
Default: 1 Default: 1
router_solicitation_interval - INTEGER router_solicitation_interval - INTEGER
Number of seconds to wait between Router Solicitations. Number of seconds to wait between Router Solicitations.
Default: 4 Default: 4
router_solicitations - INTEGER router_solicitations - INTEGER
Number of Router Solicitations to send until assuming no Number of Router Solicitations to send until assuming no
routers are present. routers are present.
Default: 3 Default: 3
use_oif_addrs_only - BOOLEAN use_oif_addrs_only - BOOLEAN
...@@ -1816,28 +2042,35 @@ use_oif_addrs_only - BOOLEAN ...@@ -1816,28 +2042,35 @@ use_oif_addrs_only - BOOLEAN
use_tempaddr - INTEGER use_tempaddr - INTEGER
Preference for Privacy Extensions (RFC3041). Preference for Privacy Extensions (RFC3041).
<= 0 : disable Privacy Extensions
== 1 : enable Privacy Extensions, but prefer public * <= 0 : disable Privacy Extensions
addresses over temporary addresses. * == 1 : enable Privacy Extensions, but prefer public
> 1 : enable Privacy Extensions and prefer temporary addresses over temporary addresses.
addresses over public addresses. * > 1 : enable Privacy Extensions and prefer temporary
Default: 0 (for most devices) addresses over public addresses.
-1 (for point-to-point devices and loopback devices)
Default:
* 0 (for most devices)
* -1 (for point-to-point devices and loopback devices)
temp_valid_lft - INTEGER temp_valid_lft - INTEGER
valid lifetime (in seconds) for temporary addresses. valid lifetime (in seconds) for temporary addresses.
Default: 604800 (7 days) Default: 604800 (7 days)
temp_prefered_lft - INTEGER temp_prefered_lft - INTEGER
Preferred lifetime (in seconds) for temporary addresses. Preferred lifetime (in seconds) for temporary addresses.
Default: 86400 (1 day) Default: 86400 (1 day)
keep_addr_on_down - INTEGER keep_addr_on_down - INTEGER
Keep all IPv6 addresses on an interface down event. If set static Keep all IPv6 addresses on an interface down event. If set static
global addresses with no expiration time are not flushed. global addresses with no expiration time are not flushed.
>0 : enabled
0 : system default * >0 : enabled
<0 : disabled * 0 : system default
* <0 : disabled
Default: 0 (addresses are removed) Default: 0 (addresses are removed)
...@@ -1846,11 +2079,13 @@ max_desync_factor - INTEGER ...@@ -1846,11 +2079,13 @@ max_desync_factor - INTEGER
that ensures that clients don't synchronize with each that ensures that clients don't synchronize with each
other and generate new addresses at exactly the same time. other and generate new addresses at exactly the same time.
value is in seconds. value is in seconds.
Default: 600 Default: 600
regen_max_retry - INTEGER regen_max_retry - INTEGER
Number of attempts before give up attempting to generate Number of attempts before give up attempting to generate
valid temporary addresses. valid temporary addresses.
Default: 5 Default: 5
max_addresses - INTEGER max_addresses - INTEGER
...@@ -1858,12 +2093,14 @@ max_addresses - INTEGER ...@@ -1858,12 +2093,14 @@ max_addresses - INTEGER
to zero disables the limitation. It is not recommended to set this to zero disables the limitation. It is not recommended to set this
value too large (or to zero) because it would be an easy way to value too large (or to zero) because it would be an easy way to
crash the kernel by allowing too many addresses to be created. crash the kernel by allowing too many addresses to be created.
Default: 16 Default: 16
disable_ipv6 - BOOLEAN disable_ipv6 - BOOLEAN
Disable IPv6 operation. If accept_dad is set to 2, this value Disable IPv6 operation. If accept_dad is set to 2, this value
will be dynamically set to TRUE if DAD fails for the link-local will be dynamically set to TRUE if DAD fails for the link-local
address. address.
Default: FALSE (enable IPv6 operation) Default: FALSE (enable IPv6 operation)
When this value is changed from 1 to 0 (IPv6 is being enabled), When this value is changed from 1 to 0 (IPv6 is being enabled),
...@@ -1877,10 +2114,13 @@ disable_ipv6 - BOOLEAN ...@@ -1877,10 +2114,13 @@ disable_ipv6 - BOOLEAN
accept_dad - INTEGER accept_dad - INTEGER
Whether to accept DAD (Duplicate Address Detection). Whether to accept DAD (Duplicate Address Detection).
0: Disable DAD
1: Enable DAD (default) == ==============================================================
2: Enable DAD, and disable IPv6 operation if MAC-based duplicate 0 Disable DAD
link-local address has been found. 1 Enable DAD (default)
2 Enable DAD, and disable IPv6 operation if MAC-based duplicate
link-local address has been found.
== ==============================================================
DAD operation and mode on a given interface will be selected according DAD operation and mode on a given interface will be selected according
to the maximum value of conf/{all,interface}/accept_dad. to the maximum value of conf/{all,interface}/accept_dad.
...@@ -1888,6 +2128,7 @@ accept_dad - INTEGER ...@@ -1888,6 +2128,7 @@ accept_dad - INTEGER
force_tllao - BOOLEAN force_tllao - BOOLEAN
Enable sending the target link-layer address option even when Enable sending the target link-layer address option even when
responding to a unicast neighbor solicitation. responding to a unicast neighbor solicitation.
Default: FALSE Default: FALSE
Quoting from RFC 2461, section 4.4, Target link-layer address: Quoting from RFC 2461, section 4.4, Target link-layer address:
...@@ -1905,9 +2146,10 @@ force_tllao - BOOLEAN ...@@ -1905,9 +2146,10 @@ force_tllao - BOOLEAN
ndisc_notify - BOOLEAN ndisc_notify - BOOLEAN
Define mode for notification of address and device changes. Define mode for notification of address and device changes.
0 - (default): do nothing
1 - Generate unsolicited neighbour advertisements when device is brought * 0 - (default): do nothing
up or hardware address changes. * 1 - Generate unsolicited neighbour advertisements when device is brought
up or hardware address changes.
ndisc_tclass - INTEGER ndisc_tclass - INTEGER
The IPv6 Traffic Class to use by default when sending IPv6 Neighbor The IPv6 Traffic Class to use by default when sending IPv6 Neighbor
...@@ -1916,33 +2158,38 @@ ndisc_tclass - INTEGER ...@@ -1916,33 +2158,38 @@ ndisc_tclass - INTEGER
These 8 bits can be interpreted as 6 high order bits holding the DSCP These 8 bits can be interpreted as 6 high order bits holding the DSCP
value and 2 low order bits representing ECN (which you probably want value and 2 low order bits representing ECN (which you probably want
to leave cleared). to leave cleared).
0 - (default)
* 0 - (default)
mldv1_unsolicited_report_interval - INTEGER mldv1_unsolicited_report_interval - INTEGER
The interval in milliseconds in which the next unsolicited The interval in milliseconds in which the next unsolicited
MLDv1 report retransmit will take place. MLDv1 report retransmit will take place.
Default: 10000 (10 seconds) Default: 10000 (10 seconds)
mldv2_unsolicited_report_interval - INTEGER mldv2_unsolicited_report_interval - INTEGER
The interval in milliseconds in which the next unsolicited The interval in milliseconds in which the next unsolicited
MLDv2 report retransmit will take place. MLDv2 report retransmit will take place.
Default: 1000 (1 second) Default: 1000 (1 second)
force_mld_version - INTEGER force_mld_version - INTEGER
0 - (default) No enforcement of a MLD version, MLDv1 fallback allowed * 0 - (default) No enforcement of a MLD version, MLDv1 fallback allowed
1 - Enforce to use MLD version 1 * 1 - Enforce to use MLD version 1
2 - Enforce to use MLD version 2 * 2 - Enforce to use MLD version 2
suppress_frag_ndisc - INTEGER suppress_frag_ndisc - INTEGER
Control RFC 6980 (Security Implications of IPv6 Fragmentation Control RFC 6980 (Security Implications of IPv6 Fragmentation
with IPv6 Neighbor Discovery) behavior: with IPv6 Neighbor Discovery) behavior:
1 - (default) discard fragmented neighbor discovery packets
0 - allow fragmented neighbor discovery packets * 1 - (default) discard fragmented neighbor discovery packets
* 0 - allow fragmented neighbor discovery packets
optimistic_dad - BOOLEAN optimistic_dad - BOOLEAN
Whether to perform Optimistic Duplicate Address Detection (RFC 4429). Whether to perform Optimistic Duplicate Address Detection (RFC 4429).
0: disabled (default)
1: enabled * 0: disabled (default)
* 1: enabled
Optimistic Duplicate Address Detection for the interface will be enabled Optimistic Duplicate Address Detection for the interface will be enabled
if at least one of conf/{all,interface}/optimistic_dad is set to 1, if at least one of conf/{all,interface}/optimistic_dad is set to 1,
...@@ -1953,8 +2200,9 @@ use_optimistic - BOOLEAN ...@@ -1953,8 +2200,9 @@ use_optimistic - BOOLEAN
source address selection. Preferred addresses will still be chosen source address selection. Preferred addresses will still be chosen
before optimistic addresses, subject to other ranking in the source before optimistic addresses, subject to other ranking in the source
address selection algorithm. address selection algorithm.
0: disabled (default)
1: enabled * 0: disabled (default)
* 1: enabled
This will be enabled if at least one of This will be enabled if at least one of
conf/{all,interface}/use_optimistic is set to 1, disabled otherwise. conf/{all,interface}/use_optimistic is set to 1, disabled otherwise.
...@@ -1976,12 +2224,14 @@ stable_secret - IPv6 address ...@@ -1976,12 +2224,14 @@ stable_secret - IPv6 address
addr_gen_mode - INTEGER addr_gen_mode - INTEGER
Defines how link-local and autoconf addresses are generated. Defines how link-local and autoconf addresses are generated.
0: generate address based on EUI64 (default) = =================================================================
1: do no generate a link-local address, use EUI64 for addresses generated 0 generate address based on EUI64 (default)
from autoconf 1 do no generate a link-local address, use EUI64 for addresses
2: generate stable privacy addresses, using the secret from generated from autoconf
2 generate stable privacy addresses, using the secret from
stable_secret (RFC7217) stable_secret (RFC7217)
3: generate stable privacy addresses, using a random secret if unset 3 generate stable privacy addresses, using a random secret if unset
= =================================================================
drop_unicast_in_l2_multicast - BOOLEAN drop_unicast_in_l2_multicast - BOOLEAN
Drop any unicast IPv6 packets that are received in link-layer Drop any unicast IPv6 packets that are received in link-layer
...@@ -2003,13 +2253,18 @@ enhanced_dad - BOOLEAN ...@@ -2003,13 +2253,18 @@ enhanced_dad - BOOLEAN
detection of duplicates due to loopback of the NS messages that we send. detection of duplicates due to loopback of the NS messages that we send.
The nonce option will be sent on an interface unless both of The nonce option will be sent on an interface unless both of
conf/{all,interface}/enhanced_dad are set to FALSE. conf/{all,interface}/enhanced_dad are set to FALSE.
Default: TRUE Default: TRUE
icmp/*: ``icmp/*``:
===========
ratelimit - INTEGER ratelimit - INTEGER
Limit the maximal rates for sending ICMPv6 messages. Limit the maximal rates for sending ICMPv6 messages.
0 to disable any limiting, 0 to disable any limiting,
otherwise the minimal space between responses in milliseconds. otherwise the minimal space between responses in milliseconds.
Default: 1000 Default: 1000
ratemask - list of comma separated ranges ratemask - list of comma separated ranges
...@@ -2030,16 +2285,19 @@ ratemask - list of comma separated ranges ...@@ -2030,16 +2285,19 @@ ratemask - list of comma separated ranges
echo_ignore_all - BOOLEAN echo_ignore_all - BOOLEAN
If set non-zero, then the kernel will ignore all ICMP ECHO If set non-zero, then the kernel will ignore all ICMP ECHO
requests sent to it over the IPv6 protocol. requests sent to it over the IPv6 protocol.
Default: 0 Default: 0
echo_ignore_multicast - BOOLEAN echo_ignore_multicast - BOOLEAN
If set non-zero, then the kernel will ignore all ICMP ECHO If set non-zero, then the kernel will ignore all ICMP ECHO
requests sent to it over the IPv6 protocol via multicast. requests sent to it over the IPv6 protocol via multicast.
Default: 0 Default: 0
echo_ignore_anycast - BOOLEAN echo_ignore_anycast - BOOLEAN
If set non-zero, then the kernel will ignore all ICMP ECHO If set non-zero, then the kernel will ignore all ICMP ECHO
requests sent to it over the IPv6 protocol destined to anycast address. requests sent to it over the IPv6 protocol destined to anycast address.
Default: 0 Default: 0
xfrm6_gc_thresh - INTEGER xfrm6_gc_thresh - INTEGER
...@@ -2055,43 +2313,52 @@ YOSHIFUJI Hideaki / USAGI Project <yoshfuji@linux-ipv6.org> ...@@ -2055,43 +2313,52 @@ YOSHIFUJI Hideaki / USAGI Project <yoshfuji@linux-ipv6.org>
/proc/sys/net/bridge/* Variables: /proc/sys/net/bridge/* Variables:
=================================
bridge-nf-call-arptables - BOOLEAN bridge-nf-call-arptables - BOOLEAN
1 : pass bridged ARP traffic to arptables' FORWARD chain. - 1 : pass bridged ARP traffic to arptables' FORWARD chain.
0 : disable this. - 0 : disable this.
Default: 1 Default: 1
bridge-nf-call-iptables - BOOLEAN bridge-nf-call-iptables - BOOLEAN
1 : pass bridged IPv4 traffic to iptables' chains. - 1 : pass bridged IPv4 traffic to iptables' chains.
0 : disable this. - 0 : disable this.
Default: 1 Default: 1
bridge-nf-call-ip6tables - BOOLEAN bridge-nf-call-ip6tables - BOOLEAN
1 : pass bridged IPv6 traffic to ip6tables' chains. - 1 : pass bridged IPv6 traffic to ip6tables' chains.
0 : disable this. - 0 : disable this.
Default: 1 Default: 1
bridge-nf-filter-vlan-tagged - BOOLEAN bridge-nf-filter-vlan-tagged - BOOLEAN
1 : pass bridged vlan-tagged ARP/IP/IPv6 traffic to {arp,ip,ip6}tables. - 1 : pass bridged vlan-tagged ARP/IP/IPv6 traffic to {arp,ip,ip6}tables.
0 : disable this. - 0 : disable this.
Default: 0 Default: 0
bridge-nf-filter-pppoe-tagged - BOOLEAN bridge-nf-filter-pppoe-tagged - BOOLEAN
1 : pass bridged pppoe-tagged IP/IPv6 traffic to {ip,ip6}tables. - 1 : pass bridged pppoe-tagged IP/IPv6 traffic to {ip,ip6}tables.
0 : disable this. - 0 : disable this.
Default: 0 Default: 0
bridge-nf-pass-vlan-input-dev - BOOLEAN bridge-nf-pass-vlan-input-dev - BOOLEAN
1: if bridge-nf-filter-vlan-tagged is enabled, try to find a vlan - 1: if bridge-nf-filter-vlan-tagged is enabled, try to find a vlan
interface on the bridge and set the netfilter input device to the vlan. interface on the bridge and set the netfilter input device to the
This allows use of e.g. "iptables -i br0.1" and makes the REDIRECT vlan. This allows use of e.g. "iptables -i br0.1" and makes the
target work with vlan-on-top-of-bridge interfaces. When no matching REDIRECT target work with vlan-on-top-of-bridge interfaces. When no
vlan interface is found, or this switch is off, the input device is matching vlan interface is found, or this switch is off, the input
set to the bridge interface. device is set to the bridge interface.
0: disable bridge netfilter vlan interface lookup.
- 0: disable bridge netfilter vlan interface lookup.
Default: 0 Default: 0
proc/sys/net/sctp/* Variables: ``proc/sys/net/sctp/*`` Variables:
==================================
addip_enable - BOOLEAN addip_enable - BOOLEAN
Enable or disable extension of Dynamic Address Reconfiguration Enable or disable extension of Dynamic Address Reconfiguration
...@@ -2156,11 +2423,13 @@ addip_noauth_enable - BOOLEAN ...@@ -2156,11 +2423,13 @@ addip_noauth_enable - BOOLEAN
we provide this variable to control the enforcement of the we provide this variable to control the enforcement of the
authentication requirement. authentication requirement.
1: Allow ADD-IP extension to be used without authentication. This == ===============================================================
1 Allow ADD-IP extension to be used without authentication. This
should only be set in a closed environment for interoperability should only be set in a closed environment for interoperability
with older implementations. with older implementations.
0: Enforce the authentication requirement 0 Enforce the authentication requirement
== ===============================================================
Default: 0 Default: 0
...@@ -2170,8 +2439,8 @@ auth_enable - BOOLEAN ...@@ -2170,8 +2439,8 @@ auth_enable - BOOLEAN
required for secure operation of Dynamic Address Reconfiguration required for secure operation of Dynamic Address Reconfiguration
(ADD-IP) extension. (ADD-IP) extension.
1: Enable this extension. - 1: Enable this extension.
0: Disable this extension. - 0: Disable this extension.
Default: 0 Default: 0
...@@ -2179,8 +2448,8 @@ prsctp_enable - BOOLEAN ...@@ -2179,8 +2448,8 @@ prsctp_enable - BOOLEAN
Enable or disable the Partial Reliability extension (RFC3758) which Enable or disable the Partial Reliability extension (RFC3758) which
is used to notify peers that a given DATA should no longer be expected. is used to notify peers that a given DATA should no longer be expected.
1: Enable extension - 1: Enable extension
0: Disable - 0: Disable
Default: 1 Default: 1
...@@ -2282,8 +2551,8 @@ cookie_preserve_enable - BOOLEAN ...@@ -2282,8 +2551,8 @@ cookie_preserve_enable - BOOLEAN
Enable or disable the ability to extend the lifetime of the SCTP cookie Enable or disable the ability to extend the lifetime of the SCTP cookie
that is used during the establishment phase of SCTP association that is used during the establishment phase of SCTP association
1: Enable cookie lifetime extension. - 1: Enable cookie lifetime extension.
0: Disable - 0: Disable
Default: 1 Default: 1
...@@ -2291,9 +2560,11 @@ cookie_hmac_alg - STRING ...@@ -2291,9 +2560,11 @@ cookie_hmac_alg - STRING
Select the hmac algorithm used when generating the cookie value sent by Select the hmac algorithm used when generating the cookie value sent by
a listening sctp socket to a connecting client in the INIT-ACK chunk. a listening sctp socket to a connecting client in the INIT-ACK chunk.
Valid values are: Valid values are:
* md5 * md5
* sha1 * sha1
* none * none
Ability to assign md5 or sha1 as the selected alg is predicated on the Ability to assign md5 or sha1 as the selected alg is predicated on the
configuration of those algorithms at build time (CONFIG_CRYPTO_MD5 and configuration of those algorithms at build time (CONFIG_CRYPTO_MD5 and
CONFIG_CRYPTO_SHA1). CONFIG_CRYPTO_SHA1).
...@@ -2312,16 +2583,16 @@ rcvbuf_policy - INTEGER ...@@ -2312,16 +2583,16 @@ rcvbuf_policy - INTEGER
to each association instead of the socket. This prevents the described to each association instead of the socket. This prevents the described
blocking. blocking.
1: rcvbuf space is per association - 1: rcvbuf space is per association
0: rcvbuf space is per socket - 0: rcvbuf space is per socket
Default: 0 Default: 0
sndbuf_policy - INTEGER sndbuf_policy - INTEGER
Similar to rcvbuf_policy above, this applies to send buffer space. Similar to rcvbuf_policy above, this applies to send buffer space.
1: Send buffer is tracked per association - 1: Send buffer is tracked per association
0: Send buffer is tracked per socket. - 0: Send buffer is tracked per socket.
Default: 0 Default: 0
...@@ -2354,19 +2625,23 @@ sctp_wmem - vector of 3 INTEGERs: min, default, max ...@@ -2354,19 +2625,23 @@ sctp_wmem - vector of 3 INTEGERs: min, default, max
addr_scope_policy - INTEGER addr_scope_policy - INTEGER
Control IPv4 address scoping - draft-stewart-tsvwg-sctp-ipv4-00 Control IPv4 address scoping - draft-stewart-tsvwg-sctp-ipv4-00
0 - Disable IPv4 address scoping - 0 - Disable IPv4 address scoping
1 - Enable IPv4 address scoping - 1 - Enable IPv4 address scoping
2 - Follow draft but allow IPv4 private addresses - 2 - Follow draft but allow IPv4 private addresses
3 - Follow draft but allow IPv4 link local addresses - 3 - Follow draft but allow IPv4 link local addresses
Default: 1 Default: 1
/proc/sys/net/core/* ``/proc/sys/net/core/*``
========================
Please see: Documentation/admin-guide/sysctl/net.rst for descriptions of these entries. Please see: Documentation/admin-guide/sysctl/net.rst for descriptions of these entries.
/proc/sys/net/unix/* ``/proc/sys/net/unix/*``
========================
max_dgram_qlen - INTEGER max_dgram_qlen - INTEGER
The maximum length of dgram socket receive queue The maximum length of dgram socket receive queue
......
.. SPDX-License-Identifier: GPL-2.0
==================================
IP dynamic address hack-port v0.03 IP dynamic address hack-port v0.03
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ==================================
This stuff allows diald ONESHOT connections to get established by This stuff allows diald ONESHOT connections to get established by
dynamically changing packet source address (and socket's if local procs). dynamically changing packet source address (and socket's if local procs).
It is implemented for TCP diald-box connections(1) and IP_MASQuerading(2). It is implemented for TCP diald-box connections(1) and IP_MASQuerading(2).
If enabled[*] and forwarding interface has changed: If enabled\ [#]_ and forwarding interface has changed:
1) Socket (and packet) source address is rewritten ON RETRANSMISSIONS 1) Socket (and packet) source address is rewritten ON RETRANSMISSIONS
while in SYN_SENT state (diald-box processes). while in SYN_SENT state (diald-box processes).
2) Out-bounded MASQueraded source address changes ON OUTPUT (when 2) Out-bounded MASQueraded source address changes ON OUTPUT (when
...@@ -12,18 +17,24 @@ If enabled[*] and forwarding interface has changed: ...@@ -12,18 +17,24 @@ If enabled[*] and forwarding interface has changed:
received by the tunnel. received by the tunnel.
This is specially helpful for auto dialup links (diald), where the This is specially helpful for auto dialup links (diald), where the
``actual'' outgoing address is unknown at the moment the link is ``actual`` outgoing address is unknown at the moment the link is
going up. So, the *same* (local AND masqueraded) connections requests that going up. So, the *same* (local AND masqueraded) connections requests that
bring the link up will be able to get established. bring the link up will be able to get established.
[*] At boot, by default no address rewriting is attempted. .. [#] At boot, by default no address rewriting is attempted.
To enable:
To enable::
# echo 1 > /proc/sys/net/ipv4/ip_dynaddr # echo 1 > /proc/sys/net/ipv4/ip_dynaddr
To enable verbose mode:
# echo 2 > /proc/sys/net/ipv4/ip_dynaddr To enable verbose mode::
To disable (default)
# echo 2 > /proc/sys/net/ipv4/ip_dynaddr
To disable (default)::
# echo 0 > /proc/sys/net/ipv4/ip_dynaddr # echo 0 > /proc/sys/net/ipv4/ip_dynaddr
Enjoy! Enjoy!
-- Juanjo <jjciarla@raiz.uncu.edu.ar> Juanjo <jjciarla@raiz.uncu.edu.ar>
Text file for ipddp.c: .. SPDX-License-Identifier: GPL-2.0
AppleTalk-IP Decapsulation and AppleTalk-IP Encapsulation
This text file is written by Jay Schulist <jschlst@samba.org> =========================================================
AppleTalk-IP Decapsulation and AppleTalk-IP Encapsulation
=========================================================
Documentation ipddp.c
This file is written by Jay Schulist <jschlst@samba.org>
Introduction Introduction
------------ ------------
...@@ -21,7 +26,7 @@ kernel AppleTalk layer and drivers are available. ...@@ -21,7 +26,7 @@ kernel AppleTalk layer and drivers are available.
Each mode requires its own user space software. Each mode requires its own user space software.
Compiling AppleTalk-IP Decapsulation/Encapsulation Compiling AppleTalk-IP Decapsulation/Encapsulation
================================================= ==================================================
AppleTalk-IP decapsulation needs to be compiled into your kernel. You AppleTalk-IP decapsulation needs to be compiled into your kernel. You
will need to turn on AppleTalk-IP driver support. Then you will need to will need to turn on AppleTalk-IP driver support. Then you will need to
......
.. SPDX-License-Identifier: GPL-2.0
==================================
ATM (i)Chip IA Linux Driver Source
==================================
READ ME FISRT
READ ME FISRT
ATM (i)Chip IA Linux Driver Source
-------------------------------------------------------------------------------- --------------------------------------------------------------------------------
Read This Before You Begin!
Read This Before You Begin!
-------------------------------------------------------------------------------- --------------------------------------------------------------------------------
Description Description
----------- ===========
This is the README file for the Interphase PCI ATM (i)Chip IA Linux driver This is the README file for the Interphase PCI ATM (i)Chip IA Linux driver
source release. source release.
The features and limitations of this driver are as follows: The features and limitations of this driver are as follows:
- A single VPI (VPI value of 0) is supported. - A single VPI (VPI value of 0) is supported.
- Supports 4K VCs for the server board (with 512K control memory) and 1K - Supports 4K VCs for the server board (with 512K control memory) and 1K
VCs for the client board (with 128K control memory). VCs for the client board (with 128K control memory).
- UBR, ABR and CBR service categories are supported. - UBR, ABR and CBR service categories are supported.
- Only AAL5 is supported. - Only AAL5 is supported.
- Supports setting of PCR on the VCs. - Supports setting of PCR on the VCs.
- Multiple adapters in a system are supported. - Multiple adapters in a system are supported.
- All variants of Interphase ATM PCI (i)Chip adapter cards are supported, - All variants of Interphase ATM PCI (i)Chip adapter cards are supported,
including x575 (OC3, control memory 128K , 512K and packet memory 128K, including x575 (OC3, control memory 128K , 512K and packet memory 128K,
512K and 1M), x525 (UTP25) and x531 (DS3 and E3). See 512K and 1M), x525 (UTP25) and x531 (DS3 and E3). See
http://www.iphase.com/ http://www.iphase.com/
for details. for details.
- Only x86 platforms are supported. - Only x86 platforms are supported.
...@@ -29,128 +37,155 @@ The features and limitations of this driver are as follows: ...@@ -29,128 +37,155 @@ The features and limitations of this driver are as follows:
Before You Start Before You Start
---------------- ================
Installation Installation
------------ ------------
1. Installing the adapters in the system 1. Installing the adapters in the system
To install the ATM adapters in the system, follow the steps below. To install the ATM adapters in the system, follow the steps below.
a. Login as root. a. Login as root.
b. Shut down the system and power off the system. b. Shut down the system and power off the system.
c. Install one or more ATM adapters in the system. c. Install one or more ATM adapters in the system.
d. Connect each adapter to a port on an ATM switch. The green 'Link' d. Connect each adapter to a port on an ATM switch. The green 'Link'
LED on the front panel of the adapter will be on if the adapter is LED on the front panel of the adapter will be on if the adapter is
connected to the switch properly when the system is powered up. connected to the switch properly when the system is powered up.
e. Power on and boot the system. e. Power on and boot the system.
2. [ Removed ] 2. [ Removed ]
3. Rebuild kernel with ABR support 3. Rebuild kernel with ABR support
[ a. and b. removed ] [ a. and b. removed ]
c. Reconfigure the kernel, choose the Interphase ia driver through "make
c. Reconfigure the kernel, choose the Interphase ia driver through "make
menuconfig" or "make xconfig". menuconfig" or "make xconfig".
d. Rebuild the kernel, loadable modules and the atm tools. d. Rebuild the kernel, loadable modules and the atm tools.
e. Install the new built kernel and modules and reboot. e. Install the new built kernel and modules and reboot.
4. Load the adapter hardware driver (ia driver) if it is built as a module 4. Load the adapter hardware driver (ia driver) if it is built as a module
a. Login as root. a. Login as root.
b. Change directory to /lib/modules/<kernel-version>/atm. b. Change directory to /lib/modules/<kernel-version>/atm.
c. Run "insmod suni.o;insmod iphase.o" c. Run "insmod suni.o;insmod iphase.o"
The yellow 'status' LED on the front panel of the adapter will blink The yellow 'status' LED on the front panel of the adapter will blink
while the driver is loaded in the system. while the driver is loaded in the system.
d. To verify that the 'ia' driver is loaded successfully, run the d. To verify that the 'ia' driver is loaded successfully, run the
following command: following command::
cat /proc/atm/devices cat /proc/atm/devices
If the driver is loaded successfully, the output of the command will If the driver is loaded successfully, the output of the command will
be similar to the following lines: be similar to the following lines::
Itf Type ESI/"MAC"addr AAL(TX,err,RX,err,drop) ... Itf Type ESI/"MAC"addr AAL(TX,err,RX,err,drop) ...
0 ia xxxxxxxxx 0 ( 0 0 0 0 0 ) 5 ( 0 0 0 0 0 ) 0 ia xxxxxxxxx 0 ( 0 0 0 0 0 ) 5 ( 0 0 0 0 0 )
You can also check the system log file /var/log/messages for messages You can also check the system log file /var/log/messages for messages
related to the ATM driver. related to the ATM driver.
5. Ia Driver Configuration 5. Ia Driver Configuration
5.1 Configuration of adapter buffers 5.1 Configuration of adapter buffers
The (i)Chip boards have 3 different packet RAM size variants: 128K, 512K and The (i)Chip boards have 3 different packet RAM size variants: 128K, 512K and
1M. The RAM size decides the number of buffers and buffer size. The default 1M. The RAM size decides the number of buffers and buffer size. The default
size and number of buffers are set as following: size and number of buffers are set as following:
Total Rx RAM Tx RAM Rx Buf Tx Buf Rx buf Tx buf ========= ======= ====== ====== ====== ====== ======
RAM size size size size size cnt cnt Total Rx RAM Tx RAM Rx Buf Tx Buf Rx buf Tx buf
-------- ------ ------ ------ ------ ------ ------ RAM size size size size size cnt cnt
128K 64K 64K 10K 10K 6 6 ========= ======= ====== ====== ====== ====== ======
512K 256K 256K 10K 10K 25 25 128K 64K 64K 10K 10K 6 6
1M 512K 512K 10K 10K 51 51 512K 256K 256K 10K 10K 25 25
1M 512K 512K 10K 10K 51 51
========= ======= ====== ====== ====== ====== ======
These setting should work well in most environments, but can be These setting should work well in most environments, but can be
changed by typing the following command: changed by typing the following command::
insmod <IA_DIR>/ia.o IA_RX_BUF=<RX_CNT> IA_RX_BUF_SZ=<RX_SIZE> \ insmod <IA_DIR>/ia.o IA_RX_BUF=<RX_CNT> IA_RX_BUF_SZ=<RX_SIZE> \
IA_TX_BUF=<TX_CNT> IA_TX_BUF_SZ=<TX_SIZE> IA_TX_BUF=<TX_CNT> IA_TX_BUF_SZ=<TX_SIZE>
Where: Where:
RX_CNT = number of receive buffers in the range (1-128)
RX_SIZE = size of receive buffers in the range (48-64K)
TX_CNT = number of transmit buffers in the range (1-128)
TX_SIZE = size of transmit buffers in the range (48-64K)
1. Transmit and receive buffer size must be a multiple of 4. - RX_CNT = number of receive buffers in the range (1-128)
2. Care should be taken so that the memory required for the - RX_SIZE = size of receive buffers in the range (48-64K)
transmit and receive buffers is less than or equal to the - TX_CNT = number of transmit buffers in the range (1-128)
total adapter packet memory. - TX_SIZE = size of transmit buffers in the range (48-64K)
1. Transmit and receive buffer size must be a multiple of 4.
2. Care should be taken so that the memory required for the
transmit and receive buffers is less than or equal to the
total adapter packet memory.
5.2 Turn on ia debug trace 5.2 Turn on ia debug trace
When the ia driver is built with the CONFIG_ATM_IA_DEBUG flag, the driver When the ia driver is built with the CONFIG_ATM_IA_DEBUG flag, the driver
can provide more debug trace if needed. There is a bit mask variable, can provide more debug trace if needed. There is a bit mask variable,
IADebugFlag, which controls the output of the traces. You can find the bit IADebugFlag, which controls the output of the traces. You can find the bit
map of the IADebugFlag in iphase.h. map of the IADebugFlag in iphase.h.
The debug trace can be turn on through the insmod command line option, for The debug trace can be turn on through the insmod command line option, for
example, "insmod iphase.o IADebugFlag=0xffffffff" can turn on all the debug example, "insmod iphase.o IADebugFlag=0xffffffff" can turn on all the debug
traces together with loading the driver. traces together with loading the driver.
6. Ia Driver Test Using ttcp_atm and PVC 6. Ia Driver Test Using ttcp_atm and PVC
For the PVC setup, the test machines can either be connected back-to-back or For the PVC setup, the test machines can either be connected back-to-back or
through a switch. If connected through the switch, the switch must be through a switch. If connected through the switch, the switch must be
configured for the PVC(s). configured for the PVC(s).
a. For UBR test: a. For UBR test:
At the test machine intended to receive data, type:
ttcp_atm -r -a -s 0.100 At the test machine intended to receive data, type::
At the other test machine, type:
ttcp_atm -t -a -s 0.100 -n 10000 ttcp_atm -r -a -s 0.100
At the other test machine, type::
ttcp_atm -t -a -s 0.100 -n 10000
Run "ttcp_atm -h" to display more options of the ttcp_atm tool. Run "ttcp_atm -h" to display more options of the ttcp_atm tool.
b. For ABR test: b. For ABR test:
It is the same as the UBR testing, but with an extra command option:
-Pabr:max_pcr=<xxx> It is the same as the UBR testing, but with an extra command option::
where:
xxx = the maximum peak cell rate, from 170 - 353207. -Pabr:max_pcr=<xxx>
This option must be set on both the machines.
where:
xxx = the maximum peak cell rate, from 170 - 353207.
This option must be set on both the machines.
c. For CBR test: c. For CBR test:
It is the same as the UBR testing, but with an extra command option:
-Pcbr:max_pcr=<xxx>
where:
xxx = the maximum peak cell rate, from 170 - 353207.
This option may only be set on the transmit machine.
It is the same as the UBR testing, but with an extra command option::
-Pcbr:max_pcr=<xxx>
where:
xxx = the maximum peak cell rate, from 170 - 353207.
OUTSTANDING ISSUES This option may only be set on the transmit machine.
------------------
Outstanding Issues
==================
Contact Information Contact Information
------------------- -------------------
::
Customer Support: Customer Support:
United States: Telephone: (214) 654-5555 United States: Telephone: (214) 654-5555
Fax: (214) 654-5500 Fax: (214) 654-5500
E-Mail: intouch@iphase.com E-Mail: intouch@iphase.com
Europe: Telephone: 33 (0)1 41 15 44 00 Europe: Telephone: 33 (0)1 41 15 44 00
Fax: 33 (0)1 41 15 12 13 Fax: 33 (0)1 41 15 12 13
......
.. SPDX-License-Identifier: GPL-2.0
=====
IPsec
=====
Here documents known IPsec corner cases which need to be keep in mind when Here documents known IPsec corner cases which need to be keep in mind when
deploy various IPsec configuration in real world production environment. deploy various IPsec configuration in real world production environment.
1. IPcomp: Small IP packet won't get compressed at sender, and failed on 1. IPcomp:
Small IP packet won't get compressed at sender, and failed on
policy check on receiver. policy check on receiver.
Quote from RFC3173: Quote from RFC3173::
2.2. Non-Expansion Policy
2.2. Non-Expansion Policy
If the total size of a compressed payload and the IPComp header, as If the total size of a compressed payload and the IPComp header, as
defined in section 3, is not smaller than the size of the original defined in section 3, is not smaller than the size of the original
......
.. SPDX-License-Identifier: GPL-2.0
====
IPv6
====
Options for the ipv6 module are supplied as parameters at load time. Options for the ipv6 module are supplied as parameters at load time.
Module options may be given as command line arguments to the insmod Module options may be given as command line arguments to the insmod
or modprobe command, but are usually specified in either or modprobe command, but are usually specified in either
/etc/modules.d/*.conf configuration files, or in a distro-specific ``/etc/modules.d/*.conf`` configuration files, or in a distro-specific
configuration file. configuration file.
The available ipv6 module parameters are listed below. If a parameter The available ipv6 module parameters are listed below. If a parameter
......
.. SPDX-License-Identifier: GPL-2.0
IPVLAN Driver HOWTO ===================
IPVLAN Driver HOWTO
===================
Initial Release: Initial Release:
Mahesh Bandewar <maheshb AT google.com> Mahesh Bandewar <maheshb AT google.com>
1. Introduction: 1. Introduction:
This is conceptually very similar to the macvlan driver with one major ================
This is conceptually very similar to the macvlan driver with one major
exception of using L3 for mux-ing /demux-ing among slaves. This property makes exception of using L3 for mux-ing /demux-ing among slaves. This property makes
the master device share the L2 with it's slave devices. I have developed this the master device share the L2 with it's slave devices. I have developed this
driver in conjunction with network namespaces and not sure if there is use case driver in conjunction with network namespaces and not sure if there is use case
...@@ -13,34 +17,48 @@ outside of it. ...@@ -13,34 +17,48 @@ outside of it.
2. Building and Installation: 2. Building and Installation:
In order to build the driver, please select the config item CONFIG_IPVLAN. =============================
In order to build the driver, please select the config item CONFIG_IPVLAN.
The driver can be built into the kernel (CONFIG_IPVLAN=y) or as a module The driver can be built into the kernel (CONFIG_IPVLAN=y) or as a module
(CONFIG_IPVLAN=m). (CONFIG_IPVLAN=m).
3. Configuration: 3. Configuration:
There are no module parameters for this driver and it can be configured =================
There are no module parameters for this driver and it can be configured
using IProute2/ip utility. using IProute2/ip utility.
::
ip link add link <master> name <slave> type ipvlan [ mode MODE ] [ FLAGS ] ip link add link <master> name <slave> type ipvlan [ mode MODE ] [ FLAGS ]
where where
MODE: l3 (default) | l3s | l2 MODE: l3 (default) | l3s | l2
FLAGS: bridge (default) | private | vepa FLAGS: bridge (default) | private | vepa
e.g.
e.g.
(a) Following will create IPvlan link with eth0 as master in (a) Following will create IPvlan link with eth0 as master in
L3 bridge mode L3 bridge mode::
bash# ip link add link eth0 name ipvl0 type ipvlan
(b) This command will create IPvlan link in L2 bridge mode. bash# ip link add link eth0 name ipvl0 type ipvlan
bash# ip link add link eth0 name ipvl0 type ipvlan mode l2 bridge (b) This command will create IPvlan link in L2 bridge mode::
(c) This command will create an IPvlan device in L2 private mode.
bash# ip link add link eth0 name ipvlan type ipvlan mode l2 private bash# ip link add link eth0 name ipvl0 type ipvlan mode l2 bridge
(d) This command will create an IPvlan device in L2 vepa mode.
bash# ip link add link eth0 name ipvlan type ipvlan mode l2 vepa (c) This command will create an IPvlan device in L2 private mode::
bash# ip link add link eth0 name ipvlan type ipvlan mode l2 private
(d) This command will create an IPvlan device in L2 vepa mode::
bash# ip link add link eth0 name ipvlan type ipvlan mode l2 vepa
4. Operating modes: 4. Operating modes:
IPvlan has two modes of operation - L2 and L3. For a given master device, ===================
IPvlan has two modes of operation - L2 and L3. For a given master device,
you can select one of these two modes and all slaves on that master will you can select one of these two modes and all slaves on that master will
operate in the same (selected) mode. The RX mode is almost identical except operate in the same (selected) mode. The RX mode is almost identical except
that in L3 mode the slaves wont receive any multicast / broadcast traffic. that in L3 mode the slaves wont receive any multicast / broadcast traffic.
...@@ -48,39 +66,50 @@ L3 mode is more restrictive since routing is controlled from the other (mostly) ...@@ -48,39 +66,50 @@ L3 mode is more restrictive since routing is controlled from the other (mostly)
default namespace. default namespace.
4.1 L2 mode: 4.1 L2 mode:
In this mode TX processing happens on the stack instance attached to the ------------
In this mode TX processing happens on the stack instance attached to the
slave device and packets are switched and queued to the master device to send slave device and packets are switched and queued to the master device to send
out. In this mode the slaves will RX/TX multicast and broadcast (if applicable) out. In this mode the slaves will RX/TX multicast and broadcast (if applicable)
as well. as well.
4.2 L3 mode: 4.2 L3 mode:
In this mode TX processing up to L3 happens on the stack instance attached ------------
In this mode TX processing up to L3 happens on the stack instance attached
to the slave device and packets are switched to the stack instance of the to the slave device and packets are switched to the stack instance of the
master device for the L2 processing and routing from that instance will be master device for the L2 processing and routing from that instance will be
used before packets are queued on the outbound device. In this mode the slaves used before packets are queued on the outbound device. In this mode the slaves
will not receive nor can send multicast / broadcast traffic. will not receive nor can send multicast / broadcast traffic.
4.3 L3S mode: 4.3 L3S mode:
This is very similar to the L3 mode except that iptables (conn-tracking) -------------
This is very similar to the L3 mode except that iptables (conn-tracking)
works in this mode and hence it is L3-symmetric (L3s). This will have slightly less works in this mode and hence it is L3-symmetric (L3s). This will have slightly less
performance but that shouldn't matter since you are choosing this mode over plain-L3 performance but that shouldn't matter since you are choosing this mode over plain-L3
mode to make conn-tracking work. mode to make conn-tracking work.
5. Mode flags: 5. Mode flags:
At this time following mode flags are available ==============
At this time following mode flags are available
5.1 bridge: 5.1 bridge:
This is the default option. To configure the IPvlan port in this mode, -----------
This is the default option. To configure the IPvlan port in this mode,
user can choose to either add this option on the command-line or don't specify user can choose to either add this option on the command-line or don't specify
anything. This is the traditional mode where slaves can cross-talk among anything. This is the traditional mode where slaves can cross-talk among
themselves apart from talking through the master device. themselves apart from talking through the master device.
5.2 private: 5.2 private:
If this option is added to the command-line, the port is set in private ------------
If this option is added to the command-line, the port is set in private
mode. i.e. port won't allow cross communication between slaves. mode. i.e. port won't allow cross communication between slaves.
5.3 vepa: 5.3 vepa:
If this is added to the command-line, the port is set in VEPA mode. ---------
If this is added to the command-line, the port is set in VEPA mode.
i.e. port will offload switching functionality to the external entity as i.e. port will offload switching functionality to the external entity as
described in 802.1Qbg described in 802.1Qbg
Note: VEPA mode in IPvlan has limitations. IPvlan uses the mac-address of the Note: VEPA mode in IPvlan has limitations. IPvlan uses the mac-address of the
...@@ -89,18 +118,25 @@ neighbor will have source and destination mac same. This will make the switch / ...@@ -89,18 +118,25 @@ neighbor will have source and destination mac same. This will make the switch /
router send the redirect message. router send the redirect message.
6. What to choose (macvlan vs. ipvlan)? 6. What to choose (macvlan vs. ipvlan)?
These two devices are very similar in many regards and the specific use =======================================
These two devices are very similar in many regards and the specific use
case could very well define which device to choose. if one of the following case could very well define which device to choose. if one of the following
situations defines your use case then you can choose to use ipvlan - situations defines your use case then you can choose to use ipvlan:
(a) The Linux host that is connected to the external switch / router has
policy configured that allows only one mac per port.
(b) No of virtual devices created on a master exceed the mac capacity and (a) The Linux host that is connected to the external switch / router has
puts the NIC in promiscuous mode and degraded performance is a concern. policy configured that allows only one mac per port.
(c) If the slave device is to be put into the hostile / untrusted network (b) No of virtual devices created on a master exceed the mac capacity and
namespace where L2 on the slave could be changed / misused. puts the NIC in promiscuous mode and degraded performance is a concern.
(c) If the slave device is to be put into the hostile / untrusted network
namespace where L2 on the slave could be changed / misused.
6. Example configuration: 6. Example configuration:
=========================
::
+=============================================================+ +=============================================================+
| Host: host1 | | Host: host1 |
...@@ -117,30 +153,37 @@ namespace where L2 on the slave could be changed / misused. ...@@ -117,30 +153,37 @@ namespace where L2 on the slave could be changed / misused.
+==============================#==============================+ +==============================#==============================+
(a) Create two network namespaces - ns0, ns1 (a) Create two network namespaces - ns0, ns1::
ip netns add ns0
ip netns add ns1 ip netns add ns0
ip netns add ns1
(b) Create two ipvlan slaves on eth0 (master device)
ip link add link eth0 ipvl0 type ipvlan mode l2 (b) Create two ipvlan slaves on eth0 (master device)::
ip link add link eth0 ipvl1 type ipvlan mode l2
ip link add link eth0 ipvl0 type ipvlan mode l2
(c) Assign slaves to the respective network namespaces ip link add link eth0 ipvl1 type ipvlan mode l2
ip link set dev ipvl0 netns ns0
ip link set dev ipvl1 netns ns1 (c) Assign slaves to the respective network namespaces::
(d) Now switch to the namespace (ns0 or ns1) to configure the slave devices ip link set dev ipvl0 netns ns0
- For ns0 ip link set dev ipvl1 netns ns1
(1) ip netns exec ns0 bash
(2) ip link set dev ipvl0 up (d) Now switch to the namespace (ns0 or ns1) to configure the slave devices
(3) ip link set dev lo up
(4) ip -4 addr add 127.0.0.1 dev lo - For ns0::
(5) ip -4 addr add $IPADDR dev ipvl0
(6) ip -4 route add default via $ROUTER dev ipvl0 (1) ip netns exec ns0 bash
- For ns1 (2) ip link set dev ipvl0 up
(1) ip netns exec ns1 bash (3) ip link set dev lo up
(2) ip link set dev ipvl1 up (4) ip -4 addr add 127.0.0.1 dev lo
(3) ip link set dev lo up (5) ip -4 addr add $IPADDR dev ipvl0
(4) ip -4 addr add 127.0.0.1 dev lo (6) ip -4 route add default via $ROUTER dev ipvl0
(5) ip -4 addr add $IPADDR dev ipvl1
(6) ip -4 route add default via $ROUTER dev ipvl1 - For ns1::
(1) ip netns exec ns1 bash
(2) ip link set dev ipvl1 up
(3) ip link set dev lo up
(4) ip -4 addr add 127.0.0.1 dev lo
(5) ip -4 addr add $IPADDR dev ipvl1
(6) ip -4 route add default via $ROUTER dev ipvl1
.. SPDX-License-Identifier: GPL-2.0
===========
IPvs-sysctl
===========
/proc/sys/net/ipv4/vs/* Variables: /proc/sys/net/ipv4/vs/* Variables:
==================================
am_droprate - INTEGER am_droprate - INTEGER
default 10 default 10
It sets the always mode drop rate, which is used in the mode 3 It sets the always mode drop rate, which is used in the mode 3
of the drop_rate defense. of the drop_rate defense.
amemthresh - INTEGER amemthresh - INTEGER
default 1024 default 1024
It sets the available memory threshold (in pages), which is It sets the available memory threshold (in pages), which is
used in the automatic modes of defense. When there is no used in the automatic modes of defense. When there is no
enough available memory, the respective strategy will be enough available memory, the respective strategy will be
enabled and the variable is automatically set to 2, otherwise enabled and the variable is automatically set to 2, otherwise
the strategy is disabled and the variable is set to 1. the strategy is disabled and the variable is set to 1.
backup_only - BOOLEAN backup_only - BOOLEAN
0 - disabled (default) - 0 - disabled (default)
not 0 - enabled - not 0 - enabled
If set, disable the director function while the server is If set, disable the director function while the server is
in backup mode to avoid packet loops for DR/TUN methods. in backup mode to avoid packet loops for DR/TUN methods.
...@@ -44,8 +51,8 @@ conn_reuse_mode - INTEGER ...@@ -44,8 +51,8 @@ conn_reuse_mode - INTEGER
real servers to a very busy cluster. real servers to a very busy cluster.
conntrack - BOOLEAN conntrack - BOOLEAN
0 - disabled (default) - 0 - disabled (default)
not 0 - enabled - not 0 - enabled
If set, maintain connection tracking entries for If set, maintain connection tracking entries for
connections handled by IPVS. connections handled by IPVS.
...@@ -61,28 +68,28 @@ conntrack - BOOLEAN ...@@ -61,28 +68,28 @@ conntrack - BOOLEAN
Only available when IPVS is compiled with CONFIG_IP_VS_NFCT enabled. Only available when IPVS is compiled with CONFIG_IP_VS_NFCT enabled.
cache_bypass - BOOLEAN cache_bypass - BOOLEAN
0 - disabled (default) - 0 - disabled (default)
not 0 - enabled - not 0 - enabled
If it is enabled, forward packets to the original destination If it is enabled, forward packets to the original destination
directly when no cache server is available and destination directly when no cache server is available and destination
address is not local (iph->daddr is RTN_UNICAST). It is mostly address is not local (iph->daddr is RTN_UNICAST). It is mostly
used in transparent web cache cluster. used in transparent web cache cluster.
debug_level - INTEGER debug_level - INTEGER
0 - transmission error messages (default) - 0 - transmission error messages (default)
1 - non-fatal error messages - 1 - non-fatal error messages
2 - configuration - 2 - configuration
3 - destination trash - 3 - destination trash
4 - drop entry - 4 - drop entry
5 - service lookup - 5 - service lookup
6 - scheduling - 6 - scheduling
7 - connection new/expire, lookup and synchronization - 7 - connection new/expire, lookup and synchronization
8 - state transition - 8 - state transition
9 - binding destination, template checks and applications - 9 - binding destination, template checks and applications
10 - IPVS packet transmission - 10 - IPVS packet transmission
11 - IPVS packet handling (ip_vs_in/ip_vs_out) - 11 - IPVS packet handling (ip_vs_in/ip_vs_out)
12 or more - packet traversal - 12 or more - packet traversal
Only available when IPVS is compiled with CONFIG_IP_VS_DEBUG enabled. Only available when IPVS is compiled with CONFIG_IP_VS_DEBUG enabled.
...@@ -92,58 +99,58 @@ debug_level - INTEGER ...@@ -92,58 +99,58 @@ debug_level - INTEGER
the level. the level.
drop_entry - INTEGER drop_entry - INTEGER
0 - disabled (default) - 0 - disabled (default)
The drop_entry defense is to randomly drop entries in the The drop_entry defense is to randomly drop entries in the
connection hash table, just in order to collect back some connection hash table, just in order to collect back some
memory for new connections. In the current code, the memory for new connections. In the current code, the
drop_entry procedure can be activated every second, then it drop_entry procedure can be activated every second, then it
randomly scans 1/32 of the whole and drops entries that are in randomly scans 1/32 of the whole and drops entries that are in
the SYN-RECV/SYNACK state, which should be effective against the SYN-RECV/SYNACK state, which should be effective against
syn-flooding attack. syn-flooding attack.
The valid values of drop_entry are from 0 to 3, where 0 means The valid values of drop_entry are from 0 to 3, where 0 means
that this strategy is always disabled, 1 and 2 mean automatic that this strategy is always disabled, 1 and 2 mean automatic
modes (when there is no enough available memory, the strategy modes (when there is no enough available memory, the strategy
is enabled and the variable is automatically set to 2, is enabled and the variable is automatically set to 2,
otherwise the strategy is disabled and the variable is set to otherwise the strategy is disabled and the variable is set to
1), and 3 means that that the strategy is always enabled. 1), and 3 means that that the strategy is always enabled.
drop_packet - INTEGER drop_packet - INTEGER
0 - disabled (default) - 0 - disabled (default)
The drop_packet defense is designed to drop 1/rate packets The drop_packet defense is designed to drop 1/rate packets
before forwarding them to real servers. If the rate is 1, then before forwarding them to real servers. If the rate is 1, then
drop all the incoming packets. drop all the incoming packets.
The value definition is the same as that of the drop_entry. In The value definition is the same as that of the drop_entry. In
the automatic mode, the rate is determined by the follow the automatic mode, the rate is determined by the follow
formula: rate = amemthresh / (amemthresh - available_memory) formula: rate = amemthresh / (amemthresh - available_memory)
when available memory is less than the available memory when available memory is less than the available memory
threshold. When the mode 3 is set, the always mode drop rate threshold. When the mode 3 is set, the always mode drop rate
is controlled by the /proc/sys/net/ipv4/vs/am_droprate. is controlled by the /proc/sys/net/ipv4/vs/am_droprate.
expire_nodest_conn - BOOLEAN expire_nodest_conn - BOOLEAN
0 - disabled (default) - 0 - disabled (default)
not 0 - enabled - not 0 - enabled
The default value is 0, the load balancer will silently drop The default value is 0, the load balancer will silently drop
packets when its destination server is not available. It may packets when its destination server is not available. It may
be useful, when user-space monitoring program deletes the be useful, when user-space monitoring program deletes the
destination server (because of server overload or wrong destination server (because of server overload or wrong
detection) and add back the server later, and the connections detection) and add back the server later, and the connections
to the server can continue. to the server can continue.
If this feature is enabled, the load balancer will expire the If this feature is enabled, the load balancer will expire the
connection immediately when a packet arrives and its connection immediately when a packet arrives and its
destination server is not available, then the client program destination server is not available, then the client program
will be notified that the connection is closed. This is will be notified that the connection is closed. This is
equivalent to the feature some people requires to flush equivalent to the feature some people requires to flush
connections when its destination is not available. connections when its destination is not available.
expire_quiescent_template - BOOLEAN expire_quiescent_template - BOOLEAN
0 - disabled (default) - 0 - disabled (default)
not 0 - enabled - not 0 - enabled
When set to a non-zero value, the load balancer will expire When set to a non-zero value, the load balancer will expire
persistent templates when the destination server is quiescent. persistent templates when the destination server is quiescent.
...@@ -158,8 +165,8 @@ expire_quiescent_template - BOOLEAN ...@@ -158,8 +165,8 @@ expire_quiescent_template - BOOLEAN
connection and the destination server is quiescent. connection and the destination server is quiescent.
ignore_tunneled - BOOLEAN ignore_tunneled - BOOLEAN
0 - disabled (default) - 0 - disabled (default)
not 0 - enabled - not 0 - enabled
If set, ipvs will set the ipvs_property on all packets which are of If set, ipvs will set the ipvs_property on all packets which are of
unrecognized protocols. This prevents us from routing tunneled unrecognized protocols. This prevents us from routing tunneled
...@@ -168,30 +175,30 @@ ignore_tunneled - BOOLEAN ...@@ -168,30 +175,30 @@ ignore_tunneled - BOOLEAN
ipvs routing loops when ipvs is also acting as a real server). ipvs routing loops when ipvs is also acting as a real server).
nat_icmp_send - BOOLEAN nat_icmp_send - BOOLEAN
0 - disabled (default) - 0 - disabled (default)
not 0 - enabled - not 0 - enabled
It controls sending icmp error messages (ICMP_DEST_UNREACH) It controls sending icmp error messages (ICMP_DEST_UNREACH)
for VS/NAT when the load balancer receives packets from real for VS/NAT when the load balancer receives packets from real
servers but the connection entries don't exist. servers but the connection entries don't exist.
pmtu_disc - BOOLEAN pmtu_disc - BOOLEAN
0 - disabled - 0 - disabled
not 0 - enabled (default) - not 0 - enabled (default)
By default, reject with FRAG_NEEDED all DF packets that exceed By default, reject with FRAG_NEEDED all DF packets that exceed
the PMTU, irrespective of the forwarding method. For TUN method the PMTU, irrespective of the forwarding method. For TUN method
the flag can be disabled to fragment such packets. the flag can be disabled to fragment such packets.
secure_tcp - INTEGER secure_tcp - INTEGER
0 - disabled (default) - 0 - disabled (default)
The secure_tcp defense is to use a more complicated TCP state The secure_tcp defense is to use a more complicated TCP state
transition table. For VS/NAT, it also delays entering the transition table. For VS/NAT, it also delays entering the
TCP ESTABLISHED state until the three way handshake is completed. TCP ESTABLISHED state until the three way handshake is completed.
The value definition is the same as that of drop_entry and The value definition is the same as that of drop_entry and
drop_packet. drop_packet.
sync_threshold - vector of 2 INTEGERs: sync_threshold, sync_period sync_threshold - vector of 2 INTEGERs: sync_threshold, sync_period
default 3 50 default 3 50
...@@ -248,8 +255,8 @@ sync_ports - INTEGER ...@@ -248,8 +255,8 @@ sync_ports - INTEGER
8848+sync_ports-1. 8848+sync_ports-1.
snat_reroute - BOOLEAN snat_reroute - BOOLEAN
0 - disabled - 0 - disabled
not 0 - enabled (default) - not 0 - enabled (default)
If enabled, recalculate the route of SNATed packets from If enabled, recalculate the route of SNATed packets from
realservers so that they are routed as if they originate from the realservers so that they are routed as if they originate from the
...@@ -270,6 +277,7 @@ sync_persist_mode - INTEGER ...@@ -270,6 +277,7 @@ sync_persist_mode - INTEGER
Controls the synchronisation of connections when using persistence Controls the synchronisation of connections when using persistence
0: All types of connections are synchronised 0: All types of connections are synchronised
1: Attempt to reduce the synchronisation traffic depending on 1: Attempt to reduce the synchronisation traffic depending on
the connection type. For persistent services avoid synchronisation the connection type. For persistent services avoid synchronisation
for normal connections, do it only for persistence templates. for normal connections, do it only for persistence templates.
......
.. SPDX-License-Identifier: GPL-2.0
=============================
Kernel Connection Multiplexor Kernel Connection Multiplexor
----------------------------- =============================
Kernel Connection Multiplexor (KCM) is a mechanism that provides a message based Kernel Connection Multiplexor (KCM) is a mechanism that provides a message based
interface over TCP for generic application protocols. With KCM an application interface over TCP for generic application protocols. With KCM an application
can efficiently send and receive application protocol messages over TCP using can efficiently send and receive application protocol messages over TCP using
datagram sockets. datagram sockets.
KCM implements an NxM multiplexor in the kernel as diagrammed below: KCM implements an NxM multiplexor in the kernel as diagrammed below::
+------------+ +------------+ +------------+ +------------+ +------------+ +------------+ +------------+ +------------+
| KCM socket | | KCM socket | | KCM socket | | KCM socket | | KCM socket | | KCM socket | | KCM socket | | KCM socket |
+------------+ +------------+ +------------+ +------------+ +------------+ +------------+ +------------+ +------------+
| | | | | | | |
+-----------+ | | +----------+ +-----------+ | | +----------+
| | | | | | | |
+----------------------------------+ +----------------------------------+
| Multiplexor | | Multiplexor |
+----------------------------------+ +----------------------------------+
| | | | | | | | | |
+---------+ | | | ------------+ +---------+ | | | ------------+
| | | | | | | | | |
+----------+ +----------+ +----------+ +----------+ +----------+ +----------+ +----------+ +----------+ +----------+ +----------+
| Psock | | Psock | | Psock | | Psock | | Psock | | Psock | | Psock | | Psock | | Psock | | Psock |
+----------+ +----------+ +----------+ +----------+ +----------+ +----------+ +----------+ +----------+ +----------+ +----------+
| | | | | | | | | |
+----------+ +----------+ +----------+ +----------+ +----------+ +----------+ +----------+ +----------+ +----------+ +----------+
| TCP sock | | TCP sock | | TCP sock | | TCP sock | | TCP sock | | TCP sock | | TCP sock | | TCP sock | | TCP sock | | TCP sock |
+----------+ +----------+ +----------+ +----------+ +----------+ +----------+ +----------+ +----------+ +----------+ +----------+
KCM sockets KCM sockets
----------- ===========
The KCM sockets provide the user interface to the multiplexor. All the KCM sockets The KCM sockets provide the user interface to the multiplexor. All the KCM sockets
bound to a multiplexor are considered to have equivalent function, and I/O bound to a multiplexor are considered to have equivalent function, and I/O
...@@ -37,7 +40,7 @@ operations in different sockets may be done in parallel without the need for ...@@ -37,7 +40,7 @@ operations in different sockets may be done in parallel without the need for
synchronization between threads in userspace. synchronization between threads in userspace.
Multiplexor Multiplexor
----------- ===========
The multiplexor provides the message steering. In the transmit path, messages The multiplexor provides the message steering. In the transmit path, messages
written on a KCM socket are sent atomically on an appropriate TCP socket. written on a KCM socket are sent atomically on an appropriate TCP socket.
...@@ -45,14 +48,14 @@ Similarly, in the receive path, messages are constructed on each TCP socket ...@@ -45,14 +48,14 @@ Similarly, in the receive path, messages are constructed on each TCP socket
(Psock) and complete messages are steered to a KCM socket. (Psock) and complete messages are steered to a KCM socket.
TCP sockets & Psocks TCP sockets & Psocks
-------------------- ====================
TCP sockets may be bound to a KCM multiplexor. A Psock structure is allocated TCP sockets may be bound to a KCM multiplexor. A Psock structure is allocated
for each bound TCP socket, this structure holds the state for constructing for each bound TCP socket, this structure holds the state for constructing
messages on receive as well as other connection specific information for KCM. messages on receive as well as other connection specific information for KCM.
Connected mode semantics Connected mode semantics
------------------------ ========================
Each multiplexor assumes that all attached TCP connections are to the same Each multiplexor assumes that all attached TCP connections are to the same
destination and can use the different connections for load balancing when destination and can use the different connections for load balancing when
...@@ -60,7 +63,7 @@ transmitting. The normal send and recv calls (include sendmmsg and recvmmsg) ...@@ -60,7 +63,7 @@ transmitting. The normal send and recv calls (include sendmmsg and recvmmsg)
can be used to send and receive messages from the KCM socket. can be used to send and receive messages from the KCM socket.
Socket types Socket types
------------ ============
KCM supports SOCK_DGRAM and SOCK_SEQPACKET socket types. KCM supports SOCK_DGRAM and SOCK_SEQPACKET socket types.
...@@ -110,23 +113,23 @@ User interface ...@@ -110,23 +113,23 @@ User interface
Creating a multiplexor Creating a multiplexor
---------------------- ----------------------
A new multiplexor and initial KCM socket is created by a socket call: A new multiplexor and initial KCM socket is created by a socket call::
socket(AF_KCM, type, protocol) socket(AF_KCM, type, protocol)
- type is either SOCK_DGRAM or SOCK_SEQPACKET - type is either SOCK_DGRAM or SOCK_SEQPACKET
- protocol is KCMPROTO_CONNECTED - protocol is KCMPROTO_CONNECTED
Cloning KCM sockets Cloning KCM sockets
------------------- -------------------
After the first KCM socket is created using the socket call as described After the first KCM socket is created using the socket call as described
above, additional sockets for the multiplexor can be created by cloning above, additional sockets for the multiplexor can be created by cloning
a KCM socket. This is accomplished by an ioctl on a KCM socket: a KCM socket. This is accomplished by an ioctl on a KCM socket::
/* From linux/kcm.h */ /* From linux/kcm.h */
struct kcm_clone { struct kcm_clone {
int fd; int fd;
}; };
struct kcm_clone info; struct kcm_clone info;
...@@ -142,11 +145,11 @@ Attach transport sockets ...@@ -142,11 +145,11 @@ Attach transport sockets
------------------------ ------------------------
Attaching of transport sockets to a multiplexor is performed by calling an Attaching of transport sockets to a multiplexor is performed by calling an
ioctl on a KCM socket for the multiplexor. e.g.: ioctl on a KCM socket for the multiplexor. e.g.::
/* From linux/kcm.h */ /* From linux/kcm.h */
struct kcm_attach { struct kcm_attach {
int fd; int fd;
int bpf_fd; int bpf_fd;
}; };
...@@ -160,18 +163,19 @@ ioctl on a KCM socket for the multiplexor. e.g.: ...@@ -160,18 +163,19 @@ ioctl on a KCM socket for the multiplexor. e.g.:
ioctl(kcmfd, SIOCKCMATTACH, &info); ioctl(kcmfd, SIOCKCMATTACH, &info);
The kcm_attach structure contains: The kcm_attach structure contains:
fd: file descriptor for TCP socket being attached
bpf_prog_fd: file descriptor for compiled BPF program downloaded - fd: file descriptor for TCP socket being attached
- bpf_prog_fd: file descriptor for compiled BPF program downloaded
Unattach transport sockets Unattach transport sockets
-------------------------- --------------------------
Unattaching a transport socket from a multiplexor is straightforward. An Unattaching a transport socket from a multiplexor is straightforward. An
"unattach" ioctl is done with the kcm_unattach structure as the argument: "unattach" ioctl is done with the kcm_unattach structure as the argument::
/* From linux/kcm.h */ /* From linux/kcm.h */
struct kcm_unattach { struct kcm_unattach {
int fd; int fd;
}; };
struct kcm_unattach info; struct kcm_unattach info;
...@@ -190,7 +194,7 @@ When receive is disabled, any pending messages in the socket's ...@@ -190,7 +194,7 @@ When receive is disabled, any pending messages in the socket's
receive buffer are moved to other sockets. This feature is useful receive buffer are moved to other sockets. This feature is useful
if an application thread knows that it will be doing a lot of if an application thread knows that it will be doing a lot of
work on a request and won't be able to service new messages for a work on a request and won't be able to service new messages for a
while. Example use: while. Example use::
int val = 1; int val = 1;
...@@ -200,7 +204,7 @@ BFP programs for message delineation ...@@ -200,7 +204,7 @@ BFP programs for message delineation
------------------------------------ ------------------------------------
BPF programs can be compiled using the BPF LLVM backend. For example, BPF programs can be compiled using the BPF LLVM backend. For example,
the BPF program for parsing Thrift is: the BPF program for parsing Thrift is::
#include "bpf.h" /* for __sk_buff */ #include "bpf.h" /* for __sk_buff */
#include "bpf_helpers.h" /* for load_word intrinsic */ #include "bpf_helpers.h" /* for load_word intrinsic */
...@@ -250,6 +254,7 @@ based on groups, or batches of messages, can be beneficial for performance. ...@@ -250,6 +254,7 @@ based on groups, or batches of messages, can be beneficial for performance.
On transmit, there are three ways an application can batch (pipeline) On transmit, there are three ways an application can batch (pipeline)
messages on a KCM socket. messages on a KCM socket.
1) Send multiple messages in a single sendmmsg. 1) Send multiple messages in a single sendmmsg.
2) Send a group of messages each with a sendmsg call, where all messages 2) Send a group of messages each with a sendmsg call, where all messages
except the last have MSG_BATCH in the flags of sendmsg call. except the last have MSG_BATCH in the flags of sendmsg call.
......
...@@ -99,7 +99,7 @@ treat the LocalTalk device like an ordinary Ethernet device, even if ...@@ -99,7 +99,7 @@ treat the LocalTalk device like an ordinary Ethernet device, even if
that's what it looks like to Netatalk. that's what it looks like to Netatalk.
Instead, you follow the same procedure as for doing IP in EtherTalk. Instead, you follow the same procedure as for doing IP in EtherTalk.
See Documentation/networking/ipddp.txt for more information about the See Documentation/networking/ipddp.rst for more information about the
kernel driver and userspace tools needed. kernel driver and userspace tools needed.
-------------------------------------- --------------------------------------
......
...@@ -1051,7 +1051,7 @@ for more information on hardware timestamps. ...@@ -1051,7 +1051,7 @@ for more information on hardware timestamps.
------------------------------------------------------------------------------- -------------------------------------------------------------------------------
- Packet sockets work well together with Linux socket filters, thus you also - Packet sockets work well together with Linux socket filters, thus you also
might want to have a look at Documentation/networking/filter.txt might want to have a look at Documentation/networking/filter.rst
-------------------------------------------------------------------------------- --------------------------------------------------------------------------------
+ THANKS + THANKS
......
...@@ -792,7 +792,7 @@ counters to indicate the ACK is skipped in which scenario. The ACK ...@@ -792,7 +792,7 @@ counters to indicate the ACK is skipped in which scenario. The ACK
would only be skipped if the received packet is either a SYN packet or would only be skipped if the received packet is either a SYN packet or
it has no data. it has no data.
.. _sysctl document: https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt .. _sysctl document: https://www.kernel.org/doc/Documentation/networking/ip-sysctl.rst
* TcpExtTCPACKSkippedSynRecv * TcpExtTCPACKSkippedSynRecv
......
...@@ -3192,7 +3192,7 @@ Q: https://patchwork.ozlabs.org/project/netdev/list/?delegate=77147 ...@@ -3192,7 +3192,7 @@ Q: https://patchwork.ozlabs.org/project/netdev/list/?delegate=77147
T: git git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git T: git git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git
T: git git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git T: git git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git
F: Documentation/bpf/ F: Documentation/bpf/
F: Documentation/networking/filter.txt F: Documentation/networking/filter.rst
F: arch/*/net/* F: arch/*/net/*
F: include/linux/bpf* F: include/linux/bpf*
F: include/linux/filter.h F: include/linux/filter.h
...@@ -4728,7 +4728,7 @@ DECnet NETWORK LAYER ...@@ -4728,7 +4728,7 @@ DECnet NETWORK LAYER
L: linux-decnet-user@lists.sourceforge.net L: linux-decnet-user@lists.sourceforge.net
S: Orphan S: Orphan
W: http://linux-decnet.sourceforge.net W: http://linux-decnet.sourceforge.net
F: Documentation/networking/decnet.txt F: Documentation/networking/decnet.rst
F: net/decnet/ F: net/decnet/
DECSTATION PLATFORM SUPPORT DECSTATION PLATFORM SUPPORT
...@@ -7815,7 +7815,7 @@ HUAWEI ETHERNET DRIVER ...@@ -7815,7 +7815,7 @@ HUAWEI ETHERNET DRIVER
M: Aviad Krawczyk <aviad.krawczyk@huawei.com> M: Aviad Krawczyk <aviad.krawczyk@huawei.com>
L: netdev@vger.kernel.org L: netdev@vger.kernel.org
S: Supported S: Supported
F: Documentation/networking/hinic.txt F: Documentation/networking/hinic.rst
F: drivers/net/ethernet/huawei/hinic/ F: drivers/net/ethernet/huawei/hinic/
HUGETLB FILESYSTEM HUGETLB FILESYSTEM
...@@ -8934,7 +8934,7 @@ L: lvs-devel@vger.kernel.org ...@@ -8934,7 +8934,7 @@ L: lvs-devel@vger.kernel.org
S: Maintained S: Maintained
T: git git://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next.git T: git git://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next.git
T: git git://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs.git T: git git://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs.git
F: Documentation/networking/ipvs-sysctl.txt F: Documentation/networking/ipvs-sysctl.rst
F: include/net/ip_vs.h F: include/net/ip_vs.h
F: include/uapi/linux/ip_vs.h F: include/uapi/linux/ip_vs.h
F: net/netfilter/ipvs/ F: net/netfilter/ipvs/
......
...@@ -306,7 +306,7 @@ config ATM_IA ...@@ -306,7 +306,7 @@ config ATM_IA
for more info about the cards. Say Y (or M to compile as a module for more info about the cards. Say Y (or M to compile as a module
named iphase) here if you have one of these cards. named iphase) here if you have one of these cards.
See the file <file:Documentation/networking/iphase.txt> for further See the file <file:Documentation/networking/iphase.rst> for further
details. details.
config ATM_IA_DEBUG config ATM_IA_DEBUG
...@@ -336,7 +336,7 @@ config ATM_FORE200E ...@@ -336,7 +336,7 @@ config ATM_FORE200E
on PCI and SBUS hosts. Say Y (or M to compile as a module on PCI and SBUS hosts. Say Y (or M to compile as a module
named fore_200e) here if you have one of these ATM adapters. named fore_200e) here if you have one of these ATM adapters.
See the file <file:Documentation/networking/fore200e.txt> for See the file <file:Documentation/networking/fore200e.rst> for
further details. further details.
config ATM_FORE200E_USE_TASKLET config ATM_FORE200E_USE_TASKLET
......
...@@ -50,7 +50,7 @@ config BONDING ...@@ -50,7 +50,7 @@ config BONDING
The driver supports multiple bonding modes to allow for both high The driver supports multiple bonding modes to allow for both high
performance and high availability operation. performance and high availability operation.
Refer to <file:Documentation/networking/bonding.txt> for more Refer to <file:Documentation/networking/bonding.rst> for more
information. information.
To compile this driver as a module, choose M here: the module To compile this driver as a module, choose M here: the module
...@@ -126,7 +126,7 @@ config EQUALIZER ...@@ -126,7 +126,7 @@ config EQUALIZER
Linux driver or with a Livingston Portmaster 2e. Linux driver or with a Livingston Portmaster 2e.
Say Y if you want this and read Say Y if you want this and read
<file:Documentation/networking/eql.txt>. You may also want to read <file:Documentation/networking/eql.rst>. You may also want to read
section 6.2 of the NET-3-HOWTO, available from section 6.2 of the NET-3-HOWTO, available from
<http://www.tldp.org/docs.html#howto>. <http://www.tldp.org/docs.html#howto>.
......
...@@ -59,7 +59,7 @@ config COPS ...@@ -59,7 +59,7 @@ config COPS
package. This driver is experimental, which means that it may not package. This driver is experimental, which means that it may not
work. This driver will only work if you choose "AppleTalk DDP" work. This driver will only work if you choose "AppleTalk DDP"
networking support, above. networking support, above.
Please read the file <file:Documentation/networking/cops.txt>. Please read the file <file:Documentation/networking/cops.rst>.
config COPS_DAYNA config COPS_DAYNA
bool "Dayna firmware support" bool "Dayna firmware support"
...@@ -86,7 +86,7 @@ config IPDDP ...@@ -86,7 +86,7 @@ config IPDDP
box is stuck on an AppleTalk only network) or decapsulate (e.g. if box is stuck on an AppleTalk only network) or decapsulate (e.g. if
you want your Linux box to act as an Internet gateway for a zoo of you want your Linux box to act as an Internet gateway for a zoo of
AppleTalk connected Macs). Please see the file AppleTalk connected Macs). Please see the file
<file:Documentation/networking/ipddp.txt> for more information. <file:Documentation/networking/ipddp.rst> for more information.
If you say Y here, the AppleTalk-IP support will be compiled into If you say Y here, the AppleTalk-IP support will be compiled into
the kernel. In this case, you can either use encapsulation or the kernel. In this case, you can either use encapsulation or
...@@ -107,4 +107,4 @@ config IPDDP_ENCAP ...@@ -107,4 +107,4 @@ config IPDDP_ENCAP
IP packets inside AppleTalk frames; this is useful if your Linux box IP packets inside AppleTalk frames; this is useful if your Linux box
is stuck on an AppleTalk network (which hopefully contains a is stuck on an AppleTalk network (which hopefully contains a
decapsulator somewhere). Please see decapsulator somewhere). Please see
<file:Documentation/networking/ipddp.txt> for more information. <file:Documentation/networking/ipddp.rst> for more information.
...@@ -9,7 +9,7 @@ menuconfig ARCNET ...@@ -9,7 +9,7 @@ menuconfig ARCNET
---help--- ---help---
If you have a network card of this type, say Y and check out the If you have a network card of this type, say Y and check out the
(arguably) beautiful poetry in (arguably) beautiful poetry in
<file:Documentation/networking/arcnet.txt>. <file:Documentation/networking/arcnet.rst>.
You need both this driver, and the driver for the particular ARCnet You need both this driver, and the driver for the particular ARCnet
chipset of your card. If you don't know, then it's probably a chipset of your card. If you don't know, then it's probably a
...@@ -28,7 +28,7 @@ config ARCNET_1201 ...@@ -28,7 +28,7 @@ config ARCNET_1201
arc0 device. You need to say Y here to communicate with arc0 device. You need to say Y here to communicate with
industry-standard RFC1201 implementations, like the arcether.com industry-standard RFC1201 implementations, like the arcether.com
packet driver or most DOS/Windows ODI drivers. Please read the packet driver or most DOS/Windows ODI drivers. Please read the
ARCnet documentation in <file:Documentation/networking/arcnet.txt> ARCnet documentation in <file:Documentation/networking/arcnet.rst>
for more information about using arc0. for more information about using arc0.
config ARCNET_1051 config ARCNET_1051
...@@ -42,7 +42,7 @@ config ARCNET_1051 ...@@ -42,7 +42,7 @@ config ARCNET_1051
industry-standard RFC1201 implementations, like the arcether.com industry-standard RFC1201 implementations, like the arcether.com
packet driver or most DOS/Windows ODI drivers. RFC1201 is included packet driver or most DOS/Windows ODI drivers. RFC1201 is included
automatically as the arc0 device. Please read the ARCnet automatically as the arc0 device. Please read the ARCnet
documentation in <file:Documentation/networking/arcnet.txt> for more documentation in <file:Documentation/networking/arcnet.rst> for more
information about using arc0e and arc0s. information about using arc0e and arc0s.
config ARCNET_RAW config ARCNET_RAW
......
...@@ -28,7 +28,7 @@ config CAIF_SPI_SLAVE ...@@ -28,7 +28,7 @@ config CAIF_SPI_SLAVE
The CAIF Link layer SPI Protocol driver for Slave SPI interface. The CAIF Link layer SPI Protocol driver for Slave SPI interface.
This driver implements a platform driver to accommodate for a This driver implements a platform driver to accommodate for a
platform specific SPI device. A sample CAIF SPI Platform device is platform specific SPI device. A sample CAIF SPI Platform device is
provided in <file:Documentation/networking/caif/spi_porting.txt>. provided in <file:Documentation/networking/caif/spi_porting.rst>.
config CAIF_SPI_SYNC config CAIF_SPI_SYNC
bool "Next command and length in start of frame" bool "Next command and length in start of frame"
......
...@@ -30,7 +30,7 @@ config 6PACK ...@@ -30,7 +30,7 @@ config 6PACK
Note that this driver is still experimental and might cause Note that this driver is still experimental and might cause
problems. For details about the features and the usage of the problems. For details about the features and the usage of the
driver, read <file:Documentation/networking/6pack.txt>. driver, read <file:Documentation/networking/6pack.rst>.
To compile this driver as a module, choose M here: the module To compile this driver as a module, choose M here: the module
will be called 6pack. will be called 6pack.
...@@ -127,7 +127,7 @@ config BAYCOM_SER_FDX ...@@ -127,7 +127,7 @@ config BAYCOM_SER_FDX
your serial interface chip. To configure the driver, use the sethdlc your serial interface chip. To configure the driver, use the sethdlc
utility available in the standard ax25 utilities package. For utility available in the standard ax25 utilities package. For
information on the modems, see <http://www.baycom.de/> and information on the modems, see <http://www.baycom.de/> and
<file:Documentation/networking/baycom.txt>. <file:Documentation/networking/baycom.rst>.
To compile this driver as a module, choose M here: the module To compile this driver as a module, choose M here: the module
will be called baycom_ser_fdx. This is recommended. will be called baycom_ser_fdx. This is recommended.
...@@ -145,7 +145,7 @@ config BAYCOM_SER_HDX ...@@ -145,7 +145,7 @@ config BAYCOM_SER_HDX
the driver, use the sethdlc utility available in the standard ax25 the driver, use the sethdlc utility available in the standard ax25
utilities package. For information on the modems, see utilities package. For information on the modems, see
<http://www.baycom.de/> and <http://www.baycom.de/> and
<file:Documentation/networking/baycom.txt>. <file:Documentation/networking/baycom.rst>.
To compile this driver as a module, choose M here: the module To compile this driver as a module, choose M here: the module
will be called baycom_ser_hdx. This is recommended. will be called baycom_ser_hdx. This is recommended.
...@@ -160,7 +160,7 @@ config BAYCOM_PAR ...@@ -160,7 +160,7 @@ config BAYCOM_PAR
par96 designs. To configure the driver, use the sethdlc utility par96 designs. To configure the driver, use the sethdlc utility
available in the standard ax25 utilities package. For information on available in the standard ax25 utilities package. For information on
the modems, see <http://www.baycom.de/> and the file the modems, see <http://www.baycom.de/> and the file
<file:Documentation/networking/baycom.txt>. <file:Documentation/networking/baycom.rst>.
To compile this driver as a module, choose M here: the module To compile this driver as a module, choose M here: the module
will be called baycom_par. This is recommended. will be called baycom_par. This is recommended.
...@@ -175,7 +175,7 @@ config BAYCOM_EPP ...@@ -175,7 +175,7 @@ config BAYCOM_EPP
designs. To configure the driver, use the sethdlc utility available designs. To configure the driver, use the sethdlc utility available
in the standard ax25 utilities package. For information on the in the standard ax25 utilities package. For information on the
modems, see <http://www.baycom.de/> and the file modems, see <http://www.baycom.de/> and the file
<file:Documentation/networking/baycom.txt>. <file:Documentation/networking/baycom.rst>.
To compile this driver as a module, choose M here: the module To compile this driver as a module, choose M here: the module
will be called baycom_epp. This is recommended. will be called baycom_epp. This is recommended.
......
...@@ -336,7 +336,7 @@ config DLCI ...@@ -336,7 +336,7 @@ config DLCI
To use frame relay, you need supporting hardware (called FRAD) and To use frame relay, you need supporting hardware (called FRAD) and
certain programs from the net-tools package as explained in certain programs from the net-tools package as explained in
<file:Documentation/networking/framerelay.txt>. <file:Documentation/networking/framerelay.rst>.
To compile this driver as a module, choose M here: the To compile this driver as a module, choose M here: the
module will be called dlci. module will be called dlci.
...@@ -361,7 +361,7 @@ config SDLA ...@@ -361,7 +361,7 @@ config SDLA
These are multi-protocol cards, but only Frame Relay is supported These are multi-protocol cards, but only Frame Relay is supported
by the driver at this time. Please read by the driver at this time. Please read
<file:Documentation/networking/framerelay.txt>. <file:Documentation/networking/framerelay.rst>.
To compile this driver as a module, choose M here: the To compile this driver as a module, choose M here: the
module will be called sdla. module will be called sdla.
......
...@@ -86,7 +86,7 @@ config INET ...@@ -86,7 +86,7 @@ config INET
"Sysctl support" below, you can change various aspects of the "Sysctl support" below, you can change various aspects of the
behavior of the TCP/IP code by writing to the (virtual) files in behavior of the TCP/IP code by writing to the (virtual) files in
/proc/sys/net/ipv4/*; the options are explained in the file /proc/sys/net/ipv4/*; the options are explained in the file
<file:Documentation/networking/ip-sysctl.txt>. <file:Documentation/networking/ip-sysctl.rst>.
Short answer: say Y. Short answer: say Y.
......
...@@ -16,7 +16,7 @@ config ATM ...@@ -16,7 +16,7 @@ config ATM
of your ATM card below. of your ATM card below.
Note that you need a set of user-space programs to actually make use Note that you need a set of user-space programs to actually make use
of ATM. See the file <file:Documentation/networking/atm.txt> for of ATM. See the file <file:Documentation/networking/atm.rst> for
further details. further details.
config ATM_CLIP config ATM_CLIP
......
...@@ -40,7 +40,7 @@ config AX25 ...@@ -40,7 +40,7 @@ config AX25
radio as well as information about how to configure an AX.25 port is radio as well as information about how to configure an AX.25 port is
contained in the AX25-HOWTO, available from contained in the AX25-HOWTO, available from
<http://www.tldp.org/docs.html#howto>. You might also want to <http://www.tldp.org/docs.html#howto>. You might also want to
check out the file <file:Documentation/networking/ax25.txt> in the check out the file <file:Documentation/networking/ax25.rst> in the
kernel source. More information about digital amateur radio in kernel source. More information about digital amateur radio in
general is on the WWW at general is on the WWW at
<http://www.tapr.org/>. <http://www.tapr.org/>.
...@@ -88,7 +88,7 @@ config NETROM ...@@ -88,7 +88,7 @@ config NETROM
users as well as information about how to configure an AX.25 port is users as well as information about how to configure an AX.25 port is
contained in the Linux Ham Wiki, available from contained in the Linux Ham Wiki, available from
<http://www.linux-ax25.org>. You also might want to check out the <http://www.linux-ax25.org>. You also might want to check out the
file <file:Documentation/networking/ax25.txt>. More information about file <file:Documentation/networking/ax25.rst>. More information about
digital amateur radio in general is on the WWW at digital amateur radio in general is on the WWW at
<http://www.tapr.org/>. <http://www.tapr.org/>.
...@@ -107,7 +107,7 @@ config ROSE ...@@ -107,7 +107,7 @@ config ROSE
users as well as information about how to configure an AX.25 port is users as well as information about how to configure an AX.25 port is
contained in the Linux Ham Wiki, available from contained in the Linux Ham Wiki, available from
<http://www.linux-ax25.org>. You also might want to check out the <http://www.linux-ax25.org>. You also might want to check out the
file <file:Documentation/networking/ax25.txt>. More information about file <file:Documentation/networking/ax25.rst>. More information about
digital amateur radio in general is on the WWW at digital amateur radio in general is on the WWW at
<http://www.tapr.org/>. <http://www.tapr.org/>.
......
...@@ -39,6 +39,6 @@ config CEPH_LIB_USE_DNS_RESOLVER ...@@ -39,6 +39,6 @@ config CEPH_LIB_USE_DNS_RESOLVER
be resolved using the CONFIG_DNS_RESOLVER facility. be resolved using the CONFIG_DNS_RESOLVER facility.
For information on how to use CONFIG_DNS_RESOLVER consult For information on how to use CONFIG_DNS_RESOLVER consult
Documentation/networking/dns_resolver.txt Documentation/networking/dns_resolver.rst
If unsure, say N. If unsure, say N.
...@@ -6,7 +6,7 @@ ...@@ -6,7 +6,7 @@
* Jamal Hadi Salim * Jamal Hadi Salim
* Alexey Kuznetsov, <kuznet@ms2.inr.ac.ru> * Alexey Kuznetsov, <kuznet@ms2.inr.ac.ru>
* *
* See Documentation/networking/gen_stats.txt * See Documentation/networking/gen_stats.rst
*/ */
#include <linux/types.h> #include <linux/types.h>
......
...@@ -15,7 +15,7 @@ config DECNET ...@@ -15,7 +15,7 @@ config DECNET
<http://linux-decnet.sourceforge.net/>. <http://linux-decnet.sourceforge.net/>.
More detailed documentation is available in More detailed documentation is available in
<file:Documentation/networking/decnet.txt>. <file:Documentation/networking/decnet.rst>.
Be sure to say Y to "/proc file system support" and "Sysctl support" Be sure to say Y to "/proc file system support" and "Sysctl support"
below when using DECnet, since you will need sysctl support to aid below when using DECnet, since you will need sysctl support to aid
...@@ -40,4 +40,4 @@ config DECNET_ROUTER ...@@ -40,4 +40,4 @@ config DECNET_ROUTER
filtering" option will be required for the forthcoming routing daemon filtering" option will be required for the forthcoming routing daemon
to work. to work.
See <file:Documentation/networking/decnet.txt> for more information. See <file:Documentation/networking/decnet.rst> for more information.
...@@ -19,7 +19,7 @@ config DNS_RESOLVER ...@@ -19,7 +19,7 @@ config DNS_RESOLVER
SMB2 later. DNS Resolver is supported by the userspace upcall SMB2 later. DNS Resolver is supported by the userspace upcall
helper "/sbin/dns.resolver" via /etc/request-key.conf. helper "/sbin/dns.resolver" via /etc/request-key.conf.
See <file:Documentation/networking/dns_resolver.txt> for further See <file:Documentation/networking/dns_resolver.rst> for further
information. information.
To compile this as a module, choose M here: the module will be called To compile this as a module, choose M here: the module will be called
......
/* Key type used to cache DNS lookups made by the kernel /* Key type used to cache DNS lookups made by the kernel
* *
* See Documentation/networking/dns_resolver.txt * See Documentation/networking/dns_resolver.rst
* *
* Copyright (c) 2007 Igor Mammedov * Copyright (c) 2007 Igor Mammedov
* Author(s): Igor Mammedov (niallain@gmail.com) * Author(s): Igor Mammedov (niallain@gmail.com)
......
/* Upcall routine, designed to work as a key type and working through /* Upcall routine, designed to work as a key type and working through
* /sbin/request-key to contact userspace when handling DNS queries. * /sbin/request-key to contact userspace when handling DNS queries.
* *
* See Documentation/networking/dns_resolver.txt * See Documentation/networking/dns_resolver.rst
* *
* Copyright (c) 2007 Igor Mammedov * Copyright (c) 2007 Igor Mammedov
* Author(s): Igor Mammedov (niallain@gmail.com) * Author(s): Igor Mammedov (niallain@gmail.com)
......
...@@ -49,7 +49,7 @@ config IP_ADVANCED_ROUTER ...@@ -49,7 +49,7 @@ config IP_ADVANCED_ROUTER
Note that some distributions enable it in startup scripts. Note that some distributions enable it in startup scripts.
For details about rp_filter strict and loose mode read For details about rp_filter strict and loose mode read
<file:Documentation/networking/ip-sysctl.txt>. <file:Documentation/networking/ip-sysctl.rst>.
If unsure, say N here. If unsure, say N here.
......
...@@ -853,7 +853,7 @@ static bool icmp_unreach(struct sk_buff *skb) ...@@ -853,7 +853,7 @@ static bool icmp_unreach(struct sk_buff *skb)
case ICMP_FRAG_NEEDED: case ICMP_FRAG_NEEDED:
/* for documentation of the ip_no_pmtu_disc /* for documentation of the ip_no_pmtu_disc
* values please see * values please see
* Documentation/networking/ip-sysctl.txt * Documentation/networking/ip-sysctl.rst
*/ */
switch (net->ipv4.sysctl_ip_no_pmtu_disc) { switch (net->ipv4.sysctl_ip_no_pmtu_disc) {
default: default:
......
...@@ -13,7 +13,7 @@ menuconfig IPV6 ...@@ -13,7 +13,7 @@ menuconfig IPV6
For general information about IPv6, see For general information about IPv6, see
<https://en.wikipedia.org/wiki/IPv6>. <https://en.wikipedia.org/wiki/IPv6>.
For specific information about IPv6 under Linux, see For specific information about IPv6 under Linux, see
Documentation/networking/ipv6.txt and read the HOWTO at Documentation/networking/ipv6.rst and read the HOWTO at
<http://www.tldp.org/HOWTO/Linux+IPv6-HOWTO/> <http://www.tldp.org/HOWTO/Linux+IPv6-HOWTO/>
To compile this protocol support as a module, choose M here: the To compile this protocol support as a module, choose M here: the
......
...@@ -11,7 +11,7 @@ ...@@ -11,7 +11,7 @@
* *
* How to get into it: * How to get into it:
* *
* 1) read Documentation/networking/filter.txt * 1) read Documentation/networking/filter.rst
* 2) Run `bpf_asm [-c] <filter-prog file>` to translate into binary * 2) Run `bpf_asm [-c] <filter-prog file>` to translate into binary
* blob that is loadable with xt_bpf, cls_bpf et al. Note: -c will * blob that is loadable with xt_bpf, cls_bpf et al. Note: -c will
* pretty print a C-like construct. * pretty print a C-like construct.
......
...@@ -13,7 +13,7 @@ ...@@ -13,7 +13,7 @@
* for making a verdict when multiple simple BPF programs are combined * for making a verdict when multiple simple BPF programs are combined
* into one in order to prevent parsing same headers multiple times. * into one in order to prevent parsing same headers multiple times.
* *
* More on how to debug BPF opcodes see Documentation/networking/filter.txt * More on how to debug BPF opcodes see Documentation/networking/filter.rst
* which is the main document on BPF. Mini howto for getting started: * which is the main document on BPF. Mini howto for getting started:
* *
* 1) `./bpf_dbg` to enter the shell (shell cmds denoted with '>'): * 1) `./bpf_dbg` to enter the shell (shell cmds denoted with '>'):
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment