Commit b4f94e3f authored by David S. Miller's avatar David S. Miller

Merge ssh://nuts.ninka.net//home/davem/src/BK/net-2.5

into kernel.bkbits.net:/home/davem/net-2.5
parents 52a150d8 545ac82c
...@@ -17,6 +17,23 @@ extreme-linux and beowulf sites will not work with this version of the driver. ...@@ -17,6 +17,23 @@ extreme-linux and beowulf sites will not work with this version of the driver.
For new versions of the driver, patches for older kernels and the updated For new versions of the driver, patches for older kernels and the updated
userspace tools, please follow the links at the end of this file. userspace tools, please follow the links at the end of this file.
Table of Contents
=================
Installation
Bond Configuration
Module Parameters
Configuring Multiple Bonds
Switch Configuration
Verifying Bond Configuration
Frequently Asked Questions
High Availability
Promiscuous Sniffing notes
Limitations
Resources and Links
Installation Installation
============ ============
...@@ -51,16 +68,21 @@ To install ifenslave.c, do: ...@@ -51,16 +68,21 @@ To install ifenslave.c, do:
# gcc -Wall -Wstrict-prototypes -O -I/usr/src/linux/include ifenslave.c -o ifenslave # gcc -Wall -Wstrict-prototypes -O -I/usr/src/linux/include ifenslave.c -o ifenslave
# cp ifenslave /sbin/ifenslave # cp ifenslave /sbin/ifenslave
3) Configure your system
------------------------ Bond Configuration
Also see the following section on the module parameters. You will need to add ==================
at least the following line to /etc/conf.modules (or /etc/modules.conf):
You will need to add at least the following line to /etc/modules.conf
so the bonding driver will automatically load when the bond0 interface is
configured. Refer to the modules.conf manual page for specific modules.conf
syntax details. The Module Parameters section of this document describes each
bonding driver parameter.
alias bond0 bonding alias bond0 bonding
Use standard distribution techniques to define bond0 network interface. For Use standard distribution techniques to define the bond0 network interface. For
example, on modern RedHat distributions, create ifcfg-bond0 file in example, on modern Red Hat distributions, create an ifcfg-bond0 file in
/etc/sysconfig/network-scripts directory that looks like this: the /etc/sysconfig/network-scripts directory that resembles the following:
DEVICE=bond0 DEVICE=bond0
IPADDR=192.168.1.1 IPADDR=192.168.1.1
...@@ -71,12 +93,12 @@ ONBOOT=yes ...@@ -71,12 +93,12 @@ ONBOOT=yes
BOOTPROTO=none BOOTPROTO=none
USERCTL=no USERCTL=no
(put the appropriate values for you network instead of 192.168.1). (use appropriate values for your network above)
All interfaces that are part of the trunk, should have SLAVE and MASTER All interfaces that are part of a bond should have SLAVE and MASTER
definitions. For example, in the case of RedHat, if you wish to make eth0 and definitions. For example, in the case of Red Hat, if you wish to make eth0 and
eth1 (or other interfaces) a part of the bonding interface bond0, their config eth1 a part of the bonding interface bond0, their config files (ifcfg-eth0 and
files (ifcfg-eth0, ifcfg-eth1, etc.) should look like this: ifcfg-eth1) should resemble the following:
DEVICE=eth0 DEVICE=eth0
USERCTL=no USERCTL=no
...@@ -85,89 +107,261 @@ MASTER=bond0 ...@@ -85,89 +107,261 @@ MASTER=bond0
SLAVE=yes SLAVE=yes
BOOTPROTO=none BOOTPROTO=none
(use DEVICE=eth1 for eth1 and MASTER=bond1 for bond1 if you have configured Use DEVICE=eth1 in the ifcfg-eth1 config file. If you configure a second bonding
second bonding interface). interface (bond1), use MASTER=bond1 in the config file to make the network
interface be a slave of bond1.
Restart the networking subsystem or just bring up the bonding device if your Restart the networking subsystem or just bring up the bonding device if your
administration tools allow it. Otherwise, reboot. (For the case of RedHat administration tools allow it. Otherwise, reboot. On Red Hat distros you can
distros, you can do `ifup bond0' or `/etc/rc.d/init.d/network restart'.) issue `ifup bond0' or `/etc/rc.d/init.d/network restart'.
If the administration tools of your distribution do not support master/slave If the administration tools of your distribution do not support master/slave
notation in configuration of network interfaces, you will need to configure notation in configuring network interfaces, you will need to manually configure
the bonding device with the following commands manually: the bonding device with the following commands:
# /sbin/ifconfig bond0 192.168.1.1 netmask 255.255.255.0 \
broadcast 192.168.1.255 up
# /sbin/ifconfig bond0 192.168.1.1 up
# /sbin/ifenslave bond0 eth0 # /sbin/ifenslave bond0 eth0
# /sbin/ifenslave bond0 eth1 # /sbin/ifenslave bond0 eth1
(substitute 192.168.1.1 with your IP address and add custom network and custom (use appropriate values for your network above)
netmask to the arguments of ifconfig if required).
You can then create a script with these commands and put it into the appropriate You can then create a script containing these commands and place it in the
rc directory. appropriate rc directory.
If you specifically need that all your network drivers are loaded before the If you specifically need all network drivers loaded before the bonding driver,
bonding driver, use one of modutils' powerful features : in your modules.conf, adding the following line to modules.conf will cause the network driver for
tell that when asked for bond0, modprobe should first load all your interfaces : eth0 and eth1 to be loaded before the bonding driver.
probeall bond0 eth0 eth1 bonding probeall bond0 eth0 eth1 bonding
Be careful not to reference bond0 itself at the end of the line, or modprobe will Be careful not to reference bond0 itself at the end of the line, or modprobe
die in an endless recursive loop. will die in an endless recursive loop.
4) Module parameters. To have device characteristics (such as MTU size) propagate to slave devices,
--------------------- set the bond characteristics before enslaving the device. The characteristics
The following module parameters can be passed: are propagated during the enslave process.
mode= If running SNMP agents, the bonding driver should be loaded before any network
drivers participating in a bond. This requirement is due to the the interface
Possible values are 0 (round robin policy, default) and 1 (active backup index (ipAdEntIfIndex) being associated to the first interface found with a
policy), and 2 (XOR). See question 9 and the HA section for additional info. given IP address. That is, there is only one ipAdEntIfIndex for each IP
address. For example, if eth0 and eth1 are slaves of bond0 and the driver for
miimon= eth0 is loaded before the bonding driver, the interface for the IP address
will be associated with the eth0 interface. This configuration is shown below,
Use integer value for the frequency (in ms) of MII link monitoring. Zero value the IP address 192.168.1.1 has an interface index of 2 which indexes to eth0
is default and means the link monitoring will be disabled. A good value is 100 in the ifDescr table (ifDescr.2).
if you wish to use link monitoring. See HA section for additional info.
interfaces.ifTable.ifEntry.ifDescr.1 = lo
downdelay= interfaces.ifTable.ifEntry.ifDescr.2 = eth0
interfaces.ifTable.ifEntry.ifDescr.3 = eth1
Use integer value for delaying disabling a link by this number (in ms) after interfaces.ifTable.ifEntry.ifDescr.4 = eth2
the link failure has been detected. Must be a multiple of miimon. Default interfaces.ifTable.ifEntry.ifDescr.5 = eth3
value is zero. See HA section for additional info. interfaces.ifTable.ifEntry.ifDescr.6 = bond0
ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.10.10.10.10 = 5
updelay= ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.192.168.1.1 = 2
ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.10.74.20.94 = 4
Use integer value for delaying enabling a link by this number (in ms) after ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.127.0.0.1 = 1
the "link up" status has been detected. Must be a multiple of miimon. Default
value is zero. See HA section for additional info. This problem is avoided by loading the bonding driver before any network
drivers participating in a bond. Below is an example of loading the bonding
driver first, the IP address 192.168.1.1 is correctly associated with ifDescr.2.
interfaces.ifTable.ifEntry.ifDescr.1 = lo
interfaces.ifTable.ifEntry.ifDescr.2 = bond0
interfaces.ifTable.ifEntry.ifDescr.3 = eth0
interfaces.ifTable.ifEntry.ifDescr.4 = eth1
interfaces.ifTable.ifEntry.ifDescr.5 = eth2
interfaces.ifTable.ifEntry.ifDescr.6 = eth3
ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.10.10.10.10 = 6
ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.192.168.1.1 = 2
ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.10.74.20.94 = 5
ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.127.0.0.1 = 1
While some distributions may not report the interface name in ifDescr,
the association between the IP address and IfIndex remains and SNMP
functions such as Interface_Scan_Next will report that association.
Module Parameters
=================
arp_interval= Optional parameters for the bonding driver can be supplied as command line
arguments to the insmod command. Typically, these parameters are specified in
the file /etc/modules.conf (see the manual page for modules.conf). The
available bonding driver parameters are listed below. If a parameter is not
specified the default value is used. When initially configuring a bond, it
is recommended "tail -f /var/log/messages" be run in a separate window to
watch for bonding driver error messages.
It is critical that either the miimon or arp_interval and arp_ip_target
parameters be specified, otherwise serious network degradation will occur
during link failures.
mode
Specifies one of four bonding policies. The default is round-robin.
Possible values are:
0 Round-robin policy: Transmit in a sequential order from the
first available slave through the last. This mode provides
load balancing and fault tolerance.
1 Active-backup policy: Only one slave in the bond is active. A
different slave becomes active if, and only if, the active slave
fails. The bond's MAC address is externally visible on only
one port (network adapter) to avoid confusing the switch.
This mode provides fault tolerance.
2 XOR policy: Transmit based on [(source MAC address XOR'd with
destination MAC address) modula slave count]. This selects the
same slave for each destination MAC address. This mode provides
load balancing and fault tolerance.
3 Broadcast policy: transmits everything on all slave interfaces.
This mode provides fault tolerance.
miimon
Specifies the frequency in milli-seconds that MII link monitoring will
occur. A value of zero disables MII link monitoring. A value of
100 is a good starting point. See High Availability section for
additional information. The default value is 0.
downdelay
Specifies the delay time in milli-seconds to disable a link after a
link failure has been detected. This should be a multiple of miimon
value, otherwise the value will be rounded. The default value is 0.
updelay
Specifies the delay time in milli-seconds to enable a link after a
link up status has been detected. This should be a multiple of miimon
value, otherwise the value will be rounded. The default value is 0.
arp_interval
Specifies the ARP monitoring frequency in milli-seconds.
If ARP monitoring is used in a load-balancing mode (mode 0 or 2), the
switch should be configured in a mode that evenly distributes packets
across all links - such as round-robin. If the switch is configured to
distribute the packets in an XOR fashion, all replies from the ARP
targets will be received on the same link which could cause the other
team members to fail. ARP monitoring should not be used in conjunction
with miimon. A value of 0 disables ARP monitoring. The default value
is 0.
arp_ip_target
Specifies the ip addresses to use when arp_interval is > 0. These are
the targets of the ARP request sent to determine the health of the link
to the targets. Specify these values in ddd.ddd.ddd.ddd format.
Multiple ip adresses must be seperated by a comma. At least one ip
address needs to be given for ARP monitoring to work. The maximum number
of targets that can be specified is set at 16.
primary
A string (eth0, eth2, etc) to equate to a primary device. If this
value is entered, and the device is on-line, it will be used first as
the output media. Only when this device is off-line, will alternate
devices be used. Otherwise, once a failover is detected and a new
default output is chosen, it will remain the output media until it too
fails. This is useful when one slave was preferred over another, i.e.
when one slave is 1000Mbps and another is 100Mbps. If the 1000Mbps
slave fails and is later restored, it may be preferred the faster slave
gracefully become the active slave - without deliberately failing the
100Mbps slave. Specifying a primary is only valid in active-backup mode.
multicast
Integer value for the mode of operation for multicast support.
Possible values are:
0 Disabled (no multicast support)
1 Enabled on active slave only, useful in active-backup mode
2 Enabled on all slaves, this is the default
Configuring Multiple Bonds
==========================
If several bonding interfaces are required, the driver must be loaded
multiple times. For example, to configure two bonding interfaces with link
monitoring performed every 100 milli-seconds, the /etc/conf.modules should
resemble the following:
Use integer value for the frequency (in ms) of arp monitoring. Zero value alias bond0 bonding
is default and means the arp monitoring will be disabled. See HA section alias bond1 bonding
for additional info. This field is value in active_backup mode only.
arp_ip_target= options bond0 miimon=100
options bond1 -o bonding1 miimon=100
An ip address to use when arp_interval is > 0. This is the target of the Configuring Multiple ARP Targets
arp request sent to determine the health of the link to the target. ================================
Specify this value in ddd.ddd.ddd.ddd format.
If you need to configure several bonding devices, the driver must be loaded While ARP monitoring can be done with just one target, it can be usefull
several times. I.e. for two bonding devices, your /etc/conf.modules must look in a High Availability setup to have several targets to monitor. In the
like this: case of just one target, the target itself may go down or have a problem
making it unresponsive to ARP requests. Having an additional target (or
several) would increase the reliability of the ARP monitoring.
Multiple ARP targets must be seperated by commas as follows:
# example options for ARP monitoring with three targets
alias bond0 bonding alias bond0 bonding
alias bond1 bonding options bond0 arp_interval=60 arp_ip_target=192.168.0.1,192.168.0.3,192.168.0.9
options bond0 miimon=100 For just a single target the options would resemble:
options bond1 -o bonding1 miimon=100
5) Testing configuration # example options for ARP monitoring with one target
------------------------ alias bond0 bonding
You can test the configuration and transmit policy with ifconfig. For example, options bond0 arp_interval=60 arp_ip_target=192.168.0.100
for round robin policy, you should get something like this:
Switch Configuration
====================
While the switch does not need to be configured when the active-backup
policy is used (mode=1), it does need to be configured for the round-robin,
XOR, and broadcast policies (mode=0, mode=2, and mode=3).
Verifying Bond Configuration
============================
1) Bonding information files
----------------------------
The bonding driver information files reside in the /proc/net/bond* directories.
Sample contents of /proc/net/bond0/info after the driver is loaded with
parameters of mode=0 and miimon=1000 is shown below.
Bonding Mode: load balancing (round-robin)
Currently Active Slave: eth0
MII Status: up
MII Polling Interval (ms): 1000
Up Delay (ms): 0
Down Delay (ms): 0
Slave Interface: eth1
MII Status: up
Link Failure Count: 1
Slave Interface: eth0
MII Status: up
Link Failure Count: 1
2) Network verification
-----------------------
The network configuration can be verified using the ifconfig command. In
the example below, the bond0 interface is the master (MASTER) while eth0 and
eth1 are slaves (SLAVE). Notice all slaves of bond0 have the same MAC address
(HWaddr) as bond0.
[root]# /sbin/ifconfig [root]# /sbin/ifconfig
bond0 Link encap:Ethernet HWaddr 00:C0:F0:1F:37:B4 bond0 Link encap:Ethernet HWaddr 00:C0:F0:1F:37:B4
...@@ -193,12 +387,13 @@ eth1 Link encap:Ethernet HWaddr 00:C0:F0:1F:37:B4 ...@@ -193,12 +387,13 @@ eth1 Link encap:Ethernet HWaddr 00:C0:F0:1F:37:B4
collisions:0 txqueuelen:100 collisions:0 txqueuelen:100
Interrupt:9 Base address:0x1400 Interrupt:9 Base address:0x1400
Questions :
=========== Frequently Asked Questions
==========================
1. Is it SMP safe? 1. Is it SMP safe?
Yes. The old 2.0.xx channel bonding patch was not SMP safe. Yes. The old 2.0.xx channel bonding patch was not SMP safe.
The new driver was designed to be SMP safe from the start. The new driver was designed to be SMP safe from the start.
2. What type of cards will work with it? 2. What type of cards will work with it?
...@@ -209,31 +404,30 @@ Questions : ...@@ -209,31 +404,30 @@ Questions :
3. How many bonding devices can I have? 3. How many bonding devices can I have?
One for each module you load. See section on module parameters for how One for each module you load. See section on Module Parameters for how
to accomplish this. to accomplish this.
4. How many slaves can a bonding device have? 4. How many slaves can a bonding device have?
Limited by the number of network interfaces Linux supports and the Limited by the number of network interfaces Linux supports and/or the
number of cards you can place in your system. number of network cards you can place in your system.
5. What happens when a slave link dies? 5. What happens when a slave link dies?
If your ethernet cards support MII status monitoring and the MII If your ethernet cards support MII or ETHTOOL link status monitoring
monitoring has been enabled in the driver (see description of module and the MII monitoring has been enabled in the driver (see description
parameters), there will be no adverse consequences. This release of module parameters), there will be no adverse consequences. This
of the bonding driver knows how to get the MII information and release of the bonding driver knows how to get the MII information and
enables or disables its slaves according to their link status. enables or disables its slaves according to their link status.
See section on HA for additional information. See section on High Availability for additional information.
For ethernet cards not supporting MII status, or if you wish to For ethernet cards not supporting MII status, the arp_interval and
verify that packets have been both send and received, you may arp_ip_target parameters must be specified for bonding to work
configure the arp_interval and arp_ip_target. If packets have correctly. If packets have not been sent or received during the
not been sent or received during this interval, an arp request specified arp_interval durration, an ARP request is sent to the targets
is sent to the target to generate send and receive traffic. to generate send and receive traffic. If after this interval, either
If after this interval, either the successful send and/or the successful send and/or receive count has not incremented, the next
receive count has not incremented, the next slave in the sequence slave in the sequence will become the active slave.
will become the active slave.
If neither mii_monitor and arp_interval is configured, the bonding If neither mii_monitor and arp_interval is configured, the bonding
driver will not handle this situation very well. The driver will driver will not handle this situation very well. The driver will
...@@ -245,11 +439,12 @@ Questions : ...@@ -245,11 +439,12 @@ Questions :
6. Can bonding be used for High Availability? 6. Can bonding be used for High Availability?
Yes, if you use MII monitoring and ALL your cards support MII link Yes, if you use MII monitoring and ALL your cards support MII link
status reporting. See section on HA for more information. status reporting. See section on High Availability for more information.
7. Which switches/systems does it work with? 7. Which switches/systems does it work with?
In round-robin mode, it works with systems that support trunking: In round-robin and XOR mode, it works with systems that support
trunking:
* Cisco 5500 series (look for EtherChannel support). * Cisco 5500 series (look for EtherChannel support).
* SunTrunking software. * SunTrunking software.
...@@ -259,7 +454,8 @@ Questions : ...@@ -259,7 +454,8 @@ Questions :
units. units.
* Linux bonding, of course ! * Linux bonding, of course !
In Active-backup mode, it should work with any Layer-II switches. In active-backup mode, it should work with any Layer-II switche.
8. Where does a bonding device get its MAC address from? 8. Where does a bonding device get its MAC address from?
...@@ -297,55 +493,68 @@ Questions : ...@@ -297,55 +493,68 @@ Questions :
9. Which transmit polices can be used? 9. Which transmit polices can be used?
Round robin, based on the order of enslaving, the output device Round-robin, based on the order of enslaving, the output device
is selected base on the next available slave. Regardless of is selected base on the next available slave. Regardless of
the source and/or destination of the packet. the source and/or destination of the packet.
XOR, based on (src hw addr XOR dst hw addr) % slave cnt. This
selects the same slave for each destination hw address.
Active-backup policy that ensures that one and only one device will Active-backup policy that ensures that one and only one device will
transmit at any given moment. Active-backup policy is useful for transmit at any given moment. Active-backup policy is useful for
implementing high availability solutions using two hubs (see implementing high availability solutions using two hubs (see
section on HA). section on High Availability).
High availability XOR, based on (src hw addr XOR dst hw addr) % slave count. This
================= policy selects the same slave for each destination hw address.
Broadcast policy transmits everything on all slave interfaces.
To implement high availability using the bonding driver, you need to
compile the driver as module because currently it is the only way to pass
parameters to the driver. This may change in the future.
High availability is achieved by using MII status reporting. You need to High Availability
verify that all your interfaces support MII link status reporting. On Linux =================
kernel 2.2.17, all the 100 Mbps capable drivers and yellowfin gigabit driver
support it. If your system has an interface that does not support MII status
reporting, a failure of its link will not be detected!
The bonding driver can regularly check all its slaves links by checking the To implement high availability using the bonding driver, the driver needs to be
MII status registers. The check interval is specified by the module argument compiled as a module, because currently it is the only way to pass parameters
"miimon" (MII monitoring). It takes an integer that represents the to the driver. This may change in the future.
checking time in milliseconds. It should not come to close to (1000/HZ)
(10 ms on i386) because it may then reduce the system interactivity. 100 ms High availability is achieved by using MII or ETHTOOL status reporting. You
seems to be a good value. It means that a dead link will be detected at most need to verify that all your interfaces support MII or ETHTOOL link status
100 ms after it goes down. reporting. On Linux kernel 2.2.17, all the 100 Mbps capable drivers and
yellowfin gigabit driver support MII. To determine if ETHTOOL link reporting
is available for interface eth0, type "ethtool eth0" and the "Link detected:"
line should contain the correct link status. If your system has an interface
that does not support MII or ETHTOOL status reporting, a failure of its link
will not be detected! A message indicating MII and ETHTOOL is not supported by
a network driver is logged when the bonding driver is loaded with a non-zero
miimon value.
The bonding driver can regularly check all its slaves links using the ETHTOOL
IOCTL (ETHTOOL_GLINK command) or by checking the MII status registers. The
check interval is specified by the module argument "miimon" (MII monitoring).
It takes an integer that represents the checking time in milliseconds. It
should not come to close to (1000/HZ) (10 milli-seconds on i386) because it
may then reduce the system interactivity. A value of 100 seems to be a good
starting point. It means that a dead link will be detected at most 100
milli-seconds after it goes down.
Example: Example:
# modprobe bonding miimon=100 # modprobe bonding miimon=100
Or, put in your /etc/modules.conf : Or, put the following lines in /etc/modules.conf:
alias bond0 bonding alias bond0 bonding
options bond0 miimon=100 options bond0 miimon=100
There are currently two policies for high availability, depending on whether There are currently two policies for high availability. They are dependent on
a) hosts are connected to a single host or switch that support trunking whether:
b) hosts are connected to several different switches or a single switch that
does not support trunking. a) hosts are connected to a single host or switch that support trunking
b) hosts are connected to several different switches or a single switch that
does not support trunking
1) HA on a single switch or host - load balancing
------------------------------------------------- 1) High Availability on a single switch or host - load balancing
----------------------------------------------------------------
It is the easiest to set up and to understand. Simply configure the It is the easiest to set up and to understand. Simply configure the
remote equipment (host or switch) to aggregate traffic over several remote equipment (host or switch) to aggregate traffic over several
ports (Trunk, EtherChannel, etc.) and configure the bonding interfaces. ports (Trunk, EtherChannel, etc.) and configure the bonding interfaces.
...@@ -356,7 +565,7 @@ encounter problems on some buggy switches that disable the trunk for a ...@@ -356,7 +565,7 @@ encounter problems on some buggy switches that disable the trunk for a
long time if all ports in a trunk go down. This is not Linux, but really long time if all ports in a trunk go down. This is not Linux, but really
the switch (reboot it to ensure). the switch (reboot it to ensure).
Example 1 : host to host at double speed Example 1 : host to host at twice the speed
+----------+ +----------+ +----------+ +----------+
| |eth0 eth0| | | |eth0 eth0| |
...@@ -370,7 +579,7 @@ Example 1 : host to host at double speed ...@@ -370,7 +579,7 @@ Example 1 : host to host at double speed
# ifconfig bond0 addr # ifconfig bond0 addr
# ifenslave bond0 eth0 eth1 # ifenslave bond0 eth0 eth1
Example 2 : host to switch at double speed Example 2 : host to switch at twice the speed
+----------+ +----------+ +----------+ +----------+
| |eth0 port1| | | |eth0 port1| |
...@@ -384,7 +593,9 @@ Example 2 : host to switch at double speed ...@@ -384,7 +593,9 @@ Example 2 : host to switch at double speed
# ifconfig bond0 addr and port2 # ifconfig bond0 addr and port2
# ifenslave bond0 eth0 eth1 # ifenslave bond0 eth0 eth1
2) HA on two or more switches (or a single switch without trunking support)
2) High Availability on two or more switches (or a single switch without
trunking support)
--------------------------------------------------------------------------- ---------------------------------------------------------------------------
This mode is more problematic because it relies on the fact that there This mode is more problematic because it relies on the fact that there
are multiple ports and the host's MAC address should be visible on one are multiple ports and the host's MAC address should be visible on one
...@@ -423,14 +634,14 @@ point of failure" solution. ...@@ -423,14 +634,14 @@ point of failure" solution.
+--------------+ host2 +----------------+ +--------------+ host2 +----------------+
eth0 +-------+ eth1 eth0 +-------+ eth1
In this configuration, there are an ISL - Inter Switch Link (could be a trunk), In this configuration, there is an ISL - Inter Switch Link (could be a trunk),
several servers (host1, host2 ...) attached to both switches each, and one or several servers (host1, host2 ...) attached to both switches each, and one or
more ports to the outside world (port3...). One an only one slave on each host more ports to the outside world (port3...). One an only one slave on each host
is active at a time, while all links are still monitored (the system can is active at a time, while all links are still monitored (the system can
detect a failure of active and backup links). detect a failure of active and backup links).
Each time a host changes its active interface, it sticks to the new one until Each time a host changes its active interface, it sticks to the new one until
it goes down. In this example, the hosts are not too much affected by the it goes down. In this example, the hosts are negligibly affected by the
expiration time of the switches' forwarding tables. expiration time of the switches' forwarding tables.
If host1 and host2 have the same functionality and are used in load balancing If host1 and host2 have the same functionality and are used in load balancing
...@@ -460,6 +671,7 @@ Each time the host changes its active interface, it sticks to the new one until ...@@ -460,6 +671,7 @@ Each time the host changes its active interface, it sticks to the new one until
it goes down. In this example, the host is strongly affected by the expiration it goes down. In this example, the host is strongly affected by the expiration
time of the switch forwarding table. time of the switch forwarding table.
3) Adapting to your switches' timing 3) Adapting to your switches' timing
------------------------------------ ------------------------------------
If your switches take a long time to go into backup mode, it may be If your switches take a long time to go into backup mode, it may be
...@@ -488,8 +700,34 @@ Examples : ...@@ -488,8 +700,34 @@ Examples :
# modprobe bonding miimon=100 mode=1 downdelay=2000 updelay=5000 # modprobe bonding miimon=100 mode=1 downdelay=2000 updelay=5000
# modprobe bonding miimon=100 mode=0 downdelay=0 updelay=5000 # modprobe bonding miimon=100 mode=0 downdelay=0 updelay=5000
4) Limitations
-------------- Promiscuous Sniffing notes
==========================
If you wish to bond channels together for a network sniffing
application --- you wish to run tcpdump, or ethereal, or an IDS like
snort, with its input aggregated from multiple interfaces using the
bonding driver --- then you need to handle the Promiscuous interface
setting by hand. Specifically, when you "ifconfing bond0 up" you
must add the promisc flag there; it will be propagated down to the
slave interfaces at ifenslave time; a full example might look like:
grep bond0 /etc/modules.conf || echo alias bond0 bonding >/etc/modules.conf
ifconfig bond0 promisc up
for if in eth1 eth2 ...;do
ifconfig $if up
ifenslave bond0 $if
done
snort ... -i bond0 ...
Ifenslave also wants to propagate addresses from interface to
interface, appropriately for its design functions in HA and channel
capacity aggregating; but it works fine for unnumbered interfaces;
just ignore all the warnings it emits.
Limitations
===========
The main limitations are : The main limitations are :
- only the link status is monitored. If the switch on the other side is - only the link status is monitored. If the switch on the other side is
partially down (e.g. doesn't forward anymore, but the link is OK), the link partially down (e.g. doesn't forward anymore, but the link is OK), the link
...@@ -500,7 +738,13 @@ The main limitations are : ...@@ -500,7 +738,13 @@ The main limitations are :
Use the arp_interval/arp_ip_target parameters to count incoming/outgoing Use the arp_interval/arp_ip_target parameters to count incoming/outgoing
frames. frames.
Resources and links - A Transmit Load Balancing policy is not currently available. This mode
allows every slave in the bond to transmit while only one receives. If
the "receiving" slave fails, another slave takes over the MAC address of
the failed receiving slave.
Resources and Links
=================== ===================
Current development on this driver is posted to: Current development on this driver is posted to:
......
...@@ -41,6 +41,16 @@ ...@@ -41,6 +41,16 @@
* - 2002/02/18 Erik Habbinga <erik_habbinga @ hp dot com> : * - 2002/02/18 Erik Habbinga <erik_habbinga @ hp dot com> :
* - ifr2.ifr_flags was not initialized in the hwaddr_notset case, * - ifr2.ifr_flags was not initialized in the hwaddr_notset case,
* SIOCGIFFLAGS now called before hwaddr_notset test * SIOCGIFFLAGS now called before hwaddr_notset test
*
* - 2002/10/31 Tony Cureington <tony.cureington * hp_com> :
* - If the master does not have a hardware address when the first slave
* is enslaved, the master is assigned the hardware address of that
* slave - there is a comment in bonding.c stating "ifenslave takes
* care of this now." This corrects the problem of slaves having
* different hardware addresses in active-backup mode when
* multiple interfaces are specified on a single ifenslave command
* (ifenslave bond0 eth0 eth1).
*
*/ */
static char *version = static char *version =
...@@ -131,6 +141,7 @@ main(int argc, char **argv) ...@@ -131,6 +141,7 @@ main(int argc, char **argv)
sa_family_t master_family; sa_family_t master_family;
char **spp, *master_ifname, *slave_ifname; char **spp, *master_ifname, *slave_ifname;
int hwaddr_notset; int hwaddr_notset;
int master_up;
while ((c = getopt_long(argc, argv, "acdfrvV?h", longopts, 0)) != EOF) while ((c = getopt_long(argc, argv, "acdfrvV?h", longopts, 0)) != EOF)
switch (c) { switch (c) {
...@@ -300,10 +311,86 @@ main(int argc, char **argv) ...@@ -300,10 +311,86 @@ main(int argc, char **argv)
return 1; return 1;
} }
if (hwaddr_notset) { /* we do nothing */ if (hwaddr_notset) {
/* assign the slave hw address to the
* master since it currently does not
* have one; otherwise, slaves may
* have different hw addresses in
* active-backup mode as seen when enslaving
* using "ifenslave bond0 eth0 eth1" because
* hwaddr_notset is set outside this loop.
* TODO: put this and the "else" portion in
* a function.
*/
goterr = 0;
master_up = 0;
if (if_flags.ifr_flags & IFF_UP) {
if_flags.ifr_flags &= ~IFF_UP;
if (ioctl(skfd, SIOCSIFFLAGS,
&if_flags) < 0) {
goterr = 1;
fprintf(stderr,
"Shutting down "
"interface %s failed: "
"%s\n",
master_ifname,
strerror(errno));
} else {
/* we took the master down,
* so we must bring it up
*/
master_up = 1;
}
}
} if (!goterr) {
else { /* we'll assign master's hwaddr to this slave */ /* get the slaves MAC address */
strncpy(if_hwaddr.ifr_name,
slave_ifname, IFNAMSIZ);
if (ioctl(skfd, SIOCGIFHWADDR,
&if_hwaddr) < 0) {
fprintf(stderr,
"Could not get MAC "
"address of %s: %s\n",
slave_ifname,
strerror(errno));
strncpy(if_hwaddr.ifr_name,
master_ifname,
IFNAMSIZ);
goterr=1;
}
}
if (!goterr) {
strncpy(if_hwaddr.ifr_name,
master_ifname, IFNAMSIZ);
if (ioctl(skfd, SIOCSIFHWADDR,
&if_hwaddr) < 0) {
fprintf(stderr,
"Could not set MAC "
"address of %s: %s\n",
master_ifname,
strerror(errno));
goterr=1;
} else {
hwaddr_notset = 0;
}
}
if (master_up) {
if_flags.ifr_flags |= IFF_UP;
if (ioctl(skfd, SIOCSIFFLAGS,
&if_flags) < 0) {
fprintf(stderr,
"Bringing up interface "
"%s failed: %s\n",
master_ifname,
strerror(errno));
}
}
} else {
/* we'll assign master's hwaddr to this slave */
if (ifr2.ifr_flags & IFF_UP) { if (ifr2.ifr_flags & IFF_UP) {
ifr2.ifr_flags &= ~IFF_UP; ifr2.ifr_flags &= ~IFF_UP;
if (ioctl(skfd, SIOCSIFFLAGS, &ifr2) < 0) { if (ioctl(skfd, SIOCSIFFLAGS, &ifr2) < 0) {
......
...@@ -1521,6 +1521,12 @@ M: Kai.Makisara@metla.fi ...@@ -1521,6 +1521,12 @@ M: Kai.Makisara@metla.fi
L: linux-scsi@vger.kernel.org L: linux-scsi@vger.kernel.org
S: Maintained S: Maintained
SCTP PROTOCOL
P: Jon Grimm
M: jgrimm2@us.ibm.com
L: lksctp-developers@lists.sourceforge.net
S: Supported
SCx200 CPU SUPPORT SCx200 CPU SUPPORT
P: Christer Weinigel P: Christer Weinigel
M: christer@weinigel.se M: christer@weinigel.se
......
...@@ -2028,9 +2028,11 @@ config PPPOE ...@@ -2028,9 +2028,11 @@ config PPPOE
help help
Support for PPP over Ethernet. Support for PPP over Ethernet.
This driver requires a specially patched pppd daemon. The patch to This driver requires the latest version of pppd from the CVS
pppd, along with binaries of a patched pppd package can be found at: repository at cvs.samba.org. Alternatively, see the
<http://www.shoshin.uwaterloo.ca/~mostrows/>. RoaringPenguin package (http://www.roaringpenguin.com/pppoe)
which contains instruction on how to use this driver (under
the heading "Kernel mode PPPoE").
config PPPOATM config PPPOATM
tristate "PPP over ATM" tristate "PPP over ATM"
......
...@@ -177,20 +177,91 @@ ...@@ -177,20 +177,91 @@
* - Port Gleb Natapov's multicast support patchs from 2.4.12 * - Port Gleb Natapov's multicast support patchs from 2.4.12
* to 2.4.18 adding support for multicast. * to 2.4.18 adding support for multicast.
* *
* 2002/06/17 - Tony Cureington <tony.cureington * hp_com> * 2002/06/10 - Tony Cureington <tony.cureington * hp_com>
* - corrected uninitialized pointer (ifr.ifr_data) in bond_check_dev_link; * - corrected uninitialized pointer (ifr.ifr_data) in bond_check_dev_link;
* actually changed function to use ETHTOOL, then MIIPHY, and finally * actually changed function to use MIIPHY, then MIIREG, and finally
* MIIREG to determine the link status * ETHTOOL to determine the link status
* - fixed bad ifr_data pointer assignments in bond_ioctl * - fixed bad ifr_data pointer assignments in bond_ioctl
* - corrected mode 1 being reported as active-backup in bond_get_info; * - corrected mode 1 being reported as active-backup in bond_get_info;
* also added text to distinguish type of load balancing (rr or xor) * also added text to distinguish type of load balancing (rr or xor)
* - change arp_ip_target module param from "1-12s" (array of 12 ptrs) * - change arp_ip_target module param from "1-12s" (array of 12 ptrs)
* to "s" (a single ptr) * to "s" (a single ptr)
* *
* 2002/08/30 - Jay Vosburgh <fubar at us dot ibm dot com>
* - Removed acquisition of xmit_lock in set_multicast_list; caused
* deadlock on SMP (lock is held by caller).
* - Revamped SIOCGMIIPHY, SIOCGMIIREG portion of bond_check_dev_link().
*
* 2002/09/18 - Jay Vosburgh <fubar at us dot ibm dot com> * 2002/09/18 - Jay Vosburgh <fubar at us dot ibm dot com>
* - Fixed up bond_check_dev_link() (and callers): removed some magic * - Fixed up bond_check_dev_link() (and callers): removed some magic
* numbers, banished local MII_ defines, wrapped ioctl calls to * numbers, banished local MII_ defines, wrapped ioctl calls to
* prevent EFAULT errors * prevent EFAULT errors
*
* 2002/9/30 - Jay Vosburgh <fubar at us dot ibm dot com>
* - make sure the ip target matches the arp_target before saving the
* hw address.
*
* 2002/9/30 - Dan Eisner <eisner at 2robots dot com>
* - make sure my_ip is set before taking down the link, since
* not all switches respond if the source ip is not set.
*
* 2002/10/8 - Janice Girouard <girouard at us dot ibm dot com>
* - read in the local ip address when enslaving a device
* - add primary support
* - make sure 2*arp_interval has passed when a new device
* is brought on-line before taking it down.
*
* 2002/09/11 - Philippe De Muyter <phdm at macqel dot be>
* - Added bond_xmit_broadcast logic.
* - Added bond_mode() support function.
*
* 2002/10/26 - Laurent Deniel <laurent.deniel at free.fr>
* - allow to register multicast addresses only on active slave
* (useful in active-backup mode)
* - add multicast module parameter
* - fix deletion of multicast groups after unloading module
*
* 2002/11/06 - Kameshwara Rayaprolu <kameshwara.rao * wipro_com>
* - Changes to prevent panic from closing the device twice; if we close
* the device in bond_release, we must set the original_flags to down
* so it won't be closed again by the network layer.
*
* 2002/11/07 - Tony Cureington <tony.cureington * hp_com>
* - Fix arp_target_hw_addr memory leak
* - Created activebackup_arp_monitor function to handle arp monitoring
* in active backup mode - the bond_arp_monitor had several problems...
* such as allowing slaves to tx arps sequentially without any delay
* for a response
* - Renamed bond_arp_monitor to loadbalance_arp_monitor and re-wrote
* this function to just handle arp monitoring in load-balancing mode;
* it is a lot more compact now
* - Changes to ensure one and only one slave transmits in active-backup
* mode
* - Robustesize parameters; warn users about bad combinations of
* parameters; also if miimon is specified and a network driver does
* not support MII or ETHTOOL, inform the user of this
* - Changes to support link_failure_count when in arp monitoring mode
* - Fix up/down delay reported in /proc
* - Added version; log version; make version available from "modinfo -d"
* - Fixed problem in bond_check_dev_link - if the first IOCTL (SIOCGMIIPH)
* failed, the ETHTOOL ioctl never got a chance
*
* 2002/11/16 - Laurent Deniel <laurent.deniel at free.fr>
* - fix multicast handling in activebackup_arp_monitor
* - remove one unnecessary and confusing current_slave == slave test
* in activebackup_arp_monitor
*
* 2002/11/17 - Laurent Deniel <laurent.deniel at free.fr>
* - fix bond_slave_info_query when slave_id = num_slaves
*
* 2002/11/19 - Janice Girouard <girouard at us dot ibm dot com>
* - correct ifr_data reference. Update ifr_data reference
* to mii_ioctl_data struct values to avoid confusion.
*
*
* 2002/11/22 - Bert Barbe <bert.barbe at oracle dot com>
* - Add support for multiple arp_ip_target
*
*/ */
#include <linux/config.h> #include <linux/config.h>
...@@ -201,6 +272,7 @@ ...@@ -201,6 +272,7 @@
#include <linux/interrupt.h> #include <linux/interrupt.h>
#include <linux/ioport.h> #include <linux/ioport.h>
#include <linux/in.h> #include <linux/in.h>
#include <linux/ip.h>
#include <linux/slab.h> #include <linux/slab.h>
#include <linux/string.h> #include <linux/string.h>
#include <linux/init.h> #include <linux/init.h>
...@@ -208,6 +280,7 @@ ...@@ -208,6 +280,7 @@
#include <linux/socket.h> #include <linux/socket.h>
#include <linux/errno.h> #include <linux/errno.h>
#include <linux/netdevice.h> #include <linux/netdevice.h>
#include <linux/inetdevice.h>
#include <linux/etherdevice.h> #include <linux/etherdevice.h>
#include <linux/skbuff.h> #include <linux/skbuff.h>
#include <net/sock.h> #include <net/sock.h>
...@@ -225,6 +298,13 @@ ...@@ -225,6 +298,13 @@
#include <asm/dma.h> #include <asm/dma.h>
#include <asm/uaccess.h> #include <asm/uaccess.h>
#define DRV_VERSION "2.4.20-20021210"
#define DRV_RELDATE "December 10, 2002"
#define DRV_NAME "bonding"
#define DRV_DESCRIPTION "Ethernet Channel Bonding Driver"
static const char *version =
DRV_NAME ".c:v" DRV_VERSION " (" DRV_RELDATE ")\n";
/* monitor all links that often (in milliseconds). <=0 disables monitoring */ /* monitor all links that often (in milliseconds). <=0 disables monitoring */
#ifndef BOND_LINK_MON_INTERV #ifndef BOND_LINK_MON_INTERV
...@@ -235,20 +315,31 @@ ...@@ -235,20 +315,31 @@
#define BOND_LINK_ARP_INTERV 0 #define BOND_LINK_ARP_INTERV 0
#endif #endif
#ifndef MAX_ARP_IP_TARGETS
#define MAX_ARP_IP_TARGETS 16
#endif
static int arp_interval = BOND_LINK_ARP_INTERV; static int arp_interval = BOND_LINK_ARP_INTERV;
static char *arp_ip_target = NULL; static char *arp_ip_target[MAX_ARP_IP_TARGETS] = { NULL, };
static unsigned long arp_target = 0; static unsigned long arp_target[MAX_ARP_IP_TARGETS] = { 0, } ;
static int arp_ip_count = 0;
static u32 my_ip = 0; static u32 my_ip = 0;
char *arp_target_hw_addr = NULL; char *arp_target_hw_addr = NULL;
static char *primary= NULL;
static int max_bonds = BOND_DEFAULT_MAX_BONDS; static int max_bonds = BOND_DEFAULT_MAX_BONDS;
static int miimon = BOND_LINK_MON_INTERV; static int miimon = BOND_LINK_MON_INTERV;
static int mode = BOND_MODE_ROUNDROBIN; static int mode = BOND_MODE_ROUNDROBIN;
static int updelay = 0; static int updelay = 0;
static int downdelay = 0; static int downdelay = 0;
#define BOND_MULTICAST_DISABLED 0
#define BOND_MULTICAST_ACTIVE 1
#define BOND_MULTICAST_ALL 2
static int multicast = BOND_MULTICAST_ALL;
static int first_pass = 1; static int first_pass = 1;
int bond_cnt;
static struct bonding *these_bonds = NULL; static struct bonding *these_bonds = NULL;
static struct net_device *dev_bonds = NULL; static struct net_device *dev_bonds = NULL;
...@@ -259,13 +350,17 @@ MODULE_PARM_DESC(miimon, "Link check interval in milliseconds"); ...@@ -259,13 +350,17 @@ MODULE_PARM_DESC(miimon, "Link check interval in milliseconds");
MODULE_PARM(mode, "i"); MODULE_PARM(mode, "i");
MODULE_PARM(arp_interval, "i"); MODULE_PARM(arp_interval, "i");
MODULE_PARM_DESC(arp_interval, "arp interval in milliseconds"); MODULE_PARM_DESC(arp_interval, "arp interval in milliseconds");
MODULE_PARM(arp_ip_target, "s"); MODULE_PARM(arp_ip_target, "1-" __MODULE_STRING(MAX_ARP_IP_TARGETS) "s");
MODULE_PARM_DESC(arp_ip_target, "arp target in n.n.n.n form"); MODULE_PARM_DESC(arp_ip_target, "arp targets in n.n.n.n form");
MODULE_PARM_DESC(mode, "Mode of operation : 0 for round robin, 1 for active-backup, 2 for xor"); MODULE_PARM_DESC(mode, "Mode of operation : 0 for round robin, 1 for active-backup, 2 for xor");
MODULE_PARM(updelay, "i"); MODULE_PARM(updelay, "i");
MODULE_PARM_DESC(updelay, "Delay before considering link up, in milliseconds"); MODULE_PARM_DESC(updelay, "Delay before considering link up, in milliseconds");
MODULE_PARM(downdelay, "i"); MODULE_PARM(downdelay, "i");
MODULE_PARM_DESC(downdelay, "Delay before considering link down, in milliseconds"); MODULE_PARM_DESC(downdelay, "Delay before considering link down, in milliseconds");
MODULE_PARM(primary, "s");
MODULE_PARM_DESC(primary, "Primary network device to use");
MODULE_PARM(multicast, "i");
MODULE_PARM_DESC(multicast, "Mode for multicast support : 0 for none, 1 for active slave, 2 for all slaves (default)");
extern void arp_send( int type, int ptype, u32 dest_ip, struct net_device *dev, extern void arp_send( int type, int ptype, u32 dest_ip, struct net_device *dev,
u32 src_ip, unsigned char *dest_hw, unsigned char *src_hw, u32 src_ip, unsigned char *dest_hw, unsigned char *src_hw,
...@@ -276,7 +371,8 @@ static int bond_xmit_xor(struct sk_buff *skb, struct net_device *dev); ...@@ -276,7 +371,8 @@ static int bond_xmit_xor(struct sk_buff *skb, struct net_device *dev);
static int bond_xmit_activebackup(struct sk_buff *skb, struct net_device *dev); static int bond_xmit_activebackup(struct sk_buff *skb, struct net_device *dev);
static struct net_device_stats *bond_get_stats(struct net_device *dev); static struct net_device_stats *bond_get_stats(struct net_device *dev);
static void bond_mii_monitor(struct net_device *dev); static void bond_mii_monitor(struct net_device *dev);
static void bond_arp_monitor(struct net_device *dev); static void loadbalance_arp_monitor(struct net_device *dev);
static void activebackup_arp_monitor(struct net_device *dev);
static int bond_event(struct notifier_block *this, unsigned long event, void *ptr); static int bond_event(struct notifier_block *this, unsigned long event, void *ptr);
static void bond_restore_slave_flags(slave_t *slave); static void bond_restore_slave_flags(slave_t *slave);
static void bond_mc_list_destroy(struct bonding *bond); static void bond_mc_list_destroy(struct bonding *bond);
...@@ -287,6 +383,7 @@ static inline int dmi_same(struct dev_mc_list *dmi1, struct dev_mc_list *dmi2); ...@@ -287,6 +383,7 @@ static inline int dmi_same(struct dev_mc_list *dmi1, struct dev_mc_list *dmi2);
static void bond_set_promiscuity(bonding_t *bond, int inc); static void bond_set_promiscuity(bonding_t *bond, int inc);
static void bond_set_allmulti(bonding_t *bond, int inc); static void bond_set_allmulti(bonding_t *bond, int inc);
static struct dev_mc_list* bond_mc_list_find_dmi(struct dev_mc_list *dmi, struct dev_mc_list *mc_list); static struct dev_mc_list* bond_mc_list_find_dmi(struct dev_mc_list *dmi, struct dev_mc_list *mc_list);
static void bond_mc_update(bonding_t *bond, slave_t *new, slave_t *old);
static void bond_set_slave_inactive_flags(slave_t *slave); static void bond_set_slave_inactive_flags(slave_t *slave);
static void bond_set_slave_active_flags(slave_t *slave); static void bond_set_slave_active_flags(slave_t *slave);
static int bond_enslave(struct net_device *master, struct net_device *slave); static int bond_enslave(struct net_device *master, struct net_device *slave);
...@@ -308,6 +405,47 @@ static int bond_get_info(char *buf, char **start, off_t offset, int length); ...@@ -308,6 +405,47 @@ static int bond_get_info(char *buf, char **start, off_t offset, int length);
#define IS_UP(dev) ((((dev)->flags & (IFF_UP)) == (IFF_UP)) && \ #define IS_UP(dev) ((((dev)->flags & (IFF_UP)) == (IFF_UP)) && \
(netif_running(dev) && netif_carrier_ok(dev))) (netif_running(dev) && netif_carrier_ok(dev)))
static void arp_send_all(slave_t *slave)
{
int i;
for ( i=0; (i<MAX_ARP_IP_TARGETS) && arp_target[i]; i++) {
arp_send(ARPOP_REQUEST, ETH_P_ARP, arp_target[i], slave->dev,
my_ip, arp_target_hw_addr, slave->dev->dev_addr,
arp_target_hw_addr);
}
}
static const char *bond_mode(void)
{
switch (mode) {
case BOND_MODE_ROUNDROBIN :
return "load balancing (round-robin)";
case BOND_MODE_ACTIVEBACKUP :
return "fault-tolerance (active-backup)";
case BOND_MODE_XOR :
return "load balancing (xor)";
case BOND_MODE_BROADCAST :
return "fault-tolerance (broadcast)";
default :
return "unknown";
}
}
static const char *multicast_mode(void)
{
switch(multicast) {
case BOND_MULTICAST_DISABLED :
return "disabled";
case BOND_MULTICAST_ACTIVE :
return "active slave only";
case BOND_MULTICAST_ALL :
return "all slaves";
default :
return "unknown";
}
}
static void bond_restore_slave_flags(slave_t *slave) static void bond_restore_slave_flags(slave_t *slave)
{ {
...@@ -415,11 +553,24 @@ static u16 bond_check_dev_link(struct net_device *dev) ...@@ -415,11 +553,24 @@ static u16 bond_check_dev_link(struct net_device *dev)
/* call it and not the others for that team */ /* call it and not the others for that team */
/* member. */ /* member. */
/* try SOICETHTOOL ioctl, some drivers cache ETHTOOL_GLINK */ /*
/* for a period of time; we need to encourage link status */ * We cannot assume that SIOCGMIIPHY will also read a
/* be reported by network drivers in real time; if the */ * register; not all network drivers (e.g., e100)
/* value is cached, the mmimon module parm may have no */ * support that.
/* effect... */ */
/* Yes, the mii is overlaid on the ifreq.ifr_ifru */
mii = (struct mii_ioctl_data *)&ifr.ifr_data;
if (IOCTL(dev, &ifr, SIOCGMIIPHY) == 0) {
mii->reg_num = MII_BMSR;
if (IOCTL(dev, &ifr, SIOCGMIIREG) == 0) {
return mii->val_out & BMSR_LSTATUS;
}
}
/* try SIOCETHTOOL ioctl, some drivers cache ETHTOOL_GLINK */
/* for a period of time so we attempt to get link status */
/* from it last if the above MII ioctls fail... */
etool.cmd = ETHTOOL_GLINK; etool.cmd = ETHTOOL_GLINK;
ifr.ifr_data = (char*)&etool; ifr.ifr_data = (char*)&etool;
if (IOCTL(dev, &ifr, SIOCETHTOOL) == 0) { if (IOCTL(dev, &ifr, SIOCETHTOOL) == 0) {
...@@ -427,26 +578,14 @@ static u16 bond_check_dev_link(struct net_device *dev) ...@@ -427,26 +578,14 @@ static u16 bond_check_dev_link(struct net_device *dev)
return BMSR_LSTATUS; return BMSR_LSTATUS;
} }
else { else {
#ifdef BONDING_DEBUG
printk(KERN_INFO
":: SIOCETHTOOL shows failure \n");
#endif
return(0); return(0);
} }
} }
/*
* We cannot assume that SIOCGMIIPHY will also read a
* register; not all network drivers support that.
*/
/* Yes, the mii is overlaid on the ifreq.ifr_ifru */
mii = (struct mii_ioctl_data *)&ifr.ifr_data;
if (IOCTL(dev, &ifr, SIOCGMIIPHY) != 0) {
return BMSR_LSTATUS; /* can't tell */
}
mii->reg_num = MII_BMSR;
if (IOCTL(dev, &ifr, SIOCGMIIREG) == 0) {
return mii->val_out & BMSR_LSTATUS;
}
} }
return BMSR_LSTATUS; /* spoof link up ( we can't check it) */ return BMSR_LSTATUS; /* spoof link up ( we can't check it) */
} }
...@@ -483,7 +622,11 @@ static int bond_open(struct net_device *dev) ...@@ -483,7 +622,11 @@ static int bond_open(struct net_device *dev)
init_timer(arp_timer); init_timer(arp_timer);
arp_timer->expires = jiffies + (arp_interval * HZ / 1000); arp_timer->expires = jiffies + (arp_interval * HZ / 1000);
arp_timer->data = (unsigned long)dev; arp_timer->data = (unsigned long)dev;
arp_timer->function = (void *)&bond_arp_monitor; if (mode == BOND_MODE_ACTIVEBACKUP) {
arp_timer->function = (void *)&activebackup_arp_monitor;
} else {
arp_timer->function = (void *)&loadbalance_arp_monitor;
}
add_timer(arp_timer); add_timer(arp_timer);
} }
return 0; return 0;
...@@ -501,6 +644,10 @@ static int bond_close(struct net_device *master) ...@@ -501,6 +644,10 @@ static int bond_close(struct net_device *master)
} }
if (arp_interval> 0) { /* arp interval, in milliseconds. */ if (arp_interval> 0) { /* arp interval, in milliseconds. */
del_timer(&bond->arp_timer); del_timer(&bond->arp_timer);
if (arp_target_hw_addr != NULL) {
kfree(arp_target_hw_addr);
arp_target_hw_addr = NULL;
}
} }
/* Release the bonded slaves */ /* Release the bonded slaves */
...@@ -545,9 +692,18 @@ static void bond_mc_list_destroy(struct bonding *bond) ...@@ -545,9 +692,18 @@ static void bond_mc_list_destroy(struct bonding *bond)
static void bond_mc_add(bonding_t *bond, void *addr, int alen) static void bond_mc_add(bonding_t *bond, void *addr, int alen)
{ {
slave_t *slave; slave_t *slave;
switch (multicast) {
for (slave = bond->prev; slave != (slave_t*)bond; slave = slave->prev) { case BOND_MULTICAST_ACTIVE :
dev_mc_add(slave->dev, addr, alen, 0); /* write lock already acquired */
if (bond->current_slave != NULL)
dev_mc_add(bond->current_slave->dev, addr, alen, 0);
break;
case BOND_MULTICAST_ALL :
for (slave = bond->prev; slave != (slave_t*)bond; slave = slave->prev)
dev_mc_add(slave->dev, addr, alen, 0);
break;
case BOND_MULTICAST_DISABLED :
break;
} }
} }
...@@ -557,9 +713,19 @@ static void bond_mc_add(bonding_t *bond, void *addr, int alen) ...@@ -557,9 +713,19 @@ static void bond_mc_add(bonding_t *bond, void *addr, int alen)
static void bond_mc_delete(bonding_t *bond, void *addr, int alen) static void bond_mc_delete(bonding_t *bond, void *addr, int alen)
{ {
slave_t *slave; slave_t *slave;
switch (multicast) {
for (slave = bond->prev; slave != (slave_t*)bond; slave = slave->prev) case BOND_MULTICAST_ACTIVE :
dev_mc_delete(slave->dev, addr, alen, 0); /* write lock already acquired */
if (bond->current_slave != NULL)
dev_mc_delete(bond->current_slave->dev, addr, alen, 0);
break;
case BOND_MULTICAST_ALL :
for (slave = bond->prev; slave != (slave_t*)bond; slave = slave->prev)
dev_mc_delete(slave->dev, addr, alen, 0);
break;
case BOND_MULTICAST_DISABLED :
break;
}
} }
/* /*
...@@ -603,9 +769,19 @@ static inline int dmi_same(struct dev_mc_list *dmi1, struct dev_mc_list *dmi2) ...@@ -603,9 +769,19 @@ static inline int dmi_same(struct dev_mc_list *dmi1, struct dev_mc_list *dmi2)
static void bond_set_promiscuity(bonding_t *bond, int inc) static void bond_set_promiscuity(bonding_t *bond, int inc)
{ {
slave_t *slave; slave_t *slave;
switch (multicast) {
for (slave = bond->prev; slave != (slave_t*)bond; slave = slave->prev) case BOND_MULTICAST_ACTIVE :
dev_set_promiscuity(slave->dev, inc); /* write lock already acquired */
if (bond->current_slave != NULL)
dev_set_promiscuity(bond->current_slave->dev, inc);
break;
case BOND_MULTICAST_ALL :
for (slave = bond->prev; slave != (slave_t*)bond; slave = slave->prev)
dev_set_promiscuity(slave->dev, inc);
break;
case BOND_MULTICAST_DISABLED :
break;
}
} }
/* /*
...@@ -614,9 +790,19 @@ static void bond_set_promiscuity(bonding_t *bond, int inc) ...@@ -614,9 +790,19 @@ static void bond_set_promiscuity(bonding_t *bond, int inc)
static void bond_set_allmulti(bonding_t *bond, int inc) static void bond_set_allmulti(bonding_t *bond, int inc)
{ {
slave_t *slave; slave_t *slave;
switch (multicast) {
for (slave = bond->prev; slave != (slave_t*)bond; slave = slave->prev) case BOND_MULTICAST_ACTIVE :
dev_set_allmulti(slave->dev, inc); /* write lock already acquired */
if (bond->current_slave != NULL)
dev_set_allmulti(bond->current_slave->dev, inc);
break;
case BOND_MULTICAST_ALL :
for (slave = bond->prev; slave != (slave_t*)bond; slave = slave->prev)
dev_set_allmulti(slave->dev, inc);
break;
case BOND_MULTICAST_DISABLED :
break;
}
} }
/* /*
...@@ -641,6 +827,8 @@ static void set_multicast_list(struct net_device *master) ...@@ -641,6 +827,8 @@ static void set_multicast_list(struct net_device *master)
struct dev_mc_list *dmi; struct dev_mc_list *dmi;
unsigned long flags = 0; unsigned long flags = 0;
if (multicast == BOND_MULTICAST_DISABLED)
return;
/* /*
* Lock the private data for the master * Lock the private data for the master
*/ */
...@@ -682,6 +870,43 @@ static void set_multicast_list(struct net_device *master) ...@@ -682,6 +870,43 @@ static void set_multicast_list(struct net_device *master)
write_unlock_irqrestore(&bond->lock, flags); write_unlock_irqrestore(&bond->lock, flags);
} }
/*
* Update the mc list and multicast-related flags for the new and
* old active slaves (if any) according to the multicast mode
*/
static void bond_mc_update(bonding_t *bond, slave_t *new, slave_t *old)
{
struct dev_mc_list *dmi;
switch(multicast) {
case BOND_MULTICAST_ACTIVE :
if (bond->device->flags & IFF_PROMISC) {
if (old != NULL && new != old)
dev_set_promiscuity(old->dev, -1);
dev_set_promiscuity(new->dev, 1);
}
if (bond->device->flags & IFF_ALLMULTI) {
if (old != NULL && new != old)
dev_set_allmulti(old->dev, -1);
dev_set_allmulti(new->dev, 1);
}
/* first remove all mc addresses from old slave if any,
and _then_ add them to new active slave */
if (old != NULL && new != old) {
for (dmi = bond->device->mc_list; dmi != NULL; dmi = dmi->next)
dev_mc_delete(old->dev, dmi->dmi_addr, dmi->dmi_addrlen, 0);
}
for (dmi = bond->device->mc_list; dmi != NULL; dmi = dmi->next)
dev_mc_add(new->dev, dmi->dmi_addr, dmi->dmi_addrlen, 0);
break;
case BOND_MULTICAST_ALL :
/* nothing to do: mc list is already up-to-date on all slaves */
break;
case BOND_MULTICAST_DISABLED :
break;
}
}
/* /*
* This function counts the number of attached * This function counts the number of attached
* slaves for use by bond_xmit_xor. * slaves for use by bond_xmit_xor.
...@@ -703,9 +928,16 @@ static int bond_enslave(struct net_device *master_dev, ...@@ -703,9 +928,16 @@ static int bond_enslave(struct net_device *master_dev,
bonding_t *bond = NULL; bonding_t *bond = NULL;
slave_t *new_slave = NULL; slave_t *new_slave = NULL;
unsigned long flags = 0; unsigned long flags = 0;
unsigned long rflags = 0;
int ndx = 0; int ndx = 0;
int err = 0; int err = 0;
struct dev_mc_list *dmi; struct dev_mc_list *dmi;
struct in_ifaddr **ifap;
struct in_ifaddr *ifa;
static int (* ioctl)(struct net_device *, struct ifreq *, int);
struct ifreq ifr;
struct ethtool_value etool;
int link_reporting = 0;
if (master_dev == NULL || slave_dev == NULL) { if (master_dev == NULL || slave_dev == NULL) {
return -ENODEV; return -ENODEV;
...@@ -758,17 +990,19 @@ static int bond_enslave(struct net_device *master_dev, ...@@ -758,17 +990,19 @@ static int bond_enslave(struct net_device *master_dev,
new_slave->dev = slave_dev; new_slave->dev = slave_dev;
/* set promiscuity level to new slave */ if (multicast == BOND_MULTICAST_ALL) {
if (master_dev->flags & IFF_PROMISC) /* set promiscuity level to new slave */
dev_set_promiscuity(slave_dev, 1); if (master_dev->flags & IFF_PROMISC)
dev_set_promiscuity(slave_dev, 1);
/* set allmulti level to new slave */ /* set allmulti level to new slave */
if (master_dev->flags & IFF_ALLMULTI) if (master_dev->flags & IFF_ALLMULTI)
dev_set_allmulti(slave_dev, 1); dev_set_allmulti(slave_dev, 1);
/* upload master's mc_list to new slave */ /* upload master's mc_list to new slave */
for (dmi = master_dev->mc_list; dmi != NULL; dmi = dmi->next) for (dmi = master_dev->mc_list; dmi != NULL; dmi = dmi->next)
dev_mc_add (slave_dev, dmi->dmi_addr, dmi->dmi_addrlen, 0); dev_mc_add (slave_dev, dmi->dmi_addr, dmi->dmi_addrlen, 0);
}
/* /*
* queue to the end of the slaves list, make the first element its * queue to the end of the slaves list, make the first element its
...@@ -799,6 +1033,56 @@ static int bond_enslave(struct net_device *master_dev, ...@@ -799,6 +1033,56 @@ static int bond_enslave(struct net_device *master_dev,
new_slave->delay = 0; new_slave->delay = 0;
new_slave->link_failure_count = 0; new_slave->link_failure_count = 0;
if (miimon > 0) {
/* if the network driver for the slave does not support
* ETHTOOL/MII link status reporting, warn the user of this
*/
if ((ioctl = slave_dev->do_ioctl) != NULL) {
etool.cmd = ETHTOOL_GLINK;
ifr.ifr_data = (char*)&etool;
if (IOCTL(slave_dev, &ifr, SIOCETHTOOL) == 0) {
link_reporting = 1;
}
else {
if (IOCTL(slave_dev, &ifr, SIOCGMIIPHY) == 0) {
/* Yes, the mii is overlaid on the
* ifreq.ifr_ifru
*/
((struct mii_ioctl_data*)
(&ifr.ifr_data))->reg_num = 1;
if (IOCTL(slave_dev, &ifr, SIOCGMIIREG)
== 0) {
link_reporting = 1;
}
}
}
}
if ((link_reporting == 0) && (arp_interval == 0)) {
/* miimon is set but a bonded network driver does
* not support ETHTOOL/MII and arp_interval is
* not set
*/
printk(KERN_ERR
"bond_enslave(): MII and ETHTOOL support not "
"available for interface %s, and "
"arp_interval/arp_ip_target module parameters "
"not specified, thus bonding will not detect "
"link failures! see bonding.txt for details.\n",
slave_dev->name);
}
else if (link_reporting == 0) {
/* unable get link status using mii/ethtool */
printk(KERN_WARNING
"bond_enslave: can't get link status from "
"interface %s; the network driver associated "
"with this interface does not support "
"MII or ETHTOOL link status reporting, thus "
"miimon has no effect on this interface.\n",
slave_dev->name);
}
}
/* check for initial state */ /* check for initial state */
if ((miimon <= 0) || if ((miimon <= 0) ||
(bond_check_dev_link(slave_dev) == BMSR_LSTATUS)) { (bond_check_dev_link(slave_dev) == BMSR_LSTATUS)) {
...@@ -806,6 +1090,7 @@ static int bond_enslave(struct net_device *master_dev, ...@@ -806,6 +1090,7 @@ static int bond_enslave(struct net_device *master_dev,
printk(KERN_CRIT "Initial state of slave_dev is BOND_LINK_UP\n"); printk(KERN_CRIT "Initial state of slave_dev is BOND_LINK_UP\n");
#endif #endif
new_slave->link = BOND_LINK_UP; new_slave->link = BOND_LINK_UP;
new_slave->jiffies = jiffies;
} }
else { else {
#ifdef BONDING_DEBUG #ifdef BONDING_DEBUG
...@@ -832,6 +1117,7 @@ static int bond_enslave(struct net_device *master_dev, ...@@ -832,6 +1117,7 @@ static int bond_enslave(struct net_device *master_dev,
is OK, so make this interface the active one */ is OK, so make this interface the active one */
bond->current_slave = new_slave; bond->current_slave = new_slave;
bond_set_slave_active_flags(new_slave); bond_set_slave_active_flags(new_slave);
bond_mc_update(bond, new_slave, NULL);
} }
else { else {
#ifdef BONDING_DEBUG #ifdef BONDING_DEBUG
...@@ -839,15 +1125,24 @@ static int bond_enslave(struct net_device *master_dev, ...@@ -839,15 +1125,24 @@ static int bond_enslave(struct net_device *master_dev,
#endif #endif
bond_set_slave_inactive_flags(new_slave); bond_set_slave_inactive_flags(new_slave);
} }
read_lock_irqsave(&(((struct in_device *)slave_dev->ip_ptr)->lock), rflags);
ifap= &(((struct in_device *)slave_dev->ip_ptr)->ifa_list);
ifa = *ifap;
my_ip = ifa->ifa_address;
read_unlock_irqrestore(&(((struct in_device *)slave_dev->ip_ptr)->lock), rflags);
/* if there is a primary slave, remember it */
if (primary != NULL)
if( strcmp(primary, new_slave->dev->name) == 0)
bond->primary_slave = new_slave;
} else { } else {
#ifdef BONDING_DEBUG #ifdef BONDING_DEBUG
printk(KERN_CRIT "This slave is always active in trunk mode\n"); printk(KERN_CRIT "This slave is always active in trunk mode\n");
#endif #endif
/* always active in trunk mode */ /* always active in trunk mode */
new_slave->state = BOND_STATE_ACTIVE; new_slave->state = BOND_STATE_ACTIVE;
if (bond->current_slave == NULL) { if (bond->current_slave == NULL)
bond->current_slave = new_slave; bond->current_slave = new_slave;
}
} }
update_slave_cnt(bond); update_slave_cnt(bond);
...@@ -938,6 +1233,7 @@ static int bond_change_active(struct net_device *master_dev, struct net_device * ...@@ -938,6 +1233,7 @@ static int bond_change_active(struct net_device *master_dev, struct net_device *
IS_UP(newactive->dev)) { IS_UP(newactive->dev)) {
bond_set_slave_inactive_flags(oldactive); bond_set_slave_inactive_flags(oldactive);
bond_set_slave_active_flags(newactive); bond_set_slave_active_flags(newactive);
bond_mc_update(bond, newactive, oldactive);
bond->current_slave = newactive; bond->current_slave = newactive;
printk("%s : activate %s(old : %s)\n", printk("%s : activate %s(old : %s)\n",
master_dev->name, newactive->dev->name, master_dev->name, newactive->dev->name,
...@@ -978,6 +1274,7 @@ slave_t *change_active_interface(bonding_t *bond) ...@@ -978,6 +1274,7 @@ slave_t *change_active_interface(bonding_t *bond)
newslave = bond->current_slave = bond->next; newslave = bond->current_slave = bond->next;
write_unlock(&bond->ptrlock); write_unlock(&bond->ptrlock);
} else { } else {
printk (" but could not find any %s interface.\n", printk (" but could not find any %s interface.\n",
(mode == BOND_MODE_ACTIVEBACKUP) ? "backup":"other"); (mode == BOND_MODE_ACTIVEBACKUP) ? "backup":"other");
write_lock(&bond->ptrlock); write_lock(&bond->ptrlock);
...@@ -985,16 +1282,38 @@ slave_t *change_active_interface(bonding_t *bond) ...@@ -985,16 +1282,38 @@ slave_t *change_active_interface(bonding_t *bond)
write_unlock(&bond->ptrlock); write_unlock(&bond->ptrlock);
return NULL; /* still no slave, return NULL */ return NULL; /* still no slave, return NULL */
} }
} else if (mode == BOND_MODE_ACTIVEBACKUP) {
/* make sure oldslave doesn't send arps - this could
* cause a ping-pong effect between interfaces since they
* would be able to tx arps - in active backup only one
* slave should be able to tx arps, and that should be
* the current_slave; the only exception is when all
* slaves have gone down, then only one non-current slave can
* send arps at a time; clearing oldslaves' mc list is handled
* later in this function.
*/
bond_set_slave_inactive_flags(oldslave);
} }
mintime = updelay; mintime = updelay;
/* first try the primary link; if arping, a link must tx/rx traffic
* before it can be considered the current_slave - also, we would skip
* slaves between the current_slave and primary_slave that may be up
* and able to arp
*/
if ((bond->primary_slave != NULL) && (arp_interval == 0)) {
if (IS_UP(bond->primary_slave->dev))
newslave = bond->primary_slave;
}
do { do {
if (IS_UP(newslave->dev)) { if (IS_UP(newslave->dev)) {
if (newslave->link == BOND_LINK_UP) { if (newslave->link == BOND_LINK_UP) {
/* this one is immediately usable */ /* this one is immediately usable */
if (mode == BOND_MODE_ACTIVEBACKUP) { if (mode == BOND_MODE_ACTIVEBACKUP) {
bond_set_slave_active_flags(newslave); bond_set_slave_active_flags(newslave);
bond_mc_update(bond, newslave, oldslave);
printk (" and making interface %s the active one.\n", printk (" and making interface %s the active one.\n",
newslave->dev->name); newslave->dev->name);
} }
...@@ -1030,14 +1349,30 @@ slave_t *change_active_interface(bonding_t *bond) ...@@ -1030,14 +1349,30 @@ slave_t *change_active_interface(bonding_t *bond)
bestslave->delay = 0; bestslave->delay = 0;
bestslave->link = BOND_LINK_UP; bestslave->link = BOND_LINK_UP;
bestslave->jiffies = jiffies;
bond_set_slave_active_flags(bestslave); bond_set_slave_active_flags(bestslave);
bond_mc_update(bond, bestslave, oldslave);
write_lock(&bond->ptrlock); write_lock(&bond->ptrlock);
bond->current_slave = bestslave; bond->current_slave = bestslave;
write_unlock(&bond->ptrlock); write_unlock(&bond->ptrlock);
return bestslave; return bestslave;
} }
if ((mode == BOND_MODE_ACTIVEBACKUP) &&
(multicast == BOND_MULTICAST_ACTIVE) &&
(oldslave != NULL)) {
/* flush bonds (master's) mc_list from oldslave since it wasn't
* updated (and deleted) above
*/
bond_mc_list_flush(oldslave->dev, bond->device);
if (bond->device->flags & IFF_PROMISC) {
dev_set_promiscuity(oldslave->dev, -1);
}
if (bond->device->flags & IFF_ALLMULTI) {
dev_set_allmulti(oldslave->dev, -1);
}
}
printk (" but could not find any %s interface.\n", printk (" but could not find any %s interface.\n",
(mode == BOND_MODE_ACTIVEBACKUP) ? "backup":"other"); (mode == BOND_MODE_ACTIVEBACKUP) ? "backup":"other");
...@@ -1081,6 +1416,7 @@ static int bond_release(struct net_device *master, struct net_device *slave) ...@@ -1081,6 +1416,7 @@ static int bond_release(struct net_device *master, struct net_device *slave)
return -EINVAL; return -EINVAL;
} }
bond->current_arp_slave = NULL;
our_slave = (slave_t *)bond; our_slave = (slave_t *)bond;
old_current = bond->current_slave; old_current = bond->current_slave;
while ((our_slave = our_slave->prev) != (slave_t *)bond) { while ((our_slave = our_slave->prev) != (slave_t *)bond) {
...@@ -1101,16 +1437,18 @@ static int bond_release(struct net_device *master, struct net_device *slave) ...@@ -1101,16 +1437,18 @@ static int bond_release(struct net_device *master, struct net_device *slave)
/* release the slave from its bond */ /* release the slave from its bond */
/* flush master's mc_list from slave */ if (multicast == BOND_MULTICAST_ALL) {
bond_mc_list_flush (slave, master); /* flush master's mc_list from slave */
bond_mc_list_flush (slave, master);
/* unset promiscuity level from slave */
if (master->flags & IFF_PROMISC) /* unset promiscuity level from slave */
dev_set_promiscuity(slave, -1); if (master->flags & IFF_PROMISC)
dev_set_promiscuity(slave, -1);
/* unset allmulti level from slave */ /* unset allmulti level from slave */
if (master->flags & IFF_ALLMULTI) if (master->flags & IFF_ALLMULTI)
dev_set_allmulti(slave, -1); dev_set_allmulti(slave, -1);
}
netdev_set_master(slave, NULL); netdev_set_master(slave, NULL);
...@@ -1122,6 +1460,7 @@ static int bond_release(struct net_device *master, struct net_device *slave) ...@@ -1122,6 +1460,7 @@ static int bond_release(struct net_device *master, struct net_device *slave)
if (slave->flags & IFF_NOARP || if (slave->flags & IFF_NOARP ||
bond->current_slave != NULL) { bond->current_slave != NULL) {
dev_close(slave); dev_close(slave);
our_slave->original_flags &= ~IFF_UP;
} }
bond_restore_slave_flags(our_slave); bond_restore_slave_flags(our_slave);
...@@ -1135,6 +1474,10 @@ static int bond_release(struct net_device *master, struct net_device *slave) ...@@ -1135,6 +1474,10 @@ static int bond_release(struct net_device *master, struct net_device *slave)
update_slave_cnt(bond); update_slave_cnt(bond);
if (bond->primary_slave == our_slave) {
bond->primary_slave = NULL;
}
write_unlock_irqrestore(&bond->lock, flags); write_unlock_irqrestore(&bond->lock, flags);
return 0; /* deletion OK */ return 0; /* deletion OK */
} }
...@@ -1166,12 +1509,28 @@ static int bond_release_all(struct net_device *master) ...@@ -1166,12 +1509,28 @@ static int bond_release_all(struct net_device *master)
} }
bond = (struct bonding *) master->priv; bond = (struct bonding *) master->priv;
bond->current_slave = NULL; bond->current_arp_slave = NULL;
while ((our_slave = bond->prev) != (slave_t *)bond) { while ((our_slave = bond->prev) != (slave_t *)bond) {
slave_dev = our_slave->dev; slave_dev = our_slave->dev;
bond->prev = our_slave->prev; bond->prev = our_slave->prev;
if (multicast == BOND_MULTICAST_ALL
|| (multicast == BOND_MULTICAST_ACTIVE
&& bond->current_slave == our_slave)) {
/* flush master's mc_list from slave */
bond_mc_list_flush (slave_dev, master);
/* unset promiscuity level from slave */
if (master->flags & IFF_PROMISC)
dev_set_promiscuity(slave_dev, -1);
/* unset allmulti level from slave */
if (master->flags & IFF_ALLMULTI)
dev_set_allmulti(slave_dev, -1);
}
kfree(our_slave); kfree(our_slave);
netdev_set_master(slave_dev, NULL); netdev_set_master(slave_dev, NULL);
...@@ -1183,9 +1542,12 @@ static int bond_release_all(struct net_device *master) ...@@ -1183,9 +1542,12 @@ static int bond_release_all(struct net_device *master)
if (slave_dev->flags & IFF_NOARP) if (slave_dev->flags & IFF_NOARP)
dev_close(slave_dev); dev_close(slave_dev);
} }
bond->current_slave = NULL;
bond->next = (slave_t *)bond; bond->next = (slave_t *)bond;
bond->slave_cnt = 0; bond->slave_cnt = 0;
printk (KERN_INFO "%s: releases all slaves\n", master->name); bond->primary_slave = NULL;
printk (KERN_INFO "%s: released all slaves\n", master->name);
return 0; return 0;
} }
...@@ -1291,6 +1653,7 @@ static void bond_mii_monitor(struct net_device *master) ...@@ -1291,6 +1653,7 @@ static void bond_mii_monitor(struct net_device *master)
} else { } else {
/* link up again */ /* link up again */
slave->link = BOND_LINK_UP; slave->link = BOND_LINK_UP;
slave->jiffies = jiffies;
printk(KERN_INFO printk(KERN_INFO
"%s: link status up again after %d ms " "%s: link status up again after %d ms "
"for interface %s.\n", "for interface %s.\n",
...@@ -1343,8 +1706,10 @@ static void bond_mii_monitor(struct net_device *master) ...@@ -1343,8 +1706,10 @@ static void bond_mii_monitor(struct net_device *master)
if (slave->delay == 0) { if (slave->delay == 0) {
/* now the link has been up for long time enough */ /* now the link has been up for long time enough */
slave->link = BOND_LINK_UP; slave->link = BOND_LINK_UP;
slave->jiffies = jiffies;
if (mode == BOND_MODE_ACTIVEBACKUP) { if ( (mode == BOND_MODE_ACTIVEBACKUP)
|| (slave != bond->primary_slave) ) {
/* prevent it from being the active one */ /* prevent it from being the active one */
slave->state = BOND_STATE_BACKUP; slave->state = BOND_STATE_BACKUP;
} }
...@@ -1358,15 +1723,26 @@ static void bond_mii_monitor(struct net_device *master) ...@@ -1358,15 +1723,26 @@ static void bond_mii_monitor(struct net_device *master)
"for interface %s.\n", "for interface %s.\n",
master->name, master->name,
dev->name); dev->name);
if ( (bond->primary_slave != NULL)
&& (slave == bond->primary_slave) )
change_active_interface(bond);
} }
else else
slave->delay--; slave->delay--;
/* we'll also look for the mostly eligible slave */ /* we'll also look for the mostly eligible slave */
if (IS_UP(dev) && (slave->delay < mindelay)) { if (bond->primary_slave == NULL) {
if (IS_UP(dev) && (slave->delay < mindelay)) {
mindelay = slave->delay;
bestslave = slave;
}
} else if ( (IS_UP(bond->primary_slave->dev)) ||
( (!IS_UP(bond->primary_slave->dev)) &&
(IS_UP(dev) && (slave->delay < mindelay)) ) ) {
mindelay = slave->delay; mindelay = slave->delay;
bestslave = slave; bestslave = slave;
} }
} }
break; break;
} /* end of switch */ } /* end of switch */
...@@ -1380,7 +1756,8 @@ static void bond_mii_monitor(struct net_device *master) ...@@ -1380,7 +1756,8 @@ static void bond_mii_monitor(struct net_device *master)
oldcurrent = bond->current_slave; oldcurrent = bond->current_slave;
read_unlock(&bond->ptrlock); read_unlock(&bond->ptrlock);
if (oldcurrent == NULL) { /* no active interface at the moment */ /* no active interface at the moment or need to bring up the primary */
if (oldcurrent == NULL) { /* no active interface at the moment */
if (bestslave != NULL) { /* last chance to find one ? */ if (bestslave != NULL) { /* last chance to find one ? */
if (bestslave->link == BOND_LINK_UP) { if (bestslave->link == BOND_LINK_UP) {
printk (KERN_INFO printk (KERN_INFO
...@@ -1395,10 +1772,12 @@ static void bond_mii_monitor(struct net_device *master) ...@@ -1395,10 +1772,12 @@ static void bond_mii_monitor(struct net_device *master)
bestslave->delay = 0; bestslave->delay = 0;
bestslave->link = BOND_LINK_UP; bestslave->link = BOND_LINK_UP;
bestslave->jiffies = jiffies;
} }
if (mode == BOND_MODE_ACTIVEBACKUP) { if (mode == BOND_MODE_ACTIVEBACKUP) {
bond_set_slave_active_flags(bestslave); bond_set_slave_active_flags(bestslave);
bond_mc_update(bond, bestslave, NULL);
} else { } else {
bestslave->state = BOND_STATE_ACTIVE; bestslave->state = BOND_STATE_ACTIVE;
} }
...@@ -1420,10 +1799,12 @@ static void bond_mii_monitor(struct net_device *master) ...@@ -1420,10 +1799,12 @@ static void bond_mii_monitor(struct net_device *master)
/* /*
* this function is called regularly to monitor each slave's link * this function is called regularly to monitor each slave's link
* insuring that traffic is being sent and received. If the adapter * ensuring that traffic is being sent and received when arp monitoring
* has been dormant, then an arp is transmitted to generate traffic * is used in load-balancing mode. if the adapter has been dormant, then an
* arp is transmitted to generate traffic. see activebackup_arp_monitor for
* arp monitoring in active backup mode.
*/ */
static void bond_arp_monitor(struct net_device *master) static void loadbalance_arp_monitor(struct net_device *master)
{ {
bonding_t *bond; bonding_t *bond;
unsigned long flags; unsigned long flags;
...@@ -1439,147 +1820,358 @@ static void bond_arp_monitor(struct net_device *master) ...@@ -1439,147 +1820,358 @@ static void bond_arp_monitor(struct net_device *master)
read_lock_irqsave(&bond->lock, flags); read_lock_irqsave(&bond->lock, flags);
if (!IS_UP(master)) { /* TODO: investigate why rtnl_shlock_nowait and rtnl_exlock_nowait
* are called below and add comment why they are required...
*/
if ((!IS_UP(master)) || rtnl_shlock_nowait()) {
mod_timer(&bond->arp_timer, next_timer); mod_timer(&bond->arp_timer, next_timer);
goto arp_monitor_out; read_unlock_irqrestore(&bond->lock, flags);
} return;
if (rtnl_shlock_nowait()) {
goto arp_monitor_out;
} }
if (rtnl_exlock_nowait()) { if (rtnl_exlock_nowait()) {
rtnl_shunlock(); rtnl_shunlock();
goto arp_monitor_out; mod_timer(&bond->arp_timer, next_timer);
read_unlock_irqrestore(&bond->lock, flags);
return;
} }
/* see if any of the previous devices are up now (i.e. they have seen a /* see if any of the previous devices are up now (i.e. they have
* response from an arp request sent by another adapter, since they * xmt and rcv traffic). the current_slave does not come into
* have the same hardware address). * the picture unless it is null. also, slave->jiffies is not needed
* here because we send an arp on each slave and give a slave as
* long as it needs to get the tx/rx within the delta.
* TODO: what about up/down delay in arp mode? it wasn't here before
* so it can wait
*/ */
slave = (slave_t *)bond; slave = (slave_t *)bond;
while ((slave = slave->prev) != (slave_t *)bond) { while ((slave = slave->prev) != (slave_t *)bond) {
read_lock(&bond->ptrlock); if (slave->link != BOND_LINK_UP) {
if ( (!(slave->link == BOND_LINK_UP))
&& (slave != bond->current_slave) ) {
read_unlock(&bond->ptrlock);
if ( ((jiffies - slave->dev->trans_start) <= if (((jiffies - slave->dev->trans_start) <=
the_delta_in_ticks) && the_delta_in_ticks) &&
((jiffies - slave->dev->last_rx) <= ((jiffies - slave->dev->last_rx) <=
the_delta_in_ticks) ) { the_delta_in_ticks)) {
slave->link = BOND_LINK_UP; slave->link = BOND_LINK_UP;
write_lock(&bond->ptrlock); slave->state = BOND_STATE_ACTIVE;
/* primary_slave has no meaning in round-robin
* mode. the window of a slave being up and
* current_slave being null after enslaving
* is closed.
*/
read_lock(&bond->ptrlock);
if (bond->current_slave == NULL) { if (bond->current_slave == NULL) {
slave->state = BOND_STATE_ACTIVE; read_unlock(&bond->ptrlock);
printk(KERN_INFO
"%s: link status definitely up "
"for interface %s, ",
master->name,
slave->dev->name);
change_active_interface(bond);
} else {
read_unlock(&bond->ptrlock);
printk(KERN_INFO
"%s: interface %s is now up\n",
master->name,
slave->dev->name);
}
}
} else {
/* slave->link == BOND_LINK_UP */
/* not all switches will respond to an arp request
* when the source ip is 0, so don't take the link down
* if we don't know our ip yet
*/
if (((jiffies - slave->dev->trans_start) >=
(2*the_delta_in_ticks)) ||
(((jiffies - slave->dev->last_rx) >=
(2*the_delta_in_ticks)) && my_ip !=0)) {
slave->link = BOND_LINK_DOWN;
slave->state = BOND_STATE_BACKUP;
if (slave->link_failure_count < UINT_MAX) {
slave->link_failure_count++;
}
printk(KERN_INFO
"%s: interface %s is now down.\n",
master->name,
slave->dev->name);
read_lock(&bond->ptrlock);
if (slave == bond->current_slave) {
read_unlock(&bond->ptrlock);
change_active_interface(bond);
} else {
read_unlock(&bond->ptrlock);
}
}
}
/* note: if switch is in round-robin mode, all links
* must tx arp to ensure all links rx an arp - otherwise
* links may oscillate or not come up at all; if switch is
* in something like xor mode, there is nothing we can
* do - all replies will be rx'ed on same link causing slaves
* to be unstable during low/no traffic periods
*/
if (IS_UP(slave->dev)) {
arp_send_all(slave);
}
}
rtnl_exunlock();
rtnl_shunlock();
read_unlock_irqrestore(&bond->lock, flags);
/* re-arm the timer */
mod_timer(&bond->arp_timer, next_timer);
}
/*
* When using arp monitoring in active-backup mode, this function is
* called to determine if any backup slaves have went down or a new
* current slave needs to be found.
* The backup slaves never generate traffic, they are considered up by merely
* receiving traffic. If the current slave goes down, each backup slave will
* be given the opportunity to tx/rx an arp before being taken down - this
* prevents all slaves from being taken down due to the current slave not
* sending any traffic for the backups to receive. The arps are not necessarily
* necessary, any tx and rx traffic will keep the current slave up. While any
* rx traffic will keep the backup slaves up, the current slave is responsible
* for generating traffic to keep them up regardless of any other traffic they
* may have received.
* see loadbalance_arp_monitor for arp monitoring in load balancing mode
*/
static void activebackup_arp_monitor(struct net_device *master)
{
bonding_t *bond;
unsigned long flags;
slave_t *slave;
int the_delta_in_ticks = arp_interval * HZ / 1000;
int next_timer = jiffies + (arp_interval * HZ / 1000);
bond = (struct bonding *) master->priv;
if (master->priv == NULL) {
mod_timer(&bond->arp_timer, next_timer);
return;
}
read_lock_irqsave(&bond->lock, flags);
if (!IS_UP(master)) {
mod_timer(&bond->arp_timer, next_timer);
read_unlock_irqrestore(&bond->lock, flags);
return;
}
/* determine if any slave has come up or any backup slave has
* gone down
* TODO: what about up/down delay in arp mode? it wasn't here before
* so it can wait
*/
slave = (slave_t *)bond;
while ((slave = slave->prev) != (slave_t *)bond) {
if (slave->link != BOND_LINK_UP) {
if ((jiffies - slave->dev->last_rx) <=
the_delta_in_ticks) {
slave->link = BOND_LINK_UP;
write_lock(&bond->ptrlock);
if ((bond->current_slave == NULL) &&
((jiffies - slave->dev->trans_start) <=
the_delta_in_ticks)) {
bond->current_slave = slave; bond->current_slave = slave;
bond_set_slave_active_flags(slave);
bond_mc_update(bond, slave, NULL);
bond->current_arp_slave = NULL;
} else if (bond->current_slave != slave) {
/* this slave has just come up but we
* already have a current slave; this
* can also happen if bond_enslave adds
* a new slave that is up while we are
* searching for a new slave
*/
bond_set_slave_inactive_flags(slave);
bond->current_arp_slave = NULL;
} }
if (slave != bond->current_slave) {
slave->dev->flags |= IFF_NOARP; if (slave == bond->current_slave) {
printk(KERN_INFO
"%s: %s is up and now the "
"active interface\n",
master->name,
slave->dev->name);
} else {
printk(KERN_INFO
"%s: backup interface %s is "
"now up\n",
master->name,
slave->dev->name);
} }
write_unlock(&bond->ptrlock); write_unlock(&bond->ptrlock);
} else { }
if ((jiffies - slave->dev->last_rx) <= } else {
the_delta_in_ticks) { read_lock(&bond->ptrlock);
arp_send(ARPOP_REQUEST, ETH_P_ARP, if ((slave != bond->current_slave) &&
arp_target, slave->dev, (bond->current_arp_slave == NULL) &&
my_ip, arp_target_hw_addr, (((jiffies - slave->dev->last_rx) >=
slave->dev->dev_addr, 3*the_delta_in_ticks) && (my_ip != 0))) {
arp_target_hw_addr); /* a backup slave has gone down; three times
* the delta allows the current slave to be
* taken out before the backup slave.
* note: a non-null current_arp_slave indicates
* the current_slave went down and we are
* searching for a new one; under this
* condition we only take the current_slave
* down - this gives each slave a chance to
* tx/rx traffic before being taken out
*/
read_unlock(&bond->ptrlock);
slave->link = BOND_LINK_DOWN;
if (slave->link_failure_count < UINT_MAX) {
slave->link_failure_count++;
} }
bond_set_slave_inactive_flags(slave);
printk(KERN_INFO
"%s: backup interface %s is now down\n",
master->name,
slave->dev->name);
} else {
read_unlock(&bond->ptrlock);
} }
} else }
read_unlock(&bond->ptrlock);
} }
read_lock(&bond->ptrlock); read_lock(&bond->ptrlock);
slave = bond->current_slave; slave = bond->current_slave;
read_unlock(&bond->ptrlock); read_unlock(&bond->ptrlock);
if (slave != 0) { if (slave != NULL) {
/* see if you need to take down the current_slave, since
* you haven't seen an arp in 2*arp_intervals
*/
if ( ((jiffies - slave->dev->trans_start) >=
(2*the_delta_in_ticks)) ||
((jiffies - slave->dev->last_rx) >=
(2*the_delta_in_ticks)) ) {
if (slave->link == BOND_LINK_UP) { /* if we have sent traffic in the past 2*arp_intervals but
slave->link = BOND_LINK_DOWN; * haven't xmit and rx traffic in that time interval, select
slave->state = BOND_STATE_BACKUP; * a different slave. slave->jiffies is only updated when
/* * a slave first becomes the current_slave - not necessarily
* we want to see arps, otherwise we couldn't * after every arp; this ensures the slave has a full 2*delta
* bring the adapter back online... * before being taken out. if a primary is being used, check
*/ * if it is up and needs to take over as the current_slave
printk(KERN_INFO "%s: link status definitely " */
"down for interface %s, " if ((((jiffies - slave->dev->trans_start) >=
"disabling it", (2*the_delta_in_ticks)) ||
slave->dev->master->name, (((jiffies - slave->dev->last_rx) >=
slave->dev->name); (2*the_delta_in_ticks)) && (my_ip != 0))) &&
/* find a new interface and be verbose */ ((jiffies - slave->jiffies) >= 2*the_delta_in_ticks)) {
change_active_interface(bond);
read_lock(&bond->ptrlock); slave->link = BOND_LINK_DOWN;
slave = bond->current_slave; if (slave->link_failure_count < UINT_MAX) {
read_unlock(&bond->ptrlock); slave->link_failure_count++;
}
printk(KERN_INFO "%s: link status down for "
"active interface %s, disabling it",
master->name,
slave->dev->name);
slave = change_active_interface(bond);
bond->current_arp_slave = slave;
if (slave != NULL) {
slave->jiffies = jiffies;
} }
}
/* } else if ((bond->primary_slave != NULL) &&
* ok, we know up/down, so just send a arp out if there has (bond->primary_slave != slave) &&
* been no activity for a while (bond->primary_slave->link == BOND_LINK_UP)) {
/* at this point, slave is the current_slave */
printk(KERN_INFO
"%s: changing from interface %s to primary "
"interface %s\n",
master->name,
slave->dev->name,
bond->primary_slave->dev->name);
/* primary is up so switch to it */
bond_set_slave_inactive_flags(slave);
bond_mc_update(bond, bond->primary_slave, slave);
write_lock(&bond->ptrlock);
bond->current_slave = bond->primary_slave;
write_unlock(&bond->ptrlock);
slave = bond->primary_slave;
bond_set_slave_active_flags(slave);
slave->jiffies = jiffies;
} else {
bond->current_arp_slave = NULL;
}
/* the current slave must tx an arp to ensure backup slaves
* rx traffic
*/ */
if ((slave != NULL) &&
(((jiffies - slave->dev->last_rx) >= the_delta_in_ticks) &&
(my_ip != 0))) {
arp_send_all(slave);
}
}
if (slave != NULL ) { /* if we don't have a current_slave, search for the next available
if ( ((jiffies - slave->dev->trans_start) >= * backup slave from the current_arp_slave and make it the candidate
the_delta_in_ticks) || * for becoming the current_slave
((jiffies - slave->dev->last_rx) >= */
the_delta_in_ticks) ) { if (slave == NULL) {
arp_send(ARPOP_REQUEST, ETH_P_ARP,
arp_target, slave->dev, if ((bond->current_arp_slave == NULL) ||
my_ip, arp_target_hw_addr, (bond->current_arp_slave == (slave_t *)bond)) {
slave->dev->dev_addr, bond->current_arp_slave = bond->prev;
arp_target_hw_addr);
}
} }
} if (bond->current_arp_slave != (slave_t *)bond) {
bond_set_slave_inactive_flags(bond->current_arp_slave);
slave = bond->current_arp_slave->next;
/* search for next candidate */
do {
if (IS_UP(slave->dev)) {
slave->link = BOND_LINK_BACK;
bond_set_slave_active_flags(slave);
arp_send_all(slave);
slave->jiffies = jiffies;
bond->current_arp_slave = slave;
break;
}
/* if we have no current slave.. try sending /* if the link state is up at this point, we
* an arp on all of the interfaces * mark it down - this can happen if we have
*/ * simultaneous link failures and
* change_active_interface doesn't make this
* one the current slave so it is still marked
* up when it is actually down
*/
if (slave->link == BOND_LINK_UP) {
slave->link = BOND_LINK_DOWN;
if (slave->link_failure_count <
UINT_MAX) {
slave->link_failure_count++;
}
read_lock(&bond->ptrlock); bond_set_slave_inactive_flags(slave);
if (bond->current_slave == NULL) { printk(KERN_INFO
read_unlock(&bond->ptrlock); "%s: backup interface "
slave = (slave_t *)bond; "%s is now down.\n",
while ((slave = slave->prev) != (slave_t *)bond) { master->name,
arp_send(ARPOP_REQUEST, ETH_P_ARP, arp_target, slave->dev->name);
slave->dev, my_ip, arp_target_hw_addr, }
slave->dev->dev_addr, arp_target_hw_addr); } while ((slave = slave->next) !=
bond->current_arp_slave->next);
} }
} }
else {
read_unlock(&bond->ptrlock);
}
rtnl_exunlock();
rtnl_shunlock();
arp_monitor_out:
read_unlock_irqrestore(&bond->lock, flags);
/* re-arm the timer */
mod_timer(&bond->arp_timer, next_timer); mod_timer(&bond->arp_timer, next_timer);
read_unlock_irqrestore(&bond->lock, flags);
} }
#define isdigit(c) (c >= '0' && c <= '9') #define isdigit(c) (c >= '0' && c <= '9')
__inline static int atoi( char **s) __inline static int atoi( char **s)
{ {
...@@ -1720,7 +2312,7 @@ static int bond_slave_info_query(struct net_device *master, ...@@ -1720,7 +2312,7 @@ static int bond_slave_info_query(struct net_device *master,
} }
read_unlock_irqrestore(&bond->lock, flags); read_unlock_irqrestore(&bond->lock, flags);
if (cur_ndx == info->slave_id) { if (slave != (slave_t *)bond) {
strcpy(info->slave_name, slave->dev->name); strcpy(info->slave_name, slave->dev->name);
info->link = slave->link; info->link = slave->link;
info->state = slave->state; info->state = slave->state;
...@@ -1737,7 +2329,7 @@ static int bond_ioctl(struct net_device *master_dev, struct ifreq *ifr, int cmd) ...@@ -1737,7 +2329,7 @@ static int bond_ioctl(struct net_device *master_dev, struct ifreq *ifr, int cmd)
struct net_device *slave_dev = NULL; struct net_device *slave_dev = NULL;
struct ifbond *u_binfo = NULL, k_binfo; struct ifbond *u_binfo = NULL, k_binfo;
struct ifslave *u_sinfo = NULL, k_sinfo; struct ifslave *u_sinfo = NULL, k_sinfo;
u16 *data = NULL; struct mii_ioctl_data *mii = NULL;
int ret = 0; int ret = 0;
#ifdef BONDING_DEBUG #ifdef BONDING_DEBUG
...@@ -1747,23 +2339,23 @@ static int bond_ioctl(struct net_device *master_dev, struct ifreq *ifr, int cmd) ...@@ -1747,23 +2339,23 @@ static int bond_ioctl(struct net_device *master_dev, struct ifreq *ifr, int cmd)
switch (cmd) { switch (cmd) {
case SIOCGMIIPHY: case SIOCGMIIPHY:
data = (u16 *)ifr->ifr_data; mii = (struct mii_ioctl_data *)&ifr->ifr_data;
if (data == NULL) { if (mii == NULL) {
return -EINVAL; return -EINVAL;
} }
data[0] = 0; mii->phy_id = 0;
/* Fall Through */ /* Fall Through */
case SIOCGMIIREG: case SIOCGMIIREG:
/* /*
* We do this again just in case we were called by SIOCGMIIREG * We do this again just in case we were called by SIOCGMIIREG
* instead of SIOCGMIIPHY. * instead of SIOCGMIIPHY.
*/ */
data = (u16 *)ifr->ifr_data; mii = (struct mii_ioctl_data *)&ifr->ifr_data;
if (data == NULL) { if (mii == NULL) {
return -EINVAL; return -EINVAL;
} }
if (data[1] == 1) { if (mii->reg_num == 1) {
data[3] = bond_check_mii_link( mii->val_out = bond_check_mii_link(
(struct bonding *)master_dev->priv); (struct bonding *)master_dev->priv);
} }
return 0; return 0;
...@@ -1846,6 +2438,65 @@ static int bond_accept_fastpath(struct net_device *dev, struct dst_entry *dst) ...@@ -1846,6 +2438,65 @@ static int bond_accept_fastpath(struct net_device *dev, struct dst_entry *dst)
} }
#endif #endif
/*
* in broadcast mode, we send everything to all usable interfaces.
*/
static int bond_xmit_broadcast(struct sk_buff *skb, struct net_device *dev)
{
slave_t *slave, *start_at;
struct bonding *bond = (struct bonding *) dev->priv;
unsigned long flags;
struct net_device *device_we_should_send_to = 0;
if (!IS_UP(dev)) { /* bond down */
dev_kfree_skb(skb);
return 0;
}
read_lock_irqsave(&bond->lock, flags);
read_lock(&bond->ptrlock);
slave = start_at = bond->current_slave;
read_unlock(&bond->ptrlock);
if (slave == NULL) { /* we're at the root, get the first slave */
/* no suitable interface, frame not sent */
read_unlock_irqrestore(&bond->lock, flags);
dev_kfree_skb(skb);
return 0;
}
do {
if (IS_UP(slave->dev)
&& (slave->link == BOND_LINK_UP)
&& (slave->state == BOND_STATE_ACTIVE)) {
if (device_we_should_send_to) {
struct sk_buff *skb2;
if ((skb2 = skb_clone(skb, GFP_ATOMIC)) == NULL) {
printk(KERN_ERR "bond_xmit_broadcast: skb_clone() failed\n");
continue;
}
skb2->dev = device_we_should_send_to;
skb2->priority = 1;
dev_queue_xmit(skb2);
}
device_we_should_send_to = slave->dev;
}
} while ((slave = slave->next) != start_at);
if (device_we_should_send_to) {
skb->dev = device_we_should_send_to;
skb->priority = 1;
dev_queue_xmit(skb);
} else
dev_kfree_skb(skb);
/* frame sent to all suitable interfaces */
read_unlock_irqrestore(&bond->lock, flags);
return 0;
}
static int bond_xmit_roundrobin(struct sk_buff *skb, struct net_device *dev) static int bond_xmit_roundrobin(struct sk_buff *skb, struct net_device *dev)
{ {
slave_t *slave, *start_at; slave_t *slave, *start_at;
...@@ -1978,14 +2629,25 @@ static int bond_xmit_activebackup(struct sk_buff *skb, struct net_device *dev) ...@@ -1978,14 +2629,25 @@ static int bond_xmit_activebackup(struct sk_buff *skb, struct net_device *dev)
} }
/* if we are sending arp packets and don't know /* if we are sending arp packets and don't know
the target hw address, save it so we don't need * the target hw address, save it so we don't need
to use a broadcast address */ * to use a broadcast address.
if ( (arp_interval > 0) && (arp_target_hw_addr == NULL) && * don't do this if in active backup mode because the slaves must
* receive packets to stay up, and the only ones they receive are
* broadcasts.
*/
if ( (mode != BOND_MODE_ACTIVEBACKUP) &&
(arp_ip_count == 1) &&
(arp_interval > 0) && (arp_target_hw_addr == NULL) &&
(skb->protocol == __constant_htons(ETH_P_IP) ) ) { (skb->protocol == __constant_htons(ETH_P_IP) ) ) {
struct ethhdr *eth_hdr = struct ethhdr *eth_hdr =
(struct ethhdr *) (((char *)skb->data)); (struct ethhdr *) (((char *)skb->data));
arp_target_hw_addr = kmalloc(ETH_ALEN, GFP_KERNEL); struct iphdr *ip_hdr = (struct iphdr *)(eth_hdr + 1);
memcpy(arp_target_hw_addr, eth_hdr->h_dest, ETH_ALEN);
if (arp_target[0] == ip_hdr->daddr) {
arp_target_hw_addr = kmalloc(ETH_ALEN, GFP_KERNEL);
if (arp_target_hw_addr != NULL)
memcpy(arp_target_hw_addr, eth_hdr->h_dest, ETH_ALEN);
}
} }
read_lock_irqsave(&bond->lock, flags); read_lock_irqsave(&bond->lock, flags);
...@@ -2074,29 +2736,7 @@ static int bond_get_info(char *buf, char **start, off_t offset, int length) ...@@ -2074,29 +2736,7 @@ static int bond_get_info(char *buf, char **start, off_t offset, int length)
*/ */
link = bond_check_mii_link(bond); link = bond_check_mii_link(bond);
len += sprintf(buf + len, "Bonding Mode: "); len += sprintf(buf + len, "Bonding Mode: %s\n", bond_mode());
switch (mode) {
case BOND_MODE_ACTIVEBACKUP:
len += sprintf(buf + len, "%s\n",
"active-backup");
break;
case BOND_MODE_ROUNDROBIN:
len += sprintf(buf + len, "%s\n",
"load balancing (round-robin)");
break;
case BOND_MODE_XOR:
len += sprintf(buf + len, "%s\n",
"load balancing (xor)");
break;
default:
len += sprintf(buf + len, "%s\n",
"unknown");
break;
}
if (mode == BOND_MODE_ACTIVEBACKUP) { if (mode == BOND_MODE_ACTIVEBACKUP) {
read_lock_irqsave(&bond->lock, flags); read_lock_irqsave(&bond->lock, flags);
...@@ -2115,8 +2755,11 @@ static int bond_get_info(char *buf, char **start, off_t offset, int length) ...@@ -2115,8 +2755,11 @@ static int bond_get_info(char *buf, char **start, off_t offset, int length)
link == BMSR_LSTATUS ? "up\n" : "down\n"); link == BMSR_LSTATUS ? "up\n" : "down\n");
len += sprintf(buf + len, "MII Polling Interval (ms): %d\n", len += sprintf(buf + len, "MII Polling Interval (ms): %d\n",
miimon); miimon);
len += sprintf(buf + len, "Up Delay (ms): %d\n", updelay); len += sprintf(buf + len, "Up Delay (ms): %d\n",
len += sprintf(buf + len, "Down Delay (ms): %d\n", downdelay); updelay * miimon);
len += sprintf(buf + len, "Down Delay (ms): %d\n",
downdelay * miimon);
len += sprintf(buf + len, "Multicast Mode: %s\n", multicast_mode());
read_lock_irqsave(&bond->lock, flags); read_lock_irqsave(&bond->lock, flags);
for (slave = bond->prev; slave != (slave_t *)bond; for (slave = bond->prev; slave != (slave_t *)bond;
...@@ -2205,6 +2848,7 @@ static struct notifier_block bond_netdev_notifier = { ...@@ -2205,6 +2848,7 @@ static struct notifier_block bond_netdev_notifier = {
static int __init bond_init(struct net_device *dev) static int __init bond_init(struct net_device *dev)
{ {
bonding_t *bond, *this_bond, *last_bond; bonding_t *bond, *this_bond, *last_bond;
int count;
#ifdef BONDING_DEBUG #ifdef BONDING_DEBUG
printk (KERN_INFO "Begin bond_init for %s\n", dev->name); printk (KERN_INFO "Begin bond_init for %s\n", dev->name);
...@@ -2228,6 +2872,7 @@ static int __init bond_init(struct net_device *dev) ...@@ -2228,6 +2872,7 @@ static int __init bond_init(struct net_device *dev)
bond->next = bond->prev = (slave_t *)bond; bond->next = bond->prev = (slave_t *)bond;
bond->current_slave = NULL; bond->current_slave = NULL;
bond->current_arp_slave = NULL;
bond->device = dev; bond->device = dev;
dev->priv = bond; dev->priv = bond;
...@@ -2238,6 +2883,8 @@ static int __init bond_init(struct net_device *dev) ...@@ -2238,6 +2883,8 @@ static int __init bond_init(struct net_device *dev)
dev->hard_start_xmit = bond_xmit_roundrobin; dev->hard_start_xmit = bond_xmit_roundrobin;
} else if (mode == BOND_MODE_XOR) { } else if (mode == BOND_MODE_XOR) {
dev->hard_start_xmit = bond_xmit_xor; dev->hard_start_xmit = bond_xmit_xor;
} else if (mode == BOND_MODE_BROADCAST) {
dev->hard_start_xmit = bond_xmit_broadcast;
} else { } else {
printk(KERN_ERR "Unknown bonding mode %d\n", mode); printk(KERN_ERR "Unknown bonding mode %d\n", mode);
kfree(bond->stats); kfree(bond->stats);
...@@ -2272,7 +2919,18 @@ static int __init bond_init(struct net_device *dev) ...@@ -2272,7 +2919,18 @@ static int __init bond_init(struct net_device *dev)
} else { } else {
printk("out MII link monitoring"); printk("out MII link monitoring");
} }
printk(", in %s mode.\n",mode?"active-backup":"bonding"); printk(", in %s mode.\n", bond_mode());
printk(KERN_INFO "%s registered with", dev->name);
if (arp_interval > 0) {
printk(" ARP monitoring set to %d ms with %d target(s):",
arp_interval, arp_ip_count);
for (count=0 ; count<arp_ip_count ; count++)
printk (" %s", arp_ip_target[count]);
printk("\n");
} else {
printk("out ARP monitoring\n");
}
#ifdef CONFIG_PROC_FS #ifdef CONFIG_PROC_FS
bond->bond_proc_dir = proc_mkdir(dev->name, proc_net); bond->bond_proc_dir = proc_mkdir(dev->name, proc_net);
...@@ -2329,6 +2987,8 @@ static int __init bonding_init(void) ...@@ -2329,6 +2987,8 @@ static int __init bonding_init(void)
/* Find a name for this unit */ /* Find a name for this unit */
static struct net_device *dev_bond = NULL; static struct net_device *dev_bond = NULL;
printk(KERN_INFO "%s", version);
if (max_bonds < 1 || max_bonds > INT_MAX) { if (max_bonds < 1 || max_bonds > INT_MAX) {
printk(KERN_WARNING printk(KERN_WARNING
"bonding_init(): max_bonds (%d) not in range %d-%d, " "bonding_init(): max_bonds (%d) not in range %d-%d, "
...@@ -2343,6 +3003,14 @@ static int __init bonding_init(void) ...@@ -2343,6 +3003,14 @@ static int __init bonding_init(void)
} }
memset(dev_bonds, 0, max_bonds*sizeof(struct net_device)); memset(dev_bonds, 0, max_bonds*sizeof(struct net_device));
if (miimon < 0) {
printk(KERN_WARNING
"bonding_init(): miimon module parameter (%d), "
"not in range 0-%d, so it was reset to %d\n",
miimon, INT_MAX, BOND_LINK_MON_INTERV);
miimon = BOND_LINK_MON_INTERV;
}
if (updelay < 0) { if (updelay < 0) {
printk(KERN_WARNING printk(KERN_WARNING
"bonding_init(): updelay module parameter (%d), " "bonding_init(): updelay module parameter (%d), "
...@@ -2359,6 +3027,52 @@ static int __init bonding_init(void) ...@@ -2359,6 +3027,52 @@ static int __init bonding_init(void)
downdelay = 0; downdelay = 0;
} }
if (miimon == 0) {
if ((updelay != 0) || (downdelay != 0)) {
/* just warn the user the up/down delay will have
* no effect since miimon is zero...
*/
printk(KERN_WARNING
"bonding_init(): miimon module parameter not "
"set and updelay (%d) or downdelay (%d) module "
"parameter is set; updelay and downdelay have "
"no effect unless miimon is set\n",
updelay, downdelay);
}
} else {
/* don't allow arp monitoring */
if (arp_interval != 0) {
printk(KERN_WARNING
"bonding_init(): miimon (%d) and arp_interval "
"(%d) can't be used simultaneously, "
"disabling ARP monitoring\n",
miimon, arp_interval);
arp_interval = 0;
}
if ((updelay % miimon) != 0) {
/* updelay will be rounded in bond_init() when it
* is divided by miimon, we just inform user here
*/
printk(KERN_WARNING
"bonding_init(): updelay (%d) is not a multiple "
"of miimon (%d), updelay rounded to %d ms\n",
updelay, miimon, (updelay / miimon) * miimon);
}
if ((downdelay % miimon) != 0) {
/* downdelay will be rounded in bond_init() when it
* is divided by miimon, we just inform user here
*/
printk(KERN_WARNING
"bonding_init(): downdelay (%d) is not a "
"multiple of miimon (%d), downdelay rounded "
"to %d ms\n",
downdelay, miimon,
(downdelay / miimon) * miimon);
}
}
if (arp_interval < 0) { if (arp_interval < 0) {
printk(KERN_WARNING printk(KERN_WARNING
"bonding_init(): arp_interval module parameter (%d), " "bonding_init(): arp_interval module parameter (%d), "
...@@ -2367,11 +3081,63 @@ static int __init bonding_init(void) ...@@ -2367,11 +3081,63 @@ static int __init bonding_init(void)
arp_interval = BOND_LINK_ARP_INTERV; arp_interval = BOND_LINK_ARP_INTERV;
} }
if (arp_ip_target) { for (arp_ip_count=0 ;
/* TODO: check and log bad ip address */ (arp_ip_count < MAX_ARP_IP_TARGETS) && arp_ip_target[arp_ip_count];
if (my_inet_aton(arp_ip_target, &arp_target) == 0) { arp_ip_count++ ) {
arp_interval = 0; /* TODO: check and log bad ip address */
if (my_inet_aton(arp_ip_target[arp_ip_count],
&arp_target[arp_ip_count]) == 0) {
printk(KERN_WARNING
"bonding_init(): bad arp_ip_target module "
"parameter (%s), ARP monitoring will not be "
"performed\n",
arp_ip_target[arp_ip_count]);
arp_interval = 0;
} }
}
if ( (arp_interval > 0) && (arp_ip_count==0)) {
/* don't allow arping if no arp_ip_target given... */
printk(KERN_WARNING
"bonding_init(): arp_interval module parameter "
"(%d) specified without providing an arp_ip_target "
"parameter, arp_interval was reset to 0\n",
arp_interval);
arp_interval = 0;
}
if ((miimon == 0) && (arp_interval == 0)) {
/* miimon and arp_interval not set, we need one so things
* work as expected, see bonding.txt for details
*/
printk(KERN_ERR
"bonding_init(): either miimon or "
"arp_interval and arp_ip_target module parameters "
"must be specified, otherwise bonding will not detect "
"link failures! see bonding.txt for details.\n");
}
if ((primary != NULL) && (mode != BOND_MODE_ACTIVEBACKUP)){
/* currently, using a primary only makes sence
* in active backup mode
*/
printk(KERN_WARNING
"bonding_init(): %s primary device specified but has "
" no effect in %s mode\n",
primary, bond_mode());
primary = NULL;
}
if (multicast != BOND_MULTICAST_DISABLED &&
multicast != BOND_MULTICAST_ACTIVE &&
multicast != BOND_MULTICAST_ALL) {
printk(KERN_WARNING
"bonding_init(): unknown multicast module "
"parameter (%d), multicast reset to %d\n",
multicast, BOND_MULTICAST_ALL);
multicast = BOND_MULTICAST_ALL;
} }
for (no = 0; no < max_bonds; no++) { for (no = 0; no < max_bonds; no++) {
...@@ -2420,6 +3186,7 @@ static void __exit bonding_exit(void) ...@@ -2420,6 +3186,7 @@ static void __exit bonding_exit(void)
module_init(bonding_init); module_init(bonding_init);
module_exit(bonding_exit); module_exit(bonding_exit);
MODULE_LICENSE("GPL"); MODULE_LICENSE("GPL");
MODULE_DESCRIPTION(DRV_DESCRIPTION ", v" DRV_VERSION);
/* /*
* Local variables: * Local variables:
......
...@@ -5,15 +5,15 @@ ...@@ -5,15 +5,15 @@
* PPPoE --- PPP over Ethernet (RFC 2516) * PPPoE --- PPP over Ethernet (RFC 2516)
* *
* *
* Version: 0.6.11 * Version: 0.7.0
* *
* 220102 : Fix module use count on failure in pppoe_create, pppox_sk -acme * 220102 : Fix module use count on failure in pppoe_create, pppox_sk -acme
* 030700 : Fixed connect logic to allow for disconnect. * 030700 : Fixed connect logic to allow for disconnect.
* 270700 : Fixed potential SMP problems; we must protect against * 270700 : Fixed potential SMP problems; we must protect against
* simultaneous invocation of ppp_input * simultaneous invocation of ppp_input
* and ppp_unregister_channel. * and ppp_unregister_channel.
* 040800 : Respect reference count mechanisms on net-devices. * 040800 : Respect reference count mechanisms on net-devices.
* 200800 : fix kfree(skb) in pppoe_rcv (acme) * 200800 : fix kfree(skb) in pppoe_rcv (acme)
* Module reference count is decremented in the right spot now, * Module reference count is decremented in the right spot now,
* guards against sock_put not actually freeing the sk * guards against sock_put not actually freeing the sk
* in pppoe_release. * in pppoe_release.
...@@ -30,13 +30,14 @@ ...@@ -30,13 +30,14 @@
* the original skb that was passed in on success, never on * the original skb that was passed in on success, never on
* failure. Delete the copy of the skb on failure to avoid * failure. Delete the copy of the skb on failure to avoid
* a memory leak. * a memory leak.
* 081001 : Misc. cleanup (licence string, non-blocking, prevent * 081001 : Misc. cleanup (licence string, non-blocking, prevent
* reference of device on close). * reference of device on close).
* 121301 : New ppp channels interface; cannot unregister a channel * 121301 : New ppp channels interface; cannot unregister a channel
* from interrupts. Thus, we mark the socket as a ZOMBIE * from interrupts. Thus, we mark the socket as a ZOMBIE
* and do the unregistration later. * and do the unregistration later.
* 081002 : seq_file support for proc stuff -acme * 081002 : seq_file support for proc stuff -acme
* * 111602 : Merge all 2.4 fixes into 2.5/2.6 tree. Label 2.5/2.6
* as version 0.7. Spacing cleanup.
* Author: Michal Ostrowski <mostrows@speakeasy.net> * Author: Michal Ostrowski <mostrows@speakeasy.net>
* Contributors: * Contributors:
* Arnaldo Carvalho de Melo <acme@conectiva.com.br> * Arnaldo Carvalho de Melo <acme@conectiva.com.br>
...@@ -381,8 +382,8 @@ int pppoe_rcv_core(struct sock *sk, struct sk_buff *skb) ...@@ -381,8 +382,8 @@ int pppoe_rcv_core(struct sock *sk, struct sk_buff *skb)
* *
***********************************************************************/ ***********************************************************************/
static int pppoe_rcv(struct sk_buff *skb, static int pppoe_rcv(struct sk_buff *skb,
struct net_device *dev, struct net_device *dev,
struct packet_type *pt) struct packet_type *pt)
{ {
struct pppoe_hdr *ph = (struct pppoe_hdr *) skb->nh.raw; struct pppoe_hdr *ph = (struct pppoe_hdr *) skb->nh.raw;
...@@ -398,7 +399,7 @@ static int pppoe_rcv(struct sk_buff *skb, ...@@ -398,7 +399,7 @@ static int pppoe_rcv(struct sk_buff *skb,
} }
sk = po->sk; sk = po->sk;
bh_lock_sock(sk); bh_lock_sock(sk);
/* Socket state is unknown, must put skb into backlog. */ /* Socket state is unknown, must put skb into backlog. */
if (sock_owned_by_user(sk) != 0) { if (sock_owned_by_user(sk) != 0) {
...@@ -443,8 +444,10 @@ static int pppoe_disc_rcv(struct sk_buff *skb, ...@@ -443,8 +444,10 @@ static int pppoe_disc_rcv(struct sk_buff *skb,
* what kind of SKB it is during backlog rcv. * what kind of SKB it is during backlog rcv.
*/ */
if (sock_owned_by_user(sk) == 0) { if (sock_owned_by_user(sk) == 0) {
/* We're no longer connect at the PPPOE layer,
* and must wait for ppp channel to disconnect us.
*/
sk->state = PPPOX_ZOMBIE; sk->state = PPPOX_ZOMBIE;
pppox_unbind_sock(sk);
} }
bh_unlock_sock(sk); bh_unlock_sock(sk);
...@@ -583,8 +586,7 @@ int pppoe_connect(struct socket *sock, struct sockaddr *uservaddr, ...@@ -583,8 +586,7 @@ int pppoe_connect(struct socket *sock, struct sockaddr *uservaddr,
if ((sk->state & PPPOX_CONNECTED) && sp->sa_addr.pppoe.sid) if ((sk->state & PPPOX_CONNECTED) && sp->sa_addr.pppoe.sid)
goto end; goto end;
/* Check for already disconnected sockets, /* Check for already disconnected sockets, on attempts to disconnect */
on attempts to disconnect */
error = -EALREADY; error = -EALREADY;
if((sk->state & PPPOX_DEAD) && !sp->sa_addr.pppoe.sid ) if((sk->state & PPPOX_DEAD) && !sp->sa_addr.pppoe.sid )
goto end; goto end;
...@@ -596,7 +598,8 @@ int pppoe_connect(struct socket *sock, struct sockaddr *uservaddr, ...@@ -596,7 +598,8 @@ int pppoe_connect(struct socket *sock, struct sockaddr *uservaddr,
/* Delete the old binding */ /* Delete the old binding */
delete_item(po->pppoe_pa.sid,po->pppoe_pa.remote); delete_item(po->pppoe_pa.sid,po->pppoe_pa.remote);
dev_put(po->pppoe_dev); if(po->pppoe_dev)
dev_put(po->pppoe_dev);
memset(po, 0, sizeof(struct pppox_opt)); memset(po, 0, sizeof(struct pppox_opt));
po->sk = sk; po->sk = sk;
...@@ -994,7 +997,7 @@ static int pppoe_seq_show(struct seq_file *seq, void *v) ...@@ -994,7 +997,7 @@ static int pppoe_seq_show(struct seq_file *seq, void *v)
po->pppoe_pa.remote[2], po->pppoe_pa.remote[3], po->pppoe_pa.remote[2], po->pppoe_pa.remote[3],
po->pppoe_pa.remote[4], po->pppoe_pa.remote[5], dev_name); po->pppoe_pa.remote[4], po->pppoe_pa.remote[5], dev_name);
out: out:
return 0; return 0;
} }
static __inline__ struct pppox_opt *pppoe_get_idx(loff_t pos) static __inline__ struct pppox_opt *pppoe_get_idx(loff_t pos)
...@@ -1064,10 +1067,10 @@ static int pppoe_seq_open(struct inode *inode, struct file *file) ...@@ -1064,10 +1067,10 @@ static int pppoe_seq_open(struct inode *inode, struct file *file)
} }
static struct file_operations pppoe_seq_fops = { static struct file_operations pppoe_seq_fops = {
.open = pppoe_seq_open, .open = pppoe_seq_open,
.read = seq_read, .read = seq_read,
.llseek = seq_lseek, .llseek = seq_lseek,
.release = seq_release, .release = seq_release,
}; };
#endif /* CONFIG_PROC_FS */ #endif /* CONFIG_PROC_FS */
......
...@@ -5,9 +5,9 @@ ...@@ -5,9 +5,9 @@
* PPPoE --- PPP over Ethernet (RFC 2516) * PPPoE --- PPP over Ethernet (RFC 2516)
* *
* *
* Version: 0.5.1 * Version: 0.5.2
* *
* Author: Michal Ostrowski <mostrows@styx.uwaterloo.ca> * Author: Michal Ostrowski <mostrows@speakeasy.net>
* *
* 051000 : Initialization cleanup * 051000 : Initialization cleanup
* *
...@@ -56,8 +56,8 @@ int register_pppox_proto(int proto_num, struct pppox_proto *pp) ...@@ -56,8 +56,8 @@ int register_pppox_proto(int proto_num, struct pppox_proto *pp)
void unregister_pppox_proto(int proto_num) void unregister_pppox_proto(int proto_num)
{ {
if (proto_num >= 0 && proto_num <= PX_MAX_PROTO) { if (proto_num >= 0 && proto_num <= PX_MAX_PROTO) {
proto[proto_num] = NULL; proto[proto_num] = NULL;
MOD_DEC_USE_COUNT; MOD_DEC_USE_COUNT;
} }
} }
...@@ -65,9 +65,9 @@ void pppox_unbind_sock(struct sock *sk) ...@@ -65,9 +65,9 @@ void pppox_unbind_sock(struct sock *sk)
{ {
/* Clear connection to ppp device, if attached. */ /* Clear connection to ppp device, if attached. */
if (sk->state & PPPOX_BOUND) { if (sk->state & (PPPOX_BOUND|PPPOX_ZOMBIE)) {
ppp_unregister_channel(&pppox_sk(sk)->chan); ppp_unregister_channel(&pppox_sk(sk)->chan);
sk->state &= ~PPPOX_BOUND; sk->state = PPPOX_DEAD;
} }
} }
...@@ -75,7 +75,7 @@ EXPORT_SYMBOL(register_pppox_proto); ...@@ -75,7 +75,7 @@ EXPORT_SYMBOL(register_pppox_proto);
EXPORT_SYMBOL(unregister_pppox_proto); EXPORT_SYMBOL(unregister_pppox_proto);
EXPORT_SYMBOL(pppox_unbind_sock); EXPORT_SYMBOL(pppox_unbind_sock);
static int pppox_ioctl(struct socket* sock, unsigned int cmd, static int pppox_ioctl(struct socket* sock, unsigned int cmd,
unsigned long arg) unsigned long arg)
{ {
struct sock *sk = sock->sk; struct sock *sk = sock->sk;
...@@ -117,10 +117,10 @@ static int pppox_create(struct socket *sock, int protocol) ...@@ -117,10 +117,10 @@ static int pppox_create(struct socket *sock, int protocol)
int err = 0; int err = 0;
if (protocol < 0 || protocol > PX_MAX_PROTO) if (protocol < 0 || protocol > PX_MAX_PROTO)
return -EPROTOTYPE; return -EPROTOTYPE;
if (proto[protocol] == NULL) if (proto[protocol] == NULL)
return -EPROTONOSUPPORT; return -EPROTONOSUPPORT;
err = (*proto[protocol]->create)(sock); err = (*proto[protocol]->create)(sock);
......
...@@ -37,9 +37,10 @@ ...@@ -37,9 +37,10 @@
#define BOND_CHECK_MII_STATUS (SIOCGMIIPHY) #define BOND_CHECK_MII_STATUS (SIOCGMIIPHY)
#define BOND_MODE_ROUNDROBIN 0 #define BOND_MODE_ROUNDROBIN 0
#define BOND_MODE_ACTIVEBACKUP 1 #define BOND_MODE_ACTIVEBACKUP 1
#define BOND_MODE_XOR 2 #define BOND_MODE_XOR 2
#define BOND_MODE_BROADCAST 3
/* each slave's link has 4 states */ /* each slave's link has 4 states */
#define BOND_LINK_UP 0 /* link is up and running */ #define BOND_LINK_UP 0 /* link is up and running */
...@@ -74,6 +75,7 @@ typedef struct slave { ...@@ -74,6 +75,7 @@ typedef struct slave {
struct slave *prev; struct slave *prev;
struct net_device *dev; struct net_device *dev;
short delay; short delay;
unsigned long jiffies;
char link; /* one of BOND_LINK_XXXX */ char link; /* one of BOND_LINK_XXXX */
char state; /* one of BOND_STATE_XXXX */ char state; /* one of BOND_STATE_XXXX */
unsigned short original_flags; unsigned short original_flags;
...@@ -93,6 +95,8 @@ typedef struct bonding { ...@@ -93,6 +95,8 @@ typedef struct bonding {
slave_t *next; slave_t *next;
slave_t *prev; slave_t *prev;
slave_t *current_slave; slave_t *current_slave;
slave_t *primary_slave;
slave_t *current_arp_slave;
__s32 slave_cnt; __s32 slave_cnt;
rwlock_t lock; rwlock_t lock;
rwlock_t ptrlock; rwlock_t ptrlock;
......
...@@ -544,7 +544,8 @@ enum { ...@@ -544,7 +544,8 @@ enum {
NET_SCTP_PATH_MAX_RETRANS = 8, NET_SCTP_PATH_MAX_RETRANS = 8,
NET_SCTP_MAX_INIT_RETRANSMITS = 9, NET_SCTP_MAX_INIT_RETRANSMITS = 9,
NET_SCTP_HB_INTERVAL = 10, NET_SCTP_HB_INTERVAL = 10,
NET_SCTP_MAX_BURST = 11, NET_SCTP_PRESERVE_ENABLE = 11,
NET_SCTP_MAX_BURST = 12,
}; };
/* CTL_PROC names: */ /* CTL_PROC names: */
......
...@@ -10,6 +10,7 @@ ...@@ -10,6 +10,7 @@
#include <linux/config.h> #include <linux/config.h>
#include <linux/rtnetlink.h> #include <linux/rtnetlink.h>
#include <linux/rcupdate.h>
#include <net/neighbour.h> #include <net/neighbour.h>
#include <asm/processor.h> #include <asm/processor.h>
...@@ -71,6 +72,7 @@ struct dst_entry ...@@ -71,6 +72,7 @@ struct dst_entry
#endif #endif
struct dst_ops *ops; struct dst_ops *ops;
struct rcu_head rcu_head;
char info[0]; char info[0];
}; };
......
...@@ -123,8 +123,8 @@ extern sctp_protocol_t sctp_proto; ...@@ -123,8 +123,8 @@ extern sctp_protocol_t sctp_proto;
extern struct sock *sctp_get_ctl_sock(void); extern struct sock *sctp_get_ctl_sock(void);
extern int sctp_copy_local_addr_list(sctp_protocol_t *, sctp_bind_addr_t *, extern int sctp_copy_local_addr_list(sctp_protocol_t *, sctp_bind_addr_t *,
sctp_scope_t, int priority, int flags); sctp_scope_t, int priority, int flags);
extern sctp_pf_t *sctp_get_pf_specific(int family); extern struct sctp_pf *sctp_get_pf_specific(sa_family_t family);
extern void sctp_set_pf_specific(int family, sctp_pf_t *); extern int sctp_register_pf(struct sctp_pf *, sa_family_t);
/* /*
* sctp_socket.c * sctp_socket.c
......
...@@ -140,6 +140,8 @@ sctp_state_fn_t sctp_sf_do_5_2_2_dupinit; ...@@ -140,6 +140,8 @@ sctp_state_fn_t sctp_sf_do_5_2_2_dupinit;
sctp_state_fn_t sctp_sf_do_5_2_4_dupcook; sctp_state_fn_t sctp_sf_do_5_2_4_dupcook;
sctp_state_fn_t sctp_sf_unk_chunk; sctp_state_fn_t sctp_sf_unk_chunk;
sctp_state_fn_t sctp_sf_do_8_5_1_E_sa; sctp_state_fn_t sctp_sf_do_8_5_1_E_sa;
sctp_state_fn_t sctp_sf_cookie_echoed_err;
sctp_state_fn_t sctp_sf_do_5_2_6_stale;
/* Prototypes for primitive event state functions. */ /* Prototypes for primitive event state functions. */
sctp_state_fn_t sctp_sf_do_prm_asoc; sctp_state_fn_t sctp_sf_do_prm_asoc;
...@@ -175,7 +177,6 @@ sctp_state_fn_t sctp_sf_autoclose_timer_expire; ...@@ -175,7 +177,6 @@ sctp_state_fn_t sctp_sf_autoclose_timer_expire;
*/ */
/* Prototypes for chunk state functions. Not in use. */ /* Prototypes for chunk state functions. Not in use. */
sctp_state_fn_t sctp_sf_do_5_2_6_stale;
sctp_state_fn_t sctp_sf_do_9_2_reshutack; sctp_state_fn_t sctp_sf_do_9_2_reshutack;
sctp_state_fn_t sctp_sf_do_9_2_reshut; sctp_state_fn_t sctp_sf_do_9_2_reshut;
sctp_state_fn_t sctp_sf_do_9_2_shutack; sctp_state_fn_t sctp_sf_do_9_2_shutack;
...@@ -211,7 +212,7 @@ void sctp_populate_tie_tags(__u8 *cookie, __u32 curTag, __u32 hisTag); ...@@ -211,7 +212,7 @@ void sctp_populate_tie_tags(__u8 *cookie, __u32 curTag, __u32 hisTag);
/* Prototypes for chunk-building functions. */ /* Prototypes for chunk-building functions. */
sctp_chunk_t *sctp_make_init(const sctp_association_t *, sctp_chunk_t *sctp_make_init(const sctp_association_t *,
const sctp_bind_addr_t *, const sctp_bind_addr_t *,
int priority); int priority, int vparam_len);
sctp_chunk_t *sctp_make_init_ack(const sctp_association_t *, sctp_chunk_t *sctp_make_init_ack(const sctp_association_t *,
const sctp_chunk_t *, const sctp_chunk_t *,
const int priority, const int priority,
...@@ -322,9 +323,15 @@ sctp_pack_cookie(const sctp_endpoint_t *, const sctp_association_t *, ...@@ -322,9 +323,15 @@ sctp_pack_cookie(const sctp_endpoint_t *, const sctp_association_t *,
const __u8 *, int addrs_len); const __u8 *, int addrs_len);
sctp_association_t *sctp_unpack_cookie(const sctp_endpoint_t *, sctp_association_t *sctp_unpack_cookie(const sctp_endpoint_t *,
const sctp_association_t *, const sctp_association_t *,
sctp_chunk_t *, int priority, int *err); sctp_chunk_t *, int priority, int *err,
sctp_chunk_t **err_chk_p);
int sctp_addip_addr_config(sctp_association_t *, sctp_param_t, int sctp_addip_addr_config(sctp_association_t *, sctp_param_t,
struct sockaddr_storage*, int); struct sockaddr_storage*, int);
void sctp_send_stale_cookie_err(const sctp_endpoint_t *ep,
const sctp_association_t *asoc,
const sctp_chunk_t *chunk,
sctp_cmd_seq_t *commands,
sctp_chunk_t *err_chunk);
/* 3rd level prototypes */ /* 3rd level prototypes */
__u32 sctp_generate_tag(const sctp_endpoint_t *); __u32 sctp_generate_tag(const sctp_endpoint_t *);
......
...@@ -42,6 +42,7 @@ ...@@ -42,6 +42,7 @@
* Sridhar Samudrala <sri@us.ibm.com> * Sridhar Samudrala <sri@us.ibm.com>
* Daisy Chang <daisyc@us.ibm.com> * Daisy Chang <daisyc@us.ibm.com>
* Dajiang Zhang <dajiang.zhang@nokia.com> * Dajiang Zhang <dajiang.zhang@nokia.com>
* Ardelle Fan <ardelle.fan@intel.com>
* *
* Any bugs reported given to us we will try to fix... any fixes shared will * Any bugs reported given to us we will try to fix... any fixes shared will
* be incorporated into the next SCTP release. * be incorporated into the next SCTP release.
...@@ -182,6 +183,9 @@ struct SCTP_protocol { ...@@ -182,6 +183,9 @@ struct SCTP_protocol {
/* Valid.Cookie.Life - 60 seconds */ /* Valid.Cookie.Life - 60 seconds */
int valid_cookie_life; int valid_cookie_life;
/* Whether Cookie Preservative is enabled(1) or not(0) */
int cookie_preserve_enable;
/* Association.Max.Retrans - 10 attempts /* Association.Max.Retrans - 10 attempts
* Path.Max.Retrans - 5 attempts (per destination address) * Path.Max.Retrans - 5 attempts (per destination address)
...@@ -234,7 +238,7 @@ struct SCTP_protocol { ...@@ -234,7 +238,7 @@ struct SCTP_protocol {
* Pointers to address related SCTP functions. * Pointers to address related SCTP functions.
* (i.e. things that depend on the address family.) * (i.e. things that depend on the address family.)
*/ */
typedef struct sctp_func { struct sctp_af {
int (*queue_xmit) (struct sk_buff *skb); int (*queue_xmit) (struct sk_buff *skb);
int (*setsockopt) (struct sock *sk, int (*setsockopt) (struct sock *sk,
int level, int level,
...@@ -259,27 +263,34 @@ typedef struct sctp_func { ...@@ -259,27 +263,34 @@ typedef struct sctp_func {
void (*from_skb) (union sctp_addr *, void (*from_skb) (union sctp_addr *,
struct sk_buff *skb, struct sk_buff *skb,
int saddr); int saddr);
void (*from_sk) (union sctp_addr *,
struct sock *sk);
void (*to_sk) (union sctp_addr *,
struct sock *sk);
int (*addr_valid) (union sctp_addr *); int (*addr_valid) (union sctp_addr *);
sctp_scope_t (*scope) (union sctp_addr *); sctp_scope_t (*scope) (union sctp_addr *);
void (*inaddr_any) (union sctp_addr *, unsigned short); void (*inaddr_any) (union sctp_addr *, unsigned short);
int (*is_any) (const union sctp_addr *); int (*is_any) (const union sctp_addr *);
int (*available) (const union sctp_addr *);
__u16 net_header_len; __u16 net_header_len;
int sockaddr_len; int sockaddr_len;
sa_family_t sa_family; sa_family_t sa_family;
struct list_head list; struct list_head list;
} sctp_func_t; };
sctp_func_t *sctp_get_af_specific(sa_family_t); struct sctp_af *sctp_get_af_specific(sa_family_t);
int sctp_register_af(struct sctp_af *);
/* Protocol family functions. */ /* Protocol family functions. */
typedef struct sctp_pf { typedef struct sctp_pf {
void (*event_msgname)(sctp_ulpevent_t *, char *, int *); void (*event_msgname)(sctp_ulpevent_t *, char *, int *);
void (*skb_msgname)(struct sk_buff *, char *, int *); void (*skb_msgname) (struct sk_buff *, char *, int *);
int (*af_supported)(sa_family_t); int (*af_supported) (sa_family_t);
int (*cmp_addr) (const union sctp_addr *, int (*cmp_addr) (const union sctp_addr *,
const union sctp_addr *, const union sctp_addr *,
struct sctp_opt *); struct sctp_opt *);
struct sctp_func *af; int (*bind_verify) (struct sctp_opt *, union sctp_addr *);
struct sctp_af *af;
} sctp_pf_t; } sctp_pf_t;
/* SCTP Socket type: UDP or TCP style. */ /* SCTP Socket type: UDP or TCP style. */
...@@ -623,7 +634,7 @@ struct SCTP_transport { ...@@ -623,7 +634,7 @@ struct SCTP_transport {
union sctp_addr ipaddr; union sctp_addr ipaddr;
/* These are the functions we call to handle LLP stuff. */ /* These are the functions we call to handle LLP stuff. */
sctp_func_t *af_specific; struct sctp_af *af_specific;
/* Which association do we belong to? */ /* Which association do we belong to? */
sctp_association_t *asoc; sctp_association_t *asoc;
...@@ -1271,7 +1282,6 @@ struct SCTP_association { ...@@ -1271,7 +1282,6 @@ struct SCTP_association {
/* The cookie life I award for any cookie. */ /* The cookie life I award for any cookie. */
struct timeval cookie_life; struct timeval cookie_life;
__u32 cookie_preserve;
/* Overall : The overall association error count. /* Overall : The overall association error count.
* Error Count : [Clear this any time I get something.] * Error Count : [Clear this any time I get something.]
...@@ -1350,6 +1360,9 @@ struct SCTP_association { ...@@ -1350,6 +1360,9 @@ struct SCTP_association {
*/ */
__u32 rwnd; __u32 rwnd;
/* This is the last advertised value of rwnd over a SACK chunk. */
__u32 a_rwnd;
/* Number of bytes by which the rwnd has slopped. The rwnd is allowed /* Number of bytes by which the rwnd has slopped. The rwnd is allowed
* to slop over a maximum of the association's frag_point. * to slop over a maximum of the association's frag_point.
*/ */
......
...@@ -574,7 +574,15 @@ void nf_reinject(struct sk_buff *skb, struct nf_info *info, ...@@ -574,7 +574,15 @@ void nf_reinject(struct sk_buff *skb, struct nf_info *info,
/* Release those devices we held, or Alexey will kill me. */ /* Release those devices we held, or Alexey will kill me. */
if (info->indev) dev_put(info->indev); if (info->indev) dev_put(info->indev);
if (info->outdev) dev_put(info->outdev); if (info->outdev) dev_put(info->outdev);
#if defined(CONFIG_BRIDGE) || defined(CONFIG_BRIDGE_MODULE)
if (skb->nf_bridge) {
if (skb->nf_bridge->physindev)
dev_put(skb->nf_bridge->physindev);
if (skb->nf_bridge->physoutdev)
dev_put(skb->nf_bridge->physoutdev);
}
#endif
kfree(info); kfree(info);
return; return;
} }
......
...@@ -207,13 +207,13 @@ static struct pktgen_info pginfos[MAX_PKTGEN]; ...@@ -207,13 +207,13 @@ static struct pktgen_info pginfos[MAX_PKTGEN];
/** Convert to miliseconds */ /** Convert to miliseconds */
inline __u64 tv_to_ms(const struct timeval* tv) { static inline __u64 tv_to_ms(const struct timeval* tv) {
__u64 ms = tv->tv_usec / 1000; __u64 ms = tv->tv_usec / 1000;
ms += (__u64)tv->tv_sec * (__u64)1000; ms += (__u64)tv->tv_sec * (__u64)1000;
return ms; return ms;
} }
inline __u64 getCurMs(void) { static inline __u64 getCurMs(void) {
struct timeval tv; struct timeval tv;
do_gettimeofday(&tv); do_gettimeofday(&tv);
return tv_to_ms(&tv); return tv_to_ms(&tv);
...@@ -1277,7 +1277,7 @@ static int proc_write(struct file *file, const char *user_buffer, ...@@ -1277,7 +1277,7 @@ static int proc_write(struct file *file, const char *user_buffer,
} }
int create_proc_dir(void) static int create_proc_dir(void)
{ {
int len; int len;
/* does proc_dir already exists */ /* does proc_dir already exists */
...@@ -1295,7 +1295,7 @@ int create_proc_dir(void) ...@@ -1295,7 +1295,7 @@ int create_proc_dir(void)
return 1; return 1;
} }
int remove_proc_dir(void) static int remove_proc_dir(void)
{ {
remove_proc_entry(PG_PROC_DIR, proc_net); remove_proc_entry(PG_PROC_DIR, proc_net);
return 1; return 1;
......
...@@ -86,6 +86,7 @@ ...@@ -86,6 +86,7 @@
#include <linux/mroute.h> #include <linux/mroute.h>
#include <linux/netfilter_ipv4.h> #include <linux/netfilter_ipv4.h>
#include <linux/random.h> #include <linux/random.h>
#include <linux/rcupdate.h>
#include <net/protocol.h> #include <net/protocol.h>
#include <net/ip.h> #include <net/ip.h>
#include <net/route.h> #include <net/route.h>
...@@ -178,7 +179,7 @@ __u8 ip_tos2prio[16] = { ...@@ -178,7 +179,7 @@ __u8 ip_tos2prio[16] = {
/* The locking scheme is rather straight forward: /* The locking scheme is rather straight forward:
* *
* 1) A BH protected rwlocks protect buckets of the central route hash. * 1) Read-Copy Update protects the buckets of the central route hash.
* 2) Only writers remove entries, and they hold the lock * 2) Only writers remove entries, and they hold the lock
* as they look at rtable reference counts. * as they look at rtable reference counts.
* 3) Only readers acquire references to rtable entries, * 3) Only readers acquire references to rtable entries,
...@@ -188,7 +189,7 @@ __u8 ip_tos2prio[16] = { ...@@ -188,7 +189,7 @@ __u8 ip_tos2prio[16] = {
struct rt_hash_bucket { struct rt_hash_bucket {
struct rtable *chain; struct rtable *chain;
rwlock_t lock; spinlock_t lock;
} __attribute__((__aligned__(8))); } __attribute__((__aligned__(8)));
static struct rt_hash_bucket *rt_hash_table; static struct rt_hash_bucket *rt_hash_table;
...@@ -220,11 +221,11 @@ static struct rtable *rt_cache_get_first(struct seq_file *seq) ...@@ -220,11 +221,11 @@ static struct rtable *rt_cache_get_first(struct seq_file *seq)
struct rt_cache_iter_state *st = seq->private; struct rt_cache_iter_state *st = seq->private;
for (st->bucket = rt_hash_mask; st->bucket >= 0; --st->bucket) { for (st->bucket = rt_hash_mask; st->bucket >= 0; --st->bucket) {
read_lock_bh(&rt_hash_table[st->bucket].lock); rcu_read_lock();
r = rt_hash_table[st->bucket].chain; r = rt_hash_table[st->bucket].chain;
if (r) if (r)
break; break;
read_unlock_bh(&rt_hash_table[st->bucket].lock); rcu_read_unlock();
} }
return r; return r;
} }
...@@ -233,12 +234,13 @@ static struct rtable *rt_cache_get_next(struct seq_file *seq, struct rtable *r) ...@@ -233,12 +234,13 @@ static struct rtable *rt_cache_get_next(struct seq_file *seq, struct rtable *r)
{ {
struct rt_cache_iter_state *st = seq->private; struct rt_cache_iter_state *st = seq->private;
read_barrier_depends();
r = r->u.rt_next; r = r->u.rt_next;
while (!r) { while (!r) {
read_unlock_bh(&rt_hash_table[st->bucket].lock); rcu_read_unlock();
if (--st->bucket < 0) if (--st->bucket < 0)
break; break;
read_lock_bh(&rt_hash_table[st->bucket].lock); rcu_read_lock();
r = rt_hash_table[st->bucket].chain; r = rt_hash_table[st->bucket].chain;
} }
return r; return r;
...@@ -276,7 +278,7 @@ static void rt_cache_seq_stop(struct seq_file *seq, void *v) ...@@ -276,7 +278,7 @@ static void rt_cache_seq_stop(struct seq_file *seq, void *v)
if (v && v != (void *)1) { if (v && v != (void *)1) {
struct rt_cache_iter_state *st = seq->private; struct rt_cache_iter_state *st = seq->private;
read_unlock_bh(&rt_hash_table[st->bucket].lock); rcu_read_unlock();
} }
} }
...@@ -406,13 +408,13 @@ void __init rt_cache_proc_exit(void) ...@@ -406,13 +408,13 @@ void __init rt_cache_proc_exit(void)
static __inline__ void rt_free(struct rtable *rt) static __inline__ void rt_free(struct rtable *rt)
{ {
dst_free(&rt->u.dst); call_rcu(&rt->u.dst.rcu_head, (void (*)(void *))dst_free, &rt->u.dst);
} }
static __inline__ void rt_drop(struct rtable *rt) static __inline__ void rt_drop(struct rtable *rt)
{ {
ip_rt_put(rt); ip_rt_put(rt);
dst_free(&rt->u.dst); call_rcu(&rt->u.dst.rcu_head, (void (*)(void *))dst_free, &rt->u.dst);
} }
static __inline__ int rt_fast_clean(struct rtable *rth) static __inline__ int rt_fast_clean(struct rtable *rth)
...@@ -465,7 +467,7 @@ static void SMP_TIMER_NAME(rt_check_expire)(unsigned long dummy) ...@@ -465,7 +467,7 @@ static void SMP_TIMER_NAME(rt_check_expire)(unsigned long dummy)
i = (i + 1) & rt_hash_mask; i = (i + 1) & rt_hash_mask;
rthp = &rt_hash_table[i].chain; rthp = &rt_hash_table[i].chain;
write_lock(&rt_hash_table[i].lock); spin_lock(&rt_hash_table[i].lock);
while ((rth = *rthp) != NULL) { while ((rth = *rthp) != NULL) {
if (rth->u.dst.expires) { if (rth->u.dst.expires) {
/* Entry is expired even if it is in use */ /* Entry is expired even if it is in use */
...@@ -484,7 +486,7 @@ static void SMP_TIMER_NAME(rt_check_expire)(unsigned long dummy) ...@@ -484,7 +486,7 @@ static void SMP_TIMER_NAME(rt_check_expire)(unsigned long dummy)
*rthp = rth->u.rt_next; *rthp = rth->u.rt_next;
rt_free(rth); rt_free(rth);
} }
write_unlock(&rt_hash_table[i].lock); spin_unlock(&rt_hash_table[i].lock);
/* Fallback loop breaker. */ /* Fallback loop breaker. */
if ((jiffies - now) > 0) if ((jiffies - now) > 0)
...@@ -507,11 +509,11 @@ static void SMP_TIMER_NAME(rt_run_flush)(unsigned long dummy) ...@@ -507,11 +509,11 @@ static void SMP_TIMER_NAME(rt_run_flush)(unsigned long dummy)
rt_deadline = 0; rt_deadline = 0;
for (i = rt_hash_mask; i >= 0; i--) { for (i = rt_hash_mask; i >= 0; i--) {
write_lock_bh(&rt_hash_table[i].lock); spin_lock_bh(&rt_hash_table[i].lock);
rth = rt_hash_table[i].chain; rth = rt_hash_table[i].chain;
if (rth) if (rth)
rt_hash_table[i].chain = NULL; rt_hash_table[i].chain = NULL;
write_unlock_bh(&rt_hash_table[i].lock); spin_unlock_bh(&rt_hash_table[i].lock);
for (; rth; rth = next) { for (; rth; rth = next) {
next = rth->u.rt_next; next = rth->u.rt_next;
...@@ -635,7 +637,7 @@ static int rt_garbage_collect(void) ...@@ -635,7 +637,7 @@ static int rt_garbage_collect(void)
k = (k + 1) & rt_hash_mask; k = (k + 1) & rt_hash_mask;
rthp = &rt_hash_table[k].chain; rthp = &rt_hash_table[k].chain;
write_lock_bh(&rt_hash_table[k].lock); spin_lock_bh(&rt_hash_table[k].lock);
while ((rth = *rthp) != NULL) { while ((rth = *rthp) != NULL) {
if (!rt_may_expire(rth, tmo, expire)) { if (!rt_may_expire(rth, tmo, expire)) {
tmo >>= 1; tmo >>= 1;
...@@ -646,7 +648,7 @@ static int rt_garbage_collect(void) ...@@ -646,7 +648,7 @@ static int rt_garbage_collect(void)
rt_free(rth); rt_free(rth);
goal--; goal--;
} }
write_unlock_bh(&rt_hash_table[k].lock); spin_unlock_bh(&rt_hash_table[k].lock);
if (goal <= 0) if (goal <= 0)
break; break;
} }
...@@ -714,7 +716,7 @@ static int rt_intern_hash(unsigned hash, struct rtable *rt, struct rtable **rp) ...@@ -714,7 +716,7 @@ static int rt_intern_hash(unsigned hash, struct rtable *rt, struct rtable **rp)
restart: restart:
rthp = &rt_hash_table[hash].chain; rthp = &rt_hash_table[hash].chain;
write_lock_bh(&rt_hash_table[hash].lock); spin_lock_bh(&rt_hash_table[hash].lock);
while ((rth = *rthp) != NULL) { while ((rth = *rthp) != NULL) {
if (compare_keys(&rth->fl, &rt->fl)) { if (compare_keys(&rth->fl, &rt->fl)) {
/* Put it first */ /* Put it first */
...@@ -725,7 +727,7 @@ static int rt_intern_hash(unsigned hash, struct rtable *rt, struct rtable **rp) ...@@ -725,7 +727,7 @@ static int rt_intern_hash(unsigned hash, struct rtable *rt, struct rtable **rp)
rth->u.dst.__use++; rth->u.dst.__use++;
dst_hold(&rth->u.dst); dst_hold(&rth->u.dst);
rth->u.dst.lastuse = now; rth->u.dst.lastuse = now;
write_unlock_bh(&rt_hash_table[hash].lock); spin_unlock_bh(&rt_hash_table[hash].lock);
rt_drop(rt); rt_drop(rt);
*rp = rth; *rp = rth;
...@@ -741,7 +743,7 @@ static int rt_intern_hash(unsigned hash, struct rtable *rt, struct rtable **rp) ...@@ -741,7 +743,7 @@ static int rt_intern_hash(unsigned hash, struct rtable *rt, struct rtable **rp)
if (rt->rt_type == RTN_UNICAST || rt->fl.iif == 0) { if (rt->rt_type == RTN_UNICAST || rt->fl.iif == 0) {
int err = arp_bind_neighbour(&rt->u.dst); int err = arp_bind_neighbour(&rt->u.dst);
if (err) { if (err) {
write_unlock_bh(&rt_hash_table[hash].lock); spin_unlock_bh(&rt_hash_table[hash].lock);
if (err != -ENOBUFS) { if (err != -ENOBUFS) {
rt_drop(rt); rt_drop(rt);
...@@ -782,7 +784,7 @@ static int rt_intern_hash(unsigned hash, struct rtable *rt, struct rtable **rp) ...@@ -782,7 +784,7 @@ static int rt_intern_hash(unsigned hash, struct rtable *rt, struct rtable **rp)
} }
#endif #endif
rt_hash_table[hash].chain = rt; rt_hash_table[hash].chain = rt;
write_unlock_bh(&rt_hash_table[hash].lock); spin_unlock_bh(&rt_hash_table[hash].lock);
*rp = rt; *rp = rt;
return 0; return 0;
} }
...@@ -849,7 +851,7 @@ static void rt_del(unsigned hash, struct rtable *rt) ...@@ -849,7 +851,7 @@ static void rt_del(unsigned hash, struct rtable *rt)
{ {
struct rtable **rthp; struct rtable **rthp;
write_lock_bh(&rt_hash_table[hash].lock); spin_lock_bh(&rt_hash_table[hash].lock);
ip_rt_put(rt); ip_rt_put(rt);
for (rthp = &rt_hash_table[hash].chain; *rthp; for (rthp = &rt_hash_table[hash].chain; *rthp;
rthp = &(*rthp)->u.rt_next) rthp = &(*rthp)->u.rt_next)
...@@ -858,7 +860,7 @@ static void rt_del(unsigned hash, struct rtable *rt) ...@@ -858,7 +860,7 @@ static void rt_del(unsigned hash, struct rtable *rt)
rt_free(rt); rt_free(rt);
break; break;
} }
write_unlock_bh(&rt_hash_table[hash].lock); spin_unlock_bh(&rt_hash_table[hash].lock);
} }
void ip_rt_redirect(u32 old_gw, u32 daddr, u32 new_gw, void ip_rt_redirect(u32 old_gw, u32 daddr, u32 new_gw,
...@@ -897,10 +899,11 @@ void ip_rt_redirect(u32 old_gw, u32 daddr, u32 new_gw, ...@@ -897,10 +899,11 @@ void ip_rt_redirect(u32 old_gw, u32 daddr, u32 new_gw,
rthp=&rt_hash_table[hash].chain; rthp=&rt_hash_table[hash].chain;
read_lock(&rt_hash_table[hash].lock); rcu_read_lock();
while ((rth = *rthp) != NULL) { while ((rth = *rthp) != NULL) {
struct rtable *rt; struct rtable *rt;
read_barrier_depends();
if (rth->fl.fl4_dst != daddr || if (rth->fl.fl4_dst != daddr ||
rth->fl.fl4_src != skeys[i] || rth->fl.fl4_src != skeys[i] ||
rth->fl.fl4_tos != tos || rth->fl.fl4_tos != tos ||
...@@ -918,7 +921,7 @@ void ip_rt_redirect(u32 old_gw, u32 daddr, u32 new_gw, ...@@ -918,7 +921,7 @@ void ip_rt_redirect(u32 old_gw, u32 daddr, u32 new_gw,
break; break;
dst_clone(&rth->u.dst); dst_clone(&rth->u.dst);
read_unlock(&rt_hash_table[hash].lock); rcu_read_unlock();
rt = dst_alloc(&ipv4_dst_ops); rt = dst_alloc(&ipv4_dst_ops);
if (rt == NULL) { if (rt == NULL) {
...@@ -929,6 +932,7 @@ void ip_rt_redirect(u32 old_gw, u32 daddr, u32 new_gw, ...@@ -929,6 +932,7 @@ void ip_rt_redirect(u32 old_gw, u32 daddr, u32 new_gw,
/* Copy all the information. */ /* Copy all the information. */
*rt = *rth; *rt = *rth;
INIT_RCU_HEAD(&rt->u.dst.rcu_head);
rt->u.dst.__use = 1; rt->u.dst.__use = 1;
atomic_set(&rt->u.dst.__refcnt, 1); atomic_set(&rt->u.dst.__refcnt, 1);
if (rt->u.dst.dev) if (rt->u.dst.dev)
...@@ -964,7 +968,7 @@ void ip_rt_redirect(u32 old_gw, u32 daddr, u32 new_gw, ...@@ -964,7 +968,7 @@ void ip_rt_redirect(u32 old_gw, u32 daddr, u32 new_gw,
ip_rt_put(rt); ip_rt_put(rt);
goto do_next; goto do_next;
} }
read_unlock(&rt_hash_table[hash].lock); rcu_read_unlock();
do_next: do_next:
; ;
} }
...@@ -1144,9 +1148,10 @@ unsigned short ip_rt_frag_needed(struct iphdr *iph, unsigned short new_mtu) ...@@ -1144,9 +1148,10 @@ unsigned short ip_rt_frag_needed(struct iphdr *iph, unsigned short new_mtu)
for (i = 0; i < 2; i++) { for (i = 0; i < 2; i++) {
unsigned hash = rt_hash_code(daddr, skeys[i], tos); unsigned hash = rt_hash_code(daddr, skeys[i], tos);
read_lock(&rt_hash_table[hash].lock); rcu_read_lock();
for (rth = rt_hash_table[hash].chain; rth; for (rth = rt_hash_table[hash].chain; rth;
rth = rth->u.rt_next) { rth = rth->u.rt_next) {
read_barrier_depends();
if (rth->fl.fl4_dst == daddr && if (rth->fl.fl4_dst == daddr &&
rth->fl.fl4_src == skeys[i] && rth->fl.fl4_src == skeys[i] &&
rth->rt_dst == daddr && rth->rt_dst == daddr &&
...@@ -1182,7 +1187,7 @@ unsigned short ip_rt_frag_needed(struct iphdr *iph, unsigned short new_mtu) ...@@ -1182,7 +1187,7 @@ unsigned short ip_rt_frag_needed(struct iphdr *iph, unsigned short new_mtu)
} }
} }
} }
read_unlock(&rt_hash_table[hash].lock); rcu_read_unlock();
} }
return est_mtu ? : new_mtu; return est_mtu ? : new_mtu;
} }
...@@ -1736,8 +1741,9 @@ int ip_route_input(struct sk_buff *skb, u32 daddr, u32 saddr, ...@@ -1736,8 +1741,9 @@ int ip_route_input(struct sk_buff *skb, u32 daddr, u32 saddr,
tos &= IPTOS_RT_MASK; tos &= IPTOS_RT_MASK;
hash = rt_hash_code(daddr, saddr ^ (iif << 5), tos); hash = rt_hash_code(daddr, saddr ^ (iif << 5), tos);
read_lock(&rt_hash_table[hash].lock); rcu_read_lock();
for (rth = rt_hash_table[hash].chain; rth; rth = rth->u.rt_next) { for (rth = rt_hash_table[hash].chain; rth; rth = rth->u.rt_next) {
read_barrier_depends();
if (rth->fl.fl4_dst == daddr && if (rth->fl.fl4_dst == daddr &&
rth->fl.fl4_src == saddr && rth->fl.fl4_src == saddr &&
rth->fl.iif == iif && rth->fl.iif == iif &&
...@@ -1750,12 +1756,12 @@ int ip_route_input(struct sk_buff *skb, u32 daddr, u32 saddr, ...@@ -1750,12 +1756,12 @@ int ip_route_input(struct sk_buff *skb, u32 daddr, u32 saddr,
dst_hold(&rth->u.dst); dst_hold(&rth->u.dst);
rth->u.dst.__use++; rth->u.dst.__use++;
rt_cache_stat[smp_processor_id()].in_hit++; rt_cache_stat[smp_processor_id()].in_hit++;
read_unlock(&rt_hash_table[hash].lock); rcu_read_unlock();
skb->dst = (struct dst_entry*)rth; skb->dst = (struct dst_entry*)rth;
return 0; return 0;
} }
} }
read_unlock(&rt_hash_table[hash].lock); rcu_read_unlock();
/* Multicast recognition logic is moved from route cache to here. /* Multicast recognition logic is moved from route cache to here.
The problem was that too many Ethernet cards have broken/missing The problem was that too many Ethernet cards have broken/missing
...@@ -2100,8 +2106,9 @@ int __ip_route_output_key(struct rtable **rp, const struct flowi *flp) ...@@ -2100,8 +2106,9 @@ int __ip_route_output_key(struct rtable **rp, const struct flowi *flp)
hash = rt_hash_code(flp->fl4_dst, flp->fl4_src ^ (flp->oif << 5), flp->fl4_tos); hash = rt_hash_code(flp->fl4_dst, flp->fl4_src ^ (flp->oif << 5), flp->fl4_tos);
read_lock_bh(&rt_hash_table[hash].lock); rcu_read_lock();
for (rth = rt_hash_table[hash].chain; rth; rth = rth->u.rt_next) { for (rth = rt_hash_table[hash].chain; rth; rth = rth->u.rt_next) {
read_barrier_depends();
if (rth->fl.fl4_dst == flp->fl4_dst && if (rth->fl.fl4_dst == flp->fl4_dst &&
rth->fl.fl4_src == flp->fl4_src && rth->fl.fl4_src == flp->fl4_src &&
rth->fl.iif == 0 && rth->fl.iif == 0 &&
...@@ -2115,12 +2122,12 @@ int __ip_route_output_key(struct rtable **rp, const struct flowi *flp) ...@@ -2115,12 +2122,12 @@ int __ip_route_output_key(struct rtable **rp, const struct flowi *flp)
dst_hold(&rth->u.dst); dst_hold(&rth->u.dst);
rth->u.dst.__use++; rth->u.dst.__use++;
rt_cache_stat[smp_processor_id()].out_hit++; rt_cache_stat[smp_processor_id()].out_hit++;
read_unlock_bh(&rt_hash_table[hash].lock); rcu_read_unlock();
*rp = rth; *rp = rth;
return 0; return 0;
} }
} }
read_unlock_bh(&rt_hash_table[hash].lock); rcu_read_unlock();
return ip_route_output_slow(rp, flp); return ip_route_output_slow(rp, flp);
} }
...@@ -2328,9 +2335,10 @@ int ip_rt_dump(struct sk_buff *skb, struct netlink_callback *cb) ...@@ -2328,9 +2335,10 @@ int ip_rt_dump(struct sk_buff *skb, struct netlink_callback *cb)
if (h < s_h) continue; if (h < s_h) continue;
if (h > s_h) if (h > s_h)
s_idx = 0; s_idx = 0;
read_lock_bh(&rt_hash_table[h].lock); rcu_read_lock();
for (rt = rt_hash_table[h].chain, idx = 0; rt; for (rt = rt_hash_table[h].chain, idx = 0; rt;
rt = rt->u.rt_next, idx++) { rt = rt->u.rt_next, idx++) {
read_barrier_depends();
if (idx < s_idx) if (idx < s_idx)
continue; continue;
skb->dst = dst_clone(&rt->u.dst); skb->dst = dst_clone(&rt->u.dst);
...@@ -2338,12 +2346,12 @@ int ip_rt_dump(struct sk_buff *skb, struct netlink_callback *cb) ...@@ -2338,12 +2346,12 @@ int ip_rt_dump(struct sk_buff *skb, struct netlink_callback *cb)
cb->nlh->nlmsg_seq, cb->nlh->nlmsg_seq,
RTM_NEWROUTE, 1) <= 0) { RTM_NEWROUTE, 1) <= 0) {
dst_release(xchg(&skb->dst, NULL)); dst_release(xchg(&skb->dst, NULL));
read_unlock_bh(&rt_hash_table[h].lock); rcu_read_unlock();
goto done; goto done;
} }
dst_release(xchg(&skb->dst, NULL)); dst_release(xchg(&skb->dst, NULL));
} }
read_unlock_bh(&rt_hash_table[h].lock); rcu_read_unlock();
} }
done: done:
...@@ -2627,7 +2635,7 @@ int __init ip_rt_init(void) ...@@ -2627,7 +2635,7 @@ int __init ip_rt_init(void)
rt_hash_mask--; rt_hash_mask--;
for (i = 0; i <= rt_hash_mask; i++) { for (i = 0; i <= rt_hash_mask; i++) {
rt_hash_table[i].lock = RW_LOCK_UNLOCKED; rt_hash_table[i].lock = SPIN_LOCK_UNLOCKED;
rt_hash_table[i].chain = NULL; rt_hash_table[i].chain = NULL;
} }
......
...@@ -2236,6 +2236,7 @@ static void *listening_get_next(struct seq_file *seq, void *cur) ...@@ -2236,6 +2236,7 @@ static void *listening_get_next(struct seq_file *seq, void *cur)
goto get_req; goto get_req;
} }
read_unlock_bh(&tp->syn_wait_lock); read_unlock_bh(&tp->syn_wait_lock);
sk = sk->next;
} }
if (++st->bucket < TCP_LHTABLE_SIZE) { if (++st->bucket < TCP_LHTABLE_SIZE) {
sk = tcp_listening_hash[st->bucket]; sk = tcp_listening_hash[st->bucket];
......
...@@ -871,6 +871,7 @@ static void ndisc_router_discovery(struct sk_buff *skb) ...@@ -871,6 +871,7 @@ static void ndisc_router_discovery(struct sk_buff *skb)
} }
if (!ndisc_parse_options(opt, optlen, &ndopts)) { if (!ndisc_parse_options(opt, optlen, &ndopts)) {
in6_dev_put(in6_dev);
if (net_ratelimit()) if (net_ratelimit())
ND_PRINTK2(KERN_WARNING ND_PRINTK2(KERN_WARNING
"ICMP6 RA: invalid ND option, ignored.\n"); "ICMP6 RA: invalid ND option, ignored.\n");
......
/* /*
* net/key/pfkeyv2.c An implemenation of PF_KEYv2 sockets. * net/key/af_key.c An implementation of PF_KEYv2 sockets.
* *
* This program is free software; you can redistribute it and/or * This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License * modify it under the terms of the GNU General Public License
......
...@@ -128,8 +128,9 @@ sctp_association_t *sctp_association_init(sctp_association_t *asoc, ...@@ -128,8 +128,9 @@ sctp_association_t *sctp_association_init(sctp_association_t *asoc,
asoc->state_timestamp = jiffies; asoc->state_timestamp = jiffies;
/* Set things that have constant value. */ /* Set things that have constant value. */
asoc->cookie_life.tv_sec = SCTP_DEFAULT_COOKIE_LIFE_SEC; asoc->cookie_life.tv_sec = sctp_proto.valid_cookie_life / HZ;
asoc->cookie_life.tv_usec = SCTP_DEFAULT_COOKIE_LIFE_USEC; asoc->cookie_life.tv_usec = (sctp_proto.valid_cookie_life % HZ) *
1000000L / HZ;
asoc->pmtu = 0; asoc->pmtu = 0;
asoc->frag_point = 0; asoc->frag_point = 0;
...@@ -185,6 +186,8 @@ sctp_association_t *sctp_association_init(sctp_association_t *asoc, ...@@ -185,6 +186,8 @@ sctp_association_t *sctp_association_init(sctp_association_t *asoc,
else else
asoc->rwnd = sk->rcvbuf; asoc->rwnd = sk->rcvbuf;
asoc->a_rwnd = 0;
asoc->rwnd_over = 0; asoc->rwnd_over = 0;
/* Use my own max window until I learn something better. */ /* Use my own max window until I learn something better. */
...@@ -642,7 +645,7 @@ __u16 __sctp_association_get_next_ssn(sctp_association_t *asoc, __u16 sid) ...@@ -642,7 +645,7 @@ __u16 __sctp_association_get_next_ssn(sctp_association_t *asoc, __u16 sid)
int sctp_cmp_addr_exact(const union sctp_addr *ss1, int sctp_cmp_addr_exact(const union sctp_addr *ss1,
const union sctp_addr *ss2) const union sctp_addr *ss2)
{ {
struct sctp_func *af; struct sctp_af *af;
af = sctp_get_af_specific(ss1->sa.sa_family); af = sctp_get_af_specific(ss1->sa.sa_family);
if (!af) if (!af)
......
...@@ -327,7 +327,7 @@ static int sctp_copy_one_addr(sctp_bind_addr_t *dest, union sctp_addr *addr, ...@@ -327,7 +327,7 @@ static int sctp_copy_one_addr(sctp_bind_addr_t *dest, union sctp_addr *addr,
/* Is this a wildcard address? */ /* Is this a wildcard address? */
int sctp_is_any(const union sctp_addr *addr) int sctp_is_any(const union sctp_addr *addr)
{ {
struct sctp_func *af = sctp_get_af_specific(addr->sa.sa_family); struct sctp_af *af = sctp_get_af_specific(addr->sa.sa_family);
if (!af) if (!af)
return 0; return 0;
return af->is_any(addr); return af->is_any(addr);
...@@ -362,7 +362,7 @@ int sctp_in_scope(const union sctp_addr *addr, sctp_scope_t scope) ...@@ -362,7 +362,7 @@ int sctp_in_scope(const union sctp_addr *addr, sctp_scope_t scope)
/* What is the scope of 'addr'? */ /* What is the scope of 'addr'? */
sctp_scope_t sctp_scope(const union sctp_addr *addr) sctp_scope_t sctp_scope(const union sctp_addr *addr)
{ {
struct sctp_func *af; struct sctp_af *af;
af = sctp_get_af_specific(addr->sa.sa_family); af = sctp_get_af_specific(addr->sa.sa_family);
if (!af) if (!af)
......
...@@ -42,6 +42,7 @@ ...@@ -42,6 +42,7 @@
* Hui Huang <hui.huang@nokia.com> * Hui Huang <hui.huang@nokia.com>
* Daisy Chang <daisyc@us.ibm.com> * Daisy Chang <daisyc@us.ibm.com>
* Sridhar Samudrala <sri@us.ibm.com> * Sridhar Samudrala <sri@us.ibm.com>
* Ardelle Fan <ardelle.fan@intel.com>
* *
* Any bugs reported given to us we will try to fix... any fixes shared will * Any bugs reported given to us we will try to fix... any fixes shared will
* be incorporated into the next SCTP release. * be incorporated into the next SCTP release.
...@@ -96,7 +97,7 @@ int sctp_rcv(struct sk_buff *skb) ...@@ -96,7 +97,7 @@ int sctp_rcv(struct sk_buff *skb)
struct sctphdr *sh; struct sctphdr *sh;
union sctp_addr src; union sctp_addr src;
union sctp_addr dest; union sctp_addr dest;
struct sctp_func *af; struct sctp_af *af;
int ret = 0; int ret = 0;
if (skb->pkt_type!=PACKET_HOST) if (skb->pkt_type!=PACKET_HOST)
...@@ -279,6 +280,7 @@ int sctp_rcv_ootb(struct sk_buff *skb) ...@@ -279,6 +280,7 @@ int sctp_rcv_ootb(struct sk_buff *skb)
{ {
sctp_chunkhdr_t *ch; sctp_chunkhdr_t *ch;
__u8 *ch_end; __u8 *ch_end;
sctp_errhdr_t *err;
ch = (sctp_chunkhdr_t *) skb->data; ch = (sctp_chunkhdr_t *) skb->data;
...@@ -308,8 +310,9 @@ int sctp_rcv_ootb(struct sk_buff *skb) ...@@ -308,8 +310,9 @@ int sctp_rcv_ootb(struct sk_buff *skb)
goto discard; goto discard;
if (ch->type == SCTP_CID_ERROR) { if (ch->type == SCTP_CID_ERROR) {
/* FIXME - Need to check the "Stale cookie" ERROR. */ err = (sctp_errhdr_t *)(ch + sizeof(sctp_chunkhdr_t));
goto discard; if (SCTP_ERROR_STALE_COOKIE == err->cause)
goto discard;
} }
ch = (sctp_chunkhdr_t *) ch_end; ch = (sctp_chunkhdr_t *) ch_end;
......
...@@ -76,8 +76,19 @@ ...@@ -76,8 +76,19 @@
#include <asm/uaccess.h> #include <asm/uaccess.h>
/* FIXME: Cleanup so we don't need TEST_FRAME here. */ extern struct notifier_block sctp_inetaddr_notifier;
#ifndef TEST_FRAME
/* FIXME: This macro needs to be moved to a common header file. */
#define NIP6(addr) \
ntohs((addr)->s6_addr16[0]), \
ntohs((addr)->s6_addr16[1]), \
ntohs((addr)->s6_addr16[2]), \
ntohs((addr)->s6_addr16[3]), \
ntohs((addr)->s6_addr16[4]), \
ntohs((addr)->s6_addr16[5]), \
ntohs((addr)->s6_addr16[6]), \
ntohs((addr)->s6_addr16[7])
/* FIXME: Comments. */ /* FIXME: Comments. */
static inline void sctp_v6_err(struct sk_buff *skb, static inline void sctp_v6_err(struct sk_buff *skb,
struct inet6_skb_parm *opt, struct inet6_skb_parm *opt,
...@@ -92,13 +103,38 @@ static inline int sctp_v6_xmit(struct sk_buff *skb) ...@@ -92,13 +103,38 @@ static inline int sctp_v6_xmit(struct sk_buff *skb)
struct sock *sk = skb->sk; struct sock *sk = skb->sk;
struct ipv6_pinfo *np = inet6_sk(sk); struct ipv6_pinfo *np = inet6_sk(sk);
struct flowi fl; struct flowi fl;
struct dst_entry *dst; struct dst_entry *dst = skb->dst;
struct rt6_info *rt6 = (struct rt6_info *)dst;
struct in6_addr saddr; struct in6_addr saddr;
int err = 0; int err;
fl.proto = sk->protocol; fl.proto = sk->protocol;
fl.fl6_dst = &np->daddr; fl.fl6_dst = &rt6->rt6i_dst.addr;
fl.fl6_src = NULL;
/* FIXME: Currently, ip6_route_output() doesn't fill in the source
* address in the returned route entry. So we call ipv6_get_saddr()
* to get an appropriate source address. It is possible that this address
* may not be part of the bind address list of the association.
* Once ip6_route_ouput() is fixed so that it returns a route entry
* with an appropriate source address, the following if condition can
* be removed. With ip6_route_output() returning a source address filled
* route entry, sctp_transport_route() can do real source address
* selection for v6.
*/
if (ipv6_addr_any(&rt6->rt6i_src.addr)) {
err = ipv6_get_saddr(dst, fl.fl6_dst, &saddr);
if (err) {
printk(KERN_ERR "%s: No saddr available for "
"DST=%04x:%04x:%04x:%04x:%04x:%04x:%04x:%04x\n",
__FUNCTION__, NIP6(fl.fl6_src));
return err;
}
fl.fl6_src = &saddr;
} else {
fl.fl6_src = &rt6->rt6i_src.addr;
}
fl.fl6_flowlabel = np->flow_label; fl.fl6_flowlabel = np->flow_label;
IP6_ECN_flow_xmit(sk, fl.fl6_flowlabel); IP6_ECN_flow_xmit(sk, fl.fl6_flowlabel);
...@@ -111,63 +147,8 @@ static inline int sctp_v6_xmit(struct sk_buff *skb) ...@@ -111,63 +147,8 @@ static inline int sctp_v6_xmit(struct sk_buff *skb)
fl.nl_u.ip6_u.daddr = rt0->addr; fl.nl_u.ip6_u.daddr = rt0->addr;
} }
dst = __sk_dst_check(sk, np->dst_cookie);
if (dst == NULL) {
dst = ip6_route_output(sk, &fl);
if (dst->error) {
sk->err_soft = -dst->error;
dst_release(dst);
return -sk->err_soft;
}
ip6_dst_store(sk, dst, NULL);
}
skb->dst = dst_clone(dst);
/* FIXME: This is all temporary until real source address
* selection is done.
*/
if (ipv6_addr_any(&np->saddr)) {
err = ipv6_get_saddr(dst, fl.fl6_dst, &saddr);
if (err)
printk(KERN_ERR "sctp_v6_xmit: no saddr available\n");
/* FIXME: This is a workaround until we get
* real source address selection done. This is here
* to disallow loopback when the scoping rules have
* not bound loopback to the endpoint.
*/
if (sctp_ipv6_addr_type(&saddr) & IPV6_ADDR_LOOPBACK) {
if (!(sctp_ipv6_addr_type(&np->daddr) &
IPV6_ADDR_LOOPBACK)) {
ipv6_addr_copy(&saddr, &np->daddr);
}
}
fl.fl6_src = &saddr;
} else {
fl.fl6_src = &np->saddr;
}
/* Restore final destination back after routing done */
fl.nl_u.ip6_u.daddr = &np->daddr;
return ip6_xmit(sk, skb, &fl, np->opt); return ip6_xmit(sk, skb, &fl, np->opt);
} }
#endif /* TEST_FRAME */
/* FIXME: This macro needs to be moved to a common header file. */
#define NIP6(addr) \
ntohs((addr)->s6_addr16[0]), \
ntohs((addr)->s6_addr16[1]), \
ntohs((addr)->s6_addr16[2]), \
ntohs((addr)->s6_addr16[3]), \
ntohs((addr)->s6_addr16[4]), \
ntohs((addr)->s6_addr16[5]), \
ntohs((addr)->s6_addr16[6]), \
ntohs((addr)->s6_addr16[7])
/* Returns the dst cache entry for the given source and destination ip /* Returns the dst cache entry for the given source and destination ip
* addresses. * addresses.
...@@ -176,7 +157,7 @@ struct dst_entry *sctp_v6_get_dst(union sctp_addr *daddr, ...@@ -176,7 +157,7 @@ struct dst_entry *sctp_v6_get_dst(union sctp_addr *daddr,
union sctp_addr *saddr) union sctp_addr *saddr)
{ {
struct dst_entry *dst; struct dst_entry *dst;
struct flowi fl = { struct flowi fl = {
.nl_u = { .ip6_u = { .daddr = &daddr->v6.sin6_addr, } } }; .nl_u = { .ip6_u = { .daddr = &daddr->v6.sin6_addr, } } };
...@@ -261,6 +242,20 @@ static void sctp_v6_from_skb(union sctp_addr *addr,struct sk_buff *skb, ...@@ -261,6 +242,20 @@ static void sctp_v6_from_skb(union sctp_addr *addr,struct sk_buff *skb,
ipv6_addr_copy(&addr->v6.sin6_addr, from); ipv6_addr_copy(&addr->v6.sin6_addr, from);
} }
/* Initialize an sctp_addr from a socket. */
static void sctp_v6_from_sk(union sctp_addr *addr, struct sock *sk)
{
addr->v6.sin6_family = AF_INET6;
addr->v6.sin6_port = inet_sk(sk)->num;
addr->v6.sin6_addr = inet6_sk(sk)->rcv_saddr;
}
/* Initialize sk->rcv_saddr from sctp_addr. */
static void sctp_v6_to_sk(union sctp_addr *addr, struct sock *sk)
{
inet6_sk(sk)->rcv_saddr = addr->v6.sin6_addr;
}
/* Initialize a sctp_addr from a dst_entry. */ /* Initialize a sctp_addr from a dst_entry. */
static void sctp_v6_dst_saddr(union sctp_addr *addr, struct dst_entry *dst) static void sctp_v6_dst_saddr(union sctp_addr *addr, struct dst_entry *dst)
{ {
...@@ -270,15 +265,15 @@ static void sctp_v6_dst_saddr(union sctp_addr *addr, struct dst_entry *dst) ...@@ -270,15 +265,15 @@ static void sctp_v6_dst_saddr(union sctp_addr *addr, struct dst_entry *dst)
} }
/* Compare addresses exactly. Well.. almost exactly; ignore scope_id /* Compare addresses exactly. Well.. almost exactly; ignore scope_id
* for now. FIXME. * for now. FIXME: v4-mapped-v6.
*/ */
static int sctp_v6_cmp_addr(const union sctp_addr *addr1, static int sctp_v6_cmp_addr(const union sctp_addr *addr1,
const union sctp_addr *addr2) const union sctp_addr *addr2)
{ {
int match; int match;
if (addr1->sa.sa_family != addr2->sa.sa_family) if (addr1->sa.sa_family != addr2->sa.sa_family)
return 0; return 0;
match = !ipv6_addr_cmp((struct in6_addr *)&addr1->v6.sin6_addr, match = !ipv6_addr_cmp((struct in6_addr *)&addr1->v6.sin6_addr,
(struct in6_addr *)&addr2->v6.sin6_addr); (struct in6_addr *)&addr2->v6.sin6_addr);
return match; return match;
...@@ -300,6 +295,22 @@ static int sctp_v6_is_any(const union sctp_addr *addr) ...@@ -300,6 +295,22 @@ static int sctp_v6_is_any(const union sctp_addr *addr)
return IPV6_ADDR_ANY == type; return IPV6_ADDR_ANY == type;
} }
/* Should this be available for binding? */
static int sctp_v6_available(const union sctp_addr *addr)
{
int type;
struct in6_addr *in6 = (struct in6_addr *)&addr->v6.sin6_addr;
type = ipv6_addr_type(in6);
if (IPV6_ADDR_ANY == type)
return 1;
if (!(type & IPV6_ADDR_UNICAST))
return 0;
return ipv6_chk_addr(in6, NULL);
}
/* This function checks if the address is a valid address to be used for /* This function checks if the address is a valid address to be used for
* SCTP. * SCTP.
* *
...@@ -309,7 +320,7 @@ static int sctp_v6_is_any(const union sctp_addr *addr) ...@@ -309,7 +320,7 @@ static int sctp_v6_is_any(const union sctp_addr *addr)
*/ */
static int sctp_v6_addr_valid(union sctp_addr *addr) static int sctp_v6_addr_valid(union sctp_addr *addr)
{ {
int ret = sctp_ipv6_addr_type(&addr->v6.sin6_addr); int ret = ipv6_addr_type(&addr->v6.sin6_addr);
/* FIXME: v4-mapped-v6 address support. */ /* FIXME: v4-mapped-v6 address support. */
...@@ -442,14 +453,14 @@ static int sctp_inet6_af_supported(sa_family_t family) ...@@ -442,14 +453,14 @@ static int sctp_inet6_af_supported(sa_family_t family)
/* Address matching with wildcards allowed. This extra level /* Address matching with wildcards allowed. This extra level
* of indirection lets us choose whether a PF_INET6 should * of indirection lets us choose whether a PF_INET6 should
* disallow any v4 addresses if we so choose. * disallow any v4 addresses if we so choose.
*/ */
static int sctp_inet6_cmp_addr(const union sctp_addr *addr1, static int sctp_inet6_cmp_addr(const union sctp_addr *addr1,
const union sctp_addr *addr2, const union sctp_addr *addr2,
struct sctp_opt *opt) struct sctp_opt *opt)
{ {
struct sctp_func *af1, *af2; struct sctp_af *af1, *af2;
af1 = sctp_get_af_specific(addr1->sa.sa_family); af1 = sctp_get_af_specific(addr1->sa.sa_family);
af2 = sctp_get_af_specific(addr2->sa.sa_family); af2 = sctp_get_af_specific(addr2->sa.sa_family);
...@@ -461,11 +472,25 @@ static int sctp_inet6_cmp_addr(const union sctp_addr *addr1, ...@@ -461,11 +472,25 @@ static int sctp_inet6_cmp_addr(const union sctp_addr *addr1,
if (addr1->sa.sa_family != addr2->sa.sa_family) if (addr1->sa.sa_family != addr2->sa.sa_family)
return 0; return 0;
return af1->cmp_addr(addr1, addr2); return af1->cmp_addr(addr1, addr2);
} }
/* Verify that the provided sockaddr looks bindable. Common verification,
* has already been taken care of.
*/
static int sctp_inet6_bind_verify(struct sctp_opt *opt, union sctp_addr *addr)
{
struct sctp_af *af;
/* ASSERT: address family has already been verified. */
if (addr->sa.sa_family != AF_INET6) {
af = sctp_get_af_specific(addr->sa.sa_family);
} else
af = opt->pf->af;
return af->available(addr);
}
static struct proto_ops inet6_seqpacket_ops = { static struct proto_ops inet6_seqpacket_ops = {
.family = PF_INET6, .family = PF_INET6,
...@@ -501,29 +526,33 @@ static struct inet6_protocol sctpv6_protocol = { ...@@ -501,29 +526,33 @@ static struct inet6_protocol sctpv6_protocol = {
.err_handler = sctp_v6_err, .err_handler = sctp_v6_err,
}; };
static sctp_func_t sctp_ipv6_specific = { static struct sctp_af sctp_ipv6_specific = {
.queue_xmit = sctp_v6_xmit, .queue_xmit = sctp_v6_xmit,
.setsockopt = ipv6_setsockopt, .setsockopt = ipv6_setsockopt,
.getsockopt = ipv6_getsockopt, .getsockopt = ipv6_getsockopt,
.get_dst = sctp_v6_get_dst, .get_dst = sctp_v6_get_dst,
.copy_addrlist = sctp_v6_copy_addrlist, .copy_addrlist = sctp_v6_copy_addrlist,
.from_skb = sctp_v6_from_skb, .from_skb = sctp_v6_from_skb,
.from_sk = sctp_v6_from_sk,
.to_sk = sctp_v6_to_sk,
.dst_saddr = sctp_v6_dst_saddr, .dst_saddr = sctp_v6_dst_saddr,
.cmp_addr = sctp_v6_cmp_addr, .cmp_addr = sctp_v6_cmp_addr,
.scope = sctp_v6_scope, .scope = sctp_v6_scope,
.addr_valid = sctp_v6_addr_valid, .addr_valid = sctp_v6_addr_valid,
.inaddr_any = sctp_v6_inaddr_any, .inaddr_any = sctp_v6_inaddr_any,
.is_any = sctp_v6_is_any, .is_any = sctp_v6_is_any,
.available = sctp_v6_available,
.net_header_len = sizeof(struct ipv6hdr), .net_header_len = sizeof(struct ipv6hdr),
.sockaddr_len = sizeof(struct sockaddr_in6), .sockaddr_len = sizeof(struct sockaddr_in6),
.sa_family = AF_INET6, .sa_family = AF_INET6,
}; };
static sctp_pf_t sctp_pf_inet6_specific = { static struct sctp_pf sctp_pf_inet6_specific = {
.event_msgname = sctp_inet6_event_msgname, .event_msgname = sctp_inet6_event_msgname,
.skb_msgname = sctp_inet6_skb_msgname, .skb_msgname = sctp_inet6_skb_msgname,
.af_supported = sctp_inet6_af_supported, .af_supported = sctp_inet6_af_supported,
.cmp_addr = sctp_inet6_cmp_addr, .cmp_addr = sctp_inet6_cmp_addr,
.bind_verify = sctp_inet6_bind_verify,
.af = &sctp_ipv6_specific, .af = &sctp_ipv6_specific,
}; };
...@@ -538,11 +567,13 @@ int sctp_v6_init(void) ...@@ -538,11 +567,13 @@ int sctp_v6_init(void)
inet6_register_protosw(&sctpv6_protosw); inet6_register_protosw(&sctpv6_protosw);
/* Register the SCTP specfic PF_INET6 functions. */ /* Register the SCTP specfic PF_INET6 functions. */
sctp_set_pf_specific(PF_INET6, &sctp_pf_inet6_specific); sctp_register_pf(&sctp_pf_inet6_specific, PF_INET6);
/* Register the SCTP specfic AF_INET6 functions. */
sctp_register_af(&sctp_ipv6_specific);
/* Fill in address family info. */ /* Register notifier for inet6 address additions/deletions. */
INIT_LIST_HEAD(&sctp_ipv6_specific.list); register_inet6addr_notifier(&sctp_inetaddr_notifier);
list_add_tail(&sctp_ipv6_specific.list, &sctp_proto.address_families);
return 0; return 0;
} }
...@@ -553,4 +584,5 @@ void sctp_v6_exit(void) ...@@ -553,4 +584,5 @@ void sctp_v6_exit(void)
list_del(&sctp_ipv6_specific.list); list_del(&sctp_ipv6_specific.list);
inet6_del_protocol(&sctpv6_protocol, IPPROTO_SCTP); inet6_del_protocol(&sctpv6_protocol, IPPROTO_SCTP);
inet6_unregister_protosw(&sctpv6_protosw); inet6_unregister_protosw(&sctpv6_protosw);
unregister_inet6addr_notifier(&sctp_inetaddr_notifier);
} }
...@@ -40,6 +40,7 @@ ...@@ -40,6 +40,7 @@
* Jon Grimm <jgrimm@us.ibm.com> * Jon Grimm <jgrimm@us.ibm.com>
* Sridhar Samudrala <sri@us.ibm.com> * Sridhar Samudrala <sri@us.ibm.com>
* Daisy Chang <daisyc@us.ibm.com> * Daisy Chang <daisyc@us.ibm.com>
* Ardelle Fan <ardelle.fan@intel.com>
* *
* Any bugs reported given to us we will try to fix... any fixes shared will * Any bugs reported given to us we will try to fix... any fixes shared will
* be incorporated into the next SCTP release. * be incorporated into the next SCTP release.
...@@ -67,8 +68,10 @@ struct sctp_mib sctp_statistics[NR_CPUS * 2]; ...@@ -67,8 +68,10 @@ struct sctp_mib sctp_statistics[NR_CPUS * 2];
*/ */
static struct socket *sctp_ctl_socket; static struct socket *sctp_ctl_socket;
static sctp_pf_t *sctp_pf_inet6_specific; static struct sctp_pf *sctp_pf_inet6_specific;
static sctp_pf_t *sctp_pf_inet_specific; static struct sctp_pf *sctp_pf_inet_specific;
static struct sctp_af *sctp_af_v4_specific;
static struct sctp_af *sctp_af_v6_specific;
extern struct net_proto_family inet_family_ops; extern struct net_proto_family inet_family_ops;
...@@ -140,12 +143,12 @@ static void __sctp_get_local_addr_list(sctp_protocol_t *proto) ...@@ -140,12 +143,12 @@ static void __sctp_get_local_addr_list(sctp_protocol_t *proto)
{ {
struct net_device *dev; struct net_device *dev;
struct list_head *pos; struct list_head *pos;
struct sctp_func *af; struct sctp_af *af;
read_lock(&dev_base_lock); read_lock(&dev_base_lock);
for (dev = dev_base; dev; dev = dev->next) { for (dev = dev_base; dev; dev = dev->next) {
list_for_each(pos, &proto->address_families) { list_for_each(pos, &proto->address_families) {
af = list_entry(pos, sctp_func_t, list); af = list_entry(pos, struct sctp_af, list);
af->copy_addrlist(&proto->local_addr_list, dev); af->copy_addrlist(&proto->local_addr_list, dev);
} }
} }
...@@ -251,7 +254,6 @@ struct dst_entry *sctp_v4_get_dst(union sctp_addr *daddr, ...@@ -251,7 +254,6 @@ struct dst_entry *sctp_v4_get_dst(union sctp_addr *daddr,
return &rt->u.dst; return &rt->u.dst;
} }
/* Initialize a sctp_addr from in incoming skb. */ /* Initialize a sctp_addr from in incoming skb. */
static void sctp_v4_from_skb(union sctp_addr *addr, struct sk_buff *skb, static void sctp_v4_from_skb(union sctp_addr *addr, struct sk_buff *skb,
int is_saddr) int is_saddr)
...@@ -274,6 +276,21 @@ static void sctp_v4_from_skb(union sctp_addr *addr, struct sk_buff *skb, ...@@ -274,6 +276,21 @@ static void sctp_v4_from_skb(union sctp_addr *addr, struct sk_buff *skb,
memcpy(&addr->v4.sin_addr.s_addr, from, sizeof(struct in_addr)); memcpy(&addr->v4.sin_addr.s_addr, from, sizeof(struct in_addr));
} }
/* Initialize an sctp_addr from a socket. */
static void sctp_v4_from_sk(union sctp_addr *addr, struct sock *sk)
{
addr->v4.sin_family = AF_INET;
addr->v4.sin_port = inet_sk(sk)->num;
addr->v4.sin_addr.s_addr = inet_sk(sk)->rcv_saddr;
}
/* Initialize sk->rcv_saddr from sctp_addr. */
static void sctp_v4_to_sk(union sctp_addr *addr, struct sock *sk)
{
inet_sk(sk)->rcv_saddr = addr->v4.sin_addr.s_addr;
}
/* Initialize a sctp_addr from a dst_entry. */ /* Initialize a sctp_addr from a dst_entry. */
static void sctp_v4_dst_saddr(union sctp_addr *saddr, struct dst_entry *dst) static void sctp_v4_dst_saddr(union sctp_addr *saddr, struct dst_entry *dst)
{ {
...@@ -311,7 +328,7 @@ static int sctp_v4_is_any(const union sctp_addr *addr) ...@@ -311,7 +328,7 @@ static int sctp_v4_is_any(const union sctp_addr *addr)
} }
/* This function checks if the address is a valid address to be used for /* This function checks if the address is a valid address to be used for
* SCTP. * SCTP binding.
* *
* Output: * Output:
* Return 0 - If the address is a non-unicast or an illegal address. * Return 0 - If the address is a non-unicast or an illegal address.
...@@ -326,6 +343,18 @@ static int sctp_v4_addr_valid(union sctp_addr *addr) ...@@ -326,6 +343,18 @@ static int sctp_v4_addr_valid(union sctp_addr *addr)
return 1; return 1;
} }
/* Should this be available for binding? */
static int sctp_v4_available(const union sctp_addr *addr)
{
int ret = inet_addr_type(addr->v4.sin_addr.s_addr);
/* FIXME: ip_nonlocal_bind sysctl support. */
if (addr->v4.sin_addr.s_addr != INADDR_ANY && ret != RTN_LOCAL)
return 0;
return 1;
}
/* Checking the loopback, private and other address scopes as defined in /* Checking the loopback, private and other address scopes as defined in
* RFC 1918. The IPv4 scoping is based on the draft for SCTP IPv4 * RFC 1918. The IPv4 scoping is based on the draft for SCTP IPv4
* scoping <draft-stewart-tsvwg-sctp-ipv4-00.txt>. * scoping <draft-stewart-tsvwg-sctp-ipv4-00.txt>.
...@@ -365,11 +394,11 @@ static sctp_scope_t sctp_v4_scope(union sctp_addr *addr) ...@@ -365,11 +394,11 @@ static sctp_scope_t sctp_v4_scope(union sctp_addr *addr)
return retval; return retval;
} }
/* Event handler for inet device events. /* Event handler for inet address addition/deletion events.
* Basically, whenever there is an event, we re-build our local address list. * Basically, whenever there is an event, we re-build our local address list.
*/ */
static int sctp_netdev_event(struct notifier_block *this, unsigned long event, static int sctp_inetaddr_event(struct notifier_block *this, unsigned long event,
void *ptr) void *ptr)
{ {
long flags __attribute__ ((unused)); long flags __attribute__ ((unused));
...@@ -405,29 +434,42 @@ int sctp_ctl_sock_init(void) ...@@ -405,29 +434,42 @@ int sctp_ctl_sock_init(void)
return 0; return 0;
} }
/* Register address family specific functions. */
int sctp_register_af(struct sctp_af *af)
{
switch (af->sa_family) {
case AF_INET:
if (sctp_af_v4_specific)
return 0;
sctp_af_v4_specific = af;
break;
case AF_INET6:
if (sctp_af_v6_specific)
return 0;
sctp_af_v6_specific = af;
break;
default:
return 0;
}
INIT_LIST_HEAD(&af->list);
list_add_tail(&af->list, &sctp_proto.address_families);
return 1;
}
/* Get the table of functions for manipulating a particular address /* Get the table of functions for manipulating a particular address
* family. * family.
*/ */
sctp_func_t *sctp_get_af_specific(sa_family_t family) struct sctp_af *sctp_get_af_specific(sa_family_t family)
{ {
struct list_head *pos; switch (family) {
sctp_protocol_t *proto = sctp_get_protocol(); case AF_INET:
struct sctp_func *retval, *af; return sctp_af_v4_specific;
case AF_INET6:
retval = NULL; return sctp_af_v6_specific;
default:
/* Cycle through all AF specific functions looking for a return NULL;
* match.
*/
list_for_each(pos, &proto->address_families) {
af = list_entry(pos, sctp_func_t, list);
if (family == af->sa_family) {
retval = af;
break;
}
} }
return retval;
} }
/* Common code to initialize a AF_INET msg_name. */ /* Common code to initialize a AF_INET msg_name. */
...@@ -495,21 +537,28 @@ static int sctp_inet_cmp_addr(const union sctp_addr *addr1, ...@@ -495,21 +537,28 @@ static int sctp_inet_cmp_addr(const union sctp_addr *addr1,
return 0; return 0;
} }
/* Verify that provided sockaddr looks bindable. Common verification has
* already been taken care of.
*/
static int sctp_inet_bind_verify(struct sctp_opt *opt, union sctp_addr *addr)
{
return sctp_v4_available(addr);
}
struct sctp_func sctp_ipv4_specific; struct sctp_af sctp_ipv4_specific;
static sctp_pf_t sctp_pf_inet = { static struct sctp_pf sctp_pf_inet = {
.event_msgname = sctp_inet_event_msgname, .event_msgname = sctp_inet_event_msgname,
.skb_msgname = sctp_inet_skb_msgname, .skb_msgname = sctp_inet_skb_msgname,
.af_supported = sctp_inet_af_supported, .af_supported = sctp_inet_af_supported,
.cmp_addr = sctp_inet_cmp_addr, .cmp_addr = sctp_inet_cmp_addr,
.bind_verify = sctp_inet_bind_verify,
.af = &sctp_ipv4_specific, .af = &sctp_ipv4_specific,
}; };
/* Notifier for inetaddr addition/deletion events. */
/* Registration for netdev events. */ struct notifier_block sctp_inetaddr_notifier = {
struct notifier_block sctp_netdev_notifier = { .notifier_call = sctp_inetaddr_event,
.notifier_call = sctp_netdev_event,
}; };
/* Socket operations. */ /* Socket operations. */
...@@ -551,25 +600,28 @@ static struct inet_protocol sctp_protocol = { ...@@ -551,25 +600,28 @@ static struct inet_protocol sctp_protocol = {
}; };
/* IPv4 address related functions. */ /* IPv4 address related functions. */
struct sctp_func sctp_ipv4_specific = { struct sctp_af sctp_ipv4_specific = {
.queue_xmit = ip_queue_xmit, .queue_xmit = ip_queue_xmit,
.setsockopt = ip_setsockopt, .setsockopt = ip_setsockopt,
.getsockopt = ip_getsockopt, .getsockopt = ip_getsockopt,
.get_dst = sctp_v4_get_dst, .get_dst = sctp_v4_get_dst,
.copy_addrlist = sctp_v4_copy_addrlist, .copy_addrlist = sctp_v4_copy_addrlist,
.from_skb = sctp_v4_from_skb, .from_skb = sctp_v4_from_skb,
.from_sk = sctp_v4_from_sk,
.to_sk = sctp_v4_to_sk,
.dst_saddr = sctp_v4_dst_saddr, .dst_saddr = sctp_v4_dst_saddr,
.cmp_addr = sctp_v4_cmp_addr, .cmp_addr = sctp_v4_cmp_addr,
.addr_valid = sctp_v4_addr_valid, .addr_valid = sctp_v4_addr_valid,
.inaddr_any = sctp_v4_inaddr_any, .inaddr_any = sctp_v4_inaddr_any,
.is_any = sctp_v4_is_any, .is_any = sctp_v4_is_any,
.available = sctp_v4_available,
.scope = sctp_v4_scope, .scope = sctp_v4_scope,
.net_header_len = sizeof(struct iphdr), .net_header_len = sizeof(struct iphdr),
.sockaddr_len = sizeof(struct sockaddr_in), .sockaddr_len = sizeof(struct sockaddr_in),
.sa_family = AF_INET, .sa_family = AF_INET,
}; };
sctp_pf_t *sctp_get_pf_specific(int family) { struct sctp_pf *sctp_get_pf_specific(sa_family_t family) {
switch (family) { switch (family) {
case PF_INET: case PF_INET:
...@@ -581,20 +633,24 @@ sctp_pf_t *sctp_get_pf_specific(int family) { ...@@ -581,20 +633,24 @@ sctp_pf_t *sctp_get_pf_specific(int family) {
} }
} }
/* Set the PF specific function table. */ /* Register the PF specific function table. */
void sctp_set_pf_specific(int family, sctp_pf_t *pf) int sctp_register_pf(struct sctp_pf *pf, sa_family_t family)
{ {
switch (family) { switch (family) {
case PF_INET: case PF_INET:
if (sctp_pf_inet_specific)
return 0;
sctp_pf_inet_specific = pf; sctp_pf_inet_specific = pf;
break; break;
case PF_INET6: case PF_INET6:
if (sctp_pf_inet6_specific)
return 0;
sctp_pf_inet6_specific = pf; sctp_pf_inet6_specific = pf;
break; break;
default: default:
BUG(); return 0;
break;
} }
return 1;
} }
/* Initialize the universe into something sensible. */ /* Initialize the universe into something sensible. */
...@@ -617,7 +673,7 @@ int sctp_init(void) ...@@ -617,7 +673,7 @@ int sctp_init(void)
sctp_dbg_objcnt_init(); sctp_dbg_objcnt_init();
/* Initialize the SCTP specific PF functions. */ /* Initialize the SCTP specific PF functions. */
sctp_set_pf_specific(PF_INET, &sctp_pf_inet); sctp_register_pf(&sctp_pf_inet, PF_INET);
/* /*
* 14. Suggested SCTP Protocol Parameter Values * 14. Suggested SCTP Protocol Parameter Values
*/ */
...@@ -636,6 +692,9 @@ int sctp_init(void) ...@@ -636,6 +692,9 @@ int sctp_init(void)
/* Valid.Cookie.Life - 60 seconds */ /* Valid.Cookie.Life - 60 seconds */
sctp_proto.valid_cookie_life = 60 * HZ; sctp_proto.valid_cookie_life = 60 * HZ;
/* Whether Cookie Preservative is enabled(1) or not(0) */
sctp_proto.cookie_preserve_enable = 1;
/* Max.Burst - 4 */ /* Max.Burst - 4 */
sctp_proto.max_burst = SCTP_MAX_BURST; sctp_proto.max_burst = SCTP_MAX_BURST;
...@@ -709,8 +768,7 @@ int sctp_init(void) ...@@ -709,8 +768,7 @@ int sctp_init(void)
sctp_sysctl_register(); sctp_sysctl_register();
INIT_LIST_HEAD(&sctp_proto.address_families); INIT_LIST_HEAD(&sctp_proto.address_families);
INIT_LIST_HEAD(&sctp_ipv4_specific.list); sctp_register_af(&sctp_ipv4_specific);
list_add_tail(&sctp_ipv4_specific.list, &sctp_proto.address_families);
status = sctp_v6_init(); status = sctp_v6_init();
if (status) if (status)
...@@ -727,7 +785,9 @@ int sctp_init(void) ...@@ -727,7 +785,9 @@ int sctp_init(void)
INIT_LIST_HEAD(&sctp_proto.local_addr_list); INIT_LIST_HEAD(&sctp_proto.local_addr_list);
sctp_proto.local_addr_lock = SPIN_LOCK_UNLOCKED; sctp_proto.local_addr_lock = SPIN_LOCK_UNLOCKED;
register_inetaddr_notifier(&sctp_netdev_notifier); /* Register notifier for inet address additions/deletions. */
register_inetaddr_notifier(&sctp_inetaddr_notifier);
sctp_get_local_addr_list(&sctp_proto); sctp_get_local_addr_list(&sctp_proto);
return 0; return 0;
...@@ -757,8 +817,10 @@ void sctp_exit(void) ...@@ -757,8 +817,10 @@ void sctp_exit(void)
* up all the remaining associations and all that memory. * up all the remaining associations and all that memory.
*/ */
/* Unregister notifier for inet address additions/deletions. */
unregister_inetaddr_notifier(&sctp_inetaddr_notifier);
/* Free the local address list. */ /* Free the local address list. */
unregister_inetaddr_notifier(&sctp_netdev_notifier);
sctp_free_local_addr_list(&sctp_proto); sctp_free_local_addr_list(&sctp_proto);
/* Free the control endpoint. */ /* Free the control endpoint. */
......
...@@ -77,13 +77,18 @@ static const sctp_supported_addrs_param_t sat_param = { ...@@ -77,13 +77,18 @@ static const sctp_supported_addrs_param_t sat_param = {
{ {
SCTP_PARAM_SUPPORTED_ADDRESS_TYPES, SCTP_PARAM_SUPPORTED_ADDRESS_TYPES,
__constant_htons(SCTP_SAT_LEN), __constant_htons(SCTP_SAT_LEN),
},
{ /* types[] */
SCTP_PARAM_IPV4_ADDRESS,
SCTP_V6(SCTP_PARAM_IPV6_ADDRESS,)
} }
}; };
/* gcc 3.2 doesn't allow initialization of zero-length arrays. So the above
* structure is split and the address types array is initialized using a
* fixed length array.
*/
static const __u16 sat_addr_types[2] = {
SCTP_PARAM_IPV4_ADDRESS,
SCTP_V6(SCTP_PARAM_IPV6_ADDRESS,)
};
/* RFC 2960 3.3.2 Initiation (INIT) (1) /* RFC 2960 3.3.2 Initiation (INIT) (1)
* *
* Note 2: The ECN capable field is reserved for future use of * Note 2: The ECN capable field is reserved for future use of
...@@ -163,7 +168,7 @@ void sctp_init_cause(sctp_chunk_t *chunk, __u16 cause_code, ...@@ -163,7 +168,7 @@ void sctp_init_cause(sctp_chunk_t *chunk, __u16 cause_code,
*/ */
sctp_chunk_t *sctp_make_init(const sctp_association_t *asoc, sctp_chunk_t *sctp_make_init(const sctp_association_t *asoc,
const sctp_bind_addr_t *bp, const sctp_bind_addr_t *bp,
int priority) int priority, int vparam_len)
{ {
sctp_inithdr_t init; sctp_inithdr_t init;
union sctp_params addrs; union sctp_params addrs;
...@@ -192,6 +197,7 @@ sctp_chunk_t *sctp_make_init(const sctp_association_t *asoc, ...@@ -192,6 +197,7 @@ sctp_chunk_t *sctp_make_init(const sctp_association_t *asoc,
chunksize = sizeof(init) + addrs_len + SCTP_SAT_LEN; chunksize = sizeof(init) + addrs_len + SCTP_SAT_LEN;
chunksize += sizeof(ecap_param); chunksize += sizeof(ecap_param);
chunksize += vparam_len;
/* RFC 2960 3.3.2 Initiation (INIT) (1) /* RFC 2960 3.3.2 Initiation (INIT) (1)
* *
...@@ -213,7 +219,10 @@ sctp_chunk_t *sctp_make_init(const sctp_association_t *asoc, ...@@ -213,7 +219,10 @@ sctp_chunk_t *sctp_make_init(const sctp_association_t *asoc,
sctp_addto_chunk(retval, sizeof(init), &init); sctp_addto_chunk(retval, sizeof(init), &init);
retval->param_hdr.v = retval->param_hdr.v =
sctp_addto_chunk(retval, addrs_len, addrs.v); sctp_addto_chunk(retval, addrs_len, addrs.v);
sctp_addto_chunk(retval, SCTP_SAT_LEN, &sat_param);
sctp_addto_chunk(retval, sizeof(sctp_paramhdr_t), &sat_param);
sctp_addto_chunk(retval, sizeof(sat_addr_types), sat_addr_types);
sctp_addto_chunk(retval, sizeof(ecap_param), &ecap_param); sctp_addto_chunk(retval, sizeof(ecap_param), &ecap_param);
nodata: nodata:
...@@ -1337,7 +1346,7 @@ sctp_cookie_param_t *sctp_pack_cookie(const sctp_endpoint_t *ep, ...@@ -1337,7 +1346,7 @@ sctp_cookie_param_t *sctp_pack_cookie(const sctp_endpoint_t *ep,
sctp_association_t *sctp_unpack_cookie(const sctp_endpoint_t *ep, sctp_association_t *sctp_unpack_cookie(const sctp_endpoint_t *ep,
const sctp_association_t *asoc, const sctp_association_t *asoc,
sctp_chunk_t *chunk, int priority, sctp_chunk_t *chunk, int priority,
int *error) int *error, sctp_chunk_t **err_chk_p)
{ {
sctp_association_t *retval = NULL; sctp_association_t *retval = NULL;
sctp_signed_cookie_t *cookie; sctp_signed_cookie_t *cookie;
...@@ -1394,7 +1403,29 @@ sctp_association_t *sctp_unpack_cookie(const sctp_endpoint_t *ep, ...@@ -1394,7 +1403,29 @@ sctp_association_t *sctp_unpack_cookie(const sctp_endpoint_t *ep,
* for init collision case of lost COOKIE ACK. * for init collision case of lost COOKIE ACK.
*/ */
if (!asoc && tv_lt(bear_cookie->expiration, chunk->skb->stamp)) { if (!asoc && tv_lt(bear_cookie->expiration, chunk->skb->stamp)) {
*error = -SCTP_IERROR_STALE_COOKIE; /*
* Section 3.3.10.3 Stale Cookie Error (3)
*
* Cause of error
* ---------------
* Stale Cookie Error: Indicates the receipt of a valid State
* Cookie that has expired.
*/
*err_chk_p = sctp_make_op_error_space(asoc, chunk,
ntohs(chunk->chunk_hdr->length));
if (*err_chk_p) {
suseconds_t usecs = (chunk->skb->stamp.tv_sec -
bear_cookie->expiration.tv_sec) * 1000000L +
chunk->skb->stamp.tv_usec -
bear_cookie->expiration.tv_usec;
usecs = htonl(usecs);
sctp_init_cause(*err_chk_p, SCTP_ERROR_STALE_COOKIE,
&usecs, sizeof(usecs));
*error = -SCTP_IERROR_STALE_COOKIE;
} else
*error = -SCTP_IERROR_NOMEM;
goto fail; goto fail;
} }
...@@ -1751,6 +1782,7 @@ int sctp_process_param(sctp_association_t *asoc, union sctp_params param, ...@@ -1751,6 +1782,7 @@ int sctp_process_param(sctp_association_t *asoc, union sctp_params param,
__u16 sat; __u16 sat;
int retval = 1; int retval = 1;
sctp_scope_t scope; sctp_scope_t scope;
time_t stale;
/* We maintain all INIT parameters in network byte order all the /* We maintain all INIT parameters in network byte order all the
* time. This allows us to not worry about whether the parameters * time. This allows us to not worry about whether the parameters
...@@ -1770,8 +1802,16 @@ int sctp_process_param(sctp_association_t *asoc, union sctp_params param, ...@@ -1770,8 +1802,16 @@ int sctp_process_param(sctp_association_t *asoc, union sctp_params param,
break; break;
case SCTP_PARAM_COOKIE_PRESERVATIVE: case SCTP_PARAM_COOKIE_PRESERVATIVE:
asoc->cookie_preserve = if (!sctp_proto.cookie_preserve_enable)
ntohl(param.life->lifespan_increment); break;
stale = ntohl(param.life->lifespan_increment);
/* Suggested Cookie Life span increment's unit is msec,
* (1/1000sec).
*/
asoc->cookie_life.tv_sec += stale / 1000;
asoc->cookie_life.tv_usec += (stale % 1000) * 1000;
break; break;
case SCTP_PARAM_HOST_NAME_ADDRESS: case SCTP_PARAM_HOST_NAME_ADDRESS:
......
...@@ -68,7 +68,8 @@ static void sctp_do_8_2_transport_strike(sctp_association_t *asoc, ...@@ -68,7 +68,8 @@ static void sctp_do_8_2_transport_strike(sctp_association_t *asoc,
sctp_transport_t *transport); sctp_transport_t *transport);
static void sctp_cmd_init_failed(sctp_cmd_seq_t *, sctp_association_t *asoc); static void sctp_cmd_init_failed(sctp_cmd_seq_t *, sctp_association_t *asoc);
static void sctp_cmd_assoc_failed(sctp_cmd_seq_t *, sctp_association_t *asoc, static void sctp_cmd_assoc_failed(sctp_cmd_seq_t *, sctp_association_t *asoc,
sctp_event_t event_type, sctp_chunk_t *chunk); sctp_event_t event_type, sctp_subtype_t stype,
sctp_chunk_t *chunk);
static int sctp_cmd_process_init(sctp_cmd_seq_t *, sctp_association_t *asoc, static int sctp_cmd_process_init(sctp_cmd_seq_t *, sctp_association_t *asoc,
sctp_chunk_t *chunk, sctp_chunk_t *chunk,
sctp_init_chunk_t *peer_init, sctp_init_chunk_t *peer_init,
...@@ -517,7 +518,7 @@ int sctp_cmd_interpreter(sctp_event_t event_type, sctp_subtype_t subtype, ...@@ -517,7 +518,7 @@ int sctp_cmd_interpreter(sctp_event_t event_type, sctp_subtype_t subtype,
case SCTP_CMD_ASSOC_FAILED: case SCTP_CMD_ASSOC_FAILED:
sctp_cmd_assoc_failed(commands, asoc, event_type, sctp_cmd_assoc_failed(commands, asoc, event_type,
chunk); subtype, chunk);
break; break;
case SCTP_CMD_COUNTER_INC: case SCTP_CMD_COUNTER_INC:
...@@ -736,6 +737,9 @@ int sctp_gen_sack(sctp_association_t *asoc, int force, sctp_cmd_seq_t *commands) ...@@ -736,6 +737,9 @@ int sctp_gen_sack(sctp_association_t *asoc, int force, sctp_cmd_seq_t *commands)
if (!sack) if (!sack)
goto nomem; goto nomem;
/* Update the last advertised rwnd value. */
asoc->a_rwnd = asoc->rwnd;
asoc->peer.sack_needed = 0; asoc->peer.sack_needed = 0;
asoc->peer.next_dup_tsn = 0; asoc->peer.next_dup_tsn = 0;
...@@ -1046,18 +1050,27 @@ static void sctp_cmd_init_failed(sctp_cmd_seq_t *commands, ...@@ -1046,18 +1050,27 @@ static void sctp_cmd_init_failed(sctp_cmd_seq_t *commands,
static void sctp_cmd_assoc_failed(sctp_cmd_seq_t *commands, static void sctp_cmd_assoc_failed(sctp_cmd_seq_t *commands,
sctp_association_t *asoc, sctp_association_t *asoc,
sctp_event_t event_type, sctp_event_t event_type,
sctp_subtype_t subtype,
sctp_chunk_t *chunk) sctp_chunk_t *chunk)
{ {
sctp_ulpevent_t *event; sctp_ulpevent_t *event;
__u16 error = 0; __u16 error = 0;
if (event_type == SCTP_EVENT_T_PRIMITIVE) switch(event_type) {
error = SCTP_ERROR_USER_ABORT; case SCTP_EVENT_T_PRIMITIVE:
if (SCTP_PRIMITIVE_ABORT == subtype.primitive)
if (chunk && (SCTP_CID_ABORT == chunk->chunk_hdr->type) && error = SCTP_ERROR_USER_ABORT;
(ntohs(chunk->chunk_hdr->length) >= (sizeof(struct sctp_chunkhdr) + break;
sizeof(struct sctp_errhdr)))) { case SCTP_EVENT_T_CHUNK:
error = ((sctp_errhdr_t *)chunk->skb->data)->cause; if (chunk && (SCTP_CID_ABORT == chunk->chunk_hdr->type) &&
(ntohs(chunk->chunk_hdr->length) >=
(sizeof(struct sctp_chunkhdr) +
sizeof(struct sctp_errhdr)))) {
error = ((sctp_errhdr_t *)chunk->skb->data)->cause;
}
break;
default:
break;
} }
event = sctp_ulpevent_make_assoc_change(asoc, event = sctp_ulpevent_make_assoc_change(asoc,
......
...@@ -2,6 +2,7 @@ ...@@ -2,6 +2,7 @@
* Copyright (c) 1999-2000 Cisco, Inc. * Copyright (c) 1999-2000 Cisco, Inc.
* Copyright (c) 1999-2001 Motorola, Inc. * Copyright (c) 1999-2001 Motorola, Inc.
* Copyright (c) 2001-2002 International Business Machines, Corp. * Copyright (c) 2001-2002 International Business Machines, Corp.
* Copyright (c) 2001-2002 Intel Corp.
* Copyright (c) 2002 Nokia Corp. * Copyright (c) 2002 Nokia Corp.
* *
* This file is part of the SCTP kernel reference Implementation * This file is part of the SCTP kernel reference Implementation
...@@ -502,6 +503,7 @@ sctp_disposition_t sctp_sf_do_5_1D_ce(const sctp_endpoint_t *ep, ...@@ -502,6 +503,7 @@ sctp_disposition_t sctp_sf_do_5_1D_ce(const sctp_endpoint_t *ep,
sctp_chunk_t *repl; sctp_chunk_t *repl;
sctp_ulpevent_t *ev; sctp_ulpevent_t *ev;
int error = 0; int error = 0;
sctp_chunk_t *err_chk_p;
/* If the packet is an OOTB packet which is temporarily on the /* If the packet is an OOTB packet which is temporarily on the
* control endpoint, responding with an ABORT. * control endpoint, responding with an ABORT.
...@@ -521,7 +523,8 @@ sctp_disposition_t sctp_sf_do_5_1D_ce(const sctp_endpoint_t *ep, ...@@ -521,7 +523,8 @@ sctp_disposition_t sctp_sf_do_5_1D_ce(const sctp_endpoint_t *ep,
* "Z" will reply with a COOKIE ACK chunk after building a TCB * "Z" will reply with a COOKIE ACK chunk after building a TCB
* and moving to the ESTABLISHED state. * and moving to the ESTABLISHED state.
*/ */
new_asoc = sctp_unpack_cookie(ep, asoc, chunk, GFP_ATOMIC, &error); new_asoc = sctp_unpack_cookie(ep, asoc, chunk, GFP_ATOMIC, &error,
&err_chk_p);
/* FIXME: /* FIXME:
* If the re-build failed, what is the proper error path * If the re-build failed, what is the proper error path
...@@ -537,6 +540,11 @@ sctp_disposition_t sctp_sf_do_5_1D_ce(const sctp_endpoint_t *ep, ...@@ -537,6 +540,11 @@ sctp_disposition_t sctp_sf_do_5_1D_ce(const sctp_endpoint_t *ep,
case -SCTP_IERROR_NOMEM: case -SCTP_IERROR_NOMEM:
goto nomem; goto nomem;
case -SCTP_IERROR_STALE_COOKIE:
sctp_send_stale_cookie_err(ep, asoc, chunk, commands,
err_chk_p);
return sctp_sf_pdiscard(ep, asoc, type, arg, commands);
case -SCTP_IERROR_BAD_SIG: case -SCTP_IERROR_BAD_SIG:
default: default:
return sctp_sf_pdiscard(ep, asoc, type, arg, commands); return sctp_sf_pdiscard(ep, asoc, type, arg, commands);
...@@ -862,8 +870,8 @@ sctp_disposition_t sctp_sf_backbeat_8_3(const sctp_endpoint_t *ep, ...@@ -862,8 +870,8 @@ sctp_disposition_t sctp_sf_backbeat_8_3(const sctp_endpoint_t *ep,
/* Check if the timestamp looks valid. */ /* Check if the timestamp looks valid. */
if (time_after(hbinfo->sent_at, jiffies) || if (time_after(hbinfo->sent_at, jiffies) ||
time_after(jiffies, hbinfo->sent_at + max_interval)) { time_after(jiffies, hbinfo->sent_at + max_interval)) {
SCTP_DEBUG_PRINTK("%s: HEARTBEAT ACK with invalid timestamp SCTP_DEBUG_PRINTK("%s: HEARTBEAT ACK with invalid timestamp"
received for transport: %p\n", "received for transport: %p\n",
__FUNCTION__, link); __FUNCTION__, link);
return SCTP_DISPOSITION_DISCARD; return SCTP_DISPOSITION_DISCARD;
} }
...@@ -1562,6 +1570,7 @@ sctp_disposition_t sctp_sf_do_5_2_4_dupcook(const sctp_endpoint_t *ep, ...@@ -1562,6 +1570,7 @@ sctp_disposition_t sctp_sf_do_5_2_4_dupcook(const sctp_endpoint_t *ep,
sctp_association_t *new_asoc; sctp_association_t *new_asoc;
int error = 0; int error = 0;
char action; char action;
sctp_chunk_t *err_chk_p;
/* "Decode" the chunk. We have no optional parameters so we /* "Decode" the chunk. We have no optional parameters so we
* are in good shape. * are in good shape.
...@@ -1575,7 +1584,8 @@ sctp_disposition_t sctp_sf_do_5_2_4_dupcook(const sctp_endpoint_t *ep, ...@@ -1575,7 +1584,8 @@ sctp_disposition_t sctp_sf_do_5_2_4_dupcook(const sctp_endpoint_t *ep,
* current association, consider the State Cookie valid even if * current association, consider the State Cookie valid even if
* the lifespan is exceeded. * the lifespan is exceeded.
*/ */
new_asoc = sctp_unpack_cookie(ep, asoc, chunk, GFP_ATOMIC, &error); new_asoc = sctp_unpack_cookie(ep, asoc, chunk, GFP_ATOMIC, &error,
&err_chk_p);
/* FIXME: /* FIXME:
* If the re-build failed, what is the proper error path * If the re-build failed, what is the proper error path
...@@ -1591,6 +1601,12 @@ sctp_disposition_t sctp_sf_do_5_2_4_dupcook(const sctp_endpoint_t *ep, ...@@ -1591,6 +1601,12 @@ sctp_disposition_t sctp_sf_do_5_2_4_dupcook(const sctp_endpoint_t *ep,
case -SCTP_IERROR_NOMEM: case -SCTP_IERROR_NOMEM:
goto nomem; goto nomem;
case -SCTP_IERROR_STALE_COOKIE:
sctp_send_stale_cookie_err(ep, asoc, chunk, commands,
err_chk_p);
return sctp_sf_pdiscard(ep, asoc, type, arg, commands);
break;
case -SCTP_IERROR_BAD_SIG: case -SCTP_IERROR_BAD_SIG:
default: default:
return sctp_sf_pdiscard(ep, asoc, type, arg, commands); return sctp_sf_pdiscard(ep, asoc, type, arg, commands);
...@@ -1706,7 +1722,47 @@ sctp_disposition_t sctp_sf_shutdown_ack_sent_abort(const sctp_endpoint_t *ep, ...@@ -1706,7 +1722,47 @@ sctp_disposition_t sctp_sf_shutdown_ack_sent_abort(const sctp_endpoint_t *ep,
return sctp_sf_shutdown_sent_abort(ep, asoc, type, arg, commands); return sctp_sf_shutdown_sent_abort(ep, asoc, type, arg, commands);
} }
#if 0 /*
* Handle an Error received in COOKIE_ECHOED state.
*
* Only handle the error type of stale COOKIE Error, the other errors will
* be ignored.
*
* Inputs
* (endpoint, asoc, chunk)
*
* Outputs
* (asoc, reply_msg, msg_up, timers, counters)
*
* The return value is the disposition of the chunk.
*/
sctp_disposition_t sctp_sf_cookie_echoed_err(const sctp_endpoint_t *ep,
const sctp_association_t *asoc,
const sctp_subtype_t type,
void *arg,
sctp_cmd_seq_t *commands)
{
sctp_chunk_t *chunk = arg;
sctp_errhdr_t *err;
/* If we have gotten too many failures, give up. */
if (1 + asoc->counters[SCTP_COUNTER_INIT_ERROR] >
asoc->max_init_attempts) {
/* INIT_FAILED will issue an ulpevent. */
sctp_add_cmd_sf(commands, SCTP_CMD_INIT_FAILED, SCTP_NULL());
return SCTP_DISPOSITION_DELETE_TCB;
}
err = (sctp_errhdr_t *)(chunk->skb->data);
/* Process the error here */
switch (err->cause) {
case SCTP_ERROR_STALE_COOKIE:
return sctp_sf_do_5_2_6_stale(ep, asoc, type, arg, commands);
default:
return sctp_sf_pdiscard(ep, asoc, type, arg, commands);
}
}
/* /*
* Handle a Stale COOKIE Error * Handle a Stale COOKIE Error
* *
...@@ -1732,47 +1788,30 @@ sctp_disposition_t sctp_sf_shutdown_ack_sent_abort(const sctp_endpoint_t *ep, ...@@ -1732,47 +1788,30 @@ sctp_disposition_t sctp_sf_shutdown_ack_sent_abort(const sctp_endpoint_t *ep,
* *
* The return value is the disposition of the chunk. * The return value is the disposition of the chunk.
*/ */
sctp_disposition_t do_5_2_6_stale(const sctp_endpoint_t *ep, sctp_disposition_t sctp_sf_do_5_2_6_stale(const sctp_endpoint_t *ep,
const sctp_association_t *asoc, const sctp_association_t *asoc,
const sctp_subtype_t type, const sctp_subtype_t type,
void *arg, void *arg,
sctp_cmd_seq_t *commands) sctp_cmd_seq_t *commands)
{ {
sctp_chunk_t *chunk = arg; sctp_chunk_t *chunk = arg;
time_t stale;
sctp_cookie_preserve_param_t bht;
sctp_errhdr_t *err;
struct list_head *pos;
sctp_transport_t *t;
sctp_chunk_t *reply;
sctp_bind_addr_t *bp;
int attempts;
/* This is not a real chunk type. It is a subtype of the attempts = asoc->counters[SCTP_COUNTER_INIT_ERROR] + 1;
* ERROR chunk type. The ERROR chunk processing will bring us
* here.
*/
sctp_chunk_t *in_packet;
stp_chunk_t *reply;
sctp_inithdr_t initack;
__u8 *addrs;
int addrs_len;
time_t rtt;
struct sctpCookiePreserve bht;
/* If we have gotten too many failures, give up. */ if (attempts >= asoc->max_init_attempts) {
if (1 + asoc->counters[SctpCounterInits] > asoc->max_init_attempts) { sctp_add_cmd_sf(commands, SCTP_CMD_INIT_FAILED, SCTP_NULL());
/* FIXME: Move to new ulpevent. */
retval->event_up = sctp_make_ulp_init_timeout(asoc);
if (!retval->event_up)
goto nomem;
sctp_add_cmd_sf(retval->commands, SCTP_CMD_DELETE_TCB,
SCTP_NULL());
return SCTP_DISPOSITION_DELETE_TCB; return SCTP_DISPOSITION_DELETE_TCB;
} }
retval->counters[0] = SCTP_COUNTER_INCR; err = (sctp_errhdr_t *)(chunk->skb->data);
retval->counters[0] = SctpCounterInits;
retval->counters[1] = 0;
retval->counters[1] = 0;
/* Calculate the RTT in ms. */
/* BUG--we should get the send time of the HEARTBEAT REQUEST. */
in_packet = chunk;
rtt = 1000 * timeval_sub(in_packet->skb->stamp,
asoc->c.state_timestamp);
/* When calculating the time extension, an implementation /* When calculating the time extension, an implementation
* SHOULD use the RTT information measured based on the * SHOULD use the RTT information measured based on the
...@@ -1780,28 +1819,48 @@ sctp_disposition_t do_5_2_6_stale(const sctp_endpoint_t *ep, ...@@ -1780,28 +1819,48 @@ sctp_disposition_t do_5_2_6_stale(const sctp_endpoint_t *ep,
* more than 1 second beyond the measured RTT, due to long * more than 1 second beyond the measured RTT, due to long
* State Cookie lifetimes making the endpoint more subject to * State Cookie lifetimes making the endpoint more subject to
* a replay attack. * a replay attack.
* Measure of Staleness's unit is usec. (1/1000000 sec)
* Suggested Cookie Life-span Increment's unit is msec.
* (1/1000 sec)
* In general, if you use the suggested cookie life, the value
* found in the field of measure of staleness should be doubled
* to give ample time to retransmit the new cookie and thus
* yield a higher probability of success on the reattempt.
*/ */
bht.p = {SCTP_COOKIE_PRESERVE, 8}; stale = ntohl(*(suseconds_t *)((u8 *)err + sizeof(sctp_errhdr_t)));
bht.extraTime = htonl(rtt + 1000); stale = stale << 1 / 1000;
initack.init_tag = htonl(asoc->c.my_vtag); bht.param_hdr.type = SCTP_PARAM_COOKIE_PRESERVATIVE;
initack.a_rwnd = htonl(atomic_read(&asoc->rnwd)); bht.param_hdr.length = htons(sizeof(bht));
initack.num_outbound_streams = htons(asoc->streamoutcnt); bht.lifespan_increment = htonl(stale);
initack.num_inbound_streams = htons(asoc->streamincnt);
initack.initial_tsn = htonl(asoc->c.initSeqNumber);
sctp_get_my_addrs(asoc, &addrs, &addrs_len);
/* Build that new INIT chunk. */ /* Build that new INIT chunk. */
reply = sctp_make_chunk(SCTP_INITIATION, 0, bp = (sctp_bind_addr_t *) &asoc->base.bind_addr;
sizeof(initack) reply = sctp_make_init(asoc, bp, GFP_ATOMIC, sizeof(bht));
+ sizeof(bht)
+ addrs_len);
if (!reply) if (!reply)
goto nomem; goto nomem;
sctp_addto_chunk(reply, sizeof(initack), &initack);
sctp_addto_chunk(reply, sizeof(bht), &bht); sctp_addto_chunk(reply, sizeof(bht), &bht);
sctp_addto_chunk(reply, addrs_len, addrs);
/* Cast away the const modifier, as we want to just
* rerun it through as a sideffect.
*/
sctp_add_cmd_sf(commands, SCTP_CMD_COUNTER_INC,
SCTP_COUNTER(SCTP_COUNTER_INIT_ERROR));
/* If we've sent any data bundled with COOKIE-ECHO we need to resend. */
list_for_each(pos, &asoc->peer.transport_addr_list) {
t = list_entry(pos, sctp_transport_t, transports);
sctp_add_cmd_sf(commands, SCTP_CMD_RETRAN, SCTP_TRANSPORT(t));
}
sctp_add_cmd_sf(commands, SCTP_CMD_TIMER_STOP,
SCTP_TO(SCTP_EVENT_TIMEOUT_T1_COOKIE));
sctp_add_cmd_sf(commands, SCTP_CMD_NEW_STATE,
SCTP_STATE(SCTP_STATE_COOKIE_WAIT));
sctp_add_cmd_sf(commands, SCTP_CMD_TIMER_START,
SCTP_TO(SCTP_EVENT_TIMEOUT_T1_INIT));
sctp_add_cmd_sf(commands, SCTP_CMD_REPLY, SCTP_CHUNK(reply)); sctp_add_cmd_sf(commands, SCTP_CMD_REPLY, SCTP_CHUNK(reply));
return SCTP_DISPOSITION_CONSUME; return SCTP_DISPOSITION_CONSUME;
...@@ -1809,7 +1868,6 @@ sctp_disposition_t do_5_2_6_stale(const sctp_endpoint_t *ep, ...@@ -1809,7 +1868,6 @@ sctp_disposition_t do_5_2_6_stale(const sctp_endpoint_t *ep,
nomem: nomem:
return SCTP_DISPOSITION_NOMEM; return SCTP_DISPOSITION_NOMEM;
} }
#endif /* 0 */
/* /*
* Process an ABORT. * Process an ABORT.
...@@ -3220,7 +3278,7 @@ sctp_disposition_t sctp_sf_do_prm_asoc(const sctp_endpoint_t *ep, ...@@ -3220,7 +3278,7 @@ sctp_disposition_t sctp_sf_do_prm_asoc(const sctp_endpoint_t *ep,
* 1 to 4294967295 (see 5.3.1 for Tag value selection). ... * 1 to 4294967295 (see 5.3.1 for Tag value selection). ...
*/ */
repl = sctp_make_init(asoc, bp, GFP_ATOMIC); repl = sctp_make_init(asoc, bp, GFP_ATOMIC, 0);
if (!repl) if (!repl)
goto nomem; goto nomem;
...@@ -3992,7 +4050,7 @@ sctp_disposition_t sctp_sf_t1_timer_expire(const sctp_endpoint_t *ep, ...@@ -3992,7 +4050,7 @@ sctp_disposition_t sctp_sf_t1_timer_expire(const sctp_endpoint_t *ep,
switch (timer) { switch (timer) {
case SCTP_EVENT_TIMEOUT_T1_INIT: case SCTP_EVENT_TIMEOUT_T1_INIT:
bp = (sctp_bind_addr_t *) &asoc->base.bind_addr; bp = (sctp_bind_addr_t *) &asoc->base.bind_addr;
repl = sctp_make_init(asoc, bp, GFP_ATOMIC); repl = sctp_make_init(asoc, bp, GFP_ATOMIC, 0);
break; break;
case SCTP_EVENT_TIMEOUT_T1_COOKIE: case SCTP_EVENT_TIMEOUT_T1_COOKIE:
...@@ -4334,3 +4392,25 @@ void sctp_ootb_pkt_free(sctp_packet_t *packet) ...@@ -4334,3 +4392,25 @@ void sctp_ootb_pkt_free(sctp_packet_t *packet)
sctp_transport_free(packet->transport); sctp_transport_free(packet->transport);
sctp_packet_free(packet); sctp_packet_free(packet);
} }
/* Send a stale cookie error when a invalid COOKIE ECHO chunk is found */
void sctp_send_stale_cookie_err(const sctp_endpoint_t *ep,
const sctp_association_t *asoc,
const sctp_chunk_t *chunk,
sctp_cmd_seq_t *commands,
sctp_chunk_t *err_chunk)
{
sctp_packet_t *packet;
if (err_chunk) {
packet = sctp_ootb_pkt_new(asoc, chunk);
if (packet) {
/* Set the skb to the belonging sock for accounting. */
err_chunk->skb->sk = ep->base.sk;
sctp_packet_append_chunk(packet, err_chunk);
sctp_add_cmd_sf(commands, SCTP_CMD_SEND_PKT,
SCTP_PACKET(packet));
} else
sctp_free_chunk (err_chunk);
}
}
...@@ -295,7 +295,7 @@ sctp_sm_table_entry_t *sctp_sm_lookup_event(sctp_event_t event_type, ...@@ -295,7 +295,7 @@ sctp_sm_table_entry_t *sctp_sm_lookup_event(sctp_event_t event_type,
/* SCTP_STATE_COOKIE_WAIT */ \ /* SCTP_STATE_COOKIE_WAIT */ \
{.fn = sctp_sf_not_impl, .name = "sctp_sf_not_impl"}, \ {.fn = sctp_sf_not_impl, .name = "sctp_sf_not_impl"}, \
/* SCTP_STATE_COOKIE_ECHOED */ \ /* SCTP_STATE_COOKIE_ECHOED */ \
{.fn = sctp_sf_not_impl, .name = "sctp_sf_not_impl"}, \ {.fn = sctp_sf_cookie_echoed_err, .name = "sctp_sf_cookie_echoed_err"}, \
/* SCTP_STATE_ESTABLISHED */ \ /* SCTP_STATE_ESTABLISHED */ \
{.fn = sctp_sf_operr_notify, .name = "sctp_sf_operr_notify"}, \ {.fn = sctp_sf_operr_notify, .name = "sctp_sf_operr_notify"}, \
/* SCTP_STATE_SHUTDOWN_PENDING */ \ /* SCTP_STATE_SHUTDOWN_PENDING */ \
......
...@@ -87,12 +87,7 @@ static int sctp_wait_for_sndbuf(sctp_association_t *asoc, long *timeo_p, ...@@ -87,12 +87,7 @@ static int sctp_wait_for_sndbuf(sctp_association_t *asoc, long *timeo_p,
int msg_len); int msg_len);
static int sctp_wait_for_packet(struct sock * sk, int *err, long *timeo_p); static int sctp_wait_for_packet(struct sock * sk, int *err, long *timeo_p);
static int sctp_wait_for_connect(sctp_association_t *asoc, long *timeo_p); static int sctp_wait_for_connect(sctp_association_t *asoc, long *timeo_p);
static inline void sctp_sk_addr_set(struct sock *, static inline int sctp_verify_addr(struct sock *, union sctp_addr *, int);
const union sctp_addr *newaddr,
union sctp_addr *saveaddr);
static inline void sctp_sk_addr_restore(struct sock *,
const union sctp_addr *);
static inline int sctp_verify_addr(struct sock *, struct sockaddr *, int);
static int sctp_bindx_add(struct sock *, struct sockaddr_storage *, int); static int sctp_bindx_add(struct sock *, struct sockaddr_storage *, int);
static int sctp_bindx_rem(struct sock *, struct sockaddr_storage *, int); static int sctp_bindx_rem(struct sock *, struct sockaddr_storage *, int);
static int sctp_do_bind(struct sock *, union sctp_addr *, int); static int sctp_do_bind(struct sock *, union sctp_addr *, int);
...@@ -133,101 +128,75 @@ int sctp_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len) ...@@ -133,101 +128,75 @@ int sctp_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len)
return retval; return retval;
} }
static long sctp_get_port_local(struct sock *, unsigned short); static long sctp_get_port_local(struct sock *, union sctp_addr *);
/* Bind a local address either to an endpoint or to an association. */ /* Verify this is a valid sockaddr. */
SCTP_STATIC int sctp_do_bind(struct sock *sk, union sctp_addr *newaddr, static struct sctp_af *sctp_sockaddr_af(struct sctp_opt *opt,
int addr_len) union sctp_addr *addr, int len)
{ {
sctp_opt_t *sp = sctp_sk(sk); struct sctp_af *af;
sctp_endpoint_t *ep = sp->ep;
sctp_bind_addr_t *bp = &ep->base.bind_addr;
unsigned short sa_family = newaddr->sa.sa_family;
union sctp_addr tmpaddr, saveaddr;
unsigned short *snum;
int ret = 0;
SCTP_DEBUG_PRINTK("sctp_do_bind(sk: %p, newaddr: %p, addr_len: %d)\n", /* Check minimum size. */
sk, newaddr, addr_len); if (len < sizeof (struct sockaddr))
return NULL;
/* FIXME: This function needs to handle v4-mapped-on-v6
* addresses!
*/
if (PF_INET == sk->family) {
if (sa_family != AF_INET)
return -EINVAL;
}
/* Make a local copy of the new address. */ /* Does this PF support this AF? */
tmpaddr = *newaddr; if (!opt->pf->af_supported(addr->sa.sa_family))
return NULL;
switch (sa_family) { /* If we get this far, af is valid. */
case AF_INET: af = sctp_get_af_specific(addr->sa.sa_family);
if (addr_len < sizeof(struct sockaddr_in))
return -EINVAL;
ret = inet_addr_type(newaddr->v4.sin_addr.s_addr); if (len < af->sockaddr_len)
return NULL;
/* FIXME: return af;
* Should we allow apps to bind to non-local addresses by }
* checking the IP sysctl parameter "ip_nonlocal_bind"?
*/
if (newaddr->v4.sin_addr.s_addr != INADDR_ANY &&
ret != RTN_LOCAL)
return -EADDRNOTAVAIL;
tmpaddr.v4.sin_port = htons(tmpaddr.v4.sin_port);
snum = &tmpaddr.v4.sin_port;
break;
case AF_INET6: /* Bind a local address either to an endpoint or to an association. */
SCTP_V6( SCTP_STATIC int sctp_do_bind(struct sock *sk, union sctp_addr *addr, int len)
/* FIXME: Hui, please verify this. Looking at {
* the ipv6 code I see a SIN6_LEN_RFC2133 check. sctp_opt_t *sp = sctp_sk(sk);
* I'm guessing that scope_id is a newer addition. sctp_endpoint_t *ep = sp->ep;
*/ sctp_bind_addr_t *bp = &ep->base.bind_addr;
if (addr_len < sizeof(struct sockaddr_in6)) struct sctp_af *af;
return -EINVAL; unsigned short snum;
int ret = 0;
/* FIXME - The support for IPv6 multiple types SCTP_DEBUG_PRINTK("sctp_do_bind(sk: %p, newaddr: %p, len: %d)\n",
* of addresses need to be added later. sk, addr, len);
*/
ret = sctp_ipv6_addr_type(&newaddr->v6.sin6_addr);
tmpaddr.v6.sin6_port = htons(tmpaddr.v6.sin6_port);
snum = &tmpaddr.v6.sin6_port;
break;
)
default: /* Common sockaddr verification. */
af = sctp_sockaddr_af(sp, addr, len);
if (!af)
return -EINVAL; return -EINVAL;
};
/* PF specific bind() address verification. */
if (!sp->pf->bind_verify(sp, addr))
return -EADDRNOTAVAIL;
snum= ntohs(addr->v4.sin_port);
SCTP_DEBUG_PRINTK("sctp_do_bind: port: %d, new port: %d\n", SCTP_DEBUG_PRINTK("sctp_do_bind: port: %d, new port: %d\n",
bp->port, *snum); bp->port, snum);
/* We must either be unbound, or bind to the same port. */ /* We must either be unbound, or bind to the same port. */
if (bp->port && (*snum != bp->port)) { if (bp->port && (snum != bp->port)) {
SCTP_DEBUG_PRINTK("sctp_do_bind:" SCTP_DEBUG_PRINTK("sctp_do_bind:"
" New port %d does not match existing port " " New port %d does not match existing port "
"%d.\n", *snum, bp->port); "%d.\n", snum, bp->port);
return -EINVAL; return -EINVAL;
} }
if (*snum && *snum < PROT_SOCK && !capable(CAP_NET_BIND_SERVICE)) if (snum && snum < PROT_SOCK && !capable(CAP_NET_BIND_SERVICE))
return -EACCES; return -EACCES;
/* FIXME - Make socket understand that there might be multiple bind
* addresses and there will be multiple source addresses involved in
* routing and failover decisions.
*/
sctp_sk_addr_set(sk, &tmpaddr, &saveaddr);
/* Make sure we are allowed to bind here. /* Make sure we are allowed to bind here.
* The function sctp_get_port_local() does duplicate address * The function sctp_get_port_local() does duplicate address
* detection. * detection.
*/ */
if ((ret = sctp_get_port_local(sk, *snum))) { if ((ret = sctp_get_port_local(sk, addr))) {
sctp_sk_addr_restore(sk, &saveaddr);
if (ret == (long) sk) { if (ret == (long) sk) {
/* This endpoint has a conflicting address. */ /* This endpoint has a conflicting address. */
return -EINVAL; return -EINVAL;
...@@ -237,25 +206,32 @@ SCTP_STATIC int sctp_do_bind(struct sock *sk, union sctp_addr *newaddr, ...@@ -237,25 +206,32 @@ SCTP_STATIC int sctp_do_bind(struct sock *sk, union sctp_addr *newaddr,
} }
/* Refresh ephemeral port. */ /* Refresh ephemeral port. */
if (!*snum) if (!snum)
*snum = inet_sk(sk)->num; snum = inet_sk(sk)->num;
/* The getsockname() API depends on 'sport' being set. */
inet_sk(sk)->sport = htons(inet_sk(sk)->num);
/* Add the address to the bind address list. */ /* Add the address to the bind address list. */
sctp_local_bh_disable(); sctp_local_bh_disable();
sctp_write_lock(&ep->base.addr_lock); sctp_write_lock(&ep->base.addr_lock);
/* Use GFP_ATOMIC since BHs are disabled. */ /* Use GFP_ATOMIC since BHs are disabled. */
if ((ret = sctp_add_bind_addr(bp, &tmpaddr, GFP_ATOMIC))) { addr->v4.sin_port = ntohs(addr->v4.sin_port);
sctp_sk_addr_restore(sk, &saveaddr); ret = sctp_add_bind_addr(bp, addr, GFP_ATOMIC);
} else if (!bp->port) { addr->v4.sin_port = htons(addr->v4.sin_port);
bp->port = *snum; if (!ret && !bp->port)
} bp->port = snum;
sctp_write_unlock(&ep->base.addr_lock); sctp_write_unlock(&ep->base.addr_lock);
sctp_local_bh_enable(); sctp_local_bh_enable();
/* Copy back into socket for getsockname() use. */
if (!ret) {
inet_sk(sk)->sport = htons(inet_sk(sk)->num);
af->to_sk(addr, sk);
}
return ret; return ret;
} }
...@@ -735,7 +711,7 @@ SCTP_STATIC void sctp_close(struct sock *sk, long timeout) ...@@ -735,7 +711,7 @@ SCTP_STATIC void sctp_close(struct sock *sk, long timeout)
SCTP_STATIC int sctp_msghdr_parse(const struct msghdr *, sctp_cmsgs_t *); SCTP_STATIC int sctp_msghdr_parse(const struct msghdr *, sctp_cmsgs_t *);
SCTP_STATIC int sctp_sendmsg(struct kiocb *iocb, struct sock *sk, SCTP_STATIC int sctp_sendmsg(struct kiocb *iocb, struct sock *sk,
struct msghdr *msg, int size) struct msghdr *msg, int msg_len)
{ {
sctp_opt_t *sp; sctp_opt_t *sp;
sctp_endpoint_t *ep; sctp_endpoint_t *ep;
...@@ -750,13 +726,12 @@ SCTP_STATIC int sctp_sendmsg(struct kiocb *iocb, struct sock *sk, ...@@ -750,13 +726,12 @@ SCTP_STATIC int sctp_sendmsg(struct kiocb *iocb, struct sock *sk,
sctp_assoc_t associd = NULL; sctp_assoc_t associd = NULL;
sctp_cmsgs_t cmsgs = { 0 }; sctp_cmsgs_t cmsgs = { 0 };
int err; int err;
size_t msg_len;
sctp_scope_t scope; sctp_scope_t scope;
long timeo; long timeo;
__u16 sinfo_flags = 0; __u16 sinfo_flags = 0;
SCTP_DEBUG_PRINTK("sctp_sendmsg(sk: %p, msg: %p, " SCTP_DEBUG_PRINTK("sctp_sendmsg(sk: %p, msg: %p, msg_len: %d)\n",
"size: %d)\n", sk, msg, size); sk, msg, msg_len);
err = 0; err = 0;
sp = sctp_sk(sk); sp = sctp_sk(sk);
...@@ -778,12 +753,16 @@ SCTP_STATIC int sctp_sendmsg(struct kiocb *iocb, struct sock *sk, ...@@ -778,12 +753,16 @@ SCTP_STATIC int sctp_sendmsg(struct kiocb *iocb, struct sock *sk,
* For a peeled-off socket, msg_name is ignored. * For a peeled-off socket, msg_name is ignored.
*/ */
if ((SCTP_SOCKET_UDP_HIGH_BANDWIDTH != sp->type) && msg->msg_name) { if ((SCTP_SOCKET_UDP_HIGH_BANDWIDTH != sp->type) && msg->msg_name) {
err = sctp_verify_addr(sk, (struct sockaddr *)msg->msg_name, int msg_namelen = msg->msg_namelen;
msg->msg_namelen);
err = sctp_verify_addr(sk, (union sctp_addr *)msg->msg_name,
msg_namelen);
if (err) if (err)
return err; return err;
memcpy(&to, msg->msg_name, msg->msg_namelen); if (msg_namelen > sizeof(to))
msg_namelen = sizeof(to);
memcpy(&to, msg->msg_name, msg_namelen);
SCTP_DEBUG_PRINTK("Just memcpy'd. msg_name is " SCTP_DEBUG_PRINTK("Just memcpy'd. msg_name is "
"0x%x:%u.\n", "0x%x:%u.\n",
to.v4.sin_addr.s_addr, to.v4.sin_port); to.v4.sin_addr.s_addr, to.v4.sin_port);
...@@ -792,8 +771,6 @@ SCTP_STATIC int sctp_sendmsg(struct kiocb *iocb, struct sock *sk, ...@@ -792,8 +771,6 @@ SCTP_STATIC int sctp_sendmsg(struct kiocb *iocb, struct sock *sk,
msg_name = msg->msg_name; msg_name = msg->msg_name;
} }
msg_len = get_user_iov_size(msg->msg_iov, msg->msg_iovlen);
sinfo = cmsgs.info; sinfo = cmsgs.info;
sinit = cmsgs.init; sinit = cmsgs.init;
...@@ -1216,9 +1193,11 @@ SCTP_STATIC int sctp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr ...@@ -1216,9 +1193,11 @@ SCTP_STATIC int sctp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr
* Otherwise, set MSG_EOR indicating the end of a message. * Otherwise, set MSG_EOR indicating the end of a message.
*/ */
if (skb_len > copied) { if (skb_len > copied) {
msg->msg_flags &= ~MSG_EOR;
if (flags & MSG_PEEK)
goto out_free;
sctp_skb_pull(skb, copied); sctp_skb_pull(skb, copied);
skb_queue_head(&sk->receive_queue, skb); skb_queue_head(&sk->receive_queue, skb);
msg->msg_flags &= ~MSG_EOR;
goto out; goto out;
} else { } else {
msg->msg_flags |= MSG_EOR; msg->msg_flags |= MSG_EOR;
...@@ -1335,6 +1314,16 @@ static inline int sctp_setsockopt_set_peer_addr_params(struct sock *sk, ...@@ -1335,6 +1314,16 @@ static inline int sctp_setsockopt_set_peer_addr_params(struct sock *sk,
return 0; return 0;
} }
static inline int sctp_setsockopt_initmsg(struct sock *sk, char *optval,
int optlen)
{
if (optlen != sizeof(struct sctp_initmsg))
return -EINVAL;
if (copy_from_user(&sctp_sk(sk)->initmsg, optval, optlen))
return -EFAULT;
return 0;
}
/* API 6.2 setsockopt(), getsockopt() /* API 6.2 setsockopt(), getsockopt()
* *
* Applications use setsockopt() and getsockopt() to set or retrieve * Applications use setsockopt() and getsockopt() to set or retrieve
...@@ -1355,13 +1344,10 @@ static inline int sctp_setsockopt_set_peer_addr_params(struct sock *sk, ...@@ -1355,13 +1344,10 @@ static inline int sctp_setsockopt_set_peer_addr_params(struct sock *sk,
* optlen - the size of the buffer. * optlen - the size of the buffer.
*/ */
SCTP_STATIC int sctp_setsockopt(struct sock *sk, int level, int optname, SCTP_STATIC int sctp_setsockopt(struct sock *sk, int level, int optname,
char *optval, int optlen) char *optval, int optlen)
{ {
int retval = 0; int retval = 0;
char *tmp; char *tmp;
sctp_protocol_t *proto = sctp_get_protocol();
struct list_head *pos;
sctp_func_t *af;
SCTP_DEBUG_PRINTK("sctp_setsockopt(sk: %p... optname: %d)\n", SCTP_DEBUG_PRINTK("sctp_setsockopt(sk: %p... optname: %d)\n",
sk, optname); sk, optname);
...@@ -1373,14 +1359,9 @@ SCTP_STATIC int sctp_setsockopt(struct sock *sk, int level, int optname, ...@@ -1373,14 +1359,9 @@ SCTP_STATIC int sctp_setsockopt(struct sock *sk, int level, int optname,
* are at all well-founded. * are at all well-founded.
*/ */
if (level != SOL_SCTP) { if (level != SOL_SCTP) {
list_for_each(pos, &proto->address_families) { struct sctp_af *af = sctp_sk(sk)->pf->af;
af = list_entry(pos, sctp_func_t, list); retval = af->setsockopt(sk, level, optname, optval, optlen);
goto out_nounlock;
retval = af->setsockopt(sk, level, optname, optval,
optlen);
if (retval < 0)
goto out_nounlock;
}
} }
sctp_lock_sock(sk); sctp_lock_sock(sk);
...@@ -1430,6 +1411,10 @@ SCTP_STATIC int sctp_setsockopt(struct sock *sk, int level, int optname, ...@@ -1430,6 +1411,10 @@ SCTP_STATIC int sctp_setsockopt(struct sock *sk, int level, int optname,
optlen); optlen);
break; break;
case SCTP_INITMSG:
retval = sctp_setsockopt_initmsg(sk, optval, optlen);
break;
default: default:
retval = -ENOPROTOOPT; retval = -ENOPROTOOPT;
break; break;
...@@ -1484,7 +1469,7 @@ SCTP_STATIC int sctp_connect(struct sock *sk, struct sockaddr *uaddr, ...@@ -1484,7 +1469,7 @@ SCTP_STATIC int sctp_connect(struct sock *sk, struct sockaddr *uaddr,
goto out_unlock; goto out_unlock;
} }
err = sctp_verify_addr(sk, uaddr, addr_len); err = sctp_verify_addr(sk, (union sctp_addr *)uaddr, addr_len);
if (err) if (err)
goto out_unlock; goto out_unlock;
...@@ -1938,13 +1923,19 @@ static inline int sctp_getsockopt_get_peer_addr_params(struct sock *sk, ...@@ -1938,13 +1923,19 @@ static inline int sctp_getsockopt_get_peer_addr_params(struct sock *sk,
return 0; return 0;
} }
static inline int sctp_getsockopt_initmsg(struct sock *sk, int len, char *optval, int *optlen)
{
if (len != sizeof(struct sctp_initmsg))
return -EINVAL;
if (copy_to_user(optval, &sctp_sk(sk)->initmsg, len))
return -EFAULT;
return 0;
}
SCTP_STATIC int sctp_getsockopt(struct sock *sk, int level, int optname, SCTP_STATIC int sctp_getsockopt(struct sock *sk, int level, int optname,
char *optval, int *optlen) char *optval, int *optlen)
{ {
int retval = 0; int retval = 0;
sctp_protocol_t *proto = sctp_get_protocol();
sctp_func_t *af;
struct list_head *pos;
int len; int len;
SCTP_DEBUG_PRINTK("sctp_getsockopt(sk: %p, ...)\n", sk); SCTP_DEBUG_PRINTK("sctp_getsockopt(sk: %p, ...)\n", sk);
...@@ -1956,13 +1947,10 @@ SCTP_STATIC int sctp_getsockopt(struct sock *sk, int level, int optname, ...@@ -1956,13 +1947,10 @@ SCTP_STATIC int sctp_getsockopt(struct sock *sk, int level, int optname,
* are at all well-founded. * are at all well-founded.
*/ */
if (level != SOL_SCTP) { if (level != SOL_SCTP) {
list_for_each(pos, &proto->address_families) { struct sctp_af *af = sctp_sk(sk)->pf->af;
af = list_entry(pos, sctp_func_t, list);
retval = af->getsockopt(sk, level, optname, retval = af->getsockopt(sk, level, optname, optval, optlen);
optval, optlen); return retval;
if (retval < 0)
return retval;
}
} }
if (get_user(len, optlen)) if (get_user(len, optlen))
...@@ -1997,6 +1985,10 @@ SCTP_STATIC int sctp_getsockopt(struct sock *sk, int level, int optname, ...@@ -1997,6 +1985,10 @@ SCTP_STATIC int sctp_getsockopt(struct sock *sk, int level, int optname,
optlen); optlen);
break; break;
case SCTP_INITMSG:
retval = sctp_getsockopt_initmsg(sk, len, optval, optlen);
break;
default: default:
retval = -ENOPROTOOPT; retval = -ENOPROTOOPT;
break; break;
...@@ -2030,12 +2022,17 @@ static void sctp_unhash(struct sock *sk) ...@@ -2030,12 +2022,17 @@ static void sctp_unhash(struct sock *sk)
*/ */
static sctp_bind_bucket_t *sctp_bucket_create(sctp_bind_hashbucket_t *head, static sctp_bind_bucket_t *sctp_bucket_create(sctp_bind_hashbucket_t *head,
unsigned short snum); unsigned short snum);
static long sctp_get_port_local(struct sock *sk, unsigned short snum) static long sctp_get_port_local(struct sock *sk, union sctp_addr *addr)
{ {
sctp_bind_hashbucket_t *head; /* hash list */ sctp_bind_hashbucket_t *head; /* hash list */
sctp_bind_bucket_t *pp; /* hash list port iterator */ sctp_bind_bucket_t *pp; /* hash list port iterator */
sctp_protocol_t *sctp = sctp_get_protocol(); sctp_protocol_t *sctp = sctp_get_protocol();
unsigned short snum;
int ret; int ret;
/* NOTE: Remember to put this back to net order. */
addr->v4.sin_port = ntohs(addr->v4.sin_port);
snum = addr->v4.sin_port;
SCTP_DEBUG_PRINTK("sctp_get_port() begins, snum=%d\n", snum); SCTP_DEBUG_PRINTK("sctp_get_port() begins, snum=%d\n", snum);
...@@ -2101,6 +2098,7 @@ static long sctp_get_port_local(struct sock *sk, unsigned short snum) ...@@ -2101,6 +2098,7 @@ static long sctp_get_port_local(struct sock *sk, unsigned short snum)
} }
} }
if (pp != NULL && pp->sk != NULL) { if (pp != NULL && pp->sk != NULL) {
/* We had a port hash table hit - there is an /* We had a port hash table hit - there is an
* available port (pp != NULL) and it is being * available port (pp != NULL) and it is being
...@@ -2108,7 +2106,6 @@ static long sctp_get_port_local(struct sock *sk, unsigned short snum) ...@@ -2108,7 +2106,6 @@ static long sctp_get_port_local(struct sock *sk, unsigned short snum)
* socket is going to be sk2. * socket is going to be sk2.
*/ */
int sk_reuse = sk->reuse; int sk_reuse = sk->reuse;
union sctp_addr tmpaddr;
struct sock *sk2 = pp->sk; struct sock *sk2 = pp->sk;
SCTP_DEBUG_PRINTK("sctp_get_port() found a " SCTP_DEBUG_PRINTK("sctp_get_port() found a "
...@@ -2116,27 +2113,6 @@ static long sctp_get_port_local(struct sock *sk, unsigned short snum) ...@@ -2116,27 +2113,6 @@ static long sctp_get_port_local(struct sock *sk, unsigned short snum)
if (pp->fastreuse != 0 && sk->reuse != 0) if (pp->fastreuse != 0 && sk->reuse != 0)
goto success; goto success;
/* FIXME - multiple addresses need to be supported
* later.
*/
switch (sk->family) {
case PF_INET:
tmpaddr.v4.sin_family = AF_INET;
tmpaddr.v4.sin_port = snum;
tmpaddr.v4.sin_addr.s_addr = inet_sk(sk)->rcv_saddr;
break;
case PF_INET6:
SCTP_V6(tmpaddr.v6.sin6_family = AF_INET6;
tmpaddr.v6.sin6_port = snum;
tmpaddr.v6.sin6_addr = inet6_sk(sk)->rcv_saddr;
)
break;
default:
break;
};
/* Run through the list of sockets bound to the port /* Run through the list of sockets bound to the port
* (pp->port) [via the pointers bind_next and * (pp->port) [via the pointers bind_next and
* bind_pprev in the struct sock *sk2 (pp->sk)]. On each one, * bind_pprev in the struct sock *sk2 (pp->sk)]. On each one,
...@@ -2154,8 +2130,7 @@ static long sctp_get_port_local(struct sock *sk, unsigned short snum) ...@@ -2154,8 +2130,7 @@ static long sctp_get_port_local(struct sock *sk, unsigned short snum)
if (sk_reuse && sk2->reuse) if (sk_reuse && sk2->reuse)
continue; continue;
if (sctp_bind_addr_match(&ep2->base.bind_addr, if (sctp_bind_addr_match(&ep2->base.bind_addr, addr,
&tmpaddr,
sctp_sk(sk))) sctp_sk(sk)))
goto found; goto found;
} }
...@@ -2207,12 +2182,25 @@ static long sctp_get_port_local(struct sock *sk, unsigned short snum) ...@@ -2207,12 +2182,25 @@ static long sctp_get_port_local(struct sock *sk, unsigned short snum)
sctp_local_bh_enable(); sctp_local_bh_enable();
SCTP_DEBUG_PRINTK("sctp_get_port() ends, ret=%d\n", ret); SCTP_DEBUG_PRINTK("sctp_get_port() ends, ret=%d\n", ret);
addr->v4.sin_port = htons(addr->v4.sin_port);
return ret; return ret;
} }
/* Assign a 'snum' port to the socket. If snum == 0, an ephemeral
* port is requested.
*/
static int sctp_get_port(struct sock *sk, unsigned short snum) static int sctp_get_port(struct sock *sk, unsigned short snum)
{ {
long ret = sctp_get_port_local(sk, snum); long ret;
union sctp_addr addr;
struct sctp_af *af = sctp_sk(sk)->pf->af;
/* Set up a dummy address struct from the sk. */
af->from_sk(&addr, sk);
addr.v4.sin_port = htons(snum);
/* Note: sk->num gets filled in if ephemeral port request. */
ret = sctp_get_port_local(sk, &addr);
return (ret ? 1 : 0); return (ret ? 1 : 0);
} }
...@@ -2413,7 +2401,7 @@ void sctp_put_port(struct sock *sk) ...@@ -2413,7 +2401,7 @@ void sctp_put_port(struct sock *sk)
static int sctp_autobind(struct sock *sk) static int sctp_autobind(struct sock *sk)
{ {
union sctp_addr autoaddr; union sctp_addr autoaddr;
struct sctp_func *af; struct sctp_af *af;
unsigned short port; unsigned short port;
/* Initialize a local sockaddr structure to INADDR_ANY. */ /* Initialize a local sockaddr structure to INADDR_ANY. */
...@@ -2537,58 +2525,6 @@ SCTP_STATIC int sctp_msghdr_parse(const struct msghdr *msg, ...@@ -2537,58 +2525,6 @@ SCTP_STATIC int sctp_msghdr_parse(const struct msghdr *msg,
return 0; return 0;
} }
/* Setup sk->rcv_saddr before calling get_port(). */
static inline void sctp_sk_addr_set(struct sock *sk,
const union sctp_addr *newaddr,
union sctp_addr *saveaddr)
{
struct inet_opt *inet = inet_sk(sk);
saveaddr->sa.sa_family = newaddr->sa.sa_family;
switch (newaddr->sa.sa_family) {
case AF_INET:
saveaddr->v4.sin_addr.s_addr = inet->rcv_saddr;
inet->rcv_saddr = inet->saddr = newaddr->v4.sin_addr.s_addr;
break;
case AF_INET6:
SCTP_V6({
struct ipv6_pinfo *np = inet6_sk(sk);
saveaddr->v6.sin6_addr = np->rcv_saddr;
np->rcv_saddr = np->saddr = newaddr->v6.sin6_addr;
break;
})
default:
break;
};
}
/* Restore sk->rcv_saddr after failing get_port(). */
static inline void sctp_sk_addr_restore(struct sock *sk, const union sctp_addr *addr)
{
struct inet_opt *inet = inet_sk(sk);
switch (addr->sa.sa_family) {
case AF_INET:
inet->rcv_saddr = inet->saddr = addr->v4.sin_addr.s_addr;
break;
case AF_INET6:
SCTP_V6({
struct ipv6_pinfo *np = inet6_sk(sk);
np->rcv_saddr = np->saddr = addr->v6.sin6_addr;
break;
})
default:
break;
};
}
/* /*
* Wait for a packet.. * Wait for a packet..
* Note: This function is the same function as in core/datagram.c * Note: This function is the same function as in core/datagram.c
...@@ -2711,27 +2647,15 @@ static struct sk_buff *sctp_skb_recv_datagram(struct sock *sk, int flags, int no ...@@ -2711,27 +2647,15 @@ static struct sk_buff *sctp_skb_recv_datagram(struct sock *sk, int flags, int no
} }
/* Verify that this is a valid address. */ /* Verify that this is a valid address. */
static int sctp_verify_addr(struct sock *sk, struct sockaddr *addr, int len) static int sctp_verify_addr(struct sock *sk, union sctp_addr *addr, int len)
{ {
struct sctp_func *af; struct sctp_af *af;
/* Check minimum size. */
if (len < sizeof (struct sockaddr))
return -EINVAL;
/* Do we support this address family in general? */ /* Verify basic sockaddr. */
af = sctp_get_af_specific(addr->sa_family); af = sctp_sockaddr_af(sctp_sk(sk), addr, len);
if (!af) if (!af)
return -EINVAL; return -EINVAL;
/* Does this PF support this AF? */
if (!sctp_sk(sk)->pf->af_supported(addr->sa_family))
return -EINVAL;
/* Verify the minimum for this AF sockaddr. */
if (len < af->sockaddr_len)
return -EINVAL;
/* Is this a valid SCTP address? */ /* Is this a valid SCTP address? */
if (!af->addr_valid((union sctp_addr *)addr)) if (!af->addr_valid((union sctp_addr *)addr))
return -EINVAL; return -EINVAL;
......
/* SCTP kernel reference Implementation /* SCTP kernel reference Implementation
* Copyright (c) 2002 International Business Machines Corp. * Copyright (c) 2002 International Business Machines Corp.
* Copyright (c) 2002 Intel Corp.
* *
* This file is part of the SCTP kernel reference Implementation * This file is part of the SCTP kernel reference Implementation
* *
...@@ -32,6 +33,7 @@ ...@@ -32,6 +33,7 @@
* Written or modified by: * Written or modified by:
* Mingqin Liu <liuming@us.ibm.com> * Mingqin Liu <liuming@us.ibm.com>
* Jon Grimm <jgrimm@us.ibm.com> * Jon Grimm <jgrimm@us.ibm.com>
* Ardelle Fan <ardelle.fan@intel.com>
* *
* Any bugs reported given to us we will try to fix... any fixes shared will * Any bugs reported given to us we will try to fix... any fixes shared will
* be incorporated into the next SCTP release. * be incorporated into the next SCTP release.
...@@ -70,6 +72,9 @@ static ctl_table sctp_table[] = { ...@@ -70,6 +72,9 @@ static ctl_table sctp_table[] = {
{ NET_SCTP_HB_INTERVAL, "hb_interval", { NET_SCTP_HB_INTERVAL, "hb_interval",
&sctp_proto.hb_interval, sizeof(int), 0644, NULL, &sctp_proto.hb_interval, sizeof(int), 0644, NULL,
&proc_dointvec_jiffies, &sysctl_jiffies }, &proc_dointvec_jiffies, &sysctl_jiffies },
{ NET_SCTP_PRESERVE_ENABLE, "cookie_preserve_enable",
&sctp_proto.cookie_preserve_enable, sizeof(int), 0644, NULL,
&proc_dointvec_jiffies, &sysctl_jiffies },
{ NET_SCTP_RTO_ALPHA, "rto_alpha_exp_divisor", { NET_SCTP_RTO_ALPHA, "rto_alpha_exp_divisor",
&sctp_proto.rto_alpha, sizeof(int), 0644, NULL, &sctp_proto.rto_alpha, sizeof(int), 0644, NULL,
&proc_dointvec }, &proc_dointvec },
......
...@@ -207,7 +207,7 @@ void sctp_transport_route(sctp_transport_t *transport, union sctp_addr *saddr, ...@@ -207,7 +207,7 @@ void sctp_transport_route(sctp_transport_t *transport, union sctp_addr *saddr,
struct sctp_opt *opt) struct sctp_opt *opt)
{ {
sctp_association_t *asoc = transport->asoc; sctp_association_t *asoc = transport->asoc;
struct sctp_func *af = transport->af_specific; struct sctp_af *af = transport->af_specific;
union sctp_addr *daddr = &transport->ipaddr; union sctp_addr *daddr = &transport->ipaddr;
sctp_bind_addr_t *bp; sctp_bind_addr_t *bp;
rwlock_t *addr_lock; rwlock_t *addr_lock;
......
...@@ -762,6 +762,8 @@ static void sctp_rcvmsg_rfree(struct sk_buff *skb) ...@@ -762,6 +762,8 @@ static void sctp_rcvmsg_rfree(struct sk_buff *skb)
{ {
sctp_association_t *asoc; sctp_association_t *asoc;
sctp_ulpevent_t *event; sctp_ulpevent_t *event;
sctp_chunk_t *sack;
struct timer_list *timer;
/* Current stack structures assume that the rcv buffer is /* Current stack structures assume that the rcv buffer is
* per socket. For UDP style sockets this is not true as * per socket. For UDP style sockets this is not true as
...@@ -782,9 +784,39 @@ static void sctp_rcvmsg_rfree(struct sk_buff *skb) ...@@ -782,9 +784,39 @@ static void sctp_rcvmsg_rfree(struct sk_buff *skb)
asoc->rwnd += skb->len; asoc->rwnd += skb->len;
} }
SCTP_DEBUG_PRINTK("rwnd increased by %d to (%u, %u)\n", SCTP_DEBUG_PRINTK("rwnd increased by %d to (%u, %u) - %u\n",
skb->len, asoc->rwnd, asoc->rwnd_over); skb->len, asoc->rwnd, asoc->rwnd_over, asoc->a_rwnd);
/* Send a window update SACK if the rwnd has increased by at least the
* minimum of the association's PMTU and half of the receive buffer.
* The algorithm used is similar to the one described in Section 4.2.3.3
* of RFC 1122.
*/
if ((asoc->state == SCTP_STATE_ESTABLISHED) &&
(asoc->rwnd > asoc->a_rwnd) &&
((asoc->rwnd - asoc->a_rwnd) >=
min_t(__u32, (asoc->base.sk->rcvbuf >> 1), asoc->pmtu))) {
SCTP_DEBUG_PRINTK("Sending window update SACK- rwnd: %u "
"a_rwnd: %u\n", asoc->rwnd, asoc->a_rwnd);
sack = sctp_make_sack(asoc);
if (!sack)
goto out;
/* Update the last advertised rwnd value. */
asoc->a_rwnd = asoc->rwnd;
asoc->peer.sack_needed = 0;
asoc->peer.next_dup_tsn = 0;
sctp_push_outqueue(&asoc->outqueue, sack);
/* Stop the SACK timer. */
timer = &asoc->timers[SCTP_EVENT_TIMEOUT_SACK];
if (timer_pending(timer) && del_timer(timer))
sctp_association_put(asoc);
}
out:
sctp_association_put(asoc); sctp_association_put(asoc);
} }
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment