Commits · 0d40f75bdab241868c0eb6f97aef9f8b3a66f7b3 · Kirill Smelkov / linux

05 Sep, 2013 27 commits

openvswitch: Fix alignment of struct sw_flow_key. · 0d40f75b

Jesse Gross authored Sep 05, 2013

sw_flow_key alignment was declared as " __aligned(__alignof__(long))".
However, this breaks on the m68k architecture where long is 32 bit in
size but 16 bit aligned by default. This aligns to the size of a long to
ensure that we can always do comparsions in full long-sized chunks. It
also adds an additional build check to catch any reduction in alignment.

CC: Andy Zhou <azhou@nicira.com>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0d40f75b

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 06c54055

David S. Miller authored Sep 05, 2013

Conflicts:
	drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
	net/bridge/br_multicast.c
	net/ipv6/sit.c

The conflicts were minor:

1) sit.c changes overlap with change to ip_tunnel_xmit() signature.

2) br_multicast.c had an overlap between computing max_delay using
   msecs_to_jiffies and turning MLDV2_MRC() into an inline function
   with a name using lowercase instead of uppercase letters.

3) stmmac had two overlapping changes, one which conditionally allocated
   and hooked up a dma_cfg based upon the presence of the pbl OF property,
   and another one handling store-and-forward DMA made.  The latter of
   which should not go into the new of_find_property() basic block.
Signed-off-by: David S. Miller <davem@davemloft.net>

06c54055

netfilter: Fix build errors with xt_socket.c · 1a5bbfc3

David S. Miller authored Sep 05, 2013

As reported by Randy Dunlap:

====================
when CONFIG_IPV6=m
and CONFIG_NETFILTER_XT_MATCH_SOCKET=y:

net/built-in.o: In function `socket_mt6_v1_v2':
xt_socket.c:(.text+0x51b55): undefined reference to `udp6_lib_lookup'
net/built-in.o: In function `socket_mt_init':
xt_socket.c:(.init.text+0x1ef8): undefined reference to `nf_defrag_ipv6_enable'
====================

Like several other modules under net/netfilter/ we have to
have a dependency "IPV6 disabled or set compatibly with this
module" clause.
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

1a5bbfc3

tcp: Add missing braces to do_tcp_setsockopt · e2e5c4c0

Dave Jones authored Sep 05, 2013

Signed-off-by: Dave Jones <davej@fedoraproject.org>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e2e5c4c0

caif: Add missing braces to multiline if in cfctrl_linkup_request · 0c1db731

Dave Jones authored Sep 05, 2013

The indentation here implies this was meant to be a multi-line if.

Introduced several years back in commit c85c2951
("caif: Handle dev_queue_xmit errors.")
Signed-off-by: Dave Jones <davej@fedoraproject.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

0c1db731

bnx2x: Add missing braces in bnx2x:bnx2x_link_initialize · c0a77ec7

Dave Jones authored Sep 04, 2013

The indentation here implies that the intent was for this to be a multiline if.
Introduced a few years ago in commit ec146a6f ("bnx2x: Modify XGXS functions")
Signed-off-by: Dave Jones <davej@fedoraproject.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

c0a77ec7

vxlan: Fix kernel panic on device delete. · f011baf9

Pravin B Shelar authored Sep 04, 2013

On vxlan device create if socket create fails vxlan device is not
added to hash table. Therefore we need to check if device
is in hashtable before we delete it from hlist.
Following patch avoid the crash. net-next already has this fix.

---8<---
BUG: unable to handle kernel NULL pointer dereference at           (null)
IP: [<ffffffffa05f9ca7>] vxlan_dellink+0x77/0xf0 [vxlan]
PGD 42b2d9067 PUD 42e04c067 PMD 0
Oops: 0002 [#1] SMP
Modules linked in: vxlan(-)
Hardware name: Dell Inc. PowerEdge R620/0KCKR5, BIOS 1.4.8 10/25/2012
task: ffff88042ecf8760 ti: ffff88042f106000 task.ti: ffff88042f106000
RIP: 0010:[<ffffffffa05f9ca7>]  [<ffffffffa05f9ca7>]
vxlan_dellink+0x77/0xf0 [vxlan]
RSP: 0018:ffff88042f107e28  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88082af08000 RCX: ffff88083fd80000
RDX: 0000000000000000 RSI: ffff88042f107e58 RDI: ffff88042e12f810
RBP: ffff88042f107e48 R08: ffffffff8166eca0 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: ffff88082af087c0
R13: ffff88042e12f000 R14: ffff88042f107e58 R15: ffff88042f107e58
FS:  00007f4ed2de7700(0000) GS:ffff88043fc80000(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000042e076000 CR4: 00000000000407e0
Stack:
 ffff88082af08000 ffffffff81654848 ffffffffa05fb4e0 ffffffff81654780
 ffff88042f107e98 ffffffff813b9c7a ffff88042f107e58 ffff88042f107e58
 ffff88042f107e88 ffffffffa05fb4e0 ffffffffa05fb780 ffff88042f107f18
Call Trace:
 [<ffffffff813b9c7a>] __rtnl_link_unregister+0xca/0xd0
 [<ffffffff813bb0e9>] rtnl_link_unregister+0x19/0x30
 [<ffffffffa05faa4c>] vxlan_cleanup_module+0x10/0x2f [vxlan]
 [<ffffffff81099fef>] SyS_delete_module+0x1cf/0x2c0
 [<ffffffff8146c069>] ? do_page_fault+0x9/0x10
 [<ffffffff8146f012>] system_call_fastpath+0x16/0x1b
Code: 4d 85 ed 0f 84 95 00 00 00 4c 8d a7 c0 07 00 00 49 8d bd 10 08 00
00 e8 28 e8 e6 e0 48 8b 83 c0 07 00 00 49 8b 54 24 08 48 85 c0 <48> 89
02 74 04 48 89 50 08 49 b8 00 02 20 00 00 00 ad de 4d 89
RIP  [<ffffffffa05f9ca7>] vxlan_dellink+0x77/0xf0 [vxlan]
 RSP <ffff88042f107e28>
CR2: 0000000000000000
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f011baf9

net: mvneta: implement ->ndo_do_ioctl() to support PHY ioctls · 15f59456

Thomas Petazzoni authored Sep 04, 2013

This commit implements the ->ndo_do_ioctl() operation so that the
PHY-related ioctl() calls can work from userspace, which allows
applications like mii-tool or mii-diag to do their job.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Tested-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

15f59456

net: mvneta: properly disable HW PHY polling and ensure adjust_link() works · 71408602

Thomas Petazzoni authored Sep 04, 2013

This commit fixes a long-standing bug that has been reported by many
users: on some Armada 370 platforms, only the network interface that
has been used in U-Boot to tftp the kernel works properly in
Linux. The other network interfaces can see a 'link up', but are
unable to transmit data. The reports were generally made on the Armada
370-based Mirabox, but have also been given on the Armada 370-RD
board.

The network MAC in the Armada 370/XP (supported by the mvneta driver
in Linux) has a functionality that allows it to continuously poll the
PHY and directly update the MAC configuration accordingly (speed,
duplex, etc.). The very first versions of the driver submitted for
review were using this hardware mechanism, but due to this, the driver
was not integrated with the kernel phylib. Following reviews, the
driver was changed to use the phylib, and therefore a software based
polling. In software based polling, Linux regularly talks to the PHY
over the MDIO bus, and sees if the link status has changed. If it's
the case then the adjust_link() callback of the driver is called to
update the MAC configuration accordingly.

However, it turns out that the adjust_link() callback was not
configuring the hardware in a completely correct way: while it was
setting the speed and duplex bits correctly, it wasn't telling the
hardware to actually take into account those bits rather than what the
hardware-based PHY polling mechanism has concluded. So, in fact the
adjust_link() callback was basically a no-op.

However, the network happened to be working because on the network
interfaces used by U-Boot for tftp on Armada 370 platforms because the
hardware PHY polling was enabled by the bootloader, and left enabled
by Linux. However, the second network interface not used for tftp (or
both network interfaces if the kernel is loaded from USB, NAND or SD
card) didn't had the hardware PHY polling enabled.

This patch fixes this situation by:

 (1) Making sure that the hardware PHY polling is disabled by clearing
     the MVNETA_PHY_POLLING_ENABLE bit in the MVNETA_UNIT_CONTROL
     register in the driver ->probe() function.

 (2) Making sure that the duplex and speed selections made by the
     adjust_link() callback are taken into account by clearing the
     MVNETA_GMAC_AN_SPEED_EN and MVNETA_GMAC_AN_DUPLEX_EN bits in the
     MVNETA_GMAC_AUTONEG_CONFIG register.

This patch has been tested on Armada 370 Mirabox, and now both network
interfaces are usable after boot.

[ Problem introduced by commit c5aff182 ("net: mvneta: driver for
  Marvell Armada 370/XP network unit") ]
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Jochen De Smet <jochen.armkernel@leahnim.org>
Cc: Peter Sanford <psanford@nearbuy.io>
Cc: Ethan Tuttle <ethan@ethantuttle.com>
Cc: Chény Yves-Gael <yves@cheny.fr>
Cc: Ryan Press <ryan@presslab.us>
Cc: Simon Guinot <simon.guinot@sequanux.org>
Cc: vdonnefort@lacie.com
Cc: stable@vger.kernel.org
Acked-by: Jason Cooper <jason@lakedaemon.net>
Tested-by: Vincent Donnefort <vdonnefort@gmail.com>
Tested-by: Yves-Gael Cheny <yves@cheny.fr>
Tested-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

71408602

icplus: Use netif_running to determine device state · dfafb73f

Jon Mason authored Sep 04, 2013

Remove the __LINK_STATE_START check to verify the device is running, in
favor of netif_running().  netif_running() performs the same check of
__LINK_STATE_START, so the code should behave the same.
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Cc: Francois Romieu <romieu@fr.zoreil.com>
Cc: Sorbica Shieh <sorbica@icplus.com.tw>
Signed-off-by: David S. Miller <davem@davemloft.net>

dfafb73f

ethernet/arc/arc_emac: Fix huge delays in large file copies · 27082ee1

Vineet Gupta authored Sep 04, 2013

copying large files to a NFS mounted host was taking absurdly large
time.

Turns out that TX BD reclaim had a sublte bug.

Loop starts off from @txbd_dirty cursor and stops when it hits a BD
still in use by controller. However when it stops it needs to keep the
cursor at that very BD to resume scanning in next iteration. However it
was erroneously incrementing the cursor, causing the next scan(s) to
fail too, unless the BD chain was completely drained out.

[ARCLinux]$ ls -l -sh /disk/log.txt
 17976 -rw-r--r--    1 root     root       17.5M Sep  /disk/log.txt

========== Before =====================
[ARCLinux]$ time cp /disk/log.txt /mnt/.
real    31m 7.95s
user    0m 0.00s
sys     0m 0.10s

========== After =====================
[ARCLinux]$ time cp /disk/log.txt /mnt/.
real    0m 24.33s
user    0m 0.00s
sys     0m 0.19s
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Cc: Alexey Brodkin <abrodkin@synopsys.com> (commit_signer:3/4=75%)
Cc: "David S. Miller" <davem@davemloft.net> (commit_signer:3/4=75%)
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: arc-linux-dev@synopsys.com
Signed-off-by: David S. Miller <davem@davemloft.net>

27082ee1

tuntap: orphan frags before trying to set tx timestamp · 7bf66305

Jason Wang authored Sep 05, 2013

sock_tx_timestamp() will clear all zerocopy flags of skb which may lead the
frags never to be orphaned. This will break guest to guest traffic when zerocopy
is enabled. Fix this by orphaning the frags before trying to set tx time stamp.

The issue were introduced by commit eda29772
(tun: Support software transmit time stamping).

Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7bf66305

tuntap: purge socket error queue on detach · 4bfb0513

Jason Wang authored Sep 05, 2013

Commit eda29772
(tun: Support software transmit time stamping) will queue skbs into error queue
when tx stamping is enabled. But it forgets to purge the error queue during
detach. This patch fixes this.

Cc: Richard Cochran <richardcochran@gmail.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4bfb0513

qlcnic: use standard NAPI weights · df95fc44

Michal Schmidt authored Sep 04, 2013

Since commit 82dc3c63 ("net: introduce NAPI_POLL_WEIGHT")
netif_napi_add() produces an error message if a NAPI poll weight
greater than 64 is requested.

qlcnic requests the weight as large as 256 for some of its rings, and
smaller values for other rings. For instance in qlcnic_82xx_napi_add()
I think the intention was to give the tx+rx ring a bigger weight than
to rx-only rings, but it's actually doing the opposite. So I'm assuming
the weights do not really matter much.

Just use the standard NAPI weights for all rings.
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Acked-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

df95fc44

ipv6:introduce function to find route for redirect · b55b76b2

Duan Jiong authored Sep 04, 2013

RFC 4861 says that the IP source address of the Redirect is the
same as the current first-hop router for the specified ICMP
Destination Address, so the gateway should be taken into
consideration when we find the route for redirect.

There was once a check in commit
a6279458 ("NDISC: Search over
all possible rules on receipt of redirect.") and the check
went away in commit b94f1c09
("ipv6: Use icmpv6_notify() to propagate redirect, instead of
rt6_redirect()").

The bug is only "exploitable" on layer-2 because the source
address of the redirect is checked to be a valid link-local
address but it makes spoofing a lot easier in the same L2
domain nonetheless.

Thanks very much for Hannes's help.
Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

b55b76b2

bnx2x: VF RSS support - VF side · 60cad4e6

Ariel Elior authored Sep 04, 2013

In this patch capabilities are added to the Vf driver to request
multiple queues over the VF PF channel, and the logic for requesting
rss configuration for said queues.
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Eilong Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

60cad4e6

bnx2x: VF RSS support - PF side · b9871bcf

Ariel Elior authored Sep 04, 2013

This patch adds support for Receive Side Scaling for queues of
Virtual Functions on the PF side. This includes support for the
requests for multiple queues from VF drivers, configuration of the
HW for multiple queues per VF, and support for rss configuration
of said queues.
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b9871bcf

vxlan: Notify drivers for listening UDP port changes · 53cf5275

Joseph Gasparakis authored Sep 04, 2013

This patch adds two more ndo ops: ndo_add_rx_vxlan_port() and
ndo_del_rx_vxlan_port().

Drivers can get notifications through the above functions about changes
of the UDP listening port of VXLAN. Also, when physical ports come up,
now they can call vxlan_get_rx_port() in order to obtain the port number(s)
of the existing VXLAN interface in case they already up before them.

This information about the listening UDP port would be used for VXLAN
related offloads.

A big thank you to John Fastabend (john.r.fastabend@intel.com) for his
input and his suggestions on this patch set.

CC: John Fastabend <john.r.fastabend@intel.com>
CC: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Joseph Gasparakis <joseph.gasparakis@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

53cf5275

net: usbnet: update addr_assign_type if appropriate · eef23b53

Bjørn Mork authored Sep 04, 2013

This module generates a common default address on init,
using eth_random_addr. Set addr_assign_type to let
userspace know the address is random unless it was
overridden by the minidriver.
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Acked-by: Oliver Neukum <oliver@neukum.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

eef23b53

Merge branch 'enic' · c3de991f

David S. Miller authored Sep 05, 2013

Govindarajulu Varadarajan says:

====================
The following patch adds multi tx support for enic.
Signed-off-by: Nishank Trivedi <nistrive@cisco.com>
Signed-off-by: Christian Benvenuti <benve@cisco.com>
Signed-off-by: Govindarajulu Varadarajan <govindarajulu90@gmail.com>
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

c3de991f

driver/net: enic: update enic maintainers and driver · 001e1c1d

govindarajulu.v authored Sep 04, 2013

Signed-off-by: Govindarajulu Varadarajan <govindarajulu90@gmail.com>
Signed-off-by: Sujith Sankar <ssujith@cisco.com>
Signed-off-by: Nishank Trivedi <nistrive@cisco.com>
Signed-off-by: Christian Benvenuti <benve@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

001e1c1d

driver/net: enic: Exposing symbols for Cisco's low latency driver · 4a50ddfd

govindarajulu.v authored Sep 04, 2013

This patch exposes symbols for usnic low latency driver that can be used to
register and unregister vNics as well to traverse the resources on vNics.
Signed-off-by: Upinder Malhi <umalhi@cisco.com>
Signed-off-by: Nishank Trivedi <nistrive@cisco.com>
Signed-off-by: Christian Benvenuti <benve@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4a50ddfd

driver/net: enic: Try DMA 64 first, then failover to DMA · 624dbf55

govindarajulu.v authored Sep 04, 2013

In servers with more than 1.1 TB of RAM, the existing 40/32 bit DMA
could cause failure as the DMA-able address could go outside the range
addressable using 40/32 bits.

The following patch first tried 64 bit DMA if possible, failover to 32
bit.
Signed-off-by: Sujith Sankar <ssujith@cisco.com>
Signed-off-by: Christian Benvenuti <benve@cisco.com>
Signed-off-by: Govindarajulu Varadarajan <govindarajulu90@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

624dbf55

driver/net: enic: record q_number and rss_hash for skb · bf751ba8

govindarajulu.v authored Sep 04, 2013

The following patch sets the skb->rxhash and skb->q_number.
This is used by RPS and RFS. Kernel can make use of hw provided hash
instead of calculating the hash.
Signed-off-by: Govindarajulu Varadarajan <govindarajulu90@gmail.com>
Signed-off-by: Nishank Trivedi <nistrive@cisco.com>
Signed-off-by: Christian Benvenuti <benve@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bf751ba8

driver/net: enic: Add multi tx support for enic · 822473b6

govindarajulu.v authored Sep 04, 2013

The following patch adds multi tx support for enic.
Signed-off-by: Nishank Trivedi <nistrive@cisco.com>
Signed-off-by: Christian Benvenuti <benve@cisco.com>
Signed-off-by: Govindarajulu Varadarajan <govindarajulu90@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

822473b6

bridge: apply multicast snooping to IPv6 link-local, too · 3c3769e6

Linus Lüssing authored Sep 04, 2013

The multicast snooping code should have matured enough to be safely
applicable to IPv6 link-local multicast addresses (excluding the
link-local all nodes address, ff02::1), too.
Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

3c3769e6

bridge: prevent flooding IPv6 packets that do not have a listener · 8fad9c39

Linus Lüssing authored Sep 04, 2013

Currently if there is no listener for a certain group then IPv6 packets
for that group are flooded on all ports, even though there might be no
host and router interested in it on a port.

With this commit they are only forwarded to ports with a multicast
router.

Just like commit bd4265fe ("bridge: Only flood unregistered groups
to routers") did for IPv4, let's do the same for IPv6 with the same
reasoning.
Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

8fad9c39

04 Sep, 2013 13 commits

sh_eth: fix napi_{en|dis}able() calls racing against interrupts · d2779e99

Sergei Shtylyov authored Sep 04, 2013

While implementing NAPI for the driver, I overlooked the race conditions where
interrupt handler might have called napi_schedule_prep() before napi_enable()
was called or after napi_disable() was called. If RX interrupt happens, this
would cause the endless interrupts and messages like:

sh-eth eth0: ignoring interrupt, status 0x00040000, mask 0x01ff009f.

The interrupt wouldn't even be masked by the kernel eventually since the handler
would return IRQ_HANDLED all the time.

As a fix, move napi_enable() call before request_irq() call and napi_disable()
call after free_irq() call.
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d2779e99

net: ipv6: mld: document force_mld_version in ip-sysctl.txt · f2127810

Daniel Borkmann authored Sep 04, 2013

Document force_mld_version parameter in ip-sysctl.txt.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

f2127810

net: ipv6: mld: introduce mld_{gq, ifc, dad}_stop_timer functions · b4af8def

Daniel Borkmann authored Sep 04, 2013

We already have mld_{gq,ifc,dad}_start_timer() functions, so introduce
mld_{gq,ifc,dad}_stop_timer() functions to reduce code size and make it
more readable.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

b4af8def

net: ipv6: mld: refactor query processing into v1/v2 functions · 2b7c121f

Daniel Borkmann authored Sep 04, 2013

Make igmp6_event_query() a bit easier to read by refactoring code
parts into mld_process_v1() and mld_process_v2().
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

2b7c121f

net: ipv6: mld: similarly to MLDv2 have min max_delay of 1 · cc7f7ab7

Daniel Borkmann authored Sep 04, 2013

Similarly as we do in MLDv2 queries, set a forged MLDv1 query with
0 ms mld_maxdelay to minimum timer shot time of 1 jiffies. This is
eventually done in igmp6_group_queried() anyway, so we can simplify
a check there.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

cc7f7ab7

net: ipv6: mld: implement RFC3810 MLDv2 mode only · 58c0ecfd

Daniel Borkmann authored Sep 04, 2013

RFC3810, 10. Security Considerations says under subsection 10.1.
Query Message:

  A forged Version 1 Query message will put MLDv2 listeners on that
  link in MLDv1 Host Compatibility Mode. This scenario can be avoided
  by providing MLDv2 hosts with a configuration option to ignore
  Version 1 messages completely.

Hence, implement a MLDv2-only mode that will ignore MLDv1 traffic:

  echo 2 > /proc/sys/net/ipv6/conf/ethX/force_mld_version  or
  echo 2 > /proc/sys/net/ipv6/conf/all/force_mld_version

Note that <all> device has a higher precedence as it was previously
also the case in the macro MLD_V1_SEEN() that would "short-circuit"
if condition on <all> case.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

58c0ecfd

net: ipv6: mld: get rid of MLDV2_MRC and simplify calculation · e3f5b170

Daniel Borkmann authored Sep 04, 2013

Get rid of MLDV2_MRC and use our new macros for mantisse and
exponent to calculate Maximum Response Delay out of the Maximum
Response Code.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

e3f5b170

net: ipv6: mld: clean up MLD_V1_SEEN macro · 6c567b78

Daniel Borkmann authored Sep 04, 2013

Replace the macro with a function to make it more readable. GCC will
eventually decide whether to inline this or not (also, that's not
fast-path anyway).
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

6c567b78

net: ipv6: mld: fix v1/v2 switchback timeout to rfc3810, 9.12. · 89225d1c

Daniel Borkmann authored Sep 04, 2013

i) RFC3810, 9.2. Query Interval [QI] says:

   The Query Interval variable denotes the interval between General
   Queries sent by the Querier. Default value: 125 seconds. [...]

ii) RFC3810, 9.3. Query Response Interval [QRI] says:

  The Maximum Response Delay used to calculate the Maximum Response
  Code inserted into the periodic General Queries. Default value:
  10000 (10 seconds) [...] The number of seconds represented by the
  [Query Response Interval] must be less than the [Query Interval].

iii) RFC3810, 9.12. Older Version Querier Present Timeout [OVQPT] says:

  The Older Version Querier Present Timeout is the time-out for
  transitioning a host back to MLDv2 Host Compatibility Mode. When an
  MLDv1 query is received, MLDv2 hosts set their Older Version Querier
  Present Timer to [Older Version Querier Present Timeout].

  This value MUST be ([Robustness Variable] times (the [Query Interval]
  in the last Query received)) plus ([Query Response Interval]).

Hence, on *default* the timeout results in:

  [RV] = 2, [QI] = 125sec, [QRI] = 10sec
  [OVQPT] = [RV] * [QI] + [QRI] = 260sec

Having that said, we currently calculate [OVQPT] (here given as 'switchback'
variable) as ...

  switchback = (idev->mc_qrv + 1) * max_delay

RFC3810, 9.12. says "the [Query Interval] in the last Query received". In
section "9.14. Configuring timers", it is said:

  This section is meant to provide advice to network administrators on
  how to tune these settings to their network. Ambitious router
  implementations might tune these settings dynamically based upon
  changing characteristics of the network. [...]

iv) RFC38010, 9.14.2. Query Interval:

  The overall level of periodic MLD traffic is inversely proportional
  to the Query Interval. A longer Query Interval results in a lower
  overall level of MLD traffic. The value of the Query Interval MUST
  be equal to or greater than the Maximum Response Delay used to
  calculate the Maximum Response Code inserted in General Query
  messages.

I assume that was why switchback is calculated as is (3 * max_delay), although
this setting seems to be meant for routers only to configure their [QI]
interval for non-default intervals. So usage here like this is clearly wrong.

Concluding, the current behaviour in IPv6's multicast code is not conform
to the RFC as switch back is calculated wrongly. That is, it has a too small
value, so MLDv2 hosts switch back again to MLDv2 way too early, i.e. ~30secs
instead of ~260secs on default.

Hence, introduce necessary helper functions and fix this up properly as it
should be.

Introduced in 06da9228 ("[IPV6]: Add MLDv2 support."). Credits to Hannes
Frederic Sowa who also had a hand in this as well. Also thanks to Hangbin Liu
who did initial testing.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: David Stevens <dlstevens@us.ibm.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

89225d1c

net: ipv6: tcp: fix potential use after free in tcp_v6_do_rcv · 3a1c7565

Daniel Borkmann authored Sep 03, 2013

In tcp_v6_do_rcv() code, when processing pkt options, we soley work
on our skb clone opt_skb that we've created earlier before entering
tcp_rcv_established() on our way. However, only in condition ...

  if (np->rxopt.bits.rxtclass)
    np->rcv_tclass = ipv6_get_dsfield(ipv6_hdr(skb));

... we work on skb itself. As we extract every other information out
of opt_skb in ipv6_pktoptions path, this seems wrong, since skb can
already be released by tcp_rcv_established() earlier on. When we try
to access it in ipv6_hdr(), we will dereference freed skb.

[ Bug added by commit 4c507d28 ("net: implement IP_RECVTOS for
  IP_PKTOPTIONS") ]
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3a1c7565

tcp: better comments for RTO initiallization · 52f20e65

Yuchung Cheng authored Sep 03, 2013

Commit 1b7fdd2a("tcp: do not use cached RTT for RTT estimation")
removes important comments on how RTO is initialized and updated.
Hopefully this patch puts those information back.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

52f20e65

vxlan: Optimize vxlan rcv · 430eda6d

Pravin B Shelar authored Sep 03, 2013

vxlan-udp-recv function lookup vxlan_sock struct on every packet
recv by using udp-port number. we can use sk->sk_user_data to
store vxlan_sock and avoid lookup.
I have open coded rcu-api to store and read vxlan_sock from
sk_user_data to avoid sparse warning as sk_user_data is not
__rcu pointer.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

430eda6d

atm: he: print MAC via %pM · f8de3104

Andy Shevchenko authored Sep 03, 2013

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f8de3104