Commits · 087afe8aaf562dc7a53f2577049830d6a3245742 · nexedi / linux

21 May, 2016 3 commits

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 087afe8a

Linus Torvalds authored May 20, 2016

Pull networking fixes and more updates from David Miller:

 1) Tunneling fixes from Tom Herbert and Alexander Duyck.

 2) AF_UNIX updates some struct sock bit fields with the socket lock,
    whereas setsockopt() sets overlapping ones with locking.  Seperate
    out the synchronized vs.  the AF_UNIX unsynchronized ones to avoid
    corruption.  From Andrey Ryabinin.

 3) Mount BPF filesystem with mount_nodev rather than mount_ns, from
    Eric Biederman.

 4) A couple kmemdup conversions, from Muhammad Falak R Wani.

 5) BPF verifier fixes from Alexei Starovoitov.

 6) Don't let tunneled UDP packets get stuck in socket queues, if
    something goes wrong during the encapsulation just drop the packet
    rather than signalling an error up the call stack.  From Hannes
    Frederic Sowa.

 7) SKB ref after free in batman-adv, from Florian Westphal.

 8) TCP iSCSI, ocfs2, rds, and tipc have to disable BH in it's TCP
    callbacks since the TCP stack runs pre-emptibly now.  From Eric
    Dumazet.

 9) Fix crash in fixed_phy_add, from Rabin Vincent.

10) Fix length checks in xen-netback, from Paul Durrant.

11) Fix mixup in KEY vs KEYID macsec attributes, from Sabrina Dubroca.

12) RDS connection spamming bug fixes from Sowmini Varadhan

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (152 commits)
  net: suppress warnings on dev_alloc_skb
  uapi glibc compat: fix compilation when !__USE_MISC in glibc
  udp: prevent skbs lingering in tunnel socket queues
  bpf: teach verifier to recognize imm += ptr pattern
  bpf: support decreasing order in direct packet access
  net: usb: ch9200: use kmemdup
  ps3_gelic: use kmemdup
  net:liquidio: use kmemdup
  bpf: Use mount_nodev not mount_ns to mount the bpf filesystem
  net: cdc_ncm: update datagram size after changing mtu
  tuntap: correctly wake up process during uninit
  intel: Add support for IPv6 IP-in-IP offload
  ip6_gre: Do not allow segmentation offloads GRE_CSUM is enabled with FOU/GUE
  RDS: TCP: Avoid rds connection churn from rogue SYNs
  RDS: TCP: rds_tcp_accept_worker() must exit gracefully when terminating rds-tcp
  net: sock: move ->sk_shutdown out of bitfields.
  ipv6: Don't reset inner headers in ip6_tnl_xmit
  ip4ip6: Support for GSO/GRO
  ip6ip6: Support for GSO/GRO
  ipv6: Set features for IPv6 tunnels
  ...

087afe8a

locking,qspinlock: Fix spin_is_locked() and spin_unlock_wait() · 54cf809b

Peter Zijlstra authored May 20, 2016

Similar to commits:

  51d7d520 ("powerpc: Add smp_mb() to arch_spin_is_locked()")
  d86b8da0 ("arm64: spinlock: serialise spin_unlock_wait against concurrent lockers")

qspinlock suffers from the fact that the _Q_LOCKED_VAL store is
unordered inside the ACQUIRE of the lock.

And while this is not a problem for the regular mutual exclusive
critical section usage of spinlocks, it breaks creative locking like:

	spin_lock(A)			spin_lock(B)
	spin_unlock_wait(B)		if (!spin_is_locked(A))
	do_something()			  do_something()

In that both CPUs can end up running do_something at the same time,
because our _Q_LOCKED_VAL store can drop past the spin_unlock_wait()
spin_is_locked() loads (even on x86!!).

To avoid making the normal case slower, add smp_mb()s to the less used
spin_unlock_wait() / spin_is_locked() side of things to avoid this
problem.
Reported-and-tested-by: Davidlohr Bueso <dave@stgolabs.net>
Reported-by: Giovanni Gherdovich <ggherdovich@suse.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org   # v4.2 and later
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

54cf809b

Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6 · b99a9e87

Linus Torvalds authored May 20, 2016

Pull cifs fixes from Steve French:
 "Two small cifs fixes, including one spnego upcall cifs security fix
  for stable"

* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
  CIFS: Remove some obsolete comments
  cifs: Create dedicated keyring for spnego operations

b99a9e87

20 May, 2016 37 commits

net: suppress warnings on dev_alloc_skb · 95829b3a

Neil Horman authored May 19, 2016

Noticed an allocation failure in a network driver the other day on a 32 bit
system:

DMA-API: debugging out of memory - disabling
bnx2fc: adapter_lookup: hba NULL
lldpad: page allocation failure. order:0, mode:0x4120
Pid: 4556, comm: lldpad Not tainted 2.6.32-639.el6.i686.debug #1
Call Trace:
 [<c08a4086>] ? printk+0x19/0x23
 [<c05166a4>] ? __alloc_pages_nodemask+0x664/0x830
 [<c0649d02>] ? free_object+0x82/0xa0
 [<fb4e2c9b>] ? ixgbe_alloc_rx_buffers+0x10b/0x1d0 [ixgbe]
 [<fb4e2fff>] ? ixgbe_configure_rx_ring+0x29f/0x420 [ixgbe]
 [<fb4e228c>] ? ixgbe_configure_tx_ring+0x15c/0x220 [ixgbe]
 [<fb4e3709>] ? ixgbe_configure+0x589/0xc00 [ixgbe]
 [<fb4e7be7>] ? ixgbe_open+0xa7/0x5c0 [ixgbe]
 [<fb503ce6>] ? ixgbe_init_interrupt_scheme+0x5b6/0x970 [ixgbe]
 [<fb4e8e54>] ? ixgbe_setup_tc+0x1a4/0x260 [ixgbe]
 [<fb505a9f>] ? ixgbe_dcbnl_set_state+0x7f/0x90 [ixgbe]
 [<c088d80d>] ? dcb_doit+0x10ed/0x16d0
...

Thought that perhaps the big splat in the logs wasn't really necessecary, as
all call sites for dev_alloc_skb:

a) check the return code for the function

and

b) either print their own error message or have a recovery path that makes the
warning moot.

Fix it by modifying dev_alloc_pages to pass __GFP_NOWARN as a gfp flag to
suppress the warning

applies to the net tree
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: Alexander Duyck <alexander.duyck@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

95829b3a

uapi glibc compat: fix compilation when !__USE_MISC in glibc · f0a3fdca

Nicolas Dichtel authored May 19, 2016

These structures are defined only if __USE_MISC is set in glibc net/if.h
headers, ie when _BSD_SOURCE or _SVID_SOURCE are defined.

CC: Jan Engelhardt <jengelh@inai.de>
CC: Josh Boyer <jwboyer@fedoraproject.org>
CC: Stephen Hemminger <shemming@brocade.com>
CC: Waldemar Brodkorb <mail@waldemar-brodkorb.de>
CC: Gabriel Laskar <gabriel@lse.epita.fr>
CC: Mikko Rapeli <mikko.rapeli@iki.fi>
Fixes: 4a91cb61 ("uapi glibc compat: fix compile errors when glibc net/if.h included before linux/if.h")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f0a3fdca

udp: prevent skbs lingering in tunnel socket queues · e5aed006

Hannes Frederic Sowa authored May 19, 2016

In case we find a socket with encapsulation enabled we should call
the encap_recv function even if just a udp header without payload is
available. The callbacks are responsible for correctly verifying and
dropping the packets.

Also, in case the header validation fails for geneve and vxlan we
shouldn't put the skb back into the socket queue, no one will pick
them up there.  Instead we can simply discard them in the respective
encap_recv functions.
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

e5aed006

Merge branch 'bpf-verifier-fixes' · cb543e80

David S. Miller authored May 20, 2016

Alexei Starovoitov says:

====================
bpf: verifier fixes

Further testing of 'direct packet access' uncovered
several usability issues. Fix them.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

cb543e80

bpf: teach verifier to recognize imm += ptr pattern · 1b9b69ec

Alexei Starovoitov authored May 19, 2016

Humans don't write C code like:
  u8 *ptr = skb->data;
  int imm = 4;
  imm += ptr;
but from llvm backend point of view 'imm' and 'ptr' are registers and
imm += ptr may be preferred vs ptr += imm depending which register value
will be used further in the code, while verifier can only recognize ptr += imm.
That caused small unrelated changes in the C code of the bpf program to
trigger rejection by the verifier. Therefore teach the verifier to recognize
both ptr += imm and imm += ptr.
For example:
when R6=pkt(id=0,off=0,r=62) R7=imm22
after r7 += r6 instruction
will be R6=pkt(id=0,off=0,r=62) R7=pkt(id=0,off=22,r=62)

Fixes: 969bf05e ("bpf: direct packet access")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

1b9b69ec

bpf: support decreasing order in direct packet access · d91b28ed

Alexei Starovoitov authored May 19, 2016

when packet headers are accessed in 'decreasing' order (like TCP port
may be fetched before the program reads IP src) the llvm may generate
the following code:
[...]                // R7=pkt(id=0,off=22,r=70)
r2 = *(u32 *)(r7 +0) // good access
[...]
r7 += 40             // R7=pkt(id=0,off=62,r=70)
r8 = *(u32 *)(r7 +0) // good access
[...]
r1 = *(u32 *)(r7 -20) // this one will fail though it's within a safe range
                      // it's doing *(u32*)(skb->data + 42)
Fix verifier to recognize such code pattern

Alos turned out that 'off > range' condition is not a verifier bug.
It's a buggy program that may do something like:
if (ptr + 50 > data_end)
  return 0;
ptr += 60;
*(u32*)ptr;
in such case emit
"invalid access to packet, off=0 size=4, R1(id=0,off=60,r=50)" error message,
so all information is available for the program author to fix the program.

Fixes: 969bf05e ("bpf: direct packet access")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

d91b28ed

net: usb: ch9200: use kmemdup · 238a9584

Muhammad Falak R Wani authored May 19, 2016

Use kmemdup when some other buffer is immediately copied into allocated
region. It replaces call to allocation followed by memcpy, by a single
call to kmemdup.
Signed-off-by: Muhammad Falak R Wani <falakreyaz@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

238a9584

ps3_gelic: use kmemdup · 5877debe

Muhammad Falak R Wani authored May 19, 2016

Use kmemdup when some other buffer is immediately copied into allocated
region. It replaces call to allocation followed by memcpy, by a single
call to kmemdup.
Signed-off-by: Muhammad Falak R Wani <falakreyaz@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5877debe

net:liquidio: use kmemdup · 7c542772

Muhammad Falak R Wani authored May 19, 2016

Use kmemdup when some other buffer is immediately copied into allocated
region. It replaces call to allocation followed by memcpy, by a single
call to kmemdup.
Signed-off-by: Muhammad Falak R Wani <falakreyaz@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7c542772

bpf: Use mount_nodev not mount_ns to mount the bpf filesystem · e27f4a94

Eric W. Biederman authored May 20, 2016

While reviewing the filesystems that set FS_USERNS_MOUNT I spotted the
bpf filesystem. Looking at the code I saw a broken usage of mount_ns
with current->nsproxy->mnt_ns. As the code does not acquire a
reference to the mount namespace it can not possibly be correct to
store the mount namespace on the superblock as it does.

Replace mount_ns with mount_nodev so that each mount of the bpf
filesystem returns a distinct instance, and the code is not buggy.

In discussion with Hannes Frederic Sowa it was reported that the use
of mount_ns was an attempt to have one bpf instance per mount
namespace, in an attempt to keep resources that pin resources from
hiding. That intent simply does not work, the vfs is not built to
allow that kind of behavior. Which means that the bpf filesystem
really is buggy both semantically and in it's implemenation as it does
not nor can it implement the original intent.

This change is userspace visible, but my experience with similar
filesystems leads me to believe nothing will break with a model of each
mount of the bpf filesystem is distinct from all others.

Fixes: b2197755 ("bpf: add support for persistent maps/progs")
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

e27f4a94

Merge tag 'wireless-drivers-next-for-davem-2016-05-13' of... · 56025caa

David S. Miller authored May 20, 2016

Merge tag 'wireless-drivers-next-for-davem-2016-05-13' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next

Kalle Valo says:

====================
wireless-drivers patches for 4.7

Major changes:

iwlwifi

* remove IWLWIFI_DEBUG_EXPERIMENTAL_UCODE kconfig option
* work for RX multiqueue continues
* dynamic queue allocation work continues
* add Luca as maintainer
* a bunch of fixes and improvements all over

brcmfmac

* add 4356 sdio support

ath6kl

* add ability to set debug uart baud rate with a module parameter

wil6210

* add debugfs file to configure firmware led functionality
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

56025caa

net: cdc_ncm: update datagram size after changing mtu · 05a56487

Rafal Redzimski authored May 19, 2016

Current implementation updates the mtu size and notify cdc_ncm
device using USB_CDC_SET_MAX_DATAGRAM_SIZE request about datagram
size change instead of changing rx_urb_size.

Whenever mtu is being changed, datagram size should also be
updated. Also updating maxmtu formula so it takes max_datagram_size with
use of cdc_ncm_max_dgram_size() and not ctx.
Signed-off-by: Robert Dobrowolski <robert.dobrowolski@linux.intel.com>
Signed-off-by: Rafal Redzimski <rafal.f.redzimski@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

05a56487

tuntap: correctly wake up process during uninit · addf8fc4

Jason Wang authored May 19, 2016

We used to check dev->reg_state against NETREG_REGISTERED after each
time we are woke up. But after commit 9e641bdc ("net-tun:
restructure tun_do_read for better sleep/wakeup efficiency"), it uses
skb_recv_datagram() which does not check dev->reg_state. This will
result if we delete a tun/tap device after a process is blocked in the
reading. The device will wait for the reference count which was held
by that process for ever.

Fixes this by using RCV_SHUTDOWN which will be checked during
sk_recv_datagram() before trying to wake up the process during uninit.

Fixes: 9e641bdc ("net-tun: restructure tun_do_read for better
sleep/wakeup efficiency")
Cc: Eric Dumazet <edumazet@google.com>
Cc: Xi Wang <xii@google.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

addf8fc4

Merge branch 'GREoIPV6-followups' · 7fd3c56d

David S. Miller authored May 20, 2016

Alexander Duyck says:

====================
Follow-ups for GUEoIPv6 patches

This patch series is meant to be applied after:
[PATCH v7 net-next 00/16] ipv6: Enable GUEoIPv6 and more fixes for v6 tunneling

The first patch addresses an issue we already resolved in the GREv4 and is
now present in GREv6 with the introduction of FOU/GUE for IPv6 based GRE
tunnels.

The second patch goes through and enables IPv6 tunnel offloads for the Intel
NICs that already support the IPv4 based IP-in-IP tunnel offloads.  I have
only done a bit of touch testing but have seen ~20 Gb/s over an i40e
interface using a v4-in-v6 tunnel, and I have verified IPv6 GRE is still
passing traffic at around the same rate.  I plan to do further testing but
with these patches present it should enable a wider audience to be able to
test the new features introduced in Tom's patchset with hardware offloads.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

7fd3c56d

intel: Add support for IPv6 IP-in-IP offload · bf2d1df3

Alexander Duyck authored May 18, 2016

This patch adds support for offloading IPXIP6 type packets that represent
either IPv4 or IPv6 encapsulated inside of an IPv6 outer IP header.  In
addition with this change we should also be able to support FOU
encapsulated traffic with outer IPv6 headers.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bf2d1df3

ip6_gre: Do not allow segmentation offloads GRE_CSUM is enabled with FOU/GUE · 6a553681

Alexander Duyck authored May 18, 2016

This patch addresses the same issue we had for IPv4 where enabling GRE with
an inner checksum cannot be supported with FOU/GUE due to the fact that
they will jump past the GRE header at it is treated like a tunnel header.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6a553681

Merge branch 'rds-conn-spamming' · 9a0351df

David S. Miller authored May 20, 2016

Sowmini Varadhan says:

====================
RDS: TCP: connection spamming fixes

We have been testing the RDS-TCP code with a connection spammer
that sends incoming SYNs to the RDS listen port well after
an rds-tcp connection has been established, and found a few
race-windows that are fixed by this patch series.

Patch 1 avoids a null pointer deref when an incoming SYN
shows up when a netns is being dismantled, or when the
rds-tcp module is being unloaded.

Patch 2 addresses the case when a SYN is received after the
connection arbitration algorithm has converged: the incoming
SYN should not needlessly quiesce the transmit path, and it
should not result in needless TCP connection resets due to
re-execution of the connection arbitration logic.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

9a0351df

RDS: TCP: Avoid rds connection churn from rogue SYNs · c948bb5c

Sowmini Varadhan authored May 18, 2016

When a rogue SYN is received after the connection arbitration
algorithm has converged, the incoming SYN should not needlessly
quiesce the transmit path, and it should not result in needless
TCP connection resets due to re-execution of the connection
arbitration logic.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c948bb5c

RDS: TCP: rds_tcp_accept_worker() must exit gracefully when terminating rds-tcp · 37e14f4f

Sowmini Varadhan authored May 18, 2016

There are two instances where we want to terminate RDS-TCP: when
exiting the netns or during module unload. In either case, the
termination sequence is to stop the listen socket, mark the
rtn->rds_tcp_listen_sock as null, and flush any accept workqs.
Thus any workqs that get flushed at this point will encounter a
null rds_tcp_listen_sock, and must exit gracefully to allow
the RDS-TCP termination to complete successfully.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

37e14f4f

Merge tag 'gfs2-4.7.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 · be1332c0

Linus Torvalds authored May 20, 2016

Pull GFS2 updates from Bob Peterson:
 "We've got nine patches this time:

   - Abhi Das has two patches that fix a GFS2 splice issue (and an
     adjustment).

   - Ben Marzinski has a patch which allows the proper unmount of a GFS2
     file system after hitting a withdraw error.

   - I have a patch to fix a problem where GFS2 would dereference an
     error value, plus three cosmetic / refactoring patches.

   - Daniel DeFreez has a patch to fix two glock reference count
     problems, where GFS2 was not properly "uninitializing" its glock
     holder on error paths.

   - Denys Vlasenko has a patch to change a function to not be inlined,
     thus reducing the memory footprint of the GFS2 module"

* tag 'gfs2-4.7.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  GFS2: Refactor gfs2_remove_from_journal
  GFS2: Remove allocation parms from gfs2_rbm_find
  gfs2: use inode_lock/unlock instead of accessing i_mutex directly
  GFS2: Add calls to gfs2_holder_uninit in two error handlers
  GFS2: Don't dereference inode in gfs2_inode_lookup until it's valid
  GFS2: fs/gfs2/glock.c: Deinline do_error, save 1856 bytes
  gfs2: Use gfs2 wrapper to sync inode before calling generic_file_splice_read()
  GFS2: Get rid of dead code in inode_go_demote_ok
  GFS2: ignore unlock failures after withdraw

be1332c0

net: sock: move ->sk_shutdown out of bitfields. · fc64869c

Andrey Ryabinin authored May 18, 2016

->sk_shutdown bits share one bitfield with some other bits in sock struct,
such as ->sk_no_check_[r,t]x, ->sk_userlocks ...
sock_setsockopt() may write to these bits, while holding the socket lock.

In case of AF_UNIX sockets, we change ->sk_shutdown bits while holding only
unix_state_lock(). So concurrent setsockopt() and shutdown() may lead
to corrupting these bits.

Fix this by moving ->sk_shutdown bits out of bitfield into a separate byte.
This will not change the 'struct sock' size since ->sk_shutdown moved into
previously unused 16-bit hole.
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fc64869c

Merge branch 'GREoIPV6' · 911f6b1f

David S. Miller authored May 20, 2016

Tom Herbert says:

====================
ipv6: Enable GUEoIPv6 and more fixes for v6 tunneling

This patch set:
  - Fixes GRE6 to process translate flags correctly from configuration
  - Adds support for GSO and GRO for ip6ip6 and ip4ip6
  - Add support for FOU and GUE in IPv6
  - Support GRE, ip6ip6 and ip4ip6 over FOU/GUE
  - Fixes ip6_input to deal with UDP encapsulations
  - Some other minor fixes

v2:
  - Removed a check of GSO types in MPLS
  - Define GSO type SKB_GSO_IPXIP6 and SKB_GSO_IPXIP4 (based on input
    from Alexander)
  - Don't define GSO types specifically for IP6IP6 and IP4IP6, above
    fix makes that unnecessary
  - Don't bother clearing encapsulation flag in UDP tunnel segment
    (another item suggested by Alexander).

v3:
  - Address some minor comments from Alexander

v4:
  - Rebase on changes to fix IP TX tunnels
  - Fix MTU issues in ip4ip6, ip6ip6
  - Add test data for above

v5:
  - Address feedback from Shmulik Ladkani regarding extension header
    code that does not return next header but in instead relies
    on returning value via nhoff. Solution here is to fix EH
    processing to return nexthdr value.
  - Refactored IPv4 encaps so that we won't need to create
    a ip6_tunnel_core.c when adding encap support IPv6.

v6:
  - Fix build issues with regard to new GSO constants
  - FIx MTU calculation issues ip6_tunnel.c pointed out byt ALex
  - Add encap_hlen into headroom for GREv6 to work with FOU/GUE

v7:
  - Added skb_set_inner_ipproto to ip4ip6 and ip6ip6
  - Clarified max_headroom in ip6_tnl_xmit
  - Set features for IPv6 tunnels
  - Other cleanup suggested by Alexander
  - Above fixes throughput performance issues in ip4ip6 and ip6ip6,
    updated test results to reflect that

Tested: Various cases of IP tunnels with netperf TCP_STREAM and TCP_RR.

    - IPv4/GRE/GUE/IPv6 with RCO
      1 TCP_STREAM
      	6616 Mbps
      200 TCP_RR
	1244043 tps
        141/243/446 90/95/99% latencies
	86.61% CPU utilization

    - IPv6/GRE/GUE/IPv6 with RCO
      1 TCP_STREAM
	6940 Mbps
      200 TCP_RR
	1270903 tps
	138/236/440 90/95/99% latencies
	87.51% CPU utilization

     - IP6IP6
      1 TCP_STREAM
	5307 Mbps
      200 TCP_RR
	498981 tps
	388/498/631 90/95/99% latencies
	19.75% CPU utilization (1 CPU saturated)

     - IP6IP6/GUE with RCO
      1 TCP_STREAM
	5575 Mbps
      200 TCP_RR
	1233818 tps
	143/244/451 90/95/99% latencies
	87.57 CPU utilization

     - IP4IP6
      1 TCP_STREAM
	5235 Mbps
      200 TCP_RR
	763774 tps
	250/318/466 90/95/99% latencies
	35.25% CPU utilization (1 CPU saturated)

     - IP4IP6/GUE with RCO
      1 TCP_STREAM
	5337 Mbps
      200 TCP_RR
	1196385 tps
	148/251/460 90/95/99% latencies
	87.56 CPU utilization

     - GRE with keyid
      200 TCP_RR
	744173 tps
	258/332/461 90/95/99% latencies
	34.59% CPU utilization (1 CPU saturated)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

911f6b1f

ipv6: Don't reset inner headers in ip6_tnl_xmit · 3ee93eaf

Tom Herbert authored May 18, 2016

Since iptunnel_handle_offloads() is called in all paths we can
probably drop the block in ip6_tnl_xmit that was checking for
skb->encapsulation and resetting the inner headers.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3ee93eaf

ip4ip6: Support for GSO/GRO · b8921ca8

Tom Herbert authored May 18, 2016

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b8921ca8

ip6ip6: Support for GSO/GRO · 815d22e5

Tom Herbert authored May 18, 2016

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

815d22e5

ipv6: Set features for IPv6 tunnels · 51c052d4

Tom Herbert authored May 18, 2016

Need to set dev features, use same values that are used in GREv6.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

51c052d4

ip6_tunnel: Add support for fou/gue encapsulation · b3a27b51

Tom Herbert authored May 18, 2016

Add netlink and setup for encapsulation
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b3a27b51

ip6_gre: Add support for fou/gue encapsulation · 1faf3d9f

Tom Herbert authored May 18, 2016

Add netlink and setup for encapsulation
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1faf3d9f

fou: Add encap ops for IPv6 tunnels · aa3463d6

Tom Herbert authored May 18, 2016

This patch add a new fou6 module that provides encapsulation
operations for IPv6.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

aa3463d6

ip6_tun: Add infrastructure for doing encapsulation · 058214a4

Tom Herbert authored May 18, 2016

Add encap_hlen and ip_tunnel_encap structure to ip6_tnl. Add functions
for getting encap hlen, setting up encap on a tunnel, performing
encapsulation operation.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

058214a4

fou: Support IPv6 in fou · 5f914b68

Tom Herbert authored May 18, 2016

This patch adds receive path support for IPv6 with fou.

- Add address family to fou structure for open sockets. This supports
  AF_INET and AF_INET6. Lookups for fou ports are performed on both the
  port number and family.
- In fou and gue receive adjust tot_len in IPv4 header or payload_len
  based on address family.
- Allow AF_INET6 in FOU_ATTR_AF netlink attribute.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5f914b68

fou: Split out {fou,gue}_build_header · dc969b81

Tom Herbert authored May 18, 2016

Create __fou_build_header and __gue_build_header. These implement the
protocol generic parts of building the fou and gue header.
fou_build_header and gue_build_header implement the IPv4 specific
functions and call the __*_build_header functions.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dc969b81

fou: Call setup_udp_tunnel_sock · 440924bb

Tom Herbert authored May 18, 2016

Use helper function to set up UDP tunnel related information for a fou
socket.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

440924bb

net: Cleanup encap items in ip_tunnels.h · 55c2bc14

Tom Herbert authored May 18, 2016

Consolidate all the ip_tunnel_encap definitions in one spot in the
header file. Also, move ip_encap_hlen and ip_tunnel_encap from
ip_tunnel.c to ip_tunnels.h so they call be called without a dependency
on ip_tunnel module. Similarly, move iptun_encaps to ip_tunnel_core.c.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

55c2bc14

ipv6: Change "final" protocol processing for encapsulation · 1da44f9c

Tom Herbert authored May 18, 2016

When performing foo-over-UDP, UDP packets are processed by the
encapsulation handler which returns another protocol to process.
This may result in processing two (or more) protocols in the
loop that are marked as INET6_PROTO_FINAL. The actions taken
for hitting a final protocol, in particular the skb_postpull_rcsum
can only be performed once.

This patch set adds a check of a final protocol has been seen. The
rules are:
  - If the final protocol has not been seen any protocol is processed
    (final and non-final). In the case of a final protocol, the final
    actions are taken (like the skb_postpull_rcsum)
  - If a final protocol has been seen (e.g. an encapsulating UDP
    header) then no further non-final protocols are allowed
    (e.g. extension headers). For more final protocols the
    final actions are not taken (e.g. skb_postpull_rcsum).
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1da44f9c

ipv6: Fix nexthdr for reinjection · 4c64242a

Tom Herbert authored May 18, 2016

In ip6_input_finish the nexthdr protocol is retrieved from the
next header offset that is returned in the cb of the skb.
This method does not work for UDP encapsulation that may not
even have a concept of a nexthdr field (e.g. FOU).

This patch checks for a final protocol (INET6_PROTO_FINAL) when a
protocol handler returns > 0. If the protocol is not final then
resubmission is performed on nhoff value. If the protocol is final
then the nexthdr is taken to be the return value.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4c64242a

net: define gso types for IPx over IPv4 and IPv6 · 7e13318d

Tom Herbert authored May 18, 2016

This patch defines two new GSO definitions SKB_GSO_IPXIP4 and
SKB_GSO_IPXIP6 along with corresponding NETIF_F_GSO_IPXIP4 and
NETIF_F_GSO_IPXIP6. These are used to described IP in IP
tunnel and what the outer protocol is. The inner protocol
can be deduced from other GSO types (e.g. SKB_GSO_TCPV4 and
SKB_GSO_TCPV6). The GSO types of SKB_GSO_IPIP and SKB_GSO_SIT
are removed (these are both instances of SKB_GSO_IPXIP4).
SKB_GSO_IPXIP6 will be used when support for GSO with IP
encapsulation over IPv6 is added.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7e13318d