Commits · 803fafbe0cd522fa6b9e41ca3b96cfb2e2a2222d · nexedi / linux

07 Mar, 2018 5 commits

fsl/fman: avoid sleeping in atomic context while adding an address · 803fafbe

Denis Kirjanov authored Mar 04, 2018

__dev_mc_add grabs an adress spinlock so use
atomic context in kmalloc.

/ # ifconfig eth0 inet 192.168.0.111
[   89.331622] BUG: sleeping function called from invalid context at mm/slab.h:420
[   89.339002] in_atomic(): 1, irqs_disabled(): 0, pid: 1035, name: ifconfig
[   89.345799] 2 locks held by ifconfig/1035:
[   89.349908]  #0:  (rtnl_mutex){+.+.}, at: [<(ptrval)>] devinet_ioctl+0xc0/0x8a0
[   89.357258]  #1:  (_xmit_ETHER){+...}, at: [<(ptrval)>] __dev_mc_add+0x28/0x80
[   89.364520] CPU: 1 PID: 1035 Comm: ifconfig Not tainted 4.16.0-rc3-dirty #8
[   89.371464] Call Trace:
[   89.373908] [e959db60] [c066f948] dump_stack+0xa4/0xfc (unreliable)
[   89.380177] [e959db80] [c00671d8] ___might_sleep+0x248/0x280
[   89.385833] [e959dba0] [c01aec34] kmem_cache_alloc_trace+0x174/0x320
[   89.392179] [e959dbd0] [c04ab920] dtsec_add_hash_mac_address+0x130/0x240
[   89.398874] [e959dc00] [c04a9d74] set_multi+0x174/0x1b0
[   89.404093] [e959dc30] [c04afb68] dpaa_set_rx_mode+0x68/0xe0
[   89.409745] [e959dc40] [c057baf8] __dev_mc_add+0x58/0x80
[   89.415052] [e959dc60] [c060fd64] igmp_group_added+0x164/0x190
[   89.420878] [e959dca0] [c060ffa8] ip_mc_inc_group+0x218/0x460
[   89.426617] [e959dce0] [c06120fc] ip_mc_up+0x3c/0x190
[   89.431662] [e959dd10] [c0607270] inetdev_event+0x250/0x620
[   89.437227] [e959dd50] [c005f190] notifier_call_chain+0x80/0xf0
[   89.443138] [e959dd80] [c0573a74] __dev_notify_flags+0x54/0xf0
[   89.448964] [e959dda0] [c05743f8] dev_change_flags+0x48/0x60
[   89.454615] [e959ddc0] [c0606744] devinet_ioctl+0x544/0x8a0
[   89.460180] [e959de10] [c060987c] inet_ioctl+0x9c/0x1f0
[   89.465400] [e959de80] [c05479a8] sock_ioctl+0x168/0x460
[   89.470708] [e959ded0] [c01cf3ec] do_vfs_ioctl+0xac/0x8c0
[   89.476099] [e959df20] [c01cfc40] SyS_ioctl+0x40/0xc0
[   89.481147] [e959df40] [c0011318] ret_from_syscall+0x0/0x3c
[   89.486715] --- interrupt: c01 at 0x1006943c
[   89.486715]     LR = 0x100c45ec
Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org>
Acked-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

803fafbe

Merge branch 'rhltable-dups' · 6f22c07f

David S. Miller authored Mar 07, 2018

Paul Blakey says:

====================
rhashtable: Fix rhltable duplicates insertion

On our mlx5 driver fs_core.c, we use the rhltable interface to store
flow groups. We noticed that sometimes we get a warning that flow group isn't
found at removal. This rare case was caused when a specific scenario happened,
insertion of a flow group with a similar match criteria (a duplicate),
but only where the flow group rhash_head was second (or not first)
on the relevant rhashtable bucket list.

The first patch fixes it, and the second one adds a test that show
it is now working.

Paul.

v3 --> v2 changes:
    * added missing fix in rhashtable_lookup_one code path as well.

v1 --> v2 changes:
    * Changed commit messages to better reflect the change
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

6f22c07f

test_rhashtable: add test case for rhltable with duplicate objects · 499ac3b6

Paul Blakey authored Mar 04, 2018

Tries to insert duplicates in the middle of bucket's chain:
bucket 1:  [[val 21 (tid=1)]] -> [[ val 1 (tid=2),  val 1 (tid=0) ]]

Reuses tid to distinguish the elements insertion order.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

499ac3b6

rhashtable: Fix rhlist duplicates insertion · d3dcf8eb

Paul Blakey authored Mar 04, 2018

When inserting duplicate objects (those with the same key),
current rhlist implementation messes up the chain pointers by
updating the bucket pointer instead of prev next pointer to the
newly inserted node. This causes missing elements on removal and
travesal.

Fix that by properly updating pprev pointer to point to
the correct rhash_head next pointer.

Issue: 1241076
Change-Id: I86b2c140bcb4aeb10b70a72a267ff590bb2b17e7
Fixes: ca26893f ('rhashtable: Add rhlist interface')
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

d3dcf8eb

dt-bindings: net: renesas-ravb: Make stream buffer optional · 25b5cdfc

Geert Uytterhoeven authored Mar 02, 2018

The Stream Buffer for EtherAVB-IF (STBE) is an optional component, and
is not present on all SoCs.

Document this in the DT bindings, including a list of SoCs that do have
it.

Fixes: 785ec874 ("ravb: document R8A77970 bindings")
Fixes: f231c417 ("dt-bindings: net: renesas-ravb: Add support for R8A77995 RAVB")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Simon Horman <horms+renesas@verge.net.au>
Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

25b5cdfc

06 Mar, 2018 8 commits

net: Only honor ifindex in IP_PKTINFO if non-0 · 2cbb4ea7

David Ahern authored Feb 16, 2018

Only allow ifindex from IP_PKTINFO to override SO_BINDTODEVICE settings
if the index is actually set in the message.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2cbb4ea7

ethernet: natsemi: correct spelling · 22b8d811

Randy Dunlap authored Mar 02, 2018

Correct spelling of National Semi-conductor (no hyphen)
in drivers/net/ethernet/.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

22b8d811

Merge branch 'net-Use-strlcpy-for-ethtool-get_strings' · 30b7483f

David S. Miller authored Mar 06, 2018

Florian Fainelli says:

====================
net: Use strlcpy() for ethtool::get_strings

After turning on KASAN on one of my systems, I started getting lots of out of
bounds errors while fetching a given port's statistics, and indeed using
memcpy() is unsafe for copying strings which have not been declared as an array
of ETH_GSTRING_LEN bytes, so let's use strlcpy() instead. This allows the best
of both worlds: we still keep the efficient memory usage of variably sized
strings, but we don't copy more than we need to.

Changes in v2:
- dropped the 3 other patches that were not necessary
- use strlcpy() instead of strncpy()
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

30b7483f

net: phy: broadcom: Use strlcpy() for ethtool::get_strings · 8a17eefa

Florian Fainelli authored Mar 02, 2018

Our statistics strings are allocated at initialization without being
bound to a specific size, yet, we would copy ETH_GSTRING_LEN bytes using
memcpy() which would create out of bounds accesses, this was flagged by
KASAN. Replace this with strlcpy() to make sure we are bound the source
buffer size and we also always NUL-terminate strings.

Fixes: 820ee17b ("net: phy: broadcom: Add support code for reading PHY counters")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8a17eefa

net: phy: micrel: Use strlcpy() for ethtool::get_strings · 55f53567

Florian Fainelli authored Mar 02, 2018

Fixes: 2b2427d0 ("phy: micrel: Add ethtool statistics counters")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

55f53567

net: phy: marvell: Use strlcpy() for ethtool::get_strings · 98409b2b

Florian Fainelli authored Mar 02, 2018

Fixes: d2fa47d9 ("phy: marvell: Add ethtool statistics counters")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

98409b2b

net: dsa: b53: Use strlcpy() for ethtool::get_strings · cd526676

Florian Fainelli authored Mar 02, 2018

Fixes: 967dd82f ("net: dsa: b53: Add support for Broadcom RoboSwitch")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cd526676

Merge tag 'please-pull-ia64_misc' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux · ce380619

Linus Torvalds authored Mar 05, 2018

Pull ia64 cleanups from Tony Luck:

 - More atomic cleanup from willy

 - Fix a python script to work with version 3

 - Some other small cleanups

* tag 'please-pull-ia64_misc' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
  ia64/err-inject: fix spelling mistake: "capapbilities" -> "capabilities"
  ia64/err-inject: Use get_user_pages_fast()
  ia64: doc: tweak whitespace for 'console=' parameter
  ia64: Convert remaining atomic operations
  ia64: convert unwcheck.py to python3

ce380619

05 Mar, 2018 18 commits

ia64/err-inject: fix spelling mistake: "capapbilities" -> "capabilities" · 48e362dd

Colin Ian King authored Mar 02, 2018

Trivial fix to spelling mistake in debug message text.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>

48e362dd

ia64/err-inject: Use get_user_pages_fast() · 69c90702

Davidlohr Bueso authored Jan 22, 2018

At the point of sysfs callback, the call to gup is
done without mmap_sem (or any lock for that matter).
This is racy. As such, use the get_user_pages_fast()
alternative and safely avoid taking the lock, if possible.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>

69c90702

ia64: doc: tweak whitespace for 'console=' parameter · 339d541a

Sergei Trofimovich authored Feb 24, 2018

CC: Tony Luck <tony.luck@intel.com>
CC: Fenghua Yu <fenghua.yu@intel.com>
CC: linux-ia64@vger.kernel.org
CC: linux-kernel@vger.kernel.org
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>

339d541a

ia64: Convert remaining atomic operations · 2879b65f

Matthew Wilcox authored Feb 19, 2018

While we've only seen inlining problems with atomic_sub_return(),
the other atomic operations could have the same problem.  Convert all
remaining operations to use the same solution as atomic_sub_return().
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>

2879b65f

ia64: convert unwcheck.py to python3 · bd5edbe6

Corentin Labbe authored Feb 14, 2018

Since my system use python3 as default, arch/ia64/scripts/unwcheck.py no
longer run.

This patch convert it to the python3 syntax.
I have ran it with python2/python3 while printing values of
start/end/rlen_sum which could be impacted by this change and I see no difference.

Fixes: 94a47083 ("scripts: change scripts to use system python instead of env")
Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>

bd5edbe6

Merge tag 'linux-kselftest-4.16-rc5' of... · 094b58e1

Linus Torvalds authored Mar 05, 2018

Merge tag 'linux-kselftest-4.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull kselftest fixes from Shuah Khan:
 "A fix for regression in memory-hotplug install script that prevents
  the test from running on the target"

* tag 'linux-kselftest-4.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests: memory-hotplug: fix emit_tests regression

094b58e1

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 54704614

Linus Torvalds authored Mar 05, 2018

Pull networking fixes from David Miller:

 1) Use an appropriate TSQ pacing shift in mac80211, from Toke
    Høiland-Jørgensen.

 2) Just like ipv4's ip_route_me_harder(), we have to use skb_to_full_sk
    in ip6_route_me_harder, from Eric Dumazet.

 3) Fix several shutdown races and similar other problems in l2tp, from
    James Chapman.

 4) Handle missing XDP flush properly in tuntap, for real this time.
    From Jason Wang.

 5) Out-of-bounds access in powerpc ebpf tailcalls, from Daniel
    Borkmann.

 6) Fix phy_resume() locking, from Andrew Lunn.

 7) IFLA_MTU values are ignored on newlink for some tunnel types, fix
    from Xin Long.

 8) Revert F-RTO middle box workarounds, they only handle one dimension
    of the problem. From Yuchung Cheng.

 9) Fix socket refcounting in RDS, from Ka-Cheong Poon.

10) Don't allow ppp unit registration to an unregistered channel, from
    Guillaume Nault.

11) Various hv_netvsc fixes from Stephen Hemminger.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (98 commits)
  hv_netvsc: propagate rx filters to VF
  hv_netvsc: filter multicast/broadcast
  hv_netvsc: defer queue selection to VF
  hv_netvsc: use napi_schedule_irqoff
  hv_netvsc: fix race in napi poll when rescheduling
  hv_netvsc: cancel subchannel setup before halting device
  hv_netvsc: fix error unwind handling if vmbus_open fails
  hv_netvsc: only wake transmit queue if link is up
  hv_netvsc: avoid retry on send during shutdown
  virtio-net: re enable XDP_REDIRECT for mergeable buffer
  ppp: prevent unregistered channels from connecting to PPP units
  tc-testing: skbmod: fix match value of ethertype
  mlxsw: spectrum_switchdev: Check success of FDB add operation
  net: make skb_gso_*_seglen functions private
  net: xfrm: use skb_gso_validate_network_len() to check gso sizes
  net: sched: tbf: handle GSO_BY_FRAGS case in enqueue
  net: rename skb_gso_validate_mtu -> skb_gso_validate_network_len
  rds: Incorrect reference counting in TCP socket creation
  net: ethtool: don't ignore return from driver get_fecparam method
  vrf: check forwarding on the original netdevice when generating ICMP dest unreachable
  ...

54704614

Merge branch 'hv_netvsc-minor-fixes' · a7f0fb1b

David S. Miller authored Mar 04, 2018

Stephen Hemminger says:

====================
hv_netvsc: minor fixes

These are improvements to netvsc driver. They aren't functionality
changes so not targeting net-next; and they are not show stopper
bugs that need to go to stable either.

v2
   - drop the irq flags patch, defer it to net-next
   - split the multicast filter flag patch out
   - change propogate rx mode patch to handle startup of vf
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

a7f0fb1b

hv_netvsc: propagate rx filters to VF · bee9d41b

Stephen Hemminger authored Mar 02, 2018

The netvsc device should propagate filters to the SR-IOV VF
device (if present). The flags also need to be propagated to the
VF device as well. This only really matters on local Hyper-V
since Azure does not support multiple addresses.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bee9d41b

hv_netvsc: filter multicast/broadcast · 009f766c

Stephen Hemminger authored Mar 02, 2018

The netvsc driver was always enabling all multicast and broadcast
even if netdevice flag had not enabled it.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

009f766c

hv_netvsc: defer queue selection to VF · b3bf5666

Stephen Hemminger authored Mar 02, 2018

When VF is used for accelerated networking it will likely have
more queues (and different policy) than the synthetic NIC.
This patch defers the queue policy to the VF so that all the
queues can be used. This impacts workloads like local generate UDP.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b3bf5666

hv_netvsc: use napi_schedule_irqoff · 68633eda

Stephen Hemminger authored Mar 02, 2018

Since the netvsc_channel_cb is already called in interrupt
context from vmbus, there is no need to do irqsave/restore.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

68633eda

hv_netvsc: fix race in napi poll when rescheduling · d64e38ae

Stephen Hemminger authored Mar 02, 2018

There is a race between napi_reschedule and re-enabling interrupts
which could lead to missed host interrrupts.  This occurs when
interrupts are re-enabled (hv_end_read) and vmbus irq callback
(netvsc_channel_cb) has already scheduled NAPI.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d64e38ae

hv_netvsc: cancel subchannel setup before halting device · a7483ec0

Stephen Hemminger authored Mar 02, 2018

Block setup of multiple channels earlier in the teardown
process. This avoids possible races between halt and subchannel
initialization.
Suggested-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a7483ec0

hv_netvsc: fix error unwind handling if vmbus_open fails · fcfb4a00

Stephen Hemminger authored Mar 02, 2018

Need to delete NAPI association if vmbus_open fails.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fcfb4a00

hv_netvsc: only wake transmit queue if link is up · f4950e45

Stephen Hemminger authored Mar 02, 2018

Don't wake transmit queues if link is not up yet.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f4950e45

hv_netvsc: avoid retry on send during shutdown · 12f69661

Stephen Hemminger authored Mar 02, 2018

Change the initialization order so that the device is ready to transmit
(ie connect vsp is completed) before setting the internal reference
to the device with RCU.

This avoids any races on initialization and prevents retry issues
on shutdown.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

12f69661

virtio-net: re enable XDP_REDIRECT for mergeable buffer · 3cc81a9a

Jason Wang authored Mar 02, 2018

XDP_REDIRECT support for mergeable buffer was removed since commit
7324f539 ("virtio_net: disable XDP_REDIRECT in receive_mergeable()
case"). This is because we don't reserve enough tailroom for struct
skb_shared_info which breaks XDP assumption. So this patch fixes this
by reserving enough tailroom and using fixed size of rx buffer.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3cc81a9a

04 Mar, 2018 9 commits

ppp: prevent unregistered channels from connecting to PPP units · 77f840e3

Guillaume Nault authored Mar 02, 2018

PPP units don't hold any reference on the channels connected to it.
It is the channel's responsibility to ensure that it disconnects from
its unit before being destroyed.
In practice, this is ensured by ppp_unregister_channel() disconnecting
the channel from the unit before dropping a reference on the channel.

However, it is possible for an unregistered channel to connect to a PPP
unit: register a channel with ppp_register_net_channel(), attach a
/dev/ppp file to it with ioctl(PPPIOCATTCHAN), unregister the channel
with ppp_unregister_channel() and finally connect the /dev/ppp file to
a PPP unit with ioctl(PPPIOCCONNECT).

Once in this situation, the channel is only held by the /dev/ppp file,
which can be released at anytime and free the channel without letting
the parent PPP unit know. Then the ppp structure ends up with dangling
pointers in its ->channels list.

Prevent this scenario by forbidding unregistered channels from
connecting to PPP units. This maintains the code logic by keeping
ppp_unregister_channel() responsible from disconnecting the channel if
necessary and avoids modification on the reference counting mechanism.

This issue seems to predate git history (successfully reproduced on
Linux 2.6.26 and earlier PPP commits are unrelated).
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

77f840e3

tc-testing: skbmod: fix match value of ethertype · 79f3a8e6

Davide Caratti authored Mar 02, 2018

iproute2 print_skbmod() prints the configured ethertype using format 0x%X:
therefore, test 9aa8 systematically fails, because it configures action #4
using ethertype 0x0031, and expects 0x0031 when it reads it back. Changing
the expected value to 0x31 lets the test result 'not ok' become 'ok'.

tested with:
 # ./tdc.py -e 9aa8
 Test 9aa8: Get a single skbmod action from a list
 All test results:

 1..1
 ok 1 9aa8 Get a single skbmod action from a list

Fixes: cf797ac4 ("tc-testing: Add test cases for police and skbmod")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

79f3a8e6

mlxsw: spectrum_switchdev: Check success of FDB add operation · 0a8a1bf1

Shalom Toledo authored Mar 01, 2018

Until now, we assumed that in case of error when adding FDB entries, the
write operation will fail, but this is not the case. Instead, we need to
check that the number of entries reported in the response is equal to
the number of entries specified in the request.

Fixes: 56ade8fe ("mlxsw: spectrum: Add initial support for Spectrum ASIC")
Reported-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0a8a1bf1

Linux 4.16-rc4 · 661e50bc
Linus Torvalds authored Mar 04, 2018

661e50bc

Merge branch 'GSO_BY_FRAGS-correctness-improvements' · 19f6484f

David S. Miller authored Mar 04, 2018

Daniel Axtens says:

====================
GSO_BY_FRAGS correctness improvements

As requested [1], I went through and had a look at users of gso_size to
see if there were things that need to be fixed to consider
GSO_BY_FRAGS, and I have tried to improve our helper functions to deal
with this case.

I found a few. This fixes bugs relating to the use of
skb_gso_*_seglen() where GSO_BY_FRAGS is not considered.

Patch 1 renames skb_gso_validate_mtu to skb_gso_validate_network_len.
This is follow-up to my earlier patch 2b16f048 ("net: create
skb_gso_validate_mac_len()"), and just makes everything a bit clearer.

Patches 2 and 3 replace the final users of skb_gso_network_seglen() -
which doesn't consider GSO_BY_FRAGS - with
skb_gso_validate_network_len(), which does. This allows me to make the
skb_gso_*_seglen functions private in patch 4 - now future users won't
accidentally do the wrong comparison.

Two things remain. One is qdisc_pkt_len_init, which is discussed at
[2] - it's caught up in the GSO_DODGY mess. I don't have any expertise
in GSO_DODGY, and it looks like a good clean fix will involve
unpicking the whole validation mess, so I have left it for now.

Secondly, there are 3 eBPF opcodes that change the gso_size of an SKB
and don't consider GSO_BY_FRAGS. This is going through the bpf tree.

Regards,
Daniel

[1] https://patchwork.ozlabs.org/comment/1852414/
[2] https://www.spinics.net/lists/netdev/msg482397.html

PS: This is all in the core networking stack. For a driver to be
affected by this it would need to support NETIF_F_GSO_SCTP /
NETIF_F_GSO_SOFTWARE and then either use gso_size or not be a purely
virtual device. (Many drivers look at gso_size, but do not support
SCTP segmentation, so the core network will segment an SCTP gso before
it hits them.) Based on that, the only driver that may be affected is
sunvnet, but I have no way of testing it, so I haven't looked at it.

v2: split out bpf stuff
    fix review comments from Dave Miller
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

19f6484f

net: make skb_gso_*_seglen functions private · a4a77718

Daniel Axtens authored Mar 01, 2018

They're very hard to use properly as they do not consider the
GSO_BY_FRAGS case. Code should use skb_gso_validate_network_len
and skb_gso_validate_mac_len as they do consider this case.

Make the seglen functions static, which stops people using them
outside of skbuff.c
Signed-off-by: Daniel Axtens <dja@axtens.net>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a4a77718

net: xfrm: use skb_gso_validate_network_len() to check gso sizes · 80f5974d

Daniel Axtens authored Mar 01, 2018

Replace skb_gso_network_seglen() with
skb_gso_validate_network_len(), as it considers the GSO_BY_FRAGS
case.
Signed-off-by: Daniel Axtens <dja@axtens.net>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

80f5974d

net: sched: tbf: handle GSO_BY_FRAGS case in enqueue · ee78bbef

Daniel Axtens authored Mar 01, 2018

tbf_enqueue() checks the size of a packet before enqueuing it.
However, the GSO size check does not consider the GSO_BY_FRAGS
case, and so will drop GSO SCTP packets, causing a massive drop
in throughput.

Use skb_gso_validate_mac_len() instead, as it does consider that
case.
Signed-off-by: Daniel Axtens <dja@axtens.net>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ee78bbef

net: rename skb_gso_validate_mtu -> skb_gso_validate_network_len · 779b7931

Daniel Axtens authored Mar 01, 2018

If you take a GSO skb, and split it into packets, will the network
length (L3 headers + L4 headers + payload) of those packets be small
enough to fit within a given MTU?

skb_gso_validate_mtu gives you the answer to that question. However,
we recently added to add a way to validate the MAC length of a split GSO
skb (L2+L3+L4+payload), and the names get confusing, so rename
skb_gso_validate_mtu to skb_gso_validate_network_len
Signed-off-by: Daniel Axtens <dja@axtens.net>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

779b7931