Commits · 967c05aee439e6e5d7d805e195b3a20ef5c433d6 · nexedi / linux

16 Jun, 2019 6 commits

tcp: enforce tcp_min_snd_mss in tcp_mtu_probing() · 967c05ae

Eric Dumazet authored Jun 08, 2019

If mtu probing is enabled tcp_mtu_probing() could very well end up
with a too small MSS.

Use the new sysctl tcp_min_snd_mss to make sure MSS search
is performed in an acceptable range.

CVE-2019-11479 -- tcp mss hardcoded to 48
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Cc: Jonathan Looney <jtl@netflix.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Tyler Hicks <tyhicks@canonical.com>
Cc: Bruce Curtis <brucec@netflix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

967c05ae

tcp: add tcp_min_snd_mss sysctl · 5f3e2bf0

Eric Dumazet authored Jun 06, 2019

Some TCP peers announce a very small MSS option in their SYN and/or
SYN/ACK messages.

This forces the stack to send packets with a very high network/cpu
overhead.

Linux has enforced a minimal value of 48. Since this value includes
the size of TCP options, and that the options can consume up to 40
bytes, this means that each segment can include only 8 bytes of payload.

In some cases, it can be useful to increase the minimal value
to a saner value.

We still let the default to 48 (TCP_MIN_SND_MSS), for compatibility
reasons.

Note that TCP_MAXSEG socket option enforces a minimal value
of (TCP_MIN_MSS). David Miller increased this minimal value
in commit c39508d6 ("tcp: Make TCP_MAXSEG minimum more correct.")
from 64 to 88.

We might in the future merge TCP_MIN_SND_MSS and TCP_MIN_MSS.

CVE-2019-11479 -- tcp mss hardcoded to 48
Signed-off-by: Eric Dumazet <edumazet@google.com>
Suggested-by: Jonathan Looney <jtl@netflix.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Tyler Hicks <tyhicks@canonical.com>
Cc: Bruce Curtis <brucec@netflix.com>
Cc: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5f3e2bf0

tcp: tcp_fragment() should apply sane memory limits · f070ef2a

Eric Dumazet authored May 18, 2019

Jonathan Looney reported that a malicious peer can force a sender
to fragment its retransmit queue into tiny skbs, inflating memory
usage and/or overflow 32bit counters.

TCP allows an application to queue up to sk_sndbuf bytes,
so we need to give some allowance for non malicious splitting
of retransmit queue.

A new SNMP counter is added to monitor how many times TCP
did not allow to split an skb if the allowance was exceeded.

Note that this counter might increase in the case applications
use SO_SNDBUF socket option to lower sk_sndbuf.

CVE-2019-11478 : tcp_fragment, prevent fragmenting a packet when the
	socket is already using more than half the allowed space
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Jonathan Looney <jtl@netflix.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Tyler Hicks <tyhicks@canonical.com>
Cc: Bruce Curtis <brucec@netflix.com>
Cc: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f070ef2a

tcp: limit payload size of sacked skbs · 3b4929f6

Eric Dumazet authored May 17, 2019

Jonathan Looney reported that TCP can trigger the following crash
in tcp_shifted_skb() :

	BUG_ON(tcp_skb_pcount(skb) < pcount);

This can happen if the remote peer has advertized the smallest
MSS that linux TCP accepts : 48

An skb can hold 17 fragments, and each fragment can hold 32KB
on x86, or 64KB on PowerPC.

This means that the 16bit witdh of TCP_SKB_CB(skb)->tcp_gso_segs
can overflow.

Note that tcp_sendmsg() builds skbs with less than 64KB
of payload, so this problem needs SACK to be enabled.
SACK blocks allow TCP to coalesce multiple skbs in the retransmit
queue, thus filling the 17 fragments to maximal capacity.

CVE-2019-11477 -- u16 overflow of TCP_SKB_CB(skb)->tcp_gso_segs

Fixes: 832d11c5 ("tcp: Try to restore large SKBs while SACK processing")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Jonathan Looney <jtl@netflix.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Tyler Hicks <tyhicks@canonical.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Bruce Curtis <brucec@netflix.com>
Cc: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3b4929f6

Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 1eb4169c

David S. Miller authored Jun 15, 2019

Alexei Starovoitov says:

====================
pull-request: bpf 2019-06-15

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) fix stack layout of JITed x64 bpf code, from Alexei.

2) fix out of bounds memory access in bpf_sk_storage, from Arthur.

3) fix lpm trie walk, from Jonathan.

4) fix nested bpf_perf_event_output, from Matt.

5) and several other fixes.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

1eb4169c

Revert "net: phylink: set the autoneg state in phylink_phy_change" · 5db2e7c7

David S. Miller authored Jun 15, 2019

This reverts commit ef7bfa84.

Russell King espressed some strong opposition to this
change, explaining that this is trying to make phylink
behave outside of how it has been designed.
Signed-off-by: David S. Miller <davem@davemloft.net>

5db2e7c7

15 Jun, 2019 21 commits

bpf: fix nested bpf tracepoints with per-cpu data · 9594dc3c

Matt Mullins authored Jun 11, 2019

BPF_PROG_TYPE_RAW_TRACEPOINTs can be executed nested on the same CPU, as
they do not increment bpf_prog_active while executing.

This enables three levels of nesting, to support
  - a kprobe or raw tp or perf event,
  - another one of the above that irq context happens to call, and
  - another one in nmi context
(at most one of which may be a kprobe or perf event).

Fixes: 20b9d7ac ("bpf: avoid excessive stack usage for perf_sample_data")
Signed-off-by: Matt Mullins <mmullins@fb.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

9594dc3c

bpf: Fix out of bounds memory access in bpf_sk_storage · 85749218

Arthur Fabre authored Jun 15, 2019

bpf_sk_storage maps use multiple spin locks to reduce contention.
The number of locks to use is determined by the number of possible CPUs.
With only 1 possible CPU, bucket_log == 0, and 2^0 = 1 locks are used.

When updating elements, the correct lock is determined with hash_ptr().
Calling hash_ptr() with 0 bits is undefined behavior, as it does:

x >> (64 - bits)

Using the value results in an out of bounds memory access.
In my case, this manifested itself as a page fault when raw_spin_lock_bh()
is called later, when running the self tests:

./tools/testing/selftests/bpf/test_verifier 773 775
[   16.366342] BUG: unable to handle page fault for address: ffff8fe7a66f93f8

Force the minimum number of locks to two.
Signed-off-by: Arthur Fabre <afabre@cloudflare.com>
Fixes: 6ac99e8f ("bpf: Introduce bpf sk local storage")
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

85749218

vsock/virtio: set SOCK_DONE on peer shutdown · 42f5cda5

Stephen Barber authored Jun 14, 2019

Set the SOCK_DONE flag to match the TCP_CLOSING state when a peer has
shut down and there is nothing left to read.

This fixes the following bug:
1) Peer sends SHUTDOWN(RDWR).
2) Socket enters TCP_CLOSING but SOCK_DONE is not set.
3) read() returns -ENOTCONN until close() is called, then returns 0.
Signed-off-by: Stephen Barber <smbarber@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

42f5cda5

net: dsa: rtl8366: Fix up VLAN filtering · 760c80b7

Linus Walleij authored Jun 14, 2019

We get this regression when using RTL8366RB as part of a bridge
with OpenWrt:

WARNING: CPU: 0 PID: 1347 at net/switchdev/switchdev.c:291
	 switchdev_port_attr_set_now+0x80/0xa4
lan0: Commit of attribute (id=7) failed.
(...)
realtek-smi switch lan0: failed to initialize vlan filtering on this port

This is because it is trying to disable VLAN filtering
on VLAN0, as we have forgot to add 1 to the port number
to get the right VLAN in rtl8366_vlan_filtering(): when
we initialize the VLAN we associate VLAN1 with port 0,
VLAN2 with port 1 etc, so we need to add 1 to the port
offset.

Fixes: d8652956 ("net: dsa: realtek-smi: Add Realtek SMI driver")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

760c80b7

net: phylink: set the autoneg state in phylink_phy_change · ef7bfa84

Ioana Ciornei authored Jun 13, 2019

The phy_state field of phylink should carry only valid information
especially when this can be passed to the .mac_config callback.
Update the an_enabled field with the autoneg state in the
phylink_phy_change function.

Fixes: 9525ae83 ("phylink: add phylink infrastructure")
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ef7bfa84

Merge branch 'tcp-add-three-static-keys' · 35fc07ae

David S. Miller authored Jun 14, 2019

Eric Dumazet says:

====================
tcp: add three static keys

Recent addition of per TCP socket rx/tx cache brought
regressions for some workloads, as reported by Feng Tang.

It seems better to make them opt-in, before we adopt better
heuristics.

The last patch adds high_order_alloc_disable sysctl
to ask TCP sendmsg() to exclusively use order-0 allocations,
as mm layer has specific optimizations.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

35fc07ae

net: add high_order_alloc_disable sysctl/static key · ce27ec60

Eric Dumazet authored Jun 14, 2019

>From linux-3.7, (commit 5640f768 "net: use a per task frag
allocator") TCP sendmsg() has preferred using order-3 allocations.

While it gives good results for most cases, we had reports
that heavy uses of TCP over loopback were hitting a spinlock
contention in page allocations/freeing.

This commits adds a sysctl so that admins can opt-in
for order-0 allocations. Hopefully mm layer might optimize
order-3 allocations in the future since it could give us
a nice boost  (see 8 lines of following benchmark)

The following benchmark shows a win when more than 8 TCP_STREAM
threads are running (56 x86 cores server in my tests)

for thr in {1..30}
do
 sysctl -wq net.core.high_order_alloc_disable=0
 T0=`./super_netperf $thr -H 127.0.0.1 -l 15`
 sysctl -wq net.core.high_order_alloc_disable=1
 T1=`./super_netperf $thr -H 127.0.0.1 -l 15`
 echo $thr:$T0:$T1
done

1: 49979: 37267
2: 98745: 76286
3: 141088: 110051
4: 177414: 144772
5: 197587: 173563
6: 215377: 208448
7: 241061: 234087
8: 267155: 263373
9: 295069: 297402
10: 312393: 335213
11: 340462: 368778
12: 371366: 403954
13: 412344: 443713
14: 426617: 473580
15: 474418: 507861
16: 503261: 538539
17: 522331: 563096
18: 532409: 567084
19: 550824: 605240
20: 525493: 641988
21: 564574: 665843
22: 567349: 690868
23: 583846: 710917
24: 588715: 736306
25: 603212: 763494
26: 604083: 792654
27: 602241: 796450
28: 604291: 797993
29: 611610: 833249
30: 577356: 841062
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ce27ec60

tcp: add tcp_tx_skb_cache sysctl · 0b7d7f6b

Eric Dumazet authored Jun 14, 2019

Feng Tang reported a performance regression after introduction
of per TCP socket tx/rx caches, for TCP over loopback (netperf)

There is high chance the regression is caused by a change on
how well the 32 KB per-thread page (current->task_frag) can
be recycled, and lack of pcp caches for order-3 pages.

I could not reproduce the regression myself, cpus all being
spinning on the mm spinlocks for page allocs/freeing, regardless
of enabling or disabling the per tcp socket caches.

It seems best to disable the feature by default, and let
admins enabling it.

MM layer either needs to provide scalable order-3 pages
allocations, or could attempt a trylock on zone->lock if
the caller only attempts to get a high-order page and is
able to fallback to order-0 ones in case of pressure.

Tests run on a 56 cores host (112 hyper threads)

-	35.49%	netperf 		 [kernel.vmlinux]	  [k] queued_spin_lock_slowpath
   - 35.49% queued_spin_lock_slowpath
	  - 18.18% get_page_from_freelist
		 - __alloc_pages_nodemask
			- 18.18% alloc_pages_current
				 skb_page_frag_refill
				 sk_page_frag_refill
				 tcp_sendmsg_locked
				 tcp_sendmsg
				 inet_sendmsg
				 sock_sendmsg
				 __sys_sendto
				 __x64_sys_sendto
				 do_syscall_64
				 entry_SYSCALL_64_after_hwframe
				 __libc_send
	  + 17.31% __free_pages_ok
+	31.43%	swapper 		 [kernel.vmlinux]	  [k] intel_idle
+	 9.12%	netperf 		 [kernel.vmlinux]	  [k] copy_user_enhanced_fast_string
+	 6.53%	netserver		 [kernel.vmlinux]	  [k] copy_user_enhanced_fast_string
+	 0.69%	netserver		 [kernel.vmlinux]	  [k] queued_spin_lock_slowpath
+	 0.68%	netperf 		 [kernel.vmlinux]	  [k] skb_release_data
+	 0.52%	netperf 		 [kernel.vmlinux]	  [k] tcp_sendmsg_locked
	 0.46%	netperf 		 [kernel.vmlinux]	  [k] _raw_spin_lock_irqsave

Fixes: 472c2e07 ("tcp: add one skb cache for tx")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0b7d7f6b

tcp: add tcp_rx_skb_cache sysctl · ede61ca4

Eric Dumazet authored Jun 14, 2019

Instead of relying on rps_needed, it is safer to use a separate
static key, since we do not want to enable TCP rx_skb_cache
by default. This feature can cause huge increase of memory
usage on hosts with millions of sockets.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ede61ca4

sysctl: define proc_do_static_key() · a8e11e5c

Eric Dumazet authored Jun 14, 2019

Convert proc_dointvec_minmax_bpf_stats() into a more generic
helper, since we are going to use jump labels more often.

Note that sysctl_bpf_stats_enabled is removed, since
it is no longer needed/used.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

a8e11e5c

hv_netvsc: Set probe mode to sync · 9a33629b

Haiyang Zhang authored Jun 13, 2019

For better consistency of synthetic NIC names, we set the probe mode to
PROBE_FORCE_SYNCHRONOUS. So the names can be aligned with the vmbus
channel offer sequence.

Fixes: af0a5646 ("use the new async probing feature for the hyperv drivers")
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9a33629b

net: sched: flower: don't call synchronize_rcu() on mask creation · 99815f50

Vlad Buslov authored Jun 13, 2019

Current flower mask creating code assumes that temporary mask that is used
when inserting new filter is stack allocated. To prevent race condition
with data patch synchronize_rcu() is called every time fl_create_new_mask()
replaces temporary stack allocated mask. As reported by Jiri, this
increases runtime of creating 20000 flower classifiers from 4 seconds to
163 seconds. However, this design is no longer necessary since temporary
mask was converted to be dynamically allocated by commit 2cddd201
("net/sched: cls_flower: allocate mask dynamically in fl_change()").

Remove synchronize_rcu() calls from mask creation code. Instead, refactor
fl_change() to always deallocate temporary mask with rcu grace period.

Fixes: 195c234d ("net: sched: flower: handle concurrent mask insertion")
Reported-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Tested-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

99815f50

net: dsa: fix warning same module names · f0c03ee0

Anders Roxell authored Jun 13, 2019

When building with CONFIG_NET_DSA_REALTEK_SMI and CONFIG_REALTEK_PHY
enabled as loadable modules, we see the following warning:

warning: same module names found:
  drivers/net/phy/realtek.ko
  drivers/net/dsa/realtek.ko

Rework so the driver name is realtek-smi instead of realtek.
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

f0c03ee0

sctp: Free cookie before we memdup a new one · ce950f10

Neil Horman authored Jun 13, 2019

Based on comments from Xin, even after fixes for our recent syzbot
report of cookie memory leaks, its possible to get a resend of an INIT
chunk which would lead to us leaking cookie memory.

To ensure that we don't leak cookie memory, free any previously
allocated cookie first.

Change notes
v1->v2
update subsystem tag in subject (davem)
repeat kfree check for peer_random and peer_hmacs (xin)

v2->v3
net->sctp
also free peer_chunks

v3->v4
fix subject tags

v4->v5
remove cut line
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: syzbot+f7e9153b037eac9b1df8@syzkaller.appspotmail.com
CC: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
CC: Xin Long <lucien.xin@gmail.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: netdev@vger.kernel.org
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ce950f10

net: dsa: microchip: Don't try to read stats for unused ports · 6bb9e376

Robert Hancock authored Jun 12, 2019

If some of the switch ports were not listed in the device tree, due to
being unused, the ksz_mib_read_work function ended up accessing a NULL
dp->slave pointer and causing an oops. Skip checking statistics for any
unused ports.

Fixes: 7c6ff470 ("net: dsa: microchip: add MIB counter reading support")
Signed-off-by: Robert Hancock <hancock@sedsystems.ca>
Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6bb9e376

Merge branch 'qmi_wwan-fix-QMAP-handling' · 2309f517

David S. Miller authored Jun 14, 2019

Reinhard Speyerer says:

====================
qmi_wwan: fix QMAP handling

This series addresses the following issues observed when using the
QMAP support of the qmi_wwan driver:

1. The QMAP code in the qmi_wwan driver is based on the CodeAurora
   GobiNet driver ([1], [2]) which does not process QMAP padding
   in the RX path correctly. This causes qmimux_rx_fixup() to pass
   incorrect data to the IP stack when padding is used.

2. qmimux devices currently lack proper network device usage statistics.

3. RCU stalls on device disconnect with QMAP activated like this

   # echo Y > /sys/class/net/wwan0/qmi/raw_ip
   # echo 1 > /sys/class/net/wwan0/qmi/add_mux
   # echo 2 > /sys/class/net/wwan0/qmi/add_mux
   # echo 3 > /sys/class/net/wwan0/qmi/add_mux

   have been observed in certain setups:

   [ 2273.676593] option1 ttyUSB16: GSM modem (1-port) converter now disconnected from ttyUSB16
   [ 2273.676617] option 6-1.2:1.0: device disconnected
   [ 2273.676774] WARNING: CPU: 1 PID: 141 at kernel/rcu/tree_plugin.h:342 rcu_note_context_switch+0x2a/0x3d0
   [ 2273.676776] Modules linked in: option qmi_wwan cdc_mbim cdc_ncm qcserial cdc_wdm usb_wwan sierra sierra_net usbnet mii edd coretemp iptable_mangle ip6_tables iptable_filter ip_tables cdc_acm dm_mod dax iTCO_wdt evdev iTCO_vendor_support sg ftdi_sio usbserial e1000e ptp pps_core i2c_i801 ehci_pci button lpc_ich i2c_core mfd_core uhci_hcd ehci_hcd rtc_cmos usbcore usb_common sd_mod fan ata_piix thermal
   [ 2273.676817] CPU: 1 PID: 141 Comm: kworker/1:1 Not tainted 4.19.38-rsp-1 #1
   [ 2273.676819] Hardware name: Not Applicable   Not Applicable  /CX-GS/GM45-GL40             , BIOS V1.11      03/23/2011
   [ 2273.676828] Workqueue: usb_hub_wq hub_event [usbcore]
   [ 2273.676832] EIP: rcu_note_context_switch+0x2a/0x3d0
   [ 2273.676834] Code: 55 89 e5 57 56 89 c6 53 83 ec 14 89 45 f0 e8 5d ff ff ff 89 f0 64 8b 3d 24 a6 86 c0 84 c0 8b 87 04 02 00 00 75 7a 85 c0 7e 7a <0f> 0b 80 bf 08 02 00 00 00 0f 84 87 00 00 00 e8 b2 e2 ff ff bb dc
   [ 2273.676836] EAX: 00000001 EBX: f614bc00 ECX: 00000001 EDX: c0715b81
   [ 2273.676838] ESI: 00000000 EDI: f18beb40 EBP: f1a3dc20 ESP: f1a3dc00
   [ 2273.676840] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010002
   [ 2273.676842] CR0: 80050033 CR2: b7e97230 CR3: 2f9c4000 CR4: 000406b0
   [ 2273.676843] Call Trace:
   [ 2273.676847]  ? preempt_count_add+0xa5/0xc0
   [ 2273.676852]  __schedule+0x4e/0x4f0
   [ 2273.676855]  ? __queue_work+0xf1/0x2a0
   [ 2273.676858]  ? _raw_spin_lock_irqsave+0x14/0x40
   [ 2273.676860]  ? preempt_count_add+0x52/0xc0
   [ 2273.676862]  schedule+0x33/0x80
   [ 2273.676865]  _synchronize_rcu_expedited+0x24e/0x280
   [ 2273.676867]  ? rcu_accelerate_cbs_unlocked+0x70/0x70
   [ 2273.676871]  ? wait_woken+0x70/0x70
   [ 2273.676873]  ? rcu_accelerate_cbs_unlocked+0x70/0x70
   [ 2273.676875]  ? _synchronize_rcu_expedited+0x280/0x280
   [ 2273.676877]  synchronize_rcu_expedited+0x22/0x30
   [ 2273.676881]  synchronize_net+0x25/0x30
   [ 2273.676885]  dev_deactivate_many+0x133/0x230
   [ 2273.676887]  ? preempt_count_add+0xa5/0xc0
   [ 2273.676890]  __dev_close_many+0x4d/0xc0
   [ 2273.676892]  ? skb_dequeue+0x40/0x50
   [ 2273.676895]  dev_close_many+0x5d/0xd0
   [ 2273.676898]  rollback_registered_many+0xbf/0x4c0
   [ 2273.676901]  ? raw_notifier_call_chain+0x1a/0x20
   [ 2273.676904]  ? call_netdevice_notifiers_info+0x23/0x60
   [ 2273.676906]  ? netdev_master_upper_dev_get+0xe/0x70
   [ 2273.676908]  rollback_registered+0x1f/0x30
   [ 2273.676911]  unregister_netdevice_queue+0x47/0xb0
   [ 2273.676915]  qmimux_unregister_device+0x1f/0x30 [qmi_wwan]
   [ 2273.676917]  qmi_wwan_disconnect+0x5d/0x90 [qmi_wwan]
   ...
   [ 2273.677001] ---[ end trace 0fcc5f88496b485a ]---
   [ 2294.679136] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
   [ 2294.679140] rcu: 	Tasks blocked on level-0 rcu_node (CPUs 0-1): P141
   [ 2294.679144] rcu: 	(detected by 0, t=21002 jiffies, g=265857, q=8446)
   [ 2294.679148] kworker/1:1     D    0   141      2 0x80000000

In addition the permitted QMAP mux_id value range is extended for
compatibility with ip(8) and the rmnet driver.

Reinhard

[1]: https://portland.source.codeaurora.org/patches/quic/gobi
[2]: https://portland.source.codeaurora.org/quic/qsdk/oss/lklm/gobinet/
====================
Tested-by: Daniele Palmas <dnlplm@gmail.com>
Acked-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>

2309f517

qmi_wwan: extend permitted QMAP mux_id value range · 36815b41

Reinhard Speyerer authored Jun 12, 2019

Permit mux_id values up to 254 to be used in qmimux_register_device()
for compatibility with ip(8) and the rmnet driver.

Fixes: c6adf779 ("net: usb: qmi_wwan: add qmap mux protocol support")
Cc: Daniele Palmas <dnlplm@gmail.com>
Signed-off-by: Reinhard Speyerer <rspmn@arcor.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

36815b41

qmi_wwan: avoid RCU stalls on device disconnect when in QMAP mode · a8fdde1c

Reinhard Speyerer authored Jun 12, 2019

Switch qmimux_unregister_device() and qmi_wwan_disconnect() to
use unregister_netdevice_queue() and unregister_netdevice_many()
instead of unregister_netdevice(). This avoids RCU stalls which
have been observed on device disconnect in certain setups otherwise.

Fixes: c6adf779 ("net: usb: qmi_wwan: add qmap mux protocol support")
Cc: Daniele Palmas <dnlplm@gmail.com>
Signed-off-by: Reinhard Speyerer <rspmn@arcor.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

a8fdde1c

qmi_wwan: add network device usage statistics for qmimux devices · 44f82312

Reinhard Speyerer authored Jun 12, 2019

Add proper network device usage statistics for qmimux devices
instead of reporting all-zero values for them.

Fixes: c6adf779 ("net: usb: qmi_wwan: add qmap mux protocol support")
Cc: Daniele Palmas <dnlplm@gmail.com>
Signed-off-by: Reinhard Speyerer <rspmn@arcor.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

44f82312

qmi_wwan: add support for QMAP padding in the RX path · 61356088

Reinhard Speyerer authored Jun 12, 2019

The QMAP code in the qmi_wwan driver is based on the CodeAurora GobiNet
driver which does not process QMAP padding in the RX path correctly.
Add support for QMAP padding to qmimux_rx_fixup() according to the
description of the rmnet driver.

Fixes: c6adf779 ("net: usb: qmi_wwan: add qmap mux protocol support")
Cc: Daniele Palmas <dnlplm@gmail.com>
Signed-off-by: Reinhard Speyerer <rspmn@arcor.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

61356088

bpf, x64: fix stack layout of JITed bpf code · fe8d9571

Alexei Starovoitov authored Jun 14, 2019

Since commit 177366bf the %rbp stopped pointing to %rbp of the
previous stack frame. That broke frame pointer based stack unwinding.
This commit is a partial revert of it.
Note that the location of tail_call_cnt is fixed, since the verifier
enforces MAX_BPF_STACK stack size for programs with tail calls.

Fixes: 177366bf ("bpf: change x86 JITed program stack layout")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

fe8d9571

14 Jun, 2019 13 commits

bpf, devmap: Add missing RCU read lock on flush · 86723c86

Toshiaki Makita authored Jun 14, 2019

.ndo_xdp_xmit() assumes it is called under RCU. For example virtio_net
uses RCU to detect it has setup the resources for tx. The assumption
accidentally broke when introducing bulk queue in devmap.

Fixes: 5d053f9d ("bpf: devmap prepare xdp frames for bulking")
Reported-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Toshiaki Makita <toshiaki.makita1@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

86723c86

bpf, devmap: Add missing bulk queue free · edabf4d9

Toshiaki Makita authored Jun 14, 2019

dev_map_free() forgot to free bulk queue when freeing its entries.

Fixes: 5d053f9d ("bpf: devmap prepare xdp frames for bulking")
Signed-off-by: Toshiaki Makita <toshiaki.makita1@gmail.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

edabf4d9

bpf, devmap: Fix premature entry free on destroying map · d4dd153d

Toshiaki Makita authored Jun 14, 2019

dev_map_free() waits for flush_needed bitmap to be empty in order to
ensure all flush operations have completed before freeing its entries.
However the corresponding clear_bit() was called before using the
entries, so the entries could be used after free.

All access to the entries needs to be done before clearing the bit.
It seems commit a5e2da6e ("bpf: netdev is never null in
__dev_map_flush") accidentally changed the clear_bit() and memory access
order.

Note that the problem happens only in __dev_map_flush(), not in
dev_map_flush_old(). dev_map_flush_old() is called only after nulling
out the corresponding netdev_map entry, so dev_map_free() never frees
the entry thus no such race happens there.

Fixes: a5e2da6e ("bpf: netdev is never null in __dev_map_flush")
Signed-off-by: Toshiaki Makita <toshiaki.makita1@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

d4dd153d

Merge tag 'mac80211-for-davem-2019-06-14' of... · 2a2af5e6

David S. Miller authored Jun 14, 2019

Merge tag 'mac80211-for-davem-2019-06-14' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
Various fixes, all over:
 * a few memory leaks
 * fixes for management frame protection security
   and A2/A3 confusion (affecting TDLS as well)
 * build fix for certificates
 * etc.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

2a2af5e6

net: phylink: further mac_config documentation improvements · 4add7009

Russell King - ARM Linux admin authored Jun 14, 2019

While reviewing the DPAA2 work, it has become apparent that we need
better documentation about which members of the phylink link state
structure are valid in the mac_config call.  Improve this
documentation.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

4add7009

nfc: Ensure presence of required attributes in the deactivate_target handler · 385097a3

Young Xiao authored Jun 14, 2019

Check that the NFC_ATTR_TARGET_INDEX attributes (in addition to
NFC_ATTR_DEVICE_INDEX) are provided by the netlink client prior to
accessing them. This prevents potential unhandled NULL pointer dereference
exceptions which can be triggered by malicious user-mode programs,
if they omit one or both of these attributes.
Signed-off-by: Young Xiao <92siuyang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

385097a3

cfg80211: report measurement start TSF correctly · b6584202

Avraham Stern authored May 29, 2019

Instead of reporting the AP's TSF, host time was reported. Fix it.
Signed-off-by: Avraham Stern <avraham.stern@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

b6584202

cfg80211: fix memory leak of wiphy device name · 4f488fbc

Eric Biggers authored Jun 10, 2019

In wiphy_new_nm(), if an error occurs after dev_set_name() and
device_initialize() have already been called, it's necessary to call
put_device() (via wiphy_free()) to avoid a memory leak.

Reported-by: syzbot+7fddca22578bc67c3fe4@syzkaller.appspotmail.com
Fixes: 1f87f7d3 ("cfg80211: add rfkill support")
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

4f488fbc

cfg80211: util: fix bit count off by one · 1a473d60

Mordechay Goodstein authored May 29, 2019

The bits of Rx MCS Map in VHT capability were enumerated
with index transform - index i -> (i + 1) bit => nss i. BUG!
while it should be -   index i -> (i + 1) bit => (i + 1) nss.

The bug was exposed in commit a53b2a0b ("iwlwifi: mvm: implement VHT
extended NSS support in rs.c"), where iwlwifi started using the
function.
Signed-off-by: Mordechay Goodstein <mordechay.goodstein@intel.com>
Fixes: b0aa75f0 ("ieee80211: add new VHT capability fields/parsing")
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

1a473d60

mac80211: do not start any work during reconfigure flow · f8891461

Naftali Goldstein authored May 29, 2019

It is not a good idea to try to perform any work (e.g. send an auth
frame) during reconfigure flow.

Prevent this from happening, and at the end of the reconfigure flow
requeue all the works.
Signed-off-by: Naftali Goldstein <naftali.goldstein@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

f8891461

cfg80211: use BIT_ULL in cfg80211_parse_mbssid_data() · ebb3ca3b

Luca Coelho authored May 29, 2019

The seen_indices variable is u64 and in other parts of the code we
assume mbssid_index_ie[2] can be up to 45, so we should use the 64-bit
versions of BIT, namely, BIT_ULL().
Reported-by: Dan Carpented <dan.carpenter@oracle.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

ebb3ca3b

mac80211: only warn once on chanctx_conf being NULL · 56357234

Yibo Zhao authored Jun 14, 2019

In multiple SSID cases, it takes time to prepare every AP interface
to be ready in initializing phase. If a sta already knows everything it
needs to join one of the APs and sends authentication to the AP which
is not fully prepared at this point of time, AP's channel context
could be NULL. As a result, warning message occurs.

Even worse, if the AP is under attack via tools such as MDK3 and massive
authentication requests are received in a very short time, console will
be hung due to kernel warning messages.

WARN_ON_ONCE() could be a better way for indicating warning messages
without duplicate messages to flood the console.

Johannes: We still need to address the underlying problem, but we
          don't really have a good handle on it yet. Suppress the
          worst side-effects for now.
Signed-off-by: Zhi Chen <zhichen@codeaurora.org>
Signed-off-by: Yibo Zhao <yiboz@codeaurora.org>
[johannes: add note, change subject]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

56357234

mac80211: drop robust management frames from unknown TA · 588f7d39

Johannes Berg authored Feb 13, 2019

When receiving a robust management frame, drop it if we don't have
rx->sta since then we don't have a security association and thus
couldn't possibly validate the frame.

Cc: stable@vger.kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

588f7d39