Commits · c77804be53369dd4c15bfc376cf9b45948194cab · Kirill Smelkov / linux

04 Jan, 2019 17 commits

net: hns: Fix WARNING when hns modules installed · c77804be

Yonglong Liu authored Jan 04, 2019

Commit 308c6caf ("net: hns: All ports can not work when insmod hns ko
after rmmod.") add phy_stop in hns_nic_init_phy(), In the branch of "net",
this method is effective, but in the branch of "net-next", it will cause
a WARNING when hns modules loaded, reference to commit 2b3e88ea ("net:
phy: improve phy state checking"):

[10.092168] ------------[ cut here ]------------
[10.092171] called from state READY
[10.092189] WARNING: CPU: 4 PID: 1 at ../drivers/net/phy/phy.c:854
                phy_stop+0x90/0xb0
[10.092192] Modules linked in:
[10.092197] CPU: 4 PID:1 Comm:swapper/0 Not tainted 4.20.0-rc7-next-20181220 #1
[10.092200] Hardware name: Huawei TaiShan 2280 /D05, BIOS Hisilicon D05 UEFI
                16.12 Release 05/15/2017
[10.092202] pstate: 60000005 (nZCv daif -PAN -UAO)
[10.092205] pc : phy_stop+0x90/0xb0
[10.092208] lr : phy_stop+0x90/0xb0
[10.092209] sp : ffff00001159ba90
[10.092212] x29: ffff00001159ba90 x28: 0000000000000007
[10.092215] x27: ffff000011180068 x26: ffff0000110a5620
[10.092218] x25: ffff0000113b6000 x24: ffff842f96dac000
[10.092221] x23: 0000000000000000 x22: 0000000000000000
[10.092223] x21: ffff841fb8425e18 x20: ffff801fb3a56438
[10.092226] x19: ffff801fb3a56000 x18: ffffffffffffffff
[10.092228] x17: 0000000000000000 x16: 0000000000000000
[10.092231] x15: ffff00001122d6c8 x14: ffff00009159b7b7
[10.092234] x13: ffff00001159b7c5 x12: ffff000011245000
[10.092236] x11: 0000000005f5e0ff x10: ffff00001159b750
[10.092239] x9 : 00000000ffffffd0 x8 : 0000000000000465
[10.092242] x7 : ffff0000112457f8 x6 : ffff0000113bd7ce
[10.092245] x5 : 0000000000000000 x4 : 0000000000000000
[10.092247] x3 : 00000000ffffffff x2 : ffff000011245828
[10.092250] x1 : 4b5860bd05871300 x0 : 0000000000000000
[10.092253] Call trace:
[10.092255]  phy_stop+0x90/0xb0
[10.092260]  hns_nic_init_phy+0xf8/0x110
[10.092262]  hns_nic_try_get_ae+0x4c/0x3b0
[10.092264]  hns_nic_dev_probe+0x1fc/0x480
[10.092268]  platform_drv_probe+0x50/0xa0
[10.092271]  really_probe+0x1f4/0x298
[10.092273]  driver_probe_device+0x58/0x108
[10.092275]  __driver_attach+0xdc/0xe0
[10.092278]  bus_for_each_dev+0x74/0xc8
[10.092280]  driver_attach+0x20/0x28
[10.092283]  bus_add_driver+0x1b8/0x228
[10.092285]  driver_register+0x60/0x110
[10.092288]  __platform_driver_register+0x40/0x48
[10.092292]  hns_nic_dev_driver_init+0x18/0x20
[10.092296]  do_one_initcall+0x5c/0x180
[10.092299]  kernel_init_freeable+0x198/0x240
[10.092303]  kernel_init+0x10/0x108
[10.092306]  ret_from_fork+0x10/0x18
[10.092308] ---[ end trace 1396dd0278e397eb ]---

This WARNING occurred because of calling phy_stop before phy_start.

The root cause of the problem in commit '308c6caf' is:

Reference to hns_nic_init_phy, the flag phydev->supported is changed after
phy_connect_direct. The flag phydev->supported is 0x6ff when hns modules is
loaded, so will not change Fiber Port power(Reference to marvell.c), which
is power on at default.
Then the flag phydev->supported is changed to 0x6f, so Fiber Port power is
off when removing hns modules.
When hns modules installed again, the flag phydev->supported is default
value 0x6ff, so will not change Fiber Port power(now is off), causing mac
link not up problem.

So the solution is change phy flags before phy_connect_direct.

Fixes: 308c6caf ("net: hns: All ports can not work when insmod hns ko after rmmod.")
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c77804be

net: dsa: mt7530: Drop unused GPIO include · cff1e01f

Linus Walleij authored Jan 03, 2019

This driver uses GPIO descriptors only, <linux/of_gpio.h>
is not used so drop the include.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cff1e01f

Merge branch 'GUE-error-recursion' · 0c06a091

David S. Miller authored Jan 04, 2019

Stefano Brivio says:

====================
Fix two further potential unbounded recursions in GUE error handlers

Patch 1/2 takes care of preventing the issue fixed by commit 11789039
("fou: Prevent unbounded recursion in GUE error handler") also with
UDP-Lite payloads -- I just realised this might happen from a syzbot
report.

Patch 2/2 fixes the issue for both UDP and UDP-Lite on IPv6, which I also
forgot to deal with in that same commit.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

0c06a091

fou6: Prevent unbounded recursion in GUE error handler · 44039e00

Stefano Brivio authored Jan 03, 2019

I forgot to deal with IPv6 in commit 11789039 ("fou: Prevent unbounded
recursion in GUE error handler").

Now syzbot reported what might be the same type of issue, caused by
gue6_err(), that is, handling exceptions for direct UDP encapsulation in
GUE (UDP-in-UDP) leads to unbounded recursion in the GUE exception
handler.

As it probably doesn't make sense to set up GUE this way, and it's
currently not even possible to configure this, skip exception handling for
UDP (or UDP-Lite) packets encapsulated in UDP (or UDP-Lite) packets with
GUE on IPv6.

Reported-by: syzbot+4ad25edc7a33e4ab91e0@syzkaller.appspotmail.com
Reported-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Fixes: b8a51b38 ("fou, fou6: ICMP error handlers for FoU and GUE")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

44039e00

fou: Prevent unbounded recursion in GUE error handler also with UDP-Lite · bc6e019b

Stefano Brivio authored Jan 03, 2019

In commit 11789039 ("fou: Prevent unbounded recursion in GUE error
handler"), I didn't take care of the case where UDP-Lite is encapsulated
into UDP or UDP-Lite with GUE. From a syzbot report about a possibly
similar issue with GUE on IPv6, I just realised the same thing might
happen with a UDP-Lite inner payload.

Also skip exception handling for inner UDP-Lite protocol.

Fixes: 11789039 ("fou: Prevent unbounded recursion in GUE error handler")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bc6e019b

openvswitch: Fix IPv6 later frags parsing · 41e4e2cd

Yi-Hung Wei authored Jan 03, 2019

The previous commit fa642f08
("openvswitch: Derive IP protocol number for IPv6 later frags")
introduces IP protocol number parsing for IPv6 later frags that can mess
up the network header length calculation logic, i.e. nh_len < 0.
However, the network header length calculation is mainly for deriving
the transport layer header in the key extraction process which the later
fragment does not apply.

Therefore, this commit skips the network header length calculation to
fix the issue.
Reported-by: Chris Mi <chrism@mellanox.com>
Reported-by: Greg Rose <gvrose8192@gmail.com>
Fixes: fa642f08 ("openvswitch: Derive IP protocol number for IPv6 later frags")
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

41e4e2cd

net: macb: remove unnecessary code · ba3e1847

Claudiu Beznea authored Jan 03, 2019

Commit 653e92a9 ("net: macb: add support for padding and fcs
computation") introduced a bug fixed by commit 899ecaed ("net:
ethernet: cadence: fix socket buffer corruption problem"). Code removed
in this patch is not reachable at all so remove it.

Fixes: 653e92a9 ("net: macb: add support for padding and fcs computation")
Cc: Tristram Ha <Tristram.Ha@microchip.com>
Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ba3e1847

net: dsa: microchip: Drop unused GPIO includes · a09b42ba

Linus Walleij authored Jan 03, 2019

This driver does not use the old GPIO includes so drop
them.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a09b42ba

Merge branch 'qed-fixes' · ebdefe46

David S. Miller authored Jan 04, 2019

Denis Bolotin says:

====================
qed: Misc fixes in qed

This patch series fixes 2 potential bugs in qed.
Please consider applying to net.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

ebdefe46

qed: Fix qed_ll2_post_rx_buffer_notify_fw() by adding a write memory barrier · 46721c3d

Denis Bolotin authored Jan 03, 2019

Make sure chain element is updated before ringing the doorbell.
Signed-off-by: Denis Bolotin <dbolotin@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

46721c3d

qed: Fix qed_chain_set_prod() for PBL chains with non power of 2 page count · 2d533a92

Denis Bolotin authored Jan 03, 2019

In PBL chains with non power of 2 page count, the producer is not at the
beginning of the chain when index is 0 after a wrap. Therefore, after the
producer index wrap around, page index should be calculated more carefully.
Signed-off-by: Denis Bolotin <dbolotin@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2d533a92

net, skbuff: do not prefer skb allocation fails early · f8c468e8

David Rientjes authored Jan 02, 2019

Commit dcda9b04 ("mm, tree wide: replace __GFP_REPEAT by
__GFP_RETRY_MAYFAIL with more useful semantic") replaced __GFP_REPEAT in
alloc_skb_with_frags() with __GFP_RETRY_MAYFAIL when the allocation may
directly reclaim.

The previous behavior would require reclaim up to 1 << order pages for
skb aligned header_len of order > PAGE_ALLOC_COSTLY_ORDER before failing,
otherwise the allocations in alloc_skb() would loop in the page allocator
looking for memory. __GFP_RETRY_MAYFAIL makes both allocations failable
under memory pressure, including for the HEAD allocation.

This can cause, among many other things, write() to fail with ENOTCONN
during RPC when under memory pressure.

These allocations should succeed as they did previous to dcda9b04
even if it requires calling the oom killer and additional looping in the
page allocator to find memory. There is no way to specify the previous
behavior of __GFP_REPEAT, but it's unlikely to be necessary since the
previous behavior only guaranteed that 1 << order pages would be reclaimed
before failing for order > PAGE_ALLOC_COSTLY_ORDER. That reclaim is not
guaranteed to be contiguous memory, so repeating for such large orders is
usually not beneficial.

Removing the setting of __GFP_RETRY_MAYFAIL to restore the previous
behavior, specifically not allowing alloc_skb() to fail for small orders
and oom kill if necessary rather than allowing RPCs to fail.

Fixes: dcda9b04 ("mm, tree wide: replace __GFP_REPEAT by __GFP_RETRY_MAYFAIL with more useful semantic")
Signed-off-by: David Rientjes <rientjes@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f8c468e8

soc/fsl/qe: fix err handling of ucc_of_parse_tdm · 8d68100a

Wen Yang authored Jan 03, 2019

Currently there are some issues with the ucc_of_parse_tdm function:
1, a possible null pointer dereference in ucc_of_parse_tdm,
detected by the semantic patch deref_null.cocci,
with the following warning:
drivers/soc/fsl/qe/qe_tdm.c:177:21-24: ERROR: pdev is NULL but dereferenced.
2, dev gets modified, so in any case that devm_iounmap() will fail
even when the new pdev is valid, because the iomap was done with a
 different pdev.
3, there is no driver bind with the "fsl,t1040-qe-si" or
"fsl,t1040-qe-siram" device. So allocating resources using devm_*()
with these devices won't provide a cleanup path for these resources
when the caller fails.

This patch fixes them.
Suggested-by: Li Yang <leoyang.li@nxp.com>
Suggested-by: Christophe LEROY <christophe.leroy@c-s.fr>
Signed-off-by: Wen Yang <wen.yang99@zte.com.cn>
Reviewed-by: Peng Hao <peng.hao2@zte.com.cn>
CC: Julia Lawall <julia.lawall@lip6.fr>
CC: Zhao Qiang <qiang.zhao@nxp.com>
CC: David S. Miller <davem@davemloft.net>
CC: netdev@vger.kernel.org
CC: linuxppc-dev@lists.ozlabs.org
CC: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>

8d68100a

r8169: Add support for new Realtek Ethernet · 36352991

Kai-Heng Feng authored Jan 02, 2019

There are two new Realtek Ethernet devices which are re-branded r8168h.
Add the IDs to to support them.
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

36352991

netlink: fixup regression in RTM_GETADDR · 7c1e8a38

Arthur Gautier authored Dec 31, 2018

This commit fixes a regression in AF_INET/RTM_GETADDR and
AF_INET6/RTM_GETADDR.

Before this commit, the kernel would stop dumping addresses once the first
skb was full and end the stream with NLMSG_DONE(-EMSGSIZE). The error
shouldn't be sent back to netlink_dump so the callback is kept alive. The
userspace is expected to call back with a new empty skb.

Changes from V1:
 - The error is not handled in netlink_dump anymore but rather in
   inet_dump_ifaddr and inet6_dump_addr directly as suggested by
   David Ahern.

Fixes: d7e38611 ("net/ipv4: Put target net when address dump fails due to bad attributes")
Fixes: 242afaa6 ("net/ipv6: Put target net when address dump fails due to bad attributes")

Cc: David Ahern <dsahern@gmail.com>
Cc: "David S . Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Arthur Gautier <baloo@gandi.net>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7c1e8a38

octeontx2-af: Fix a resource leak in an error handling path in 'cgx_probe()' · 1492623e

Christophe JAILLET authored Dec 29, 2018

If an error occurs after the call to 'pci_alloc_irq_vectors()', we must
call 'pci_free_irq_vectors()' in order to avoid a	resource leak.

The same sequence is already in place in the corresponding 'cgx_remove()'
function.

Fixes: 1463f382 ("octeontx2-af: Add support for CGX link management")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

1492623e

Remove 'type' argument from access_ok() function · 96d4f267

Linus Torvalds authored Jan 03, 2019

Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument
of the user address range verification function since we got rid of the
old racy i386-only code to walk page tables by hand.

It existed because the original 80386 would not honor the write protect
bit when in kernel mode, so you had to do COW by hand before doing any
user access.  But we haven't supported that in a long time, and these
days the 'type' argument is a purely historical artifact.

A discussion about extending 'user_access_begin()' to do the range
checking resulted this patch, because there is no way we're going to
move the old VERIFY_xyz interface to that model.  And it's best done at
the end of the merge window when I've done most of my merges, so let's
just get this done once and for all.

This patch was mostly done with a sed-script, with manual fix-ups for
the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form.

There were a couple of notable cases:

 - csky still had the old "verify_area()" name as an alias.

 - the iter_iov code had magical hardcoded knowledge of the actual
   values of VERIFY_{READ,WRITE} (not that they mattered, since nothing
   really used it)

 - microblaze used the type argument for a debug printout

but other than those oddities this should be a total no-op patch.

I tried to fix up all architectures, did fairly extensive grepping for
access_ok() uses, and the changes are trivial, but I may have missed
something.  Any missed conversion should be trivially fixable, though.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

96d4f267

03 Jan, 2019 23 commits

Merge tag 'locks-v4.21-2' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux · 135143b2

Linus Torvalds authored Jan 03, 2019

Pull file locking bugfix from Jeff Layton:
 "This is a one-line fix for a bug that syzbot turned up in the new
  patches to mitigate the thundering herd when a lock is released"

* tag 'locks-v4.21-2' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
  locks: fix error in locks_move_blocks()

135143b2

Merge tag 'sound-fix-4.21-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · 810574ca

Linus Torvalds authored Jan 03, 2019

Pull sound fixes from Takashi Iwai:
 "Among a few HD-audio fixes, the only significant one is the regression
  fix on some machines like Dell XPS due to the default binding changes.
  We ended up reverting the whole since the fix for ASoC HD-audio driver
  won't be available immediately"

* tag 'sound-fix-4.21-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda - Revert DSP detection on legacy HD-audio driver
  ALSA: hda/tegra: clear pending irq handlers
  ALSA: hda/realtek: Enable the headset mic auto detection for ASUS laptops

810574ca

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 43d86ee8

Linus Torvalds authored Jan 03, 2019

Pull networking fixes from David Miller:
 "Several fixes here. Basically split down the line between newly
  introduced regressions and long existing problems:

   1) Double free in tipc_enable_bearer(), from Cong Wang.

   2) Many fixes to nf_conncount, from Florian Westphal.

   3) op->get_regs_len() can throw an error, check it, from Yunsheng
      Lin.

   4) Need to use GFP_ATOMIC in *_add_hash_mac_address() of fsl/fman
      driver, from Scott Wood.

   5) Inifnite loop in fib_empty_table(), from Yue Haibing.

   6) Use after free in ax25_fillin_cb(), from Cong Wang.

   7) Fix socket locking in nr_find_socket(), also from Cong Wang.

   8) Fix WoL wakeup enable in r8169, from Heiner Kallweit.

   9) On 32-bit sock->sk_stamp is not thread-safe, from Deepa Dinamani.

  10) Fix ptr_ring wrap during queue swap, from Cong Wang.

  11) Missing shutdown callback in hinic driver, from Xue Chaojing.

  12) Need to return NULL on error from ip6_neigh_lookup(), from Stefano
      Brivio.

  13) BPF out of bounds speculation fixes from Daniel Borkmann"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (57 commits)
  ipv6: Consider sk_bound_dev_if when binding a socket to an address
  ipv6: Fix dump of specific table with strict checking
  bpf: add various test cases to selftests
  bpf: prevent out of bounds speculation on pointer arithmetic
  bpf: fix check_map_access smin_value test when pointer contains offset
  bpf: restrict unknown scalars of mixed signed bounds for unprivileged
  bpf: restrict stack pointer arithmetic for unprivileged
  bpf: restrict map value pointer arithmetic for unprivileged
  bpf: enable access to ax register also from verifier rewrite
  bpf: move tmp variable into ax register in interpreter
  bpf: move {prev_,}insn_idx into verifier env
  isdn: fix kernel-infoleak in capi_unlocked_ioctl
  ipv6: route: Fix return value of ip6_neigh_lookup() on neigh_create() error
  net/hamradio/6pack: use mod_timer() to rearm timers
  net-next/hinic:add shutdown callback
  net: hns3: call hns3_nic_net_open() while doing HNAE3_UP_CLIENT
  ip: validate header length on virtual device xmit
  tap: call skb_probe_transport_header after setting skb->dev
  ptr_ring: wrap back ->producer in __ptr_ring_swap_queue()
  net: rds: remove unnecessary NULL check
  ...

43d86ee8

ipv6: Consider sk_bound_dev_if when binding a socket to an address · c5ee0663

David Ahern authored Jan 02, 2019

IPv6 does not consider if the socket is bound to a device when binding
to an address. The result is that a socket can be bound to eth0 and then
bound to the address of eth1. If the device is a VRF, the result is that
a socket can only be bound to an address in the default VRF.

Resolve by considering the device if sk_bound_dev_if is set.

This problem exists from the beginning of git history.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c5ee0663

ipv6: Fix dump of specific table with strict checking · 73155879

David Ahern authored Jan 02, 2019

Dump of a specific table with strict checking enabled is looping. The
problem is that the end of the table dump is not marked in the cb. When
dumping a specific table, cb args 0 and 1 are not used (they are the hash
index and entry with an hash table index when dumping all tables). Re-use
args[0] to hold a 'done' flag for the specific table dump.

Fixes: 13e38901 ("net/ipv6: Plumb support for filtering route dumps")
Reported-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

73155879

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · 645ff1e8

Linus Torvalds authored Jan 02, 2019

Pull input updates from Dmitry Torokhov:
 "A tiny pull request this merge window unfortunately, should get more
  material in for the next release:

   - new driver for Raspberry Pi's touchscreen (firmware interface)

   - miscellaneous input driver fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: elan_i2c - add ACPI ID for touchpad in ASUS Aspire F5-573G
  Input: atmel_mxt_ts - don't try to free unallocated kernel memory
  Input: drv2667 - fix indentation issues
  Input: touchscreen - fix coding style issue
  Input: add official Raspberry Pi's touchscreen driver
  Input: nomadik-ske-keypad - fix a loop timeout test
  Input: rotary-encoder - don't log EPROBE_DEFER to kernel log
  Input: olpc_apsp - remove set but not used variable 'np'
  Input: olpc_apsp - enable the SP clock
  Input: olpc_apsp - check FIFO status on open(), not probe()
  Input: olpc_apsp - drop CONFIG_OLPC dependency
  clk: mmp2: add SP clock
  dt-bindings: marvell,mmp2: Add clock id for the SP clock
  Input: ad7879 - drop platform data support

645ff1e8

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost · d548e659

Linus Torvalds authored Jan 02, 2019

Pull virtio/vhost updates from Michael Tsirkin:
"Features, fixes, cleanups:

   - discard in virtio blk

   - misc fixes and cleanups"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  vhost: correct the related warning message
  vhost: split structs into a separate header file
  virtio: remove deprecated VIRTIO_PCI_CONFIG()
  vhost/vsock: switch to a mutex for vhost_vsock_hash
  virtio_blk: add discard and write zeroes support

d548e659

Merge tag 'for-4.21/block-20190102' of git://git.kernel.dk/linux-block · 77d0b194

Linus Torvalds authored Jan 02, 2019

Pull more block updates from Jens Axboe:

 - Dead code removal for loop/sunvdc (Chengguang)

 - Mark BIDI support for bsg as deprecated, logging a single dmesg
   warning if anyone is actually using it (Christoph)

 - blkcg cleanup, killing a dead function and making the tryget_closest
   variant easier to read (Dennis)

 - Floppy fixes, one fixing a regression in swim3 (Finn)

 - lightnvm use-after-free fix (Gustavo)

 - gdrom leak fix (Wenwen)

 - a set of drbd updates (Lars, Luc, Nathan, Roland)

* tag 'for-4.21/block-20190102' of git://git.kernel.dk/linux-block: (28 commits)
  block/swim3: Fix regression on PowerBook G3
  block/swim3: Fix -EBUSY error when re-opening device after unmount
  block/swim3: Remove dead return statement
  block/amiflop: Don't log error message on invalid ioctl
  gdrom: fix a memory leak bug
  lightnvm: pblk: fix use-after-free bug
  block: sunvdc: remove redundant code
  block: loop: remove redundant code
  bsg: deprecate BIDI support in bsg
  blkcg: remove unused __blkg_release_rcu()
  blkcg: clean up blkg_tryget_closest()
  drbd: Change drbd_request_detach_interruptible's return type to int
  drbd: Avoid Clang warning about pointless switch statment
  drbd: introduce P_ZEROES (REQ_OP_WRITE_ZEROES on the "wire")
  drbd: skip spurious timeout (ping-timeo) when failing promote
  drbd: don't retry connection if peers do not agree on "authentication" settings
  drbd: fix print_st_err()'s prototype to match the definition
  drbd: avoid spurious self-outdating with concurrent disconnect / down
  drbd: do not block when adjusting "disk-options" while IO is frozen
  drbd: fix comment typos
  ...

77d0b194

Merge tag 'for-4.21/libata-20190102' of git://git.kernel.dk/linux-block · b79f9f93

Linus Torvalds authored Jan 02, 2019

Pull libata fix from Jens Axboe:
 "This libata change missed the original libata pull request.

  Just a single fix in here, fixing a missed reference drop"

* tag 'for-4.21/libata-20190102' of git://git.kernel.dk/linux-block:
  ata: pata_macio: add of_node_put()

b79f9f93

Merge tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux · 0f2107da

Linus Torvalds authored Jan 02, 2019

Pull more clk updates from Stephen Boyd:
 "One more patch to generalize a set of DT binding defines now before
  -rc1 comes out.

  This way the SoC DTS files can use the proper defines from a stable
  tag"

* tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  clk: imx8qxp: make the name of clock ID generic

0f2107da

Merge tag 'devprop-4.21-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 01766d27

Linus Torvalds authored Jan 02, 2019

Pull device properties framework fixes from Rafael Wysocki:
 "Fix two potential NULL pointer dereferences found by Coverity in the
  software nodes code introduced recently (Colin Ian King)"

* tag 'devprop-4.21-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  drivers: base: swnode: check if swnode is NULL before dereferencing it
  drivers: base: swnode: check if pointer p is NULL before dereferencing it

01766d27

Merge tag 'mailbox-v4.21' of git://git.linaro.org/landing-teams/working/fujitsu/integration · 35ddb06a

Linus Torvalds authored Jan 02, 2019

Pull mailbox updates from Jassi Brar:

 - Introduce device-managed registration
   devm_mbox_controller_un/register and convert drivers to use it

 - Introduce flush api to support clients that must busy-wait in atomic
   context

 - Support multiple controllers per device

 - Hi3660: a bugfix and constify ops structure

 - TI-MsgMgr: off by one bugfix.

 - BCM: switch to spdx license

 - Tegra-HSP: support for shared mailboxes and suspend/resume.

* tag 'mailbox-v4.21' of git://git.linaro.org/landing-teams/working/fujitsu/integration: (30 commits)
  mailbox: tegra-hsp: Use device-managed registration API
  mailbox: tegra-hsp: use devm_kstrdup_const()
  mailbox: tegra-hsp: Add suspend/resume support
  mailbox: tegra-hsp: Add support for shared mailboxes
  dt-bindings: tegra186-hsp: Add shared mailboxes
  mailbox: Allow multiple controllers per device
  mailbox: Support blocking transfers in atomic context
  mailbox: ti-msgmgr: Use device-managed registration API
  mailbox: stm32-ipcc: Use device-managed registration API
  mailbox: rockchip: Use device-managed registration API
  mailbox: qcom-apcs: Use device-managed registration API
  mailbox: platform-mhu: Use device-managed registration API
  mailbox: omap: Use device-managed registration API
  mailbox: mtk-cmdq: Remove needless devm_kfree() calls
  mailbox: mtk-cmdq: Use device-managed registration API
  mailbox: xgene-slimpro: Use device-managed registration API
  mailbox: sti: Use device-managed registration API
  mailbox: altera: Use device-managed registration API
  mailbox: imx: Use device-managed registration API
  mailbox: hi6220: Use device-managed registration API
  ...

35ddb06a

Merge branch 'for-linus-4.21-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml · 6aa293d8

Linus Torvalds authored Jan 02, 2019

Pull UML updates from Richard Weinberger:

 - DISCARD support for our block device driver

 - Many TLB flush optimizations

 - Various smaller fixes

 - And most important, Anton agreed to help me maintaining UML

* 'for-linus-4.21-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
  um: Remove obsolete reenable_XX calls
  um: writev needs <sys/uio.h>
  Add Anton Ivanov to UML maintainers
  um: remove redundant generic-y
  um: Optimize Flush TLB for force/fork case
  um: Avoid marking pages with "changed protection"
  um: Skip TLB flushing where not needed
  um: Optimize TLB operations v2
  um: Remove unnecessary faulted check in uaccess.c
  um: Add support for DISCARD in the UBD Driver
  um: Remove unsafe printks from the io thread
  um: Clean-up command processing in UML UBD driver
  um: Switch to block-mq constants in the UML UBD driver
  um: Make GCOV depend on !KCOV
  um: Include sys/uio.h to have writev()
  um: Add HAVE_DEBUG_BUGVERBOSE
  um: Update maintainers file entry

6aa293d8

Merge tag 's390-4.21-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · 04a17ede

Linus Torvalds authored Jan 02, 2019

Pull s390 updates from Martin Schwidefsky:

 - A larger update for the zcrypt / AP bus code:
    + Update two inline assemblies in the zcrypt driver to make gcc happy
    + Add a missing reply code for invalid special commands for zcrypt
    + Allow AP device reset to be triggered from user space
    + Split the AP scan function into smaller, more readable functions

 - Updates for vfio-ccw and vfio-ap
    + Add maintainers and reviewer for vfio-ccw
    + Include facility.h in vfio_ap_drv.c to avoid fragile include chain
    + Simplicy vfio-ccw state machine

 - Use the common code version of bust_spinlocks

 - Make use of the DEFINE_SHOW_ATTRIBUTE

 - Fix three incorrect file permissions in the DASD driver

 - Remove bit spin-lock from the PCI interrupt handler

 - Fix GFP_ATOMIC vs GFP_KERNEL in the PCI code

* tag 's390-4.21-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/zcrypt: rework ap scan bus code
  s390/zcrypt: make sysfs reset attribute trigger queue reset
  s390/pci: fix sleeping in atomic during hotplug
  s390/pci: remove bit_lock usage in interrupt handler
  s390/drivers: fix proc/debugfs file permissions
  s390: convert to DEFINE_SHOW_ATTRIBUTE
  MAINTAINERS/vfio-ccw: add Farhan and Eric, make Halil Reviewer
  vfio: ccw: Merge BUSY and BOXED states
  s390: use common bust_spinlocks()
  s390/zcrypt: improve special ap message cmd handling
  s390/ap: rework assembler functions to use unions for in/out register variables
  s390: vfio-ap: include <asm/facility> for test_facility()

04a17ede

locks: fix error in locks_move_blocks() · bf77ae4c

NeilBrown authored Jan 03, 2019

After moving all requests from
   fl->fl_blocked_requests
to
   new->fl_blocked_requests

it is nonsensical to do anything to all the remaining elements, there
aren't any.  This should do something to all the requests that have been
moved. For simplicity, it does it to all requests in the target list.

Setting "f->fl_blocker = new" to all members of new->fl_blocked_requests
is "obviously correct" as it preserves the invariant of the linkage
among requests.

Reported-by: syzbot+239d99847eb49ecb3899@syzkaller.appspotmail.com
Fixes: 5946c431 ("fs/locks: allow a lock request to block other requests.")
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>

bf77ae4c

Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · be630043

David S. Miller authored Jan 02, 2019

Alexei Starovoitov says:

====================
pull-request: bpf 2019-01-02

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) prevent out of bounds speculation on pointer arithmetic, from Daniel.

2) typo fix, from Xiaozhou.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

be630043

Merge tag 'nfs-for-4.21-1' of git://git.linux-nfs.org/projects/anna/linux-nfs · e6b92572

Linus Torvalds authored Jan 02, 2019

Pull NFS client updates from Anna Schumaker:
 "Stable bugfixes:
   - xprtrdma: Yet another double DMA-unmap # v4.20

  Features:
   - Allow some /proc/sys/sunrpc entries without CONFIG_SUNRPC_DEBUG
   - Per-xprt rdma receive workqueues
   - Drop support for FMR memory registration
   - Make port= mount option optional for RDMA mounts

  Other bugfixes and cleanups:
   - Remove unused nfs4_xdev_fs_type declaration
   - Fix comments for behavior that has changed
   - Remove generic RPC credentials by switching to 'struct cred'
   - Fix crossing mountpoints with different auth flavors
   - Various xprtrdma fixes from testing and auditing the close code
   - Fixes for disconnect issues when using xprtrdma with krb5
   - Clean up and improve xprtrdma trace points
   - Fix NFS v4.2 async copy reboot recovery"

* tag 'nfs-for-4.21-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (63 commits)
  sunrpc: convert to DEFINE_SHOW_ATTRIBUTE
  sunrpc: Add xprt after nfs4_test_session_trunk()
  sunrpc: convert unnecessary GFP_ATOMIC to GFP_NOFS
  sunrpc: handle ENOMEM in rpcb_getport_async
  NFS: remove unnecessary test for IS_ERR(cred)
  xprtrdma: Prevent leak of rpcrdma_rep objects
  NFSv4.2 fix async copy reboot recovery
  xprtrdma: Don't leak freed MRs
  xprtrdma: Add documenting comment for rpcrdma_buffer_destroy
  xprtrdma: Replace outdated comment for rpcrdma_ep_post
  xprtrdma: Update comments in frwr_op_send
  SUNRPC: Fix some kernel doc complaints
  SUNRPC: Simplify defining common RPC trace events
  NFS: Fix NFSv4 symbolic trace point output
  xprtrdma: Trace mapping, alloc, and dereg failures
  xprtrdma: Add trace points for calls to transport switch methods
  xprtrdma: Relocate the xprtrdma_mr_map trace points
  xprtrdma: Clean up of xprtrdma chunk trace points
  xprtrdma: Remove unused fields from rpcrdma_ia
  xprtrdma: Cull dprintk() call sites
  ...

e6b92572

Merge tag 'nfsd-4.21' of git://linux-nfs.org/~bfields/linux · e45428a4

Linus Torvalds authored Jan 02, 2019

Pull nfsd updates from Bruce Fields:
 "Thanks to Vasily Averin for fixing a use-after-free in the
  containerized NFSv4.2 client, and cleaning up some convoluted
  backchannel server code in the process.

  Otherwise, miscellaneous smaller bugfixes and cleanup"

* tag 'nfsd-4.21' of git://linux-nfs.org/~bfields/linux: (25 commits)
  nfs: fixed broken compilation in nfs_callback_up_net()
  nfs: minor typo in nfs4_callback_up_net()
  sunrpc: fix debug message in svc_create_xprt()
  sunrpc: make visible processing error in bc_svc_process()
  sunrpc: remove unused xpo_prep_reply_hdr callback
  sunrpc: remove svc_rdma_bc_class
  sunrpc: remove svc_tcp_bc_class
  sunrpc: remove unused bc_up operation from rpc_xprt_ops
  sunrpc: replace svc_serv->sv_bc_xprt by boolean flag
  sunrpc: use-after-free in svc_process_common()
  sunrpc: use SVC_NET() in svcauth_gss_* functions
  nfsd: drop useless LIST_HEAD
  lockd: Show pid of lockd for remote locks
  NFSD remove OP_CACHEME from 4.2 op_flags
  nfsd: Return EPERM, not EACCES, in some SETATTR cases
  sunrpc: fix cache_head leak due to queued request
  nfsd: clean up indentation, increase indentation in switch statement
  svcrdma: Optimize the logic that selects the R_key to invalidate
  nfsd: fix a warning in __cld_pipe_upcall()
  nfsd4: fix crash on writing v4_end_grace before nfsd startup
  ...

e45428a4

Merge branch 'prevent-oob-under-speculation' · a67825f5

Alexei Starovoitov authored Jan 02, 2019

Daniel Borkmann says:

====================
This set fixes an out of bounds case under speculative execution
by implementing masking of pointer alu into the verifier. For
details please see the individual patches.

Thanks!

v2 -> v3:
  - 8/9: change states_equal condition into old->speculative &&
    !cur->speculative, thanks Jakub!
  - 8/9: remove incorrect speculative state test in
    propagate_liveness(), thanks Jakub!
v1 -> v2:
  - Typo fixes in commit msg and a comment, thanks David!
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

a67825f5

bpf: add various test cases to selftests · 80c9b2fa

Daniel Borkmann authored Jan 03, 2019

Add various map value pointer related test cases to test_verifier
kselftest to reflect recent changes and improve test coverage. The
tests include basic masking functionality, unprivileged behavior
on pointer arithmetic which goes oob, mixed bounds tests, negative
unknown scalar but resulting positive offset for access and helper
range, handling of arithmetic from multiple maps, various masking
scenarios with subsequent map value access and others including two
test cases from Jann Horn for prior fixes.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

80c9b2fa

bpf: prevent out of bounds speculation on pointer arithmetic · 979d63d5

Daniel Borkmann authored Jan 03, 2019

Jann reported that the original commit back in b2157399
("bpf: prevent out-of-bounds speculation") was not sufficient
to stop CPU from speculating out of bounds memory access:
While b2157399 only focussed on masking array map access
for unprivileged users for tail calls and data access such
that the user provided index gets sanitized from BPF program
and syscall side, there is still a more generic form affected
from BPF programs that applies to most maps that hold user
data in relation to dynamic map access when dealing with
unknown scalars or "slow" known scalars as access offset, for
example:

  - Load a map value pointer into R6
  - Load an index into R7
  - Do a slow computation (e.g. with a memory dependency) that
    loads a limit into R8 (e.g. load the limit from a map for
    high latency, then mask it to make the verifier happy)
  - Exit if R7 >= R8 (mispredicted branch)
  - Load R0 = R6[R7]
  - Load R0 = R6[R0]

For unknown scalars there are two options in the BPF verifier
where we could derive knowledge from in order to guarantee
safe access to the memory: i) While </>/<=/>= variants won't
allow to derive any lower or upper bounds from the unknown
scalar where it would be safe to add it to the map value
pointer, it is possible through ==/!= test however. ii) another
option is to transform the unknown scalar into a known scalar,
for example, through ALU ops combination such as R &= <imm>
followed by R |= <imm> or any similar combination where the
original information from the unknown scalar would be destroyed
entirely leaving R with a constant. The initial slow load still
precedes the latter ALU ops on that register, so the CPU
executes speculatively from that point. Once we have the known
scalar, any compare operation would work then. A third option
only involving registers with known scalars could be crafted
as described in [0] where a CPU port (e.g. Slow Int unit)
would be filled with many dependent computations such that
the subsequent condition depending on its outcome has to wait
for evaluation on its execution port and thereby executing
speculatively if the speculated code can be scheduled on a
different execution port, or any other form of mistraining
as described in [1], for example. Given this is not limited
to only unknown scalars, not only map but also stack access
is affected since both is accessible for unprivileged users
and could potentially be used for out of bounds access under
speculation.

In order to prevent any of these cases, the verifier is now
sanitizing pointer arithmetic on the offset such that any
out of bounds speculation would be masked in a way where the
pointer arithmetic result in the destination register will
stay unchanged, meaning offset masked into zero similar as
in array_index_nospec() case. With regards to implementation,
there are three options that were considered: i) new insn
for sanitation, ii) push/pop insn and sanitation as inlined
BPF, iii) reuse of ax register and sanitation as inlined BPF.

Option i) has the downside that we end up using from reserved
bits in the opcode space, but also that we would require
each JIT to emit masking as native arch opcodes meaning
mitigation would have slow adoption till everyone implements
it eventually which is counter-productive. Option ii) and iii)
have both in common that a temporary register is needed in
order to implement the sanitation as inlined BPF since we
are not allowed to modify the source register. While a push /
pop insn in ii) would be useful to have in any case, it
requires once again that every JIT needs to implement it
first. While possible, amount of changes needed would also
be unsuitable for a -stable patch. Therefore, the path which
has fewer changes, less BPF instructions for the mitigation
and does not require anything to be changed in the JITs is
option iii) which this work is pursuing. The ax register is
already mapped to a register in all JITs (modulo arm32 where
it's mapped to stack as various other BPF registers there)
and used in constant blinding for JITs-only so far. It can
be reused for verifier rewrites under certain constraints.
The interpreter's tmp "register" has therefore been remapped
into extending the register set with hidden ax register and
reusing that for a number of instructions that needed the
prior temporary variable internally (e.g. div, mod). This
allows for zero increase in stack space usage in the interpreter,
and enables (restricted) generic use in rewrites otherwise as
long as such a patchlet does not make use of these instructions.
The sanitation mask is dynamic and relative to the offset the
map value or stack pointer currently holds.

There are various cases that need to be taken under consideration
for the masking, e.g. such operation could look as follows:
ptr += val or val += ptr or ptr -= val. Thus, the value to be
sanitized could reside either in source or in destination
register, and the limit is different depending on whether
the ALU op is addition or subtraction and depending on the
current known and bounded offset. The limit is derived as
follows: limit := max_value_size - (smin_value + off). For
subtraction: limit := umax_value + off. This holds because
we do not allow any pointer arithmetic that would
temporarily go out of bounds or would have an unknown
value with mixed signed bounds where it is unclear at
verification time whether the actual runtime value would
be either negative or positive. For example, we have a
derived map pointer value with constant offset and bounded
one, so limit based on smin_value works because the verifier
requires that statically analyzed arithmetic on the pointer
must be in bounds, and thus it checks if resulting
smin_value + off and umax_value + off is still within map
value bounds at time of arithmetic in addition to time of
access. Similarly, for the case of stack access we derive
the limit as follows: MAX_BPF_STACK + off for subtraction
and -off for the case of addition where off := ptr_reg->off +
ptr_reg->var_off.value. Subtraction is a special case for
the masking which can be in form of ptr += -val, ptr -= -val,
or ptr -= val. In the first two cases where we know that
the value is negative, we need to temporarily negate the
value in order to do the sanitation on a positive value
where we later swap the ALU op, and restore original source
register if the value was in source.

The sanitation of pointer arithmetic alone is still not fully
sufficient as is, since a scenario like the following could
happen ...

  PTR += 0x1000 (e.g. K-based imm)
  PTR -= BIG_NUMBER_WITH_SLOW_COMPARISON
  PTR += 0x1000
  PTR -= BIG_NUMBER_WITH_SLOW_COMPARISON
  [...]

... which under speculation could end up as ...

  PTR += 0x1000
  PTR -= 0 [ truncated by mitigation ]
  PTR += 0x1000
  PTR -= 0 [ truncated by mitigation ]
  [...]

... and therefore still access out of bounds. To prevent such
case, the verifier is also analyzing safety for potential out
of bounds access under speculative execution. Meaning, it is
also simulating pointer access under truncation. We therefore
"branch off" and push the current verification state after the
ALU operation with known 0 to the verification stack for later
analysis. Given the current path analysis succeeded it is
likely that the one under speculation can be pruned. In any
case, it is also subject to existing complexity limits and
therefore anything beyond this point will be rejected. In
terms of pruning, it needs to be ensured that the verification
state from speculative execution simulation must never prune
a non-speculative execution path, therefore, we mark verifier
state accordingly at the time of push_stack(). If verifier
detects out of bounds access under speculative execution from
one of the possible paths that includes a truncation, it will
reject such program.

Given we mask every reg-based pointer arithmetic for
unprivileged programs, we've been looking into how it could
affect real-world programs in terms of size increase. As the
majority of programs are targeted for privileged-only use
case, we've unconditionally enabled masking (with its alu
restrictions on top of it) for privileged programs for the
sake of testing in order to check i) whether they get rejected
in its current form, and ii) by how much the number of
instructions and size will increase. We've tested this by
using Katran, Cilium and test_l4lb from the kernel selftests.
For Katran we've evaluated balancer_kern.o, Cilium bpf_lxc.o
and an older test object bpf_lxc_opt_-DUNKNOWN.o and l4lb
we've used test_l4lb.o as well as test_l4lb_noinline.o. We
found that none of the programs got rejected by the verifier
with this change, and that impact is rather minimal to none.
balancer_kern.o had 13,904 bytes (1,738 insns) xlated and
7,797 bytes JITed before and after the change. Most complex
program in bpf_lxc.o had 30,544 bytes (3,817 insns) xlated
and 18,538 bytes JITed before and after and none of the other
tail call programs in bpf_lxc.o had any changes either. For
the older bpf_lxc_opt_-DUNKNOWN.o object we found a small
increase from 20,616 bytes (2,576 insns) and 12,536 bytes JITed
before to 20,664 bytes (2,582 insns) and 12,558 bytes JITed
after the change. Other programs from that object file had
similar small increase. Both test_l4lb.o had no change and
remained at 6,544 bytes (817 insns) xlated and 3,401 bytes
JITed and for test_l4lb_noinline.o constant at 5,080 bytes
(634 insns) xlated and 3,313 bytes JITed. This can be explained
in that LLVM typically optimizes stack based pointer arithmetic
by using K-based operations and that use of dynamic map access
is not overly frequent. However, in future we may decide to
optimize the algorithm further under known guarantees from
branch and value speculation. Latter seems also unclear in
terms of prediction heuristics that today's CPUs apply as well
as whether there could be collisions in e.g. the predictor's
Value History/Pattern Table for triggering out of bounds access,
thus masking is performed unconditionally at this point but could
be subject to relaxation later on. We were generally also
brainstorming various other approaches for mitigation, but the
blocker was always lack of available registers at runtime and/or
overhead for runtime tracking of limits belonging to a specific
pointer. Thus, we found this to be minimally intrusive under
given constraints.

With that in place, a simple example with sanitized access on
unprivileged load at post-verification time looks as follows:

  # bpftool prog dump xlated id 282
  [...]
  28: (79) r1 = *(u64 *)(r7 +0)
  29: (79) r2 = *(u64 *)(r7 +8)
  30: (57) r1 &= 15
  31: (79) r3 = *(u64 *)(r0 +4608)
  32: (57) r3 &= 1
  33: (47) r3 |= 1
  34: (2d) if r2 > r3 goto pc+19
  35: (b4) (u32) r11 = (u32) 20479  |
  36: (1f) r11 -= r2                | Dynamic sanitation for pointer
  37: (4f) r11 |= r2                | arithmetic with registers
  38: (87) r11 = -r11               | containing bounded or known
  39: (c7) r11 s>>= 63              | scalars in order to prevent
  40: (5f) r11 &= r2                | out of bounds speculation.
  41: (0f) r4 += r11                |
  42: (71) r4 = *(u8 *)(r4 +0)
  43: (6f) r4 <<= r1
  [...]

For the case where the scalar sits in the destination register
as opposed to the source register, the following code is emitted
for the above example:

  [...]
  16: (b4) (u32) r11 = (u32) 20479
  17: (1f) r11 -= r2
  18: (4f) r11 |= r2
  19: (87) r11 = -r11
  20: (c7) r11 s>>= 63
  21: (5f) r2 &= r11
  22: (0f) r2 += r0
  23: (61) r0 = *(u32 *)(r2 +0)
  [...]

JIT blinding example with non-conflicting use of r10:

  [...]
   d5:	je     0x0000000000000106    _
   d7:	mov    0x0(%rax),%edi       |
   da:	mov    $0xf153246,%r10d     | Index load from map value and
   e0:	xor    $0xf153259,%r10      | (const blinded) mask with 0x1f.
   e7:	and    %r10,%rdi            |_
   ea:	mov    $0x2f,%r10d          |
   f0:	sub    %rdi,%r10            | Sanitized addition. Both use r10
   f3:	or     %rdi,%r10            | but do not interfere with each
   f6:	neg    %r10                 | other. (Neither do these instructions
   f9:	sar    $0x3f,%r10           | interfere with the use of ax as temp
   fd:	and    %r10,%rdi            | in interpreter.)
  100:	add    %rax,%rdi            |_
  103:	mov    0x0(%rdi),%eax
 [...]

Tested that it fixes Jann's reproducer, and also checked that test_verifier
and test_progs suite with interpreter, JIT and JIT with hardening enabled
on x86-64 and arm64 runs successfully.

  [0] Speculose: Analyzing the Security Implications of Speculative
      Execution in CPUs, Giorgi Maisuradze and Christian Rossow,
      https://arxiv.org/pdf/1801.04084.pdf

  [1] A Systematic Evaluation of Transient Execution Attacks and
      Defenses, Claudio Canella, Jo Van Bulck, Michael Schwarz,
      Moritz Lipp, Benjamin von Berg, Philipp Ortner, Frank Piessens,
      Dmitry Evtyushkin, Daniel Gruss,
      https://arxiv.org/pdf/1811.05441.pdf

Fixes: b2157399 ("bpf: prevent out-of-bounds speculation")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

979d63d5

bpf: fix check_map_access smin_value test when pointer contains offset · b7137c4e

Daniel Borkmann authored Jan 03, 2019

In check_map_access() we probe actual bounds through __check_map_access()
with offset of reg->smin_value + off for lower bound and offset of
reg->umax_value + off for the upper bound. However, even though the
reg->smin_value could have a negative value, the final result of the
sum with off could be positive when pointer arithmetic with known and
unknown scalars is combined. In this case we reject the program with
an error such as "R<x> min value is negative, either use unsigned index
or do a if (index >=0) check." even though the access itself would be
fine. Therefore extend the check to probe whether the actual resulting
reg->smin_value + off is less than zero.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

b7137c4e

bpf: restrict unknown scalars of mixed signed bounds for unprivileged · 9d7eceed

Daniel Borkmann authored Jan 03, 2019

For unknown scalars of mixed signed bounds, meaning their smin_value is
negative and their smax_value is positive, we need to reject arithmetic
with pointer to map value. For unprivileged the goal is to mask every
map pointer arithmetic and this cannot reliably be done when it is
unknown at verification time whether the scalar value is negative or
positive. Given this is a corner case, the likelihood of breaking should
be very small.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

9d7eceed