Commits · b3dff0eb4b98c16e7dc395326425569e226f7843 · nexedi / linux

13 Nov, 2019 12 commits

Merge tag 'linux-can-fixes-for-5.4-20191113' of... · b3dff0eb

David S. Miller authored Nov 13, 2019

Merge tag 'linux-can-fixes-for-5.4-20191113' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2019-11-13

this is a pull request of 9 patches for net/master, hopefully for the v5.4
release cycle.

All nine patches are by Oleksij Rempel and fix locking and use-after-free bugs
in the j1939 stack found by the syzkaller syzbot.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

b3dff0eb

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec · c3afb7ea

David S. Miller authored Nov 13, 2019

Steffen Klassert says:

====================
pull request (net): ipsec 2019-11-13

1) Fix a page memleak on xfrm state destroy.

2) Fix a refcount imbalance if a xfrm_state
   gets invaild during async resumption.
   From Xiaodong Xu.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

c3afb7ea

can: j1939: warn if resources are still linked on destroy · 4a15d574

Oleksij Rempel authored Nov 08, 2019

j1939_session_destroy() and __j1939_priv_release() should be called only
if session, ecu or socket are not linked or used by any one else. If at
least one of these resources is linked, then the reference counting is
broken somewhere.

This warning will be triggered before KASAN will do, and will make it
easier to debug initial issue. This works on platforms without KASAN
support.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>

4a15d574

can: j1939: j1939_can_recv(): add priv refcounting · ddeeb7d4

Oleksij Rempel authored Nov 09, 2019

j1939_can_recv() can be called in parallel with socket release. In this
case sk_release and sk_destruct can be done earlier than
j1939_can_recv() is processed.

Reported-by: syzbot+ca172a0ac477ac90f045@syzkaller.appspotmail.com
Reported-by: syzbot+07ca5bce8530070a5650@syzkaller.appspotmail.com
Reported-by: syzbot+a47537d3964ef6c874e1@syzkaller.appspotmail.com
Fixes: 9d71dd0c ("can: add support of SAE J1939 protocol")
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>

ddeeb7d4

can: j1939: transport: j1939_cancel_active_session(): use... · 8d7a5f00

Oleksij Rempel authored Nov 08, 2019

can: j1939: transport: j1939_cancel_active_session(): use hrtimer_try_to_cancel() instead of hrtimer_cancel()

This part of the code protected by lock used in the hrtimer as well.
Using hrtimer_cancel() will trigger dead lock.

Fixes: 9d71dd0c ("can: add support of SAE J1939 protocol")
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>

8d7a5f00

can: j1939: make sure socket is held as long as session exists · 62ebce1d

Oleksij Rempel authored Nov 07, 2019

We link the socket to the session to be able provide socket specific
notifications. For example messages over error queue.

We need to keep the socket held, while we have a reference to it.

Fixes: 9d71dd0c ("can: add support of SAE J1939 protocol")
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>

62ebce1d

can: j1939: transport: make sure the aborted session will be deactivated only once · d966635b

Oleksij Rempel authored Nov 07, 2019

j1939_session_cancel() was modifying session->state without protecting
it by locks and without checking actual state of the session.

This patch moves j1939_tp_set_rxtimeout() into j1939_session_cancel()
and adds the missing locking.

Fixes: 9d71dd0c ("can: add support of SAE J1939 protocol")
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>

d966635b

can: j1939: socket: rework socket locking for j1939_sk_release() and j1939_sk_sendmsg() · fd81ebfe

Oleksij Rempel authored Nov 05, 2019

j1939_sk_sendmsg() should be protected by lock_sock() to avoid race with
j1939_sk_bind() and j1939_sk_release().

Reported-by: syzbot+afd421337a736d6c1ee6@syzkaller.appspotmail.com
Reported-by: syzbot+6d04f6a1b31a0ae12ca9@syzkaller.appspotmail.com
Fixes: 9d71dd0c ("can: add support of SAE J1939 protocol")
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>

fd81ebfe

can: j1939: main: j1939_ndev_to_priv(): avoid crash if can_ml_priv is NULL · c48c8c1e

Oleksij Rempel authored Nov 05, 2019

This patch avoids a NULL pointer deref crash if ndev->ml_priv is NULL.

Reported-by: syzbot+95c8e0d9dffde15b6c5c@syzkaller.appspotmail.com
Fixes: 9d71dd0c ("can: add support of SAE J1939 protocol")
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>

c48c8c1e

can: j1939: move j1939_priv_put() into sk_destruct callback · 25fe97cb

Oleksij Rempel authored Nov 07, 2019

This patch delays the j1939_priv_put() until the socket is destroyed via
the sk_destruct callback, to avoid use-after-free problems.

Fixes: 9d71dd0c ("can: add support of SAE J1939 protocol")
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>

25fe97cb

can: af_can: export can_sock_destruct() · 975987e7

Oleksij Rempel authored Nov 07, 2019

In j1939 we need our own struct sock::sk_destruct callback. Export the
generic af_can can_sock_destruct() that allows us to chain-call it.

Fixes: 9d71dd0c ("can: add support of SAE J1939 protocol")
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>

975987e7

dpaa2-eth: free already allocated channels on probe defer · 5aa4277d

Ioana Ciornei authored Nov 12, 2019

The setup_dpio() function tries to allocate a number of channels equal
to the number of CPUs online. When there are not enough DPCON objects
already probed, the function will return EPROBE_DEFER. When this
happens, the already allocated channels are not freed. This results in
the incapacity of properly probing the next time around.
Fix this by freeing the channels on the error path.

Fixes: d7f5a9d8 ("dpaa2-eth: defer probe on object allocate")
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5aa4277d

12 Nov, 2019 6 commits

net/smc: fix refcount non-blocking connect() -part 2 · 6d6dd528

Ursula Braun authored Nov 12, 2019

If an SMC socket is immediately terminated after a non-blocking connect()
has been called, a memory leak is possible.
Due to the sock_hold move in
commit 301428ea ("net/smc: fix refcounting for non-blocking connect()")
an extra sock_put() is needed in smc_connect_work(), if the internal
TCP socket is aborted and cancels the sk_stream_wait_connect() of the
connect worker.

Reported-by: syzbot+4b73ad6fc767e576e275@syzkaller.appspotmail.com
Fixes: 301428ea ("net/smc: fix refcounting for non-blocking connect()")
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6d6dd528

xfrm: release device reference for invalid state · 4944a4b1

Xiaodong Xu authored Nov 11, 2019

An ESP packet could be decrypted in async mode if the input handler for
this packet returns -EINPROGRESS in xfrm_input(). At this moment the device
reference in skb is held. Later xfrm_input() will be invoked again to
resume the processing.
If the transform state is still valid it would continue to release the
device reference and there won't be a problem; however if the transform
state is not valid when async resumption happens, the packet will be
dropped while the device reference is still being held.
When the device is deleted for some reason and the reference to this
device is not properly released, the kernel will keep logging like:

unregister_netdevice: waiting for ppp2 to become free. Usage count = 1

The issue is observed when running IPsec traffic over a PPPoE device based
on a bridge interface. By terminating the PPPoE connection on the server
end for multiple times, the PPPoE device on the client side will eventually
get stuck on the above warning message.

This patch will check the async mode first and continue to release device
reference in async resumption, before it is dropped due to invalid state.

v2: Do not assign address family from outer_mode in the transform if the
state is invalid

v3: Release device reference in the error path instead of jumping to resume

Fixes: 4ce3dbe3 ("xfrm: Fix xfrm_input() to verify state is valid when (encap_type < 0)")
Signed-off-by: Xiaodong Xu <stid.smth@gmail.com>
Reported-by: Bo Chen <chenborfc@163.com>
Tested-by: Bo Chen <chenborfc@163.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

4944a4b1

mdio_bus: Fix PTR_ERR applied after initialization to constant · 1d463956

YueHaibing authored Nov 11, 2019

Fix coccinelle warning:

./drivers/net/phy/mdio_bus.c:67:5-12: ERROR: PTR_ERR applied after initialization to constant on line 62
./drivers/net/phy/mdio_bus.c:68:5-12: ERROR: PTR_ERR applied after initialization to constant on line 62

Fix this by using IS_ERR before PTR_ERR
Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: 71dd6c0d ("net: phy: add support for reset-controller")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1d463956

NFC: nxp-nci: Fix NULL pointer dereference after I2C communication error · a71a29f5

Stephan Gerhold authored Nov 10, 2019

I2C communication errors (-EREMOTEIO) during the IRQ handler of nxp-nci
result in a NULL pointer dereference at the moment:

    BUG: kernel NULL pointer dereference, address: 0000000000000000
    Oops: 0002 [#1] PREEMPT SMP NOPTI
    CPU: 1 PID: 355 Comm: irq/137-nxp-nci Not tainted 5.4.0-rc6 #1
    RIP: 0010:skb_queue_tail+0x25/0x50
    Call Trace:
     nci_recv_frame+0x36/0x90 [nci]
     nxp_nci_i2c_irq_thread_fn+0xd1/0x285 [nxp_nci_i2c]
     ? preempt_count_add+0x68/0xa0
     ? irq_forced_thread_fn+0x80/0x80
     irq_thread_fn+0x20/0x60
     irq_thread+0xee/0x180
     ? wake_threads_waitq+0x30/0x30
     kthread+0xfb/0x130
     ? irq_thread_check_affinity+0xd0/0xd0
     ? kthread_park+0x90/0x90
     ret_from_fork+0x1f/0x40

Afterward the kernel must be rebooted to work properly again.

This happens because it attempts to call nci_recv_frame() with skb == NULL.
However, unlike nxp_nci_fw_recv_frame(), nci_recv_frame() does not have any
NULL checks for skb, causing the NULL pointer dereference.

Change the code to call only nxp_nci_fw_recv_frame() in case of an error.
Make sure to log it so it is obvious that a communication error occurred.
The error above then becomes:

    nxp-nci_i2c i2c-NXP1001:00: NFC: Read failed with error -121
    nci: __nci_request: wait_for_completion_interruptible_timeout failed 0
    nxp-nci_i2c i2c-NXP1001:00: NFC: Read failed with error -121

Fixes: 6be88670 ("NFC: nxp-nci_i2c: Add I2C support to NXP NCI driver")
Signed-off-by: Stephan Gerhold <stephan@gerhold.net>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a71a29f5

mlxsw: core: Enable devlink reload only on probe · 73a533ec

Jiri Pirko authored Nov 10, 2019

Call devlink enable only during probe time and avoid deadlock
during reload.
Reported-by: Shalom Toledo <shalomt@mellanox.com>
Fixes: 5a508a25 ("devlink: disallow reload operation during device cleanup")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Tested-by: Shalom Toledo <shalomt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

73a533ec

devlink: Add method for time-stamp on reporter's dump · d279505b

Aya Levin authored Nov 10, 2019

When setting the dump's time-stamp, use ktime_get_real in addition to
jiffies. This simplifies the user space implementation and bypasses
some inconsistent behavior with translating jiffies to current time.
The time taken is transformed into nsec, to comply with y2038 issue.

Fixes: c8e1da0b ("devlink: Add health report functionality")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

d279505b

11 Nov, 2019 1 commit

net: ethernet: dwmac-sun8i: Use the correct function in exit path · 40a1dcee

Corentin Labbe authored Nov 10, 2019

When PHY is not powered, the probe function fail and some resource are
still unallocated.
Furthermore some BUG happens:
dwmac-sun8i 5020000.ethernet: EMAC reset timeout
------------[ cut here ]------------
kernel BUG at /linux-next/net/core/dev.c:9844!

So let's use the right function (stmmac_pltfr_remove) in the error path.

Fixes: 9f93ac8d ("net-next: stmmac: Add dwmac-sun8i")
Cc: <stable@vger.kernel.org> # v4.15+
Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

40a1dcee

10 Nov, 2019 2 commits

tcp: remove redundant new line from tcp_event_sk_skb · dd3d792d

Tony Lu authored Nov 09, 2019

This removes '\n' from trace event class tcp_event_sk_skb to avoid
redundant new blank line and make output compact.

Fixes: af4325ec ("tcp: expose sk_state in tcp_retransmit_skb tracepoint")
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dd3d792d

devlink: disallow reload operation during device cleanup · 5a508a25

Jiri Pirko authored Nov 09, 2019

There is a race between driver code that does setup/cleanup of device
and devlink reload operation that in some drivers works with the same
code. Use after free could we easily obtained by running:

while true; do
        echo "0000:00:10.0" >/sys/bus/pci/drivers/mlxsw_spectrum2/bind
        devlink dev reload pci/0000:00:10.0 &
        echo "0000:00:10.0" >/sys/bus/pci/drivers/mlxsw_spectrum2/unbind
done

Fix this by enabling reload only after setup of device is complete and
disabling it at the beginning of the cleanup process.
Reported-by: Ido Schimmel <idosch@mellanox.com>
Fixes: 2d8dc5bb ("devlink: Add support for reload")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5a508a25

09 Nov, 2019 9 commits

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 0058b0a5

Linus Torvalds authored Nov 08, 2019

Pull networking fixes from David Miller:

 1) BPF sample build fixes from Björn Töpel

 2) Fix powerpc bpf tail call implementation, from Eric Dumazet.

 3) DCCP leaks jiffies on the wire, fix also from Eric Dumazet.

 4) Fix crash in ebtables when using dnat target, from Florian Westphal.

 5) Fix port disable handling whne removing bcm_sf2 driver, from Florian
    Fainelli.

 6) Fix kTLS sk_msg trim on fallback to copy mode, from Jakub Kicinski.

 7) Various KCSAN fixes all over the networking, from Eric Dumazet.

 8) Memory leaks in mlx5 driver, from Alex Vesker.

 9) SMC interface refcounting fix, from Ursula Braun.

10) TSO descriptor handling fixes in stmmac driver, from Jose Abreu.

11) Add a TX lock to synchonize the kTLS TX path properly with crypto
    operations. From Jakub Kicinski.

12) Sock refcount during shutdown fix in vsock/virtio code, from Stefano
    Garzarella.

13) Infinite loop in Intel ice driver, from Colin Ian King.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (108 commits)
  ixgbe: need_wakeup flag might not be set for Tx
  i40e: need_wakeup flag might not be set for Tx
  igb/igc: use ktime accessors for skb->tstamp
  i40e: Fix for ethtool -m issue on X722 NIC
  iavf: initialize ITRN registers with correct values
  ice: fix potential infinite loop because loop counter being too small
  qede: fix NULL pointer deref in __qede_remove()
  net: fix data-race in neigh_event_send()
  vsock/virtio: fix sock refcnt holding during the shutdown
  net: ethernet: octeon_mgmt: Account for second possible VLAN header
  mac80211: fix station inactive_time shortly after boot
  net/fq_impl: Switch to kvmalloc() for memory allocation
  mac80211: fix ieee80211_txq_setup_flows() failure path
  ipv4: Fix table id reference in fib_sync_down_addr
  ipv6: fixes rt6_probe() and fib6_nh->last_probe init
  net: hns: Fix the stray netpoll locks causing deadlock in NAPI path
  net: usb: qmi_wwan: add support for DW5821e with eSIM support
  CDC-NCM: handle incomplete transfer of MTU
  nfc: netlink: fix double device reference drop
  NFC: st21nfca: fix double free
  ...

0058b0a5

Merge tag 'for-linus-2019-11-08' of git://git.kernel.dk/linux-block · 5cb8418c

Linus Torvalds authored Nov 08, 2019

Pull block fixes from Jens Axboe:

 - Two NVMe device removal crash fixes, and a compat fixup for for an
   ioctl that was introduced in this release (Anton, Charles, Max - via
   Keith)

 - Missing error path mutex unlock for drbd (Dan)

 - cgroup writeback fixup on dead memcg (Tejun)

 - blkcg online stats print fix (Tejun)

* tag 'for-linus-2019-11-08' of git://git.kernel.dk/linux-block:
  cgroup,writeback: don't switch wbs immediately on dead wbs if the memcg is dead
  block: drbd: remove a stray unlock in __drbd_send_protocol()
  blkcg: make blkcg_print_stat() print stats only for online blkgs
  nvme: change nvme_passthru_cmd64 to explicitly mark rsvd
  nvme-multipath: fix crash in nvme_mpath_clear_ctrl_paths
  nvme-rdma: fix a segmentation fault during module unload

5cb8418c

Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue · a2582cdc

David S. Miller authored Nov 08, 2019

Jeff Kirsher says:

====================
Intel Wired LAN Driver Fixes 2019-11-08

This series contains fixes to igb, igc, ixgbe, i40e, iavf and ice
drivers.

Colin Ian King fixes a potentially wrap-around counter in a for-loop.

Nick fixes the default ITR values for the iavf driver to 50 usecs
interval.

Arkadiusz fixes 'ethtool -m' for X722 devices where the correct value
cannot be obtained from the firmware, so add X722 to the check to ensure
the wrong value is not returned.

Jake fixes igb and igc drivers in their implementation of launch time
support by declaring skb->tstamp value as ktime_t instead of s64.

Magnus fixes ixgbe and i40e where the need_wakeup flag for transmit may
not be set for AF_XDP sockets that are only used to send packets.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

a2582cdc

ixgbe: need_wakeup flag might not be set for Tx · 0843aa8f

Magnus Karlsson authored Nov 08, 2019

The need_wakeup flag for Tx might not be set for AF_XDP sockets that
are only used to send packets. This happens if there is at least one
outstanding packet that has not been completed by the hardware and we
get that corresponding completion (which will not generate an
interrupt since interrupts are disabled in the napi poll loop) between
the time we stopped processing the Tx completions and interrupts are
enabled again. In this case, the need_wakeup flag will have been
cleared at the end of the Tx completion processing as we believe we
will get an interrupt from the outstanding completion at a later point
in time. But if this completion interrupt occurs before interrupts
are enable, we lose it and should at that point really have set the
need_wakeup flag since there are no more outstanding completions that
can generate an interrupt to continue the processing. When this
happens, user space will see a Tx queue need_wakeup of 0 and skip
issuing a syscall, which means will never get into the Tx processing
again and we have a deadlock.

This patch introduces a quick fix for this issue by just setting the
need_wakeup flag for Tx to 1 all the time. I am working on a proper
fix for this that will toggle the flag appropriately, but it is more
challenging than I anticipated and I am afraid that this patch will
not be completed before the merge window closes, therefore this easier
fix for now. This fix has a negative performance impact in the range
of 0% to 4%. Towards the higher end of the scale if you have driver
and application on the same core and issue a lot of packets, and
towards no negative impact if you use two cores, lower transmission
speeds and/or a workload that also receives packets.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

0843aa8f

i40e: need_wakeup flag might not be set for Tx · 70563957

Magnus Karlsson authored Nov 08, 2019

70563957

igb/igc: use ktime accessors for skb->tstamp · 6acab13b

Jacob Keller authored Nov 06, 2019

When implementing launch time support in the igb and igc drivers, the
skb->tstamp value is assumed to be a s64, but it's declared as a ktime_t
value.

Although ktime_t is typedef'd to s64 it wasn't always, and the kernel
provides accessors for ktime_t values.

Use the ktime_to_timespec64 and ktime_set accessors instead of directly
assuming that the variable is always an s64.

This improves portability if the code is ever moved to another kernel
version, or if the definition of ktime_t ever changes again in the
future.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

6acab13b

i40e: Fix for ethtool -m issue on X722 NIC · 4c9da6f2

Arkadiusz Kubalewski authored Nov 06, 2019

This patch contains fix for a problem with command:
'ethtool -m <dev>'
which breaks functionality of:
'ethtool <dev>'
when called on X722 NIC

Disallowed update of link phy_types on X722 NIC
Currently correct value cannot be obtained from FW
Previously wrong value returned by FW was used and was
a root cause for incorrect output of 'ethtool <dev>' command
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

4c9da6f2

iavf: initialize ITRN registers with correct values · 4eda4e00

Nicholas Nunley authored Nov 05, 2019

Since commit 92418fb1 ("i40e/i40evf: Use usec value instead of reg
value for ITR defines") the driver tracks the interrupt throttling
intervals in single usec units, although the actual ITRN registers are
programmed in 2 usec units. Most register programming flows in the driver
correctly handle the conversion, although it is currently not applied when
the registers are initialized to their default values. Most of the time
this doesn't present a problem since the default values are usually
immediately overwritten through the standard adaptive throttling mechanism,
or updated manually by the user, but if adaptive throttling is disabled and
the interval values are left alone then the incorrect value will persist.

Since the intended default interval of 50 usecs (vs. 100 usecs as
programmed) performs better for most traffic workloads, this can lead to
performance regressions.

This patch adds the correct conversion when writing the initial values to
the ITRN registers.
Signed-off-by: Nicholas Nunley <nicholas.d.nunley@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

4eda4e00

ice: fix potential infinite loop because loop counter being too small · 615457a2

Colin Ian King authored Nov 01, 2019

Currently the for-loop counter i is a u8 however it is being checked
against a maximum value hw->num_tx_sched_layers which is a u16. Hence
there is a potential wrap-around of counter i back to zero if
hw->num_tx_sched_layers is greater than 255.  Fix this by making i
a u16.

Addresses-Coverity: ("Infinite loop")
Fixes: b36c598c ("ice: Updates to Tx scheduler code")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

615457a2

08 Nov, 2019 10 commits

qede: fix NULL pointer deref in __qede_remove() · deabc871

Manish Chopra authored Nov 08, 2019

While rebooting the system with SR-IOV vfs enabled leads
to below crash due to recurrence of __qede_remove() on the VF
devices (first from .shutdown() flow of the VF itself and
another from PF's .shutdown() flow executing pci_disable_sriov())

This patch adds a safeguard in __qede_remove() flow to fix this,
so that driver doesn't attempt to remove "already removed" devices.

[  194.360134] BUG: unable to handle kernel NULL pointer dereference at 00000000000008dc
[  194.360227] IP: [<ffffffffc03553c4>] __qede_remove+0x24/0x130 [qede]
[  194.360304] PGD 0
[  194.360325] Oops: 0000 [#1] SMP
[  194.360360] Modules linked in: tcp_lp fuse tun bridge stp llc devlink bonding ip_set nfnetlink ib_isert iscsi_target_mod ib_srpt target_core_mod ib_srp scsi_transport_srp scsi_tgt ib_ipoib ib_umad rpcrdma sunrpc rdma_ucm ib_uverbs ib_iser rdma_cm iw_cm ib_cm libiscsi scsi_transport_iscsi dell_smbios iTCO_wdt iTCO_vendor_support dell_wmi_descriptor dcdbas vfat fat pcc_cpufreq skx_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd qedr ib_core pcspkr ses enclosure joydev ipmi_ssif sg i2c_i801 lpc_ich mei_me mei wmi ipmi_si ipmi_devintf ipmi_msghandler tpm_crb acpi_pad acpi_power_meter xfs libcrc32c sd_mod crc_t10dif crct10dif_generic crct10dif_pclmul crct10dif_common crc32c_intel mgag200
[  194.361044]  qede i2c_algo_bit drm_kms_helper qed syscopyarea sysfillrect nvme sysimgblt fb_sys_fops ttm nvme_core mpt3sas crc8 ptp drm pps_core ahci raid_class scsi_transport_sas libahci libata drm_panel_orientation_quirks nfit libnvdimm dm_mirror dm_region_hash dm_log dm_mod [last unloaded: ip_tables]
[  194.361297] CPU: 51 PID: 7996 Comm: reboot Kdump: loaded Not tainted 3.10.0-1062.el7.x86_64 #1
[  194.361359] Hardware name: Dell Inc. PowerEdge MX840c/0740HW, BIOS 2.4.6 10/15/2019
[  194.361412] task: ffff9cea9b360000 ti: ffff9ceabebdc000 task.ti: ffff9ceabebdc000
[  194.361463] RIP: 0010:[<ffffffffc03553c4>]  [<ffffffffc03553c4>] __qede_remove+0x24/0x130 [qede]
[  194.361534] RSP: 0018:ffff9ceabebdfac0  EFLAGS: 00010282
[  194.361570] RAX: 0000000000000000 RBX: ffff9cd013846098 RCX: 0000000000000000
[  194.361621] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9cd013846098
[  194.361668] RBP: ffff9ceabebdfae8 R08: 0000000000000000 R09: 0000000000000000
[  194.361715] R10: 00000000bfe14201 R11: ffff9ceabfe141e0 R12: 0000000000000000
[  194.361762] R13: ffff9cd013846098 R14: 0000000000000000 R15: ffff9ceab5e48000
[  194.361810] FS:  00007f799c02d880(0000) GS:ffff9ceacb0c0000(0000) knlGS:0000000000000000
[  194.361865] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  194.361903] CR2: 00000000000008dc CR3: 0000001bdac76000 CR4: 00000000007607e0
[  194.361953] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  194.362002] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  194.362051] PKRU: 55555554
[  194.362073] Call Trace:
[  194.362109]  [<ffffffffc0355500>] qede_remove+0x10/0x20 [qede]
[  194.362180]  [<ffffffffb97d0f3e>] pci_device_remove+0x3e/0xc0
[  194.362240]  [<ffffffffb98b3c52>] __device_release_driver+0x82/0xf0
[  194.362285]  [<ffffffffb98b3ce3>] device_release_driver+0x23/0x30
[  194.362343]  [<ffffffffb97c86d4>] pci_stop_bus_device+0x84/0xa0
[  194.362388]  [<ffffffffb97c87e2>] pci_stop_and_remove_bus_device+0x12/0x20
[  194.362450]  [<ffffffffb97f153f>] pci_iov_remove_virtfn+0xaf/0x160
[  194.362496]  [<ffffffffb97f1aec>] sriov_disable+0x3c/0xf0
[  194.362534]  [<ffffffffb97f1bc3>] pci_disable_sriov+0x23/0x30
[  194.362599]  [<ffffffffc02f83c3>] qed_sriov_disable+0x5e3/0x650 [qed]
[  194.362658]  [<ffffffffb9622df6>] ? kfree+0x106/0x140
[  194.362709]  [<ffffffffc02cc0c0>] ? qed_free_stream_mem+0x70/0x90 [qed]
[  194.362754]  [<ffffffffb9622df6>] ? kfree+0x106/0x140
[  194.362803]  [<ffffffffc02cd659>] qed_slowpath_stop+0x1a9/0x1d0 [qed]
[  194.362854]  [<ffffffffc035544e>] __qede_remove+0xae/0x130 [qede]
[  194.362904]  [<ffffffffc03554e0>] qede_shutdown+0x10/0x20 [qede]
[  194.362956]  [<ffffffffb97cf90a>] pci_device_shutdown+0x3a/0x60
[  194.363010]  [<ffffffffb98b180b>] device_shutdown+0xfb/0x1f0
[  194.363066]  [<ffffffffb94b66c6>] kernel_restart_prepare+0x36/0x40
[  194.363107]  [<ffffffffb94b66e2>] kernel_restart+0x12/0x60
[  194.363146]  [<ffffffffb94b6959>] SYSC_reboot+0x229/0x260
[  194.363196]  [<ffffffffb95f200d>] ? handle_mm_fault+0x39d/0x9b0
[  194.363253]  [<ffffffffb942b621>] ? __switch_to+0x151/0x580
[  194.363304]  [<ffffffffb9b7ec28>] ? __schedule+0x448/0x9c0
[  194.363343]  [<ffffffffb94b69fe>] SyS_reboot+0xe/0x10
[  194.363387]  [<ffffffffb9b8bede>] system_call_fastpath+0x25/0x2a
[  194.363430] Code: f9 e9 37 ff ff ff 90 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 4c 8d af 98 00 00 00 41 54 4c 89 ef 41 89 f4 53 e8 4c e4 55 f9 <80> b8 dc 08 00 00 01 48 89 c3 4c 8d b8 c0 08 00 00 4c 8b b0 c0
[  194.363712] RIP  [<ffffffffc03553c4>] __qede_remove+0x24/0x130 [qede]
[  194.363764]  RSP <ffff9ceabebdfac0>
[  194.363791] CR2: 00000000000008dc
Signed-off-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Sudarsana Kalluru <skalluru@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

deabc871

net: fix data-race in neigh_event_send() · 1b53d644

Eric Dumazet authored Nov 07, 2019

KCSAN reported the following data-race [1]

The fix will also prevent the compiler from optimizing out
the condition.

[1]

BUG: KCSAN: data-race in neigh_resolve_output / neigh_resolve_output

write to 0xffff8880a41dba78 of 8 bytes by interrupt on cpu 1:
 neigh_event_send include/net/neighbour.h:443 [inline]
 neigh_resolve_output+0x78/0x480 net/core/neighbour.c:1474
 neigh_output include/net/neighbour.h:511 [inline]
 ip_finish_output2+0x4af/0xe40 net/ipv4/ip_output.c:228
 __ip_finish_output net/ipv4/ip_output.c:308 [inline]
 __ip_finish_output+0x23a/0x490 net/ipv4/ip_output.c:290
 ip_finish_output+0x41/0x160 net/ipv4/ip_output.c:318
 NF_HOOK_COND include/linux/netfilter.h:294 [inline]
 ip_output+0xdf/0x210 net/ipv4/ip_output.c:432
 dst_output include/net/dst.h:436 [inline]
 ip_local_out+0x74/0x90 net/ipv4/ip_output.c:125
 __ip_queue_xmit+0x3a8/0xa40 net/ipv4/ip_output.c:532
 ip_queue_xmit+0x45/0x60 include/net/ip.h:237
 __tcp_transmit_skb+0xe81/0x1d60 net/ipv4/tcp_output.c:1169
 tcp_transmit_skb net/ipv4/tcp_output.c:1185 [inline]
 __tcp_retransmit_skb+0x4bd/0x15f0 net/ipv4/tcp_output.c:2976
 tcp_retransmit_skb+0x36/0x1a0 net/ipv4/tcp_output.c:2999
 tcp_retransmit_timer+0x719/0x16d0 net/ipv4/tcp_timer.c:515
 tcp_write_timer_handler+0x42d/0x510 net/ipv4/tcp_timer.c:598
 tcp_write_timer+0xd1/0xf0 net/ipv4/tcp_timer.c:618

read to 0xffff8880a41dba78 of 8 bytes by interrupt on cpu 0:
 neigh_event_send include/net/neighbour.h:442 [inline]
 neigh_resolve_output+0x57/0x480 net/core/neighbour.c:1474
 neigh_output include/net/neighbour.h:511 [inline]
 ip_finish_output2+0x4af/0xe40 net/ipv4/ip_output.c:228
 __ip_finish_output net/ipv4/ip_output.c:308 [inline]
 __ip_finish_output+0x23a/0x490 net/ipv4/ip_output.c:290
 ip_finish_output+0x41/0x160 net/ipv4/ip_output.c:318
 NF_HOOK_COND include/linux/netfilter.h:294 [inline]
 ip_output+0xdf/0x210 net/ipv4/ip_output.c:432
 dst_output include/net/dst.h:436 [inline]
 ip_local_out+0x74/0x90 net/ipv4/ip_output.c:125
 __ip_queue_xmit+0x3a8/0xa40 net/ipv4/ip_output.c:532
 ip_queue_xmit+0x45/0x60 include/net/ip.h:237
 __tcp_transmit_skb+0xe81/0x1d60 net/ipv4/tcp_output.c:1169
 tcp_transmit_skb net/ipv4/tcp_output.c:1185 [inline]
 __tcp_retransmit_skb+0x4bd/0x15f0 net/ipv4/tcp_output.c:2976
 tcp_retransmit_skb+0x36/0x1a0 net/ipv4/tcp_output.c:2999
 tcp_retransmit_timer+0x719/0x16d0 net/ipv4/tcp_timer.c:515
 tcp_write_timer_handler+0x42d/0x510 net/ipv4/tcp_timer.c:598

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.4.0-rc3+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1b53d644

Merge tag 'pwm/for-5.4-rc7' of... · abf6c397

Linus Torvalds authored Nov 08, 2019

Merge tag 'pwm/for-5.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm

Pull pwm fix from Thierry Reding:
 "One more fix to keep a reference to the driver's module as long as
  there are users of the PWM exposed by the driver"

* tag 'pwm/for-5.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm:
  pwm: bcm-iproc: Prevent unloading the driver module while in use

abf6c397

cgroup,writeback: don't switch wbs immediately on dead wbs if the memcg is dead · 65de03e2

Tejun Heo authored Nov 08, 2019

cgroup writeback tries to refresh the associated wb immediately if the
current wb is dead.  This is to avoid keeping issuing IOs on the stale
wb after memcg - blkcg association has changed (ie. when blkcg got
disabled / enabled higher up in the hierarchy).

Unfortunately, the logic gets triggered spuriously on inodes which are
associated with dead cgroups.  When the logic is triggered on dead
cgroups, the attempt fails only after doing quite a bit of work
allocating and initializing a new wb.

While c3aab9a0 ("mm/filemap.c: don't initiate writeback if mapping
has no dirty pages") alleviated the issue significantly as it now only
triggers when the inode has dirty pages.  However, the condition can
still be triggered before the inode is switched to a different cgroup
and the logic simply doesn't make sense.

Skip the immediate switching if the associated memcg is dying.

This is a simplified version of the following two patches:

 * https://lore.kernel.org/linux-mm/20190513183053.GA73423@dennisz-mbp/
 * http://lkml.kernel.org/r/156355839560.2063.5265687291430814589.stgit@buzz

Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Fixes: e8a7abf5 ("writeback: disassociate inodes from dying bdi_writebacks")
Acked-by: Dennis Zhou <dennis@kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

65de03e2

Merge tag 'ceph-for-5.4-rc7' of git://github.com/ceph/ceph-client · 0689acfa

Linus Torvalds authored Nov 08, 2019

Pull ceph fixes from Ilya Dryomov:
 "Some late-breaking dentry handling fixes from Al and Jeff, a patch to
  further restrict copy_file_range() to avoid potential data corruption
  from Luis and a fix for !CONFIG_CEPH_FSCACHE kernels.

  Everything but the fscache fix is marked for stable"

* tag 'ceph-for-5.4-rc7' of git://github.com/ceph/ceph-client:
  ceph: return -EINVAL if given fsc mount option on kernel w/o support
  ceph: don't allow copy_file_range when stripe_count != 1
  ceph: don't try to handle hashed dentries in non-O_CREAT atomic_open
  ceph: add missing check in d_revalidate snapdir handling
  ceph: fix RCU case handling in ceph_d_revalidate()
  ceph: fix use-after-free in __ceph_remove_cap()

0689acfa

vsock/virtio: fix sock refcnt holding during the shutdown · ad8a7220

Stefano Garzarella authored Nov 08, 2019

The "42f5cda5" commit rightly set SOCK_DONE on peer shutdown,
but there is an issue if we receive the SHUTDOWN(RDWR) while the
virtio_transport_close_timeout() is scheduled.
In this case, when the timeout fires, the SOCK_DONE is already
set and the virtio_transport_close_timeout() will not call
virtio_transport_reset() and virtio_transport_do_close().
This causes that both sockets remain open and will never be released,
preventing the unloading of [virtio|vhost]_transport modules.

This patch fixes this issue, calling virtio_transport_reset() and
virtio_transport_do_close() when we receive the SHUTDOWN(RDWR)
and there is nothing left to read.

Fixes: 42f5cda5 ("vsock/virtio: set SOCK_DONE on peer shutdown")
Cc: Stephen Barber <smbarber@chromium.org>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ad8a7220

Merge tag 'mac80211-for-net-2019-11-08' of... · b05f5b4a

David S. Miller authored Nov 08, 2019

Merge tag 'mac80211-for-net-2019-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
Three small fixes:
 * we hit a failure path bug related to
   ieee80211_txq_setup_flows()
 * also use kvmalloc() to make that less likely
 * fix a timing value shortly after boot (during
   INITIAL_JIFFIES)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

b05f5b4a

net: ethernet: octeon_mgmt: Account for second possible VLAN header · e4dd5608

Alexander Sverdlin authored Nov 08, 2019

Octeon's input ring-buffer entry has 14 bits-wide size field, so to account
for second possible VLAN header max_mtu must be further reduced.

Fixes: 109cc165 ("ethernet/cavium: use core min/max MTU checking")
Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e4dd5608

Merge tag 'modules-for-v5.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux · 6737e763

Linus Torvalds authored Nov 08, 2019

Pull modules fix from Jessica Yu:
 "Fix `make nsdeps` for modules composed of multiple source files.

  Since $mod_source_files was not in quotes in the call to
  generate_deps_for_ns(), not all the source files for a module were
  being passed to spatch"

* tag 'modules-for-v5.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
  scripts/nsdeps: make sure to pass all module source files to spatch

6737e763

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · 9e8ed26e

Linus Torvalds authored Nov 08, 2019

Pull arm64 fix from Will Deacon:
 "Fix pte_same() to avoid getting stuck on write fault.

  This single arm64 fix is a revert of 747a70e6 ("arm64: Fix
  copy-on-write referencing in HugeTLB"), not because that patch was
  wrong, but because it was broken by aa57157b ("arm64: Ensure
  VM_WRITE|VM_SHARED ptes are clean by default") which we merged in
  -rc6.

  We spotted the issue in Android (AOSP), where one of the JIT threads
  gets stuck on a write fault during boot because the faulting pte is
  marked as PTE_DIRTY | PTE_WRITE | PTE_RDONLY and the fault handler
  decides that there's nothing to do thanks to pte_same() masking out
  PTE_RDONLY.

  Thanks to John Stultz for reporting this and testing this so quickly,
  and to Steve Capper for confirming that the HugeTLB tests continue to
  pass"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: Do not mask out PTE_RDONLY in pte_same()

9e8ed26e