Commits · b1552a4c839ee27d362d35a90300326dd6f7abdf · Kirill Smelkov / linux

25 Jan, 2022 19 commits

ionic: remove the dbid_inuse bitmap · b1552a4c

Shannon Nelson authored Jan 24, 2022

The dbid_inuse bitmap is not useful in this driver so remove it.

Fixes: 6461b446 ("ionic: Add interrupts and doorbells")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

b1552a4c

ionic: disable napi when ionic_lif_init() fails · 43cfed71

Brett Creeley authored Jan 24, 2022

When the driver is going through reset, it will eventually call
ionic_lif_init(), which does a lot of re-initialization. One
of the re-initialization steps is to setup the adminq and
enable napi for it.  If something breaks after this point
we can end up with a kernel NULL pointer dereference through
ionic_adminq_napi.

Fix this by making sure to call napi_disable() in the cleanup
path of ionic_lif_init().  This forces any pending napi contexts
to finish and prevents them from being recalled before deleting
the napi context.

Fixes: 77ceb68e ("ionic: Add notifyq support")
Signed-off-by: Brett Creeley <brett@pensando.io>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

43cfed71

ionic: Cleanups in the Tx hotpath code · 238a0f7c

Brett Creeley authored Jan 24, 2022

Buffer DMA mapping happens in ionic_tx_map_skb() and this function is
called from ionic_tx() and ionic_tx_tso(). If ionic_tx_map_skb()
succeeds, but a failure is encountered later in ionic_tx() or
ionic_tx_tso() we aren't unmapping the buffers. This can be fixed in
ionic_tx() by changing functions it calls to return void because they
always return 0. For ionic_tx_tso(), there's an actual possibility that
we leave the buffers mapped, so fix this by introducing the helper
function ionic_tx_desc_unmap_bufs(). This function is also re-used
in ionic_tx_clean().

Fixes: 0f3154e6 ("ionic: Add Tx and Rx handling")
Signed-off-by: Brett Creeley <brett@pensando.io>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

238a0f7c

ionic: Prevent filter add/del err msgs when the device is not available · 584fb767

Brett Creeley authored Jan 24, 2022

Currently when a request for add/deleting a filter is made when
ionic_heartbeat_check() returns failure the driver will be overly
verbose about failures, especially when these are usually temporary
fails and the request will be retried later. An example of this is
a filter add when the FW is in the middle of resetting:

IONIC_CMD_RX_FILTER_ADD (31) failed: IONIC_RC_ERROR (-6)
rx_filter add failed: ADDR 01:80:c2:00:00:0e

Fix this by checking for -ENXIO and other error values on filter
request fails before printing the error message.  Add similar
checking to the delete filter code.

Fixes: f91958cc ("ionic: tame the filter no space message")
Signed-off-by: Brett Creeley <brett@pensando.io>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

584fb767

ionic: Query FW when getting VF info via ndo_get_vf_config · f16f5be3

Brett Creeley authored Jan 24, 2022

Currently when an administrator configures a VF via ndo_set_vf*,
the driver will send the set command to FW and then update the
cached value.  The cached value is then used when reporting
VF info via ndo_get_vf_config.

A problem is that the VF info may have been updated between
the last ndo_set_vf* and ndo_get_vf_info commands via some
other method, i.e. a VF changes its MAC address (assuming it's
allowed to do so) and since this is all managed by the FW,
this new value won't be reflected in the PF's cache of values.

To fix this, update the driver to always get the latest VF
information by making use of the IONIC_CMD_VF_GETATTR dev
command. The FW may not support getting all the attributes for
IONIC_CMD_VF_GETATTR, so the driver will only update the cached
VF config members if their associated IONIC_CMD_VF_GETATTR
was successful. Otherwise the cached VF config members will
remain the same as what was set in ndo_set_vf*.

Fixes: fbb39807 ("ionic: support sr-iov operations")
Signed-off-by: Brett Creeley <brett@pensando.io>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

f16f5be3

ionic: Allow flexibility for error reporting on dev commands · b640b552

Brett Creeley authored Jan 24, 2022

When dev commands fail, an error message will always be printed,
which may be overly alarming the to system administrators,
especially if the driver shouldn't be printing the error due
to some unsupported capability.

Similar to recent adminq request changes, we can update the
dev command interface with the ability to selectively print
error messages to allow the driver to prevent printing errors
that are expected.
Signed-off-by: Brett Creeley <brett@pensando.io>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

b640b552

ionic: Correctly print AQ errors if completions aren't received · bc43ed4f

Brett Creeley authored Jan 24, 2022

Recent changes went into the driver to allow flexibility when
printing error messages. Unfortunately this had the unexpected
consequence of printing confusing messages like the following:

IONIC_CMD_RX_FILTER_ADD (31) failed: IONIC_RC_SUCCESS (-6)

In cases like this the completion of the admin queue command never
completes, so the completion status is 0, hence IONIC_RC_SUCCESS
is printed even though the command clearly failed. For example,
this could happen when the driver tries to add a filter and at
the same time the FW goes through a reset, so the AQ command
never completes.

Fix this by forcing the FW completion status to IONIC_RC_ERROR
in cases where we never get the completion.

Fixes: 8c9d956a ("ionic: allow adminq requests to override default error message")
Signed-off-by: Brett Creeley <brett@pensando.io>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

bc43ed4f

ionic: fix up printing of timeout error · 4cc787bd

Shannon Nelson authored Jan 24, 2022

Make sure we print the TIMEOUT string if we had a timeout
error, rather than printing the wrong status.

Fixes: 8c9d956a ("ionic: allow adminq requests to override default error message")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

4cc787bd

ionic: better handling of RESET event · abd75d14

Shannon Nelson authored Jan 24, 2022

When IONIC_EVENT_RESET is received, we only need to start the
fw_down process if we aren't already down, and we need to be
sure to set the FW_STOPPING state on the way.

If this is how we noticed that FW was stopped, it is most
likely from a FW update, and we'll see a new FW generation.
The update happens quickly enough that we might not see
fw_status==0, so we need to be sure things get restarted when
we see the fw_generation change.

Fixes: d2662072 ("ionic: monitor fw status generation")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

abd75d14

ionic: add FW_STOPPING state · 398d1e37

Shannon Nelson authored Jan 24, 2022

Between fw running and fw actually stopped into reset, we need
a fw_stopping concept to catch and block some actions while
we're transitioning to FW_RESET state.  This will help to be
sure the fw_up task is not scheduled until after the fw_down
task has completed.

On some rare occasion timing, it is possible for the fw_up task
to try to run before the fw_down task, then not get run after
the fw_down task has run, leaving the device in a down state.
This is possible if the watchdog goes off in between finding the
down transition and starting the fw_down task, where the later
watchdog sees the FW is back up and schedules a fw_up task.

Fixes: c672412f ("ionic: remove lifs on fw reset")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

398d1e37

ionic: Don't send reset commands if FW isn't running · b8fd0271

Brett Creeley authored Jan 24, 2022

It's possible the FW is already shutting down while the driver is being
removed and/or when the driver is going through reset. This can cause
unexpected/unnecessary errors to be printed:

eth0: DEV_CMD IONIC_CMD_PORT_RESET (12) error, IONIC_RC_ERROR (29) failed
eth1: DEV_CMD IONIC_CMD_RESET (3) error, IONIC_RC_ERROR (29) failed

Fix this by checking the FW status register before issuing the reset
commands.

Also, since err may not be assigned in ionic_port_reset(), assign it a
default value of 0, and remove an unnecessary log message.

Fixes: fbfb8031 ("ionic: Add hardware init and device commands")
Signed-off-by: Brett Creeley <brett@pensando.io>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

b8fd0271

ionic: separate function for watchdog init · e6958cef

Shannon Nelson authored Jan 24, 2022

Pull the watchdog init code out to a separate bite-sized
function.  Code cleaning for now, will be a useful change in
the near future.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

e6958cef

ionic: start watchdog after all is setup · 9ad2939a

Shannon Nelson authored Jan 24, 2022

The watchdog expects the lif to fully exist when it goes off,
so lets not start the watchdog until all is ready in case there
is some quirky time dialation that makes probe take multiple
seconds.

Fixes: 089406bc ("ionic: add a watchdog timer to monitor heartbeat")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

9ad2939a

ionic: fix type complaint in ionic_dev_cmd_clean() · bc0bf9de

Shannon Nelson authored Jan 24, 2022

Sparse seems to have gotten a little more picky lately and
we need to revisit this bit of code to make sparse happy.

warning: incorrect type in initializer (different address spaces)
   expected union ionic_dev_cmd_regs *regs
   got union ionic_dev_cmd_regs [noderef] __iomem *dev_cmd_regs
warning: incorrect type in argument 2 (different address spaces)
   expected void [noderef] __iomem *
   got unsigned int *
warning: incorrect type in argument 1 (different address spaces)
   expected void volatile [noderef] __iomem *
   got union ionic_dev_cmd *

Fixes: d701ec32 ("ionic: clean up sparse complaints")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

bc0bf9de

ipv4: get rid of fib_info_hash_{alloc|free} · ca73b68a

Eric Dumazet authored Jan 24, 2022

Use kvzalloc()/kvfree() instead of hand coded functions.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ca73b68a

ip6_tunnel: allow routing IPv4 traffic in NBMA mode · c1f55c5e

Qing Deng authored Jan 23, 2022

Since IPv4 routes support IPv6 gateways now, we can route IPv4 traffic in
NBMA tunnels.
Signed-off-by: Qing Deng <i@moy.cat>
Signed-off-by: David S. Miller <davem@davemloft.net>

c1f55c5e

net: use bool values to pass bool param of phy_init_eee() · 53243d41

Jisheng Zhang authored Jan 23, 2022

The 2nd param of phy_init_eee(): clk_stop_enable is a bool param, use
true or false instead of 1/0.
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20220123152241.1480-1-jszhang@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

53243d41

net: fec_ptp: remove redundant initialization of variable val · 6e667749

Colin Ian King authored Jan 23, 2022

Variable val is being initialized with a value that is never read,
it is being re-assigned later. The assignment is redundant and
can be removed.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Link: https://lore.kernel.org/r/20220123184936.113486-1-colin.i.king@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

6e667749

net: usb: asix: remove redundant assignment to variable reg · 9f16e0fa

Colin Ian King authored Jan 23, 2022

Variable reg is being masked however the variable is never read
after this. The assignment is redundant and can be removed.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://lore.kernel.org/r/20220123184035.112785-1-colin.i.king@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

9f16e0fa

24 Jan, 2022 21 commits

Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · caaba961

Jakub Kicinski authored Jan 24, 2022

Daniel Borkmann says:

====================
pull-request: bpf-next 2022-01-24

We've added 80 non-merge commits during the last 14 day(s) which contain
a total of 128 files changed, 4990 insertions(+), 895 deletions(-).

The main changes are:

1) Add XDP multi-buffer support and implement it for the mvneta driver,
   from Lorenzo Bianconi, Eelco Chaudron and Toke Høiland-Jørgensen.

2) Add unstable conntrack lookup helpers for BPF by using the BPF kfunc
   infra, from Kumar Kartikeya Dwivedi.

3) Extend BPF cgroup programs to export custom ret value to userspace via
   two helpers bpf_get_retval() and bpf_set_retval(), from YiFei Zhu.

4) Add support for AF_UNIX iterator batching, from Kuniyuki Iwashima.

5) Complete missing UAPI BPF helper description and change bpf_doc.py script
   to enforce consistent & complete helper documentation, from Usama Arif.

6) Deprecate libbpf's legacy BPF map definitions and streamline XDP APIs to
   follow tc-based APIs, from Andrii Nakryiko.

7) Support BPF_PROG_QUERY for BPF programs attached to sockmap, from Di Zhu.

8) Deprecate libbpf's bpf_map__def() API and replace users with proper getters
   and setters, from Christy Lee.

9) Extend libbpf's btf__add_btf() with an additional hashmap for strings to
   reduce overhead, from Kui-Feng Lee.

10) Fix bpftool and libbpf error handling related to libbpf's hashmap__new()
    utility function, from Mauricio Vásquez.

11) Add support to BTF program names in bpftool's program dump, from Raman Shukhau.

12) Fix resolve_btfids build to pick up host flags, from Connor O'Brien.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (80 commits)
  selftests, bpf: Do not yet switch to new libbpf XDP APIs
  selftests, xsk: Fix rx_full stats test
  bpf: Fix flexible_array.cocci warnings
  xdp: disable XDP_REDIRECT for xdp frags
  bpf: selftests: add CPUMAP/DEVMAP selftests for xdp frags
  bpf: selftests: introduce bpf_xdp_{load,store}_bytes selftest
  net: xdp: introduce bpf_xdp_pointer utility routine
  bpf: generalise tail call map compatibility check
  libbpf: Add SEC name for xdp frags programs
  bpf: selftests: update xdp_adjust_tail selftest to include xdp frags
  bpf: test_run: add xdp_shared_info pointer in bpf_test_finish signature
  bpf: introduce frags support to bpf_prog_test_run_xdp()
  bpf: move user_size out of bpf_test_init
  bpf: add frags support to xdp copy helpers
  bpf: add frags support to the bpf_xdp_adjust_tail() API
  bpf: introduce bpf_xdp_get_buff_len helper
  net: mvneta: enable jumbo frames if the loaded XDP program support frags
  bpf: introduce BPF_F_XDP_HAS_FRAGS flag in prog_flags loading the ebpf program
  net: mvneta: add frags support to XDP_TX
  xdp: add frags support to xdp_return_{buff/frame}
  ...
====================

Link: https://lore.kernel.org/r/20220124221235.18993-1-daniel@iogearbox.netSigned-off-by: Jakub Kicinski <kuba@kernel.org>

caaba961

selftests, bpf: Do not yet switch to new libbpf XDP APIs · 0bfb95f5

Daniel Borkmann authored Jan 24, 2022

Revert commit 54435652 ("selftests/bpf: switch to new libbpf XDP APIs")
for now given this will heavily conflict with 4b27480d ("bpf/selftests:
convert xdp_link test to ASSERT_* macros") upon merge. Andrii agreed to redo
the conversion cleanly after trees merged.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>

0bfb95f5

Merge tag 'linux-can-fixes-for-5.17-20220124' of... · e52984be

Jakub Kicinski authored Jan 24, 2022

Merge tag 'linux-can-fixes-for-5.17-20220124' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2022-01-24

The first patch updates the email address of Brian Silverman from his
former employer to his private address.

The next patch fixes DT bindings information for the tcan4x5x SPI CAN
driver.

The following patch targets the m_can driver and fixes the
introduction of FIFO bulk read support.

Another patch for the tcan4x5x driver, which fixes the max register
value for the regmap config.

The last patch for the flexcan driver marks the RX mailbox support for
the MCF5441X as support.

* tag 'linux-can-fixes-for-5.17-20220124' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
  can: flexcan: mark RX via mailboxes as supported on MCF5441X
  can: tcan4x5x: regmap: fix max register value
  can: m_can: m_can_fifo_{read,write}: don't read or write from/to FIFO if length is 0
  dt-bindings: can: tcan4x5x: fix mram-cfg RX FIFO config
  mailmap: update email address of Brian Silverman
====================

Link: https://lore.kernel.org/r/20220124175955.3464134-1-mkl@pengutronix.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>

e52984be

can: flexcan: mark RX via mailboxes as supported on MCF5441X · f04aefd4

Marc Kleine-Budde authored Jan 21, 2022

Most flexcan IP cores support 2 RX modes:
- FIFO
- mailbox

The flexcan IP core on the MCF5441X cannot receive CAN RTR messages
via mailboxes. However the mailbox mode is more performant. The commit

| 1c45f577 ("can: flexcan: add ethtool support to change rx-rtr setting during runtime")

added support to switch from FIFO to mailbox mode on these cores.

After testing the mailbox mode on the MCF5441X by Angelo Dureghello,
this patch marks it (without RTR capability) as supported. Further the
IP core overview table is updated, that RTR reception via mailboxes is
not supported.

Link: https://lore.kernel.org/all/20220121084425.3141218-1-mkl@pengutronix.deTested-by: Angelo Dureghello <angelo@kernel-space.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

f04aefd4

can: tcan4x5x: regmap: fix max register value · e59986de

Marc Kleine-Budde authored Jan 14, 2022

The MRAM of the tcan4x5x has a size of 2K and starts at 0x8000. There
are no further registers in the tcan4x5x making 0x87fc the biggest
addressable register.

This patch fixes the max register value of the regmap config from
0x8ffc to 0x87fc.

Fixes: 6e1caaf8 ("can: tcan4x5x: fix max register value")
Link: https://lore.kernel.org/all/20220119064011.2943292-1-mkl@pengutronix.deSigned-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

e59986de

can: m_can: m_can_fifo_{read,write}: don't read or write from/to FIFO if length is 0 · db72589c

Marc Kleine-Budde authored Jan 14, 2022

In order to optimize FIFO access, especially on m_can cores attached
to slow busses like SPI, in patch

| e3938177 ("can: m_can: Disable IRQs on FIFO bus errors")

bulk read/write support has been added to the m_can_fifo_{read,write}
functions.

That change leads to the tcan driver to call
regmap_bulk_{read,write}() with a length of 0 (for CAN frames with 0
data length). regmap treats this as an error:

| tcan4x5x spi1.0 tcan4x5x0: FIFO write returned -22

This patch fixes the problem by not calling the
cdev->ops->{read,write)_fifo() in case of a 0 length read/write.

Fixes: e3938177 ("can: m_can: Disable IRQs on FIFO bus errors")
Link: https://lore.kernel.org/all/20220114155751.2651888-1-mkl@pengutronix.de
Cc: stable@vger.kernel.org
Cc: Matt Kline <matt@bitbashing.io>
Cc: Chandrasekar Ramakrishnan <rcsekar@samsung.com>
Reported-by: Michael Anochin <anochin@photo-meter.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

db72589c

dt-bindings: can: tcan4x5x: fix mram-cfg RX FIFO config · 17a30422

Marc Kleine-Budde authored Jan 14, 2022

This tcan4x5x only comes with 2K of MRAM, a RX FIFO with a dept of 32
doesn't fit into the MRAM. Use a depth of 16 instead.

Fixes: 4edd396a ("dt-bindings: can: tcan4x5x: Add DT bindings for TCAN4x5X driver")
Link: https://lore.kernel.org/all/20220119062951.2939851-1-mkl@pengutronix.deSigned-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

17a30422

mailmap: update email address of Brian Silverman · 984d1eff

Marc Kleine-Budde authored Jan 10, 2022

Brian Silverman's address at bluerivertech.com is not valid anymore,
use Brian's private email address instead.

Link: https://lore.kernel.org/all/20220110082359.2019735-1-mkl@pengutronix.de
Cc: Brian Silverman <bsilver16384@gmail.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

984d1eff

selftests, xsk: Fix rx_full stats test · b4ec6a19

Magnus Karlsson authored Jan 21, 2022

Fix the rx_full stats test so that it correctly reports pass even when
the fill ring is not full of buffers.

Fixes: 872a1184 ("selftests: xsk: Put the same buffer only once in the fill ring")
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/bpf/20220121123508.12759-1-magnus.karlsson@gmail.com

b4ec6a19

bpf: Fix flexible_array.cocci warnings · ed8bb032

kernel test robot authored Jan 22, 2022

Zero-length and one-element arrays are deprecated, see:
Documentation/process/deprecated.rst

Flexible-array members should be used instead.

Generated by: scripts/coccinelle/misc/flexible_array.cocci

Fixes: c1ff181f ("selftests/bpf: Extend kfunc selftests")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: kernel test robot <lkp@intel.com>
Signed-off-by: Julia Lawall <julia.lawall@inria.fr>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/bpf/alpine.DEB.2.22.394.2201221206320.12220@hadrien

ed8bb032

net: stmmac: remove unused members in struct stmmac_priv · de8a820d

Jisheng Zhang authored Jan 23, 2022

The tx_coalesce and mii_irq are not used at all now, so remove them.
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

de8a820d

net: atlantic: Use the bitmap API instead of hand-writing it · ebe0582b

Christophe JAILLET authored Jan 23, 2022

Simplify code by using bitmap_weight() and bitmap_zero() instead of
hand-writing these functions.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ebe0582b

ping: fix the sk_bound_dev_if match in ping_lookup · 2afc3b5a

Xin Long authored Jan 22, 2022

When 'ping' changes to use PING socket instead of RAW socket by:

   # sysctl -w net.ipv4.ping_group_range="0 100"

the selftests 'router_broadcast.sh' will fail, as such command

  # ip vrf exec vrf-h1 ping -I veth0 198.51.100.255 -b

can't receive the response skb by the PING socket. It's caused by mismatch
of sk_bound_dev_if and dif in ping_rcv() when looking up the PING socket,
as dif is vrf-h1 if dif's master was set to vrf-h1.

This patch is to fix this regression by also checking the sk_bound_dev_if
against sdif so that the packets can stil be received even if the socket
is not bound to the vrf device but to the real iif.

Fixes: c319b4d7 ("net: ipv4: add IPPROTO_ICMP socket kind")
Reported-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2afc3b5a

net/smc: Transitional solution for clcsock race issue · c0bf3d8a

Wen Gu authored Jan 22, 2022

We encountered a crash in smc_setsockopt() and it is caused by
accessing smc->clcsock after clcsock was released.

 BUG: kernel NULL pointer dereference, address: 0000000000000020
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: 0000 [#1] PREEMPT SMP PTI
 CPU: 1 PID: 50309 Comm: nginx Kdump: loaded Tainted: G E     5.16.0-rc4+ #53
 RIP: 0010:smc_setsockopt+0x59/0x280 [smc]
 Call Trace:
  <TASK>
  __sys_setsockopt+0xfc/0x190
  __x64_sys_setsockopt+0x20/0x30
  do_syscall_64+0x34/0x90
  entry_SYSCALL_64_after_hwframe+0x44/0xae
 RIP: 0033:0x7f16ba83918e
  </TASK>

This patch tries to fix it by holding clcsock_release_lock and
checking whether clcsock has already been released before access.

In case that a crash of the same reason happens in smc_getsockopt()
or smc_switch_to_fallback(), this patch also checkes smc->clcsock
in them too. And the caller of smc_switch_to_fallback() will identify
whether fallback succeeds according to the return value.

Fixes: fd57770d ("net/smc: wait for pending work before clcsock release_sock")
Link: https://lore.kernel.org/lkml/5dd7ffd1-28e2-24cc-9442-1defec27375e@linux.ibm.com/T/Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Acked-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c0bf3d8a

ibmvnic: remove unused ->wait_capability · 3a5d9db7

Sukadev Bhattiprolu authored Jan 21, 2022

With previous bug fix, ->wait_capability flag is no longer needed and can
be removed.

Fixes: 249168ad ("ibmvnic: Make CRQ interrupt tasklet wait for all capabilities crqs")
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Reviewed-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3a5d9db7

ibmvnic: don't spin in tasklet · 48079e7f

Sukadev Bhattiprolu authored Jan 21, 2022

ibmvnic_tasklet() continuously spins waiting for responses to all
capability requests. It does this to avoid encountering an error
during initialization of the vnic. However if there is a bug in the
VIOS and we do not receive a response to one or more queries the
tasklet ends up spinning continuously leading to hard lock ups.

If we fail to receive a message from the VIOS it is reasonable to
timeout the login attempt rather than spin indefinitely in the tasklet.

Fixes: 249168ad ("ibmvnic: Make CRQ interrupt tasklet wait for all capabilities crqs")
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Reviewed-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

48079e7f

ibmvnic: init ->running_cap_crqs early · 151b6a5c

Sukadev Bhattiprolu authored Jan 21, 2022

We use ->running_cap_crqs to determine when the ibmvnic_tasklet() should
send out the next protocol message type. i.e when we get back responses
to all our QUERY_CAPABILITY CRQs we send out REQUEST_CAPABILITY crqs.
Similiary, when we get responses to all the REQUEST_CAPABILITY crqs, we
send out the QUERY_IP_OFFLOAD CRQ.

We currently increment ->running_cap_crqs as we send out each CRQ and
have the ibmvnic_tasklet() send out the next message type, when this
running_cap_crqs count drops to 0.

This assumes that all the CRQs of the current type were sent out before
the count drops to 0. However it is possible that we send out say 6 CRQs,
get preempted and receive all the 6 responses before we send out the
remaining CRQs. This can result in ->running_cap_crqs count dropping to
zero before all messages of the current type were sent and we end up
sending the next protocol message too early.

Instead initialize the ->running_cap_crqs upfront so the tasklet will
only send the next protocol message after all responses are received.

Use the cap_reqs local variable to also detect any discrepancy (either
now or in future) in the number of capability requests we actually send.

Currently only send_query_cap() is affected by this behavior (of sending
next message early) since it is called from the worker thread (during
reset) and from application thread (during ->ndo_open()) and they can be
preempted. send_request_cap() is only called from the tasklet which
processes CRQ responses sequentially, is not be affected. But to
maintain the existing symmtery with send_query_capability() we update
send_request_capability() also.

151b6a5c

ibmvnic: Allow extra failures before disabling · db9f0e8b

Sukadev Bhattiprolu authored Jan 21, 2022

If auto-priority-failover (APF) is enabled and there are at least two
backing devices of different priorities, some resets like fail-over,
change-param etc can cause at least two back to back failovers. (Failover
from high priority backing device to lower priority one and then back
to the higher priority one if that is still functional).

Depending on the timimg of the two failovers it is possible to trigger
a "hard" reset and for the hard reset to fail due to failovers. When this
occurs, the driver assumes that the network is unstable and disables the
VNIC for a 60-second "settling time". This in turn can cause the ethtool
command to fail with "No such device" while the vnic automatically recovers
a little while later.

Given that it's possible to have two back to back failures, allow for extra
failures before disabling the vnic for the settling time.

Fixes: f15fde9d ("ibmvnic: delay next reset if hard reset fails")
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Reviewed-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

db9f0e8b

ipv4: fix ip option filtering for locally generated fragments · 27a8caa5

Jakub Kicinski authored Jan 21, 2022

During IP fragmentation we sanitize IP options. This means overwriting
options which should not be copied with NOPs. Only the first fragment
has the original, full options.

ip_fraglist_prepare() copies the IP header and options from previous
fragment to the next one. Commit 19c3401a ("net: ipv4: place control
buffer handling away from fragmentation iterators") moved sanitizing
options before ip_fraglist_prepare() which means options are sanitized
and then overwritten again with the old values.

Fixing this is not enough, however, nor did the sanitization work
prior to aforementioned commit.

ip_options_fragment() (which does the sanitization) uses ipcb->opt.optlen
for the length of the options. ipcb->opt of fragments is not populated
(it's 0), only the head skb has the state properly built. So even when
called at the right time ip_options_fragment() does nothing. This seems
to date back all the way to v2.5.44 when the fast path for pre-fragmented
skbs had been introduced. Prior to that ip_options_build() would have been
called for every fragment (in fact ever since v2.5.44 the fragmentation
handing in ip_options_build() has been dead code, I'll clean it up in
-next).

In the original patch (see Link) caixf mentions fixing the handling
for fragments other than the second one, but I'm not sure how _any_
fragment could have had their options sanitized with the code
as it stood.

Tested with python (MTU on lo lowered to 1000 to force fragmentation):

import socket
s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
s.setsockopt(socket.IPPROTO_IP, socket.IP_OPTIONS,
bytearray([7,4,5,192, 20|0x80,4,1,0]))
s.sendto(b'1'*2000, ('127.0.0.1', 1234))

Before:

IP (tos 0x0, ttl 64, id 1053, offset 0, flags [+], proto UDP (17), length 996, options (RR [bad length 4] [bad ptr 5] 192.148.4.1,,RA value 256))
localhost.36500 > localhost.search-agent: UDP, length 2000
IP (tos 0x0, ttl 64, id 1053, offset 968, flags [+], proto UDP (17), length 996, options (RR [bad length 4] [bad ptr 5] 192.148.4.1,,RA value 256))
localhost > localhost: udp
IP (tos 0x0, ttl 64, id 1053, offset 1936, flags [none], proto UDP (17), length 100, options (RR [bad length 4] [bad ptr 5] 192.148.4.1,,RA value 256))
localhost > localhost: udp

After:

IP (tos 0x0, ttl 96, id 42549, offset 0, flags [+], proto UDP (17), length 996, options (RR [bad length 4] [bad ptr 5] 192.148.4.1,,RA value 256))
localhost.51607 > localhost.search-agent: UDP, bad length 2000 > 960
IP (tos 0x0, ttl 96, id 42549, offset 968, flags [+], proto UDP (17), length 996, options (NOP,NOP,NOP,NOP,RA value 256))
localhost > localhost: udp
IP (tos 0x0, ttl 96, id 42549, offset 1936, flags [none], proto UDP (17), length 100, options (NOP,NOP,NOP,NOP,RA value 256))
localhost > localhost: udp

RA (20 | 0x80) is now copied as expected, RR (7) is "NOPed out".

Link: https://lore.kernel.org/netdev/20220107080559.122713-1-ooppublic@163.com/
Fixes: 19c3401a ("net: ipv4: place control buffer handling away from fragmentation iterators")
Fixes: 1da177e4 ("Linux-2.6.12-rc2")
Signed-off-by: caixf <ooppublic@163.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

27a8caa5

net-procfs: show net devices bound packet types · 1d10f8a1

Jianguo Wu authored Jan 21, 2022

After commit:7866a621 ("dev: add per net_device packet type chains"),
we can not get packet types that are bound to a specified net device by
/proc/net/ptype, this patch fix the regression.

Run "tcpdump -i ens192 udp -nns0" Before and after apply this patch:

Before:
  [root@localhost ~]# cat /proc/net/ptype
  Type Device      Function
  0800          ip_rcv
  0806          arp_rcv
  86dd          ipv6_rcv

After:
  [root@localhost ~]# cat /proc/net/ptype
  Type Device      Function
  ALL  ens192   tpacket_rcv
  0800          ip_rcv
  0806          arp_rcv
  86dd          ipv6_rcv

v1 -> v2:
  - fix the regression rather than adding new /proc API as
    suggested by Stephen Hemminger.

Fixes: 7866a621 ("dev: add per net_device packet type chains")
Signed-off-by: Jianguo Wu <wujianguo@chinatelecom.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>

1d10f8a1

bonding: use rcu_dereference_rtnl when get bonding active slave · aa603467

Hangbin Liu authored Jan 21, 2022

bond_option_active_slave_get_rcu() should not be used in rtnl_mutex as it
use rcu_dereference(). Replace to rcu_dereference_rtnl() so we also can use
this function in rtnl protected context.

With this update, we can rmeove the rcu_read_lock/unlock in
bonding .ndo_eth_ioctl and .get_ts_info.
Reported-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Fixes: 94dd016a ("bond: pass get_ts_info and SIOC[SG]HWTSTAMP ioctl to active device")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

aa603467