Commits · 2e7256f12cdb16eaa2515b6231d665044a07c51a · Kirill Smelkov / linux

24 Jun, 2021 22 commits

Sasha Neftin authored Jun 24, 2021

Complete to commit def4ec6d ("e1000e: PCIm function state support")
Check the PCIm state only on CSME systems. There is no point to do this
check on non CSME systems.
This patch fixes a generation a false-positive warning:
"Error in exiting dmoff"

Fixes: def4ec6d ("e1000e: PCIm function state support")
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2e7256f1

Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · fd7ce282

David S. Miller authored Jun 24, 2021

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2021-06-24

This series contains updates to i40e driver only.

Dinghao Liu corrects error handling for failed i40e_vsi_request_irq()
call.

Mateusz allows for disabling of autonegotiation for all BaseT media.

Jesse corrects the multiplier being used on 5Gb speeds for PTP.

Jan adds locking in paths calling i40e_setup_pf_switch() that were
missing it.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

fd7ce282

ipv6: fix out-of-bound access in ip6_parse_tlv() · 624085a3

Eric Dumazet authored Jun 24, 2021

First problem is that optlen is fetched without checking
there is more than one byte to parse.

Fix this by taking care of IPV6_TLV_PAD1 before
fetching optlen (under appropriate sanity checks against len)

Second problem is that IPV6_TLV_PADN checks of zero
padding are performed before the check of remaining length.

Fixes: 1da177e4 ("Linux-2.6.12-rc2")
Fixes: c1412fce ("net/ipv6/exthdrs.c: Strict PadN option checking")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

624085a3

Merge branch 'macsec-key-length' · d9b6d26f

David S. Miller authored Jun 24, 2021

Antoine Tenart says:

====================
net: macsec: fix key length when offloading

The key length used to copy the key to offloading drivers and to store
it is wrong and was working by chance as it matched the default key
length. But using a different key length fails. Fix it by using instead
the max length accepted in uAPI to store the key and the actual key
length when copying it.

This was tested on the MSCC PHY driver but not on the Atlantic MAC
(looking at the code it looks ok, but testing would be appreciated).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

d9b6d26f

net: atlantic: fix the macsec key length · d67fb477

Antoine Tenart authored Jun 24, 2021

The key length used to store the macsec key was set to MACSEC_KEYID_LEN
(16), which is an issue as:
- This was never meant to be the key length.
- The key length can be > 16.

Fix this by using MACSEC_MAX_KEY_LEN instead (the max length accepted in
uAPI).

Fixes: 27736563 ("net: atlantic: MACSec egress offload implementation")
Fixes: 9ff40a75 ("net: atlantic: MACSec ingress offload implementation")
Reported-by: Lior Nahmanson <liorna@nvidia.com>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

d67fb477

net: phy: mscc: fix macsec key length · c309217f

Antoine Tenart authored Jun 24, 2021

The key length used to store the macsec key was set to MACSEC_KEYID_LEN
(16), which is an issue as:
- This was never meant to be the key length.
- The key length can be > 16.

Fix this by using MACSEC_MAX_KEY_LEN instead (the max length accepted in
uAPI).

Fixes: 28c5107a ("net: phy: mscc: macsec support")
Reported-by: Lior Nahmanson <liorna@nvidia.com>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

c309217f

net: macsec: fix the length used to copy the key for offloading · 1f7fe512

Antoine Tenart authored Jun 24, 2021

The key length used when offloading macsec to Ethernet or PHY drivers
was set to MACSEC_KEYID_LEN (16), which is an issue as:
- This was never meant to be the key length.
- The key length can be > 16.

Fix this by using MACSEC_MAX_KEY_LEN to store the key (the max length
accepted in uAPI) and secy->key_len to copy it.

Fixes: 3cf3227a ("net: macsec: hardware offloading infrastructure")
Reported-by: Lior Nahmanson <liorna@nvidia.com>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

1f7fe512

Merge tag 'linux-can-fixes-for-5.13-20210624' of git://git.kernel.org/ · abe90454

David S. Miller authored Jun 24, 2021

pub/scm/linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2021-06-24

this is a pull request of 2 patches for net/master.

The first patch is by Norbert Slusarek and prevent allocation of
filter for optlen == 0 in the j1939 CAN protocol.

The last patch is by Stephane Grosjean and fixes a potential
starvation in the TX path of the peak_pciefd driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

abe90454

Merge branch 'ibmvnic-fixes' · ede285b1

David S. Miller authored Jun 24, 2021

Sukadev Bhattiprolu says:

====================
ibmvnic: Assorted bug fixes

Assorted bug fixes that we tested over the last several weeks.

Thanks to Brian King, Cris Forno, Dany Madden and Rick Lindsley for
reviews and help with testing.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

ede285b1

ibmvnic: parenthesize a check · 154b3b2a

Sukadev Bhattiprolu authored Jun 23, 2021

Parenthesize a check to be more explicit and to fix a sparse warning
seen on some distros.

Fixes: 91dc5d25 ("ibmvnic: fix miscellaneous checks")
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

154b3b2a

ibmvnic: free tx_pool if tso_pool alloc fails · f6ebca8e

Sukadev Bhattiprolu authored Jun 23, 2021

Free tx_pool and clear it, if allocation of tso_pool fails.

release_tx_pools() assumes we have both tx and tso_pools if ->tx_pool is
non-NULL. If allocation of tso_pool fails in init_tx_pools(), the assumption
will not be true and we would end up dereferencing ->tx_buff, ->free_map
fields from a NULL pointer.

Fixes: 3205306c ("ibmvnic: Update TX pool initialization routine")
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f6ebca8e

ibmvnic: set ltb->buff to NULL after freeing · 552a3372

Sukadev Bhattiprolu authored Jun 23, 2021

free_long_term_buff() checks ltb->buff to decide whether we have a long
term buffer to free. So set ltb->buff to NULL afer freeing. While here,
also clear ->map_id, fix up some coding style and log an error.

Fixes: 9c4eaabd ("Check CRQ command return codes")
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

552a3372

ibmvnic: account for bufs already saved in indir_buf · 72368f8b

Sukadev Bhattiprolu authored Jun 23, 2021

This fixes a crash in replenish_rx_pool() when called from ibmvnic_poll()
after a previous call to replenish_rx_pool() encountered an error when
allocating a socket buffer.

Thanks to Rick Lindsley and Dany Madden for helping debug the crash.

Fixes: 4f0b6812 ("ibmvnic: Introduce batched RX buffer descriptor transmission")
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

72368f8b

ibmvnic: clean pending indirect buffs during reset · 65d6470d

Sukadev Bhattiprolu authored Jun 23, 2021

We batch subordinate command response queue (scrq) descriptors that we
need to send to the VIOS using an "indirect" buffer. If after we queue
one or more scrqs in the indirect buffer encounter an error (say fail
to allocate an skb), we leave the queued scrq descriptors in the
indirect buffer until the next call to ibmvnic_xmit().

On the next call to ibmvnic_xmit(), it is possible that the adapter is
going through a reset and it is possible that the long term  buffers
have been unmapped on the VIOS side. If we proceed to flush (send) the
packets that are in the indirect buffer, we will end up using the old
map ids and this can cause the VIOS to trigger an unnecessary FATAL
error reset.

Instead of flushing packets remaining on the indirect_buff, discard
(clean) them instead.

Fixes: 0d973388 ("ibmvnic: Introduce xmit_more support using batched subCRQ hcalls")
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

65d6470d

Revert "ibmvnic: remove duplicate napi_schedule call in open function" · 2ca220f9

Dany Madden authored Jun 23, 2021

This reverts commit 7c451f3e.

When a vnic interface is taken down and then up, connectivity is not
restored. We bisected it to this commit. Reverting this commit until
we can fully investigate the issue/benefit of the change.

Fixes: 7c451f3e ("ibmvnic: remove duplicate napi_schedule call in open function")
Reported-by: Cristobal Forno <cforno12@linux.ibm.com>
Reported-by: Abdul Haleem <abdhalee@in.ibm.com>
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2ca220f9

Revert "ibmvnic: simplify reset_long_term_buff function" · 0ec13aff

Sukadev Bhattiprolu authored Jun 23, 2021

This reverts commit 1c7d45e7.

We tried to optimize the number of hcalls we send and skipped sending
the REQUEST_MAP calls for some maps. However during resets, we need to
resend all the maps to the VIOS since the VIOS does not remember the
old values. In fact we may have failed over to a new VIOS which will
not have any of the mappings.

When we send packets with map ids the VIOS does not know about, it
triggers a FATAL reset. While the client does recover from the FATAL
error reset, we are seeing a large number of such resets. Handling
FATAL resets is lot more unnecessary work than issuing a few more
hcalls so revert the commit and resend the maps to the VIOS.

Fixes: 1c7d45e7 ("ibmvnic: simplify reset_long_term_buff function")
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0ec13aff

i40e: Fix missing rtnl locking when setting up pf switch · 956e759d

Jan Sokolowski authored Jun 11, 2021

A recent change that made i40e use new udp_tunnel infrastructure
uses a method that expects to be called under rtnl lock.

However, not all codepaths do the lock prior to calling
i40e_setup_pf_switch.

Fix that by adding additional rtnl locking and unlocking.

Fixes: 40a98cb6 ("i40e: convert to new udp_tunnel infrastructure")
Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

956e759d

i40e: fix PTP on 5Gb links · 26b0ce8d

Jesse Brandeburg authored May 07, 2021

As reported by Alex Sergeev, the i40e driver is incrementing the PTP
clock at 40Gb speeds when linked at 5Gb. Fix this bug by making
sure that the right multiplier is selected when linked at 5Gb.

Fixes: 3dbdd6c2 ("i40e: Add support for 5Gbps cards")
Cc: stable@vger.kernel.org
Reported-by: Alex Sergeev <asergeev@carbonrobotics.com>
Suggested-by: Alex Sergeev <asergeev@carbonrobotics.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

26b0ce8d

i40e: Fix autoneg disabling for non-10GBaseT links · 9262793e

Mateusz Palczewski authored Mar 10, 2021

Disabling autonegotiation was allowed only for 10GBaseT PHY.
The condition was changed to check if link media type is BaseT.

Fixes: 3ce12ee9 ("i40e: Fix order of checks when enabling/disabling autoneg in ethtool")
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Karen Sornek <karen.sornek@intel.com>
Signed-off-by: Dawid Lukwinski <dawid.lukwinski@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

9262793e

i40e: Fix error handling in i40e_vsi_open · 9c04cfcd

Dinghao Liu authored Feb 28, 2021

When vsi->type == I40E_VSI_FDIR, we have caught the return value of
i40e_vsi_request_irq() but without further handling. Check and execute
memory clean on failure just like the other i40e_vsi_request_irq().

Fixes: 8a9eb7d3 ("i40e: rework fdir setup and teardown")
Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

9c04cfcd

can: peak_pciefd: pucan_handle_status(): fix a potential starvation issue in TX path · b17233d3

Stephane Grosjean authored Jun 23, 2021

Rather than just indicating that transmission can start, this patch
requires the explicit flushing of the network TX queue when the driver
is informed by the device that it can transmit, next to its
configuration.

In this way, if frames have already been written by the application,
they will actually be transmitted.

Fixes: ffd137f7 ("can: peak/pcie_fd: remove useless code when interface starts")
Link: https://lore.kernel.org/r/20210623142600.149904-1-s.grosjean@peak-system.com
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Stephane Grosjean <s.grosjean@peak-system.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

b17233d3

can: j1939: j1939_sk_setsockopt(): prevent allocation of j1939 filter for optlen == 0 · aaf473d0

Norbert Slusarek authored Jun 20, 2021

If optval != NULL and optlen == 0 are specified for SO_J1939_FILTER in
j1939_sk_setsockopt(), memdup_sockptr() will return ZERO_PTR for 0
size allocation. The new filter will be mistakenly assigned ZERO_PTR.
This patch checks for optlen != 0 and filter will be assigned NULL in
case of optlen == 0.

Fixes: 9d71dd0c ("can: add support of SAE J1939 protocol")
Link: https://lore.kernel.org/r/20210620123842.117975-1-nslusarek@gmx.netSigned-off-by: Norbert Slusarek <nslusarek@gmx.net>
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

aaf473d0

23 Jun, 2021 5 commits

Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · c2f5c57d

David S. Miller authored Jun 23, 2021

Daniel Borkmann says:

====================
pull-request: bpf 2021-06-23

The following pull-request contains BPF updates for your *net* tree.

We've added 14 non-merge commits during the last 6 day(s) which contain
a total of 13 files changed, 137 insertions(+), 64 deletions(-).

Note that when you merge net into net-next, there is a small merge conflict
between 9f2470fb ("skmsg: Improve udp_bpf_recvmsg() accuracy") from bpf
with c49661aa ("skmsg: Remove unused parameters of sk_msg_wait_data()")
from net-next. Resolution is to: i) net/ipv4/udp_bpf.c: take udp_msg_wait_data()
and remove err parameter from the function, ii) net/ipv4/tcp_bpf.c: take
tcp_msg_wait_data() and remove err parameter from the function, iii) for
net/core/skmsg.c and include/linux/skmsg.h: remove the sk_msg_wait_data()
implementation and its prototype in header.

The main changes are:

1) Fix BPF poke descriptor adjustments after insn rewrite, from John Fastabend.

2) Fix regression when using BPF_OBJ_GET with non-O_RDWR flags, from Maciej Żenczykowski.

3) Various bug and error handling fixes for UDP-related sock_map, from Cong Wang.

4) Fix patching of vmlinux BTF IDs with correct endianness, from Tony Ambardar.

5) Two fixes for TX descriptor validation in AF_XDP, from Magnus Karlsson.

6) Fix overflow in size calculation for bpf_map_area_alloc(), from Bui Quang Minh.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

c2f5c57d

ipv6: exthdrs: do not blindly use init_net · bcc3f2a8

Eric Dumazet authored Jun 23, 2021

I see no reason why max_dst_opts_cnt and max_hbh_opts_cnt
are fetched from the initial net namespace.

The other sysctls (max_dst_opts_len & max_hbh_opts_len)
are in fact already using the current ns.

Note: it is not clear why ipv6_destopt_rcv() use two ways to
get to the netns :

 1) dev_net(dst->dev)
    Originally used to increment IPSTATS_MIB_INHDRERRORS

 2) dev_net(skb->dev)
     Tom used this variant in his patch.

Maybe this calls to use ipv6_skb_net() instead ?

Fixes: 47d3d7ac ("ipv6: Implement limits on Hop-by-Hop and Destination options")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <tom@quantonium.net>
Cc: Coco Li <lixiaoyan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bcc3f2a8

net: bcmgenet: Fix attaching to PYH failed on RPi 4B · b2ac9800

Jian-Hong Pan authored Jun 23, 2021

The Broadcom UniMAC MDIO bus from mdio-bcm-unimac module comes too late.
So, GENET cannot find the ethernet PHY on UniMAC MDIO bus. This leads
GENET fail to attach the PHY as following log:

bcmgenet fd580000.ethernet: GENET 5.0 EPHY: 0x0000
...
could not attach to PHY
bcmgenet fd580000.ethernet eth0: failed to connect to PHY
uart-pl011 fe201000.serial: no DMA platform data
libphy: bcmgenet MII bus: probed
...
unimac-mdio unimac-mdio.-19: Broadcom UniMAC MDIO bus

This patch adds the soft dependency to load mdio-bcm-unimac module
before genet module to avoid the issue.

Fixes: 9a4e7969 ("net: bcmgenet: utilize generic Broadcom UniMAC MDIO controller driver")
Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=213485Signed-off-by: Jian-Hong Pan <jhp@endlessos.org>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b2ac9800

bonding: allow nesting of bonding device · 4d293fe1

Di Zhu authored Jun 23, 2021

The commit 3c9ef511 ("bonding: avoid adding slave device with
IFF_MASTER flag") fix a crash when add slave device with IFF_MASTER,
but it rejects the scenario of nested bonding device.

As Eric Dumazet described: since there indeed is a usage scenario about
nesting bonding, we should not break it.

So we add a new judgment condition to allow nesting of bonding device.

Fixes: 3c9ef511 ("bonding: avoid adding slave device with IFF_MASTER flag")
Suggested-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: Di Zhu <zhudi21@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4d293fe1

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec · 7c2becf7

David S. Miller authored Jun 23, 2021

Steffen Klassert says:

====================
pull request (net): ipsec 2021-06-23

1) Don't return a mtu smaller than 1280 on IPv6 pmtu discovery.
   From Sabrina Dubroca

2) Fix seqcount rcu-read side in xfrm_policy_lookup_bytype
   for the PREEMPT_RT case. From Varad Gautam.

3) Remove a repeated declaration of xfrm_parse_spi.
   From Shaokun Zhang.

4) IPv4 beet mode can't handle fragments, but IPv6 does.
   commit 68dc022d ("xfrm: BEET mode doesn't support
   fragments for inner packets") handled IPv4 and IPv6
   the same way. Relax the check for IPv6 because fragments
   are possible here. From Xin Long.

5) Memory allocation failures are not reported for
   XFRMA_ENCAP and XFRMA_COADDR in xfrm_state_construct.
   Fix this by moving both cases in front of the function.

6) Fix a missing initialization in the xfrm offload fallback
   fail case for bonding devices. From Ayush Sawal.

Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

7c2becf7

22 Jun, 2021 13 commits

Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · f4b29d2e

David S. Miller authored Jun 22, 2021

Pablo Neira Ayuso says

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

1) Nicolas Dichtel updates MAINTAINERS file to add Netfilter IRC channel.

2) Skip non-IPv6 packets in nft_exthdr.

3) Skip non-TCP packets in nft_osf.

4) Skip non-TCP/UDP packets in nft_tproxy.

5) Memleak in hardware offload infrastructure when counters are used
   for first time in a rule.

6) The VLAN transfer routine must use FLOW_DISSECTOR_KEY_BASIC instead
   of FLOW_DISSECTOR_KEY_CONTROL. Moreover, make a more robust check
   for 802.1q and 802.1ad to restore simple matching on transport
   protocols.

7) Fix bogus EPERM when listing a ruleset when table ownership flag
   is set on.

8) Honor table ownership flag when table is referenced by handle.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

f4b29d2e

bpf: Fix null ptr deref with mixed tail calls and subprogs · 7506d211

John Fastabend authored Jun 16, 2021

The sub-programs prog->aux->poke_tab[] is populated in jit_subprogs() and
then used when emitting 'BPF_JMP|BPF_TAIL_CALL' insn->code from the
individual JITs. The poke_tab[] to use is stored in the insn->imm by
the code adding it to that array slot. The JIT then uses imm to find the
right entry for an individual instruction. In the x86 bpf_jit_comp.c
this is done by calling emit_bpf_tail_call_direct with the poke_tab[]
of the imm value.

However, we observed the below null-ptr-deref when mixing tail call
programs with subprog programs. For this to happen we just need to
mix bpf-2-bpf calls and tailcalls with some extra calls or instructions
that would be patched later by one of the fixup routines. So whats
happening?

Before the fixup_call_args() -- where the jit op is done -- various
code patching is done by do_misc_fixups(). This may increase the
insn count, for example when we patch map_lookup_up using map_gen_lookup
hook. This does two things. First, it means the instruction index,
insn_idx field, of a tail call instruction will move by a 'delta'.

In verifier code,

 struct bpf_jit_poke_descriptor desc = {
  .reason = BPF_POKE_REASON_TAIL_CALL,
  .tail_call.map = BPF_MAP_PTR(aux->map_ptr_state),
  .tail_call.key = bpf_map_key_immediate(aux),
  .insn_idx = i + delta,
 };

Then subprog start values subprog_info[i].start will be updated
with the delta and any poke descriptor index will also be updated
with the delta in adjust_poke_desc(). If we look at the adjust
subprog starts though we see its only adjusted when the delta
occurs before the new instructions,

        /* NOTE: fake 'exit' subprog should be updated as well. */
        for (i = 0; i <= env->subprog_cnt; i++) {
                if (env->subprog_info[i].start <= off)
                        continue;

Earlier subprograms are not changed because their start values
are not moved. But, adjust_poke_desc() does the offset + delta
indiscriminately. The result is poke descriptors are potentially
corrupted.

Then in jit_subprogs() we only populate the poke_tab[]
when the above insn_idx is less than the next subprogram start. From
above we corrupted our insn_idx so we might incorrectly assume a
poke descriptor is not used in a subprogram omitting it from the
subprogram. And finally when the jit runs it does the deref of poke_tab
when emitting the instruction and crashes with below. Because earlier
step omitted the poke descriptor.

The fix is straight forward with above context. Simply move same logic
from adjust_subprog_starts() into adjust_poke_descs() and only adjust
insn_idx when needed.

[   82.396354] bpf_testmod: version magic '5.12.0-rc2alu+ SMP preempt mod_unload ' should be '5.12.0+ SMP preempt mod_unload '
[   82.623001] loop10: detected capacity change from 0 to 8
[   88.487424] ==================================================================
[   88.487438] BUG: KASAN: null-ptr-deref in do_jit+0x184a/0x3290
[   88.487455] Write of size 8 at addr 0000000000000008 by task test_progs/5295
[   88.487471] CPU: 7 PID: 5295 Comm: test_progs Tainted: G          I       5.12.0+ #386
[   88.487483] Hardware name: Dell Inc. Precision 5820 Tower/002KVM, BIOS 1.9.2 01/24/2019
[   88.487490] Call Trace:
[   88.487498]  dump_stack+0x93/0xc2
[   88.487515]  kasan_report.cold+0x5f/0xd8
[   88.487530]  ? do_jit+0x184a/0x3290
[   88.487542]  do_jit+0x184a/0x3290
 ...
[   88.487709]  bpf_int_jit_compile+0x248/0x810
 ...
[   88.487765]  bpf_check+0x3718/0x5140
 ...
[   88.487920]  bpf_prog_load+0xa22/0xf10

Fixes: a748c697 ("bpf: propagate poke descriptors to subprograms")
Reported-by: Jussi Maki <joamaki@gmail.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Daniel Borkmann <daniel@iogearbox.net>

7506d211

net: ti: am65-cpsw-nuss: Fix crash when changing number of TX queues · ce8eb4c7

Vignesh Raghavendra authored Jun 22, 2021

When changing number of TX queues using ethtool:

	# ethtool -L eth0 tx 1
	[  135.301047] Unable to handle kernel paging request at virtual address 00000000af5d0000
	[...]
	[  135.525128] Call trace:
	[  135.525142]  dma_release_from_dev_coherent+0x2c/0xb0
	[  135.525148]  dma_free_attrs+0x54/0xe0
	[  135.525156]  k3_cppi_desc_pool_destroy+0x50/0xa0
	[  135.525164]  am65_cpsw_nuss_remove_tx_chns+0x88/0xdc
	[  135.525171]  am65_cpsw_set_channels+0x3c/0x70
	[...]

This is because k3_cppi_desc_pool_destroy() which is called after
k3_udma_glue_release_tx_chn() in am65_cpsw_nuss_remove_tx_chns()
references struct device that is unregistered at the end of
k3_udma_glue_release_tx_chn()

Therefore the right order is to call k3_cppi_desc_pool_destroy() and
destroy desc pool before calling k3_udma_glue_release_tx_chn().
Fix this throughout the driver.

Fixes: 93a76530 ("net: ethernet: ti: introduce am65x/j721e gigabit eth subsystem driver")
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ce8eb4c7

net: broadcom: bcm4908_enet: reset DMA rings sw indexes properly · ddeacc4f

Rafał Miłecki authored Jun 22, 2021

Resetting software indexes in bcm4908_dma_alloc_buf_descs() is not
enough as it's called during device probe only. Driver resets DMA on
every .ndo_open callback and it's required to reset indexes then.

This fixes inconsistent rings state and stalled traffic after interface
down & up sequence.

Fixes: 4feffead ("net: broadcom: bcm4908enet: add BCM4908 controller driver")
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>

ddeacc4f

net/ipv4: swap flow ports when validating source · c69f114d

Miao Wang authored Jun 22, 2021

When doing source address validation, the flowi4 struct used for
fib_lookup should be in the reverse direction to the given skb.
fl4_dport and fl4_sport returned by fib4_rules_early_flow_dissect
should thus be swapped.

Fixes: 5a847a6e ("net/ipv4: Initialize proto and ports in flow struct")
Signed-off-by: Miao Wang <shankerwangmiao@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

c69f114d

bonding: avoid adding slave device with IFF_MASTER flag · 3c9ef511

Di Zhu authored Jun 22, 2021

The following steps will definitely cause the kernel to crash:
	ip link add vrf1 type vrf table 1
	modprobe bonding.ko max_bonds=1
	echo "+vrf1" >/sys/class/net/bond0/bonding/slaves
	rmmod bonding

The root cause is that: When the VRF is added to the slave device,
it will fail, and some cleaning work will be done. because VRF device
has IFF_MASTER flag, cleanup process  will not clear the IFF_BONDING flag.
Then, when we unload the bonding module, unregister_netdevice_notifier()
will treat the VRF device as a bond master device and treat netdev_priv()
as struct bonding{} which actually is struct net_vrf{}.

By analyzing the processing logic of bond_enslave(), it seems that
it is not allowed to add the slave device with the IFF_MASTER flag, so
we need to add a code check for this situation.
Signed-off-by: Di Zhu <zhudi21@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3c9ef511

ip6_tunnel: fix GRE6 segmentation · a6e3f298

Jakub Kicinski authored Jun 21, 2021

Commit 6c11fbf9 ("ip6_tunnel: add MPLS transmit support")
moved assiging inner_ipproto down from ipxip6_tnl_xmit() to
its callee ip6_tnl_xmit(). The latter is also used by GRE.

Since commit 38720352 ("gre: Use inner_proto to obtain inner
header protocol") GRE had been depending on skb->inner_protocol
during segmentation. It sets it in gre_build_header() and reads
it in gre_gso_segment(). Changes to ip6_tnl_xmit() overwrite
the protocol, resulting in GSO skbs getting dropped.

Note that inner_protocol is a union with inner_ipproto,
GRE uses the former while the change switched it to the latter
(always setting it to just IPPROTO_GRE).

Restore the original location of skb_set_inner_ipproto(),
it is unclear why it was moved in the first place.

Fixes: 6c11fbf9 ("ip6_tunnel: add MPLS transmit support")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tested-by: Vadim Fedorenko <vfedorenko@novek.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>

a6e3f298

Merge branch 'mptcp-fixes' · e596212e

David S. Miller authored Jun 22, 2021

Mat Martineau says:

====================
mptcp: Fixes for v5.13

Here are two MPTCP fixes from Paolo.

Patch 1 fixes some possible connect-time race conditions with
MPTCP-level connection state changes.

Patch 2 deletes a duplicate function declaration.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

e596212e

mptcp: drop duplicate mptcp_setsockopt() declaration · 597dbae7

Paolo Abeni authored Jun 21, 2021

commit 78962489 ("mptcp: add skeleton to sync msk socket
options to subflows") introduced a duplicate declaration of
mptcp_setsockopt(), just drop it.
Reported-by: Florian Westphal <fw@strlen.de>
Fixes: 78962489 ("mptcp: add skeleton to sync msk socket options to subflows")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

597dbae7

mptcp: avoid race on msk state changes · 490274b4

Paolo Abeni authored Jun 21, 2021

The msk socket state is currently updated in a few spots without
owning the msk socket lock itself.

Some of such operations are safe, as they happens before exposing
the msk socket to user-space and can't race with other changes.

A couple of them, at connect time, can actually race with close()
or shutdown(), leaving breaking the socket state machine.

This change addresses the issue moving such update under the msk
socket lock with the usual:

<acquire spinlock>
<check sk lock onwers>
<ev defer to release_cb>

scheme.

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/56
Fixes: 8fd73804 ("mptcp: fallback in case of simultaneous connect")
Fixes: c3c123d1 ("net: mptcp: don't hang in mptcp_sendmsg() after TCP fallback")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

490274b4

bpf: Fix integer overflow in argument calculation for bpf_map_area_alloc · 7dd5d437

Bui Quang Minh authored Jun 13, 2021

In 32-bit architecture, the result of sizeof() is a 32-bit integer so
the expression becomes the multiplication between 2 32-bit integer which
can potentially leads to integer overflow. As a result,
bpf_map_area_alloc() allocates less memory than needed.

Fix this by casting 1 operand to u64.

Fixes: 0d2c4f96 ("bpf: Eliminate rlimit-based memory accounting for sockmap and sockhash maps")
Fixes: 99c51064 ("devmap: Use bpf_map_area_alloc() for allocating hash buckets")
Fixes: 546ac1ff ("bpf: add devmap, a map for storing net device references")
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210613143440.71975-1-minhquangbui99@gmail.com

7dd5d437

sfc: avoid duplicated code in ef10_sriov · 3ddd6e2f

Íñigo Huguet authored Jun 21, 2021

The fail path of efx_ef10_sriov_alloc_vf_vswitching is identical to the
full content of efx_ef10_sriov_free_vf_vswitching, so replace it for a
single call to efx_ef10_sriov_free_vf_vswitching.
Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3ddd6e2f

sfc: explain that "attached" VFs only refer to Xen · 9a022e76

Íñigo Huguet authored Jun 21, 2021

During SRIOV disabling it is checked wether any VF is currently attached
to a guest, using pci_vfs_assigned function. However, this check only
works with VFs attached with Xen, not with vfio/KVM. Added comments
clarifying this point.

Also, replaced manual check of PCI_DEV_FLAGS_ASSIGNED flag and used the
helper function pci_is_dev_assigned instead.
Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9a022e76