1. 24 Jun, 2021 24 commits
  2. 23 Jun, 2021 5 commits
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · c2f5c57d
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2021-06-23
      
      The following pull-request contains BPF updates for your *net* tree.
      
      We've added 14 non-merge commits during the last 6 day(s) which contain
      a total of 13 files changed, 137 insertions(+), 64 deletions(-).
      
      Note that when you merge net into net-next, there is a small merge conflict
      between 9f2470fb ("skmsg: Improve udp_bpf_recvmsg() accuracy") from bpf
      with c49661aa ("skmsg: Remove unused parameters of sk_msg_wait_data()")
      from net-next. Resolution is to: i) net/ipv4/udp_bpf.c: take udp_msg_wait_data()
      and remove err parameter from the function, ii) net/ipv4/tcp_bpf.c: take
      tcp_msg_wait_data() and remove err parameter from the function, iii) for
      net/core/skmsg.c and include/linux/skmsg.h: remove the sk_msg_wait_data()
      implementation and its prototype in header.
      
      The main changes are:
      
      1) Fix BPF poke descriptor adjustments after insn rewrite, from John Fastabend.
      
      2) Fix regression when using BPF_OBJ_GET with non-O_RDWR flags, from Maciej Żenczykowski.
      
      3) Various bug and error handling fixes for UDP-related sock_map, from Cong Wang.
      
      4) Fix patching of vmlinux BTF IDs with correct endianness, from Tony Ambardar.
      
      5) Two fixes for TX descriptor validation in AF_XDP, from Magnus Karlsson.
      
      6) Fix overflow in size calculation for bpf_map_area_alloc(), from Bui Quang Minh.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c2f5c57d
    • Eric Dumazet's avatar
      ipv6: exthdrs: do not blindly use init_net · bcc3f2a8
      Eric Dumazet authored
      I see no reason why max_dst_opts_cnt and max_hbh_opts_cnt
      are fetched from the initial net namespace.
      
      The other sysctls (max_dst_opts_len & max_hbh_opts_len)
      are in fact already using the current ns.
      
      Note: it is not clear why ipv6_destopt_rcv() use two ways to
      get to the netns :
      
       1) dev_net(dst->dev)
          Originally used to increment IPSTATS_MIB_INHDRERRORS
      
       2) dev_net(skb->dev)
           Tom used this variant in his patch.
      
      Maybe this calls to use ipv6_skb_net() instead ?
      
      Fixes: 47d3d7ac ("ipv6: Implement limits on Hop-by-Hop and Destination options")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Tom Herbert <tom@quantonium.net>
      Cc: Coco Li <lixiaoyan@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bcc3f2a8
    • Jian-Hong Pan's avatar
      net: bcmgenet: Fix attaching to PYH failed on RPi 4B · b2ac9800
      Jian-Hong Pan authored
      The Broadcom UniMAC MDIO bus from mdio-bcm-unimac module comes too late.
      So, GENET cannot find the ethernet PHY on UniMAC MDIO bus. This leads
      GENET fail to attach the PHY as following log:
      
      bcmgenet fd580000.ethernet: GENET 5.0 EPHY: 0x0000
      ...
      could not attach to PHY
      bcmgenet fd580000.ethernet eth0: failed to connect to PHY
      uart-pl011 fe201000.serial: no DMA platform data
      libphy: bcmgenet MII bus: probed
      ...
      unimac-mdio unimac-mdio.-19: Broadcom UniMAC MDIO bus
      
      This patch adds the soft dependency to load mdio-bcm-unimac module
      before genet module to avoid the issue.
      
      Fixes: 9a4e7969 ("net: bcmgenet: utilize generic Broadcom UniMAC MDIO controller driver")
      Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=213485Signed-off-by: default avatarJian-Hong Pan <jhp@endlessos.org>
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b2ac9800
    • Di Zhu's avatar
      bonding: allow nesting of bonding device · 4d293fe1
      Di Zhu authored
      The commit 3c9ef511 ("bonding: avoid adding slave device with
      IFF_MASTER flag") fix a crash when add slave device with IFF_MASTER,
      but it rejects the scenario of nested bonding device.
      
      As Eric Dumazet described: since there indeed is a usage scenario about
      nesting bonding, we should not break it.
      
      So we add a new judgment condition to allow nesting of bonding device.
      
      Fixes: 3c9ef511 ("bonding: avoid adding slave device with IFF_MASTER flag")
      Suggested-by: default avatarJay Vosburgh <jay.vosburgh@canonical.com>
      Signed-off-by: default avatarDi Zhu <zhudi21@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4d293fe1
    • David S. Miller's avatar
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec · 7c2becf7
      David S. Miller authored
      Steffen Klassert says:
      
      ====================
      pull request (net): ipsec 2021-06-23
      
      1) Don't return a mtu smaller than 1280 on IPv6 pmtu discovery.
         From Sabrina Dubroca
      
      2) Fix seqcount rcu-read side in xfrm_policy_lookup_bytype
         for the PREEMPT_RT case. From Varad Gautam.
      
      3) Remove a repeated declaration of xfrm_parse_spi.
         From Shaokun Zhang.
      
      4) IPv4 beet mode can't handle fragments, but IPv6 does.
         commit 68dc022d ("xfrm: BEET mode doesn't support
         fragments for inner packets") handled IPv4 and IPv6
         the same way. Relax the check for IPv6 because fragments
         are possible here. From Xin Long.
      
      5) Memory allocation failures are not reported for
         XFRMA_ENCAP and XFRMA_COADDR in xfrm_state_construct.
         Fix this by moving both cases in front of the function.
      
      6) Fix a missing initialization in the xfrm offload fallback
         fail case for bonding devices. From Ayush Sawal.
      
      Please pull or let me know if there are problems.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7c2becf7
  3. 22 Jun, 2021 11 commits
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · f4b29d2e
      David S. Miller authored
      Pablo Neira Ayuso says
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains Netfilter fixes for net:
      
      1) Nicolas Dichtel updates MAINTAINERS file to add Netfilter IRC channel.
      
      2) Skip non-IPv6 packets in nft_exthdr.
      
      3) Skip non-TCP packets in nft_osf.
      
      4) Skip non-TCP/UDP packets in nft_tproxy.
      
      5) Memleak in hardware offload infrastructure when counters are used
         for first time in a rule.
      
      6) The VLAN transfer routine must use FLOW_DISSECTOR_KEY_BASIC instead
         of FLOW_DISSECTOR_KEY_CONTROL. Moreover, make a more robust check
         for 802.1q and 802.1ad to restore simple matching on transport
         protocols.
      
      7) Fix bogus EPERM when listing a ruleset when table ownership flag
         is set on.
      
      8) Honor table ownership flag when table is referenced by handle.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f4b29d2e
    • John Fastabend's avatar
      bpf: Fix null ptr deref with mixed tail calls and subprogs · 7506d211
      John Fastabend authored
      The sub-programs prog->aux->poke_tab[] is populated in jit_subprogs() and
      then used when emitting 'BPF_JMP|BPF_TAIL_CALL' insn->code from the
      individual JITs. The poke_tab[] to use is stored in the insn->imm by
      the code adding it to that array slot. The JIT then uses imm to find the
      right entry for an individual instruction. In the x86 bpf_jit_comp.c
      this is done by calling emit_bpf_tail_call_direct with the poke_tab[]
      of the imm value.
      
      However, we observed the below null-ptr-deref when mixing tail call
      programs with subprog programs. For this to happen we just need to
      mix bpf-2-bpf calls and tailcalls with some extra calls or instructions
      that would be patched later by one of the fixup routines. So whats
      happening?
      
      Before the fixup_call_args() -- where the jit op is done -- various
      code patching is done by do_misc_fixups(). This may increase the
      insn count, for example when we patch map_lookup_up using map_gen_lookup
      hook. This does two things. First, it means the instruction index,
      insn_idx field, of a tail call instruction will move by a 'delta'.
      
      In verifier code,
      
       struct bpf_jit_poke_descriptor desc = {
        .reason = BPF_POKE_REASON_TAIL_CALL,
        .tail_call.map = BPF_MAP_PTR(aux->map_ptr_state),
        .tail_call.key = bpf_map_key_immediate(aux),
        .insn_idx = i + delta,
       };
      
      Then subprog start values subprog_info[i].start will be updated
      with the delta and any poke descriptor index will also be updated
      with the delta in adjust_poke_desc(). If we look at the adjust
      subprog starts though we see its only adjusted when the delta
      occurs before the new instructions,
      
              /* NOTE: fake 'exit' subprog should be updated as well. */
              for (i = 0; i <= env->subprog_cnt; i++) {
                      if (env->subprog_info[i].start <= off)
                              continue;
      
      Earlier subprograms are not changed because their start values
      are not moved. But, adjust_poke_desc() does the offset + delta
      indiscriminately. The result is poke descriptors are potentially
      corrupted.
      
      Then in jit_subprogs() we only populate the poke_tab[]
      when the above insn_idx is less than the next subprogram start. From
      above we corrupted our insn_idx so we might incorrectly assume a
      poke descriptor is not used in a subprogram omitting it from the
      subprogram. And finally when the jit runs it does the deref of poke_tab
      when emitting the instruction and crashes with below. Because earlier
      step omitted the poke descriptor.
      
      The fix is straight forward with above context. Simply move same logic
      from adjust_subprog_starts() into adjust_poke_descs() and only adjust
      insn_idx when needed.
      
      [   82.396354] bpf_testmod: version magic '5.12.0-rc2alu+ SMP preempt mod_unload ' should be '5.12.0+ SMP preempt mod_unload '
      [   82.623001] loop10: detected capacity change from 0 to 8
      [   88.487424] ==================================================================
      [   88.487438] BUG: KASAN: null-ptr-deref in do_jit+0x184a/0x3290
      [   88.487455] Write of size 8 at addr 0000000000000008 by task test_progs/5295
      [   88.487471] CPU: 7 PID: 5295 Comm: test_progs Tainted: G          I       5.12.0+ #386
      [   88.487483] Hardware name: Dell Inc. Precision 5820 Tower/002KVM, BIOS 1.9.2 01/24/2019
      [   88.487490] Call Trace:
      [   88.487498]  dump_stack+0x93/0xc2
      [   88.487515]  kasan_report.cold+0x5f/0xd8
      [   88.487530]  ? do_jit+0x184a/0x3290
      [   88.487542]  do_jit+0x184a/0x3290
       ...
      [   88.487709]  bpf_int_jit_compile+0x248/0x810
       ...
      [   88.487765]  bpf_check+0x3718/0x5140
       ...
      [   88.487920]  bpf_prog_load+0xa22/0xf10
      
      Fixes: a748c697 ("bpf: propagate poke descriptors to subprograms")
      Reported-by: default avatarJussi Maki <joamaki@gmail.com>
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Reviewed-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      7506d211
    • Eric Dumazet's avatar
      ieee802154: hwsim: avoid possible crash in hwsim_del_edge_nl() · 0303b303
      Eric Dumazet authored
      Both MAC802154_HWSIM_ATTR_RADIO_ID and MAC802154_HWSIM_ATTR_RADIO_EDGE
      must be present to avoid a crash.
      
      Fixes: f25da51f ("ieee802154: hwsim: add replacement for fakelb")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Alexander Aring <alex.aring@gmail.com>
      Cc: Stefan Schmidt <stefan@datenfreihafen.org>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Acked-by: default avatarAlexander Aring <aahringo@redhat.com>
      Link: https://lore.kernel.org/r/20210621180244.882076-1-eric.dumazet@gmail.comSigned-off-by: default avatarStefan Schmidt <stefan@datenfreihafen.org>
      0303b303
    • Dongliang Mu's avatar
      ieee802154: hwsim: Fix memory leak in hwsim_add_one · 28a5501c
      Dongliang Mu authored
      No matter from hwsim_remove or hwsim_del_radio_nl, hwsim_del fails to
      remove the entry in the edges list. Take the example below, phy0, phy1
      and e0 will be deleted, resulting in e1 not freed and accessed in the
      future.
      
                    hwsim_phys
                        |
          ------------------------------
          |                            |
      phy0 (edges)                 phy1 (edges)
         ----> e1 (idx = 1)             ----> e0 (idx = 0)
      
      Fix this by deleting and freeing all the entries in the edges list
      between hwsim_edge_unsubscribe_me and list_del(&phy->list).
      
      Reported-by: syzbot+b80c9959009a9325cdff@syzkaller.appspotmail.com
      Fixes: 1c9f4a3f ("ieee802154: hwsim: fix rcu handling")
      Signed-off-by: default avatarDongliang Mu <mudongliangabcd@gmail.com>
      Acked-by: default avatarAlexander Aring <aahringo@redhat.com>
      Link: https://lore.kernel.org/r/20210616020901.2759466-1-mudongliangabcd@gmail.comSigned-off-by: default avatarStefan Schmidt <stefan@datenfreihafen.org>
      28a5501c
    • Vignesh Raghavendra's avatar
      net: ti: am65-cpsw-nuss: Fix crash when changing number of TX queues · ce8eb4c7
      Vignesh Raghavendra authored
      When changing number of TX queues using ethtool:
      
      	# ethtool -L eth0 tx 1
      	[  135.301047] Unable to handle kernel paging request at virtual address 00000000af5d0000
      	[...]
      	[  135.525128] Call trace:
      	[  135.525142]  dma_release_from_dev_coherent+0x2c/0xb0
      	[  135.525148]  dma_free_attrs+0x54/0xe0
      	[  135.525156]  k3_cppi_desc_pool_destroy+0x50/0xa0
      	[  135.525164]  am65_cpsw_nuss_remove_tx_chns+0x88/0xdc
      	[  135.525171]  am65_cpsw_set_channels+0x3c/0x70
      	[...]
      
      This is because k3_cppi_desc_pool_destroy() which is called after
      k3_udma_glue_release_tx_chn() in am65_cpsw_nuss_remove_tx_chns()
      references struct device that is unregistered at the end of
      k3_udma_glue_release_tx_chn()
      
      Therefore the right order is to call k3_cppi_desc_pool_destroy() and
      destroy desc pool before calling k3_udma_glue_release_tx_chn().
      Fix this throughout the driver.
      
      Fixes: 93a76530 ("net: ethernet: ti: introduce am65x/j721e gigabit eth subsystem driver")
      Signed-off-by: default avatarVignesh Raghavendra <vigneshr@ti.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ce8eb4c7
    • Rafał Miłecki's avatar
      net: broadcom: bcm4908_enet: reset DMA rings sw indexes properly · ddeacc4f
      Rafał Miłecki authored
      Resetting software indexes in bcm4908_dma_alloc_buf_descs() is not
      enough as it's called during device probe only. Driver resets DMA on
      every .ndo_open callback and it's required to reset indexes then.
      
      This fixes inconsistent rings state and stalled traffic after interface
      down & up sequence.
      
      Fixes: 4feffead ("net: broadcom: bcm4908enet: add BCM4908 controller driver")
      Signed-off-by: default avatarRafał Miłecki <rafal@milecki.pl>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ddeacc4f
    • Miao Wang's avatar
      net/ipv4: swap flow ports when validating source · c69f114d
      Miao Wang authored
      When doing source address validation, the flowi4 struct used for
      fib_lookup should be in the reverse direction to the given skb.
      fl4_dport and fl4_sport returned by fib4_rules_early_flow_dissect
      should thus be swapped.
      
      Fixes: 5a847a6e ("net/ipv4: Initialize proto and ports in flow struct")
      Signed-off-by: default avatarMiao Wang <shankerwangmiao@gmail.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c69f114d
    • Di Zhu's avatar
      bonding: avoid adding slave device with IFF_MASTER flag · 3c9ef511
      Di Zhu authored
      The following steps will definitely cause the kernel to crash:
      	ip link add vrf1 type vrf table 1
      	modprobe bonding.ko max_bonds=1
      	echo "+vrf1" >/sys/class/net/bond0/bonding/slaves
      	rmmod bonding
      
      The root cause is that: When the VRF is added to the slave device,
      it will fail, and some cleaning work will be done. because VRF device
      has IFF_MASTER flag, cleanup process  will not clear the IFF_BONDING flag.
      Then, when we unload the bonding module, unregister_netdevice_notifier()
      will treat the VRF device as a bond master device and treat netdev_priv()
      as struct bonding{} which actually is struct net_vrf{}.
      
      By analyzing the processing logic of bond_enslave(), it seems that
      it is not allowed to add the slave device with the IFF_MASTER flag, so
      we need to add a code check for this situation.
      Signed-off-by: default avatarDi Zhu <zhudi21@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3c9ef511
    • Jakub Kicinski's avatar
      ip6_tunnel: fix GRE6 segmentation · a6e3f298
      Jakub Kicinski authored
      Commit 6c11fbf9 ("ip6_tunnel: add MPLS transmit support")
      moved assiging inner_ipproto down from ipxip6_tnl_xmit() to
      its callee ip6_tnl_xmit(). The latter is also used by GRE.
      
      Since commit 38720352 ("gre: Use inner_proto to obtain inner
      header protocol") GRE had been depending on skb->inner_protocol
      during segmentation. It sets it in gre_build_header() and reads
      it in gre_gso_segment(). Changes to ip6_tnl_xmit() overwrite
      the protocol, resulting in GSO skbs getting dropped.
      
      Note that inner_protocol is a union with inner_ipproto,
      GRE uses the former while the change switched it to the latter
      (always setting it to just IPPROTO_GRE).
      
      Restore the original location of skb_set_inner_ipproto(),
      it is unclear why it was moved in the first place.
      
      Fixes: 6c11fbf9 ("ip6_tunnel: add MPLS transmit support")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Tested-by: default avatarVadim Fedorenko <vfedorenko@novek.ru>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a6e3f298
    • David S. Miller's avatar
      Merge branch 'mptcp-fixes' · e596212e
      David S. Miller authored
      Mat Martineau says:
      
      ====================
      mptcp: Fixes for v5.13
      
      Here are two MPTCP fixes from Paolo.
      
      Patch 1 fixes some possible connect-time race conditions with
      MPTCP-level connection state changes.
      
      Patch 2 deletes a duplicate function declaration.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e596212e
    • Paolo Abeni's avatar
      mptcp: drop duplicate mptcp_setsockopt() declaration · 597dbae7
      Paolo Abeni authored
      commit 78962489 ("mptcp: add skeleton to sync msk socket
      options to subflows") introduced a duplicate declaration of
      mptcp_setsockopt(), just drop it.
      Reported-by: default avatarFlorian Westphal <fw@strlen.de>
      Fixes: 78962489 ("mptcp: add skeleton to sync msk socket options to subflows")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      597dbae7