1. 10 Jul, 2021 1 commit
    • Jianguo Wu's avatar
      mptcp: fix warning in __skb_flow_dissect() when do syn cookie for subflow join · 0c71929b
      Jianguo Wu authored
      I did stress test with wrk[1] and webfsd[2] with the assistance of
      mptcp-tools[3]:
      
        Server side:
            ./use_mptcp.sh webfsd -4 -R /tmp/ -p 8099
        Client side:
            ./use_mptcp.sh wrk -c 200 -d 30 -t 4 http://192.168.174.129:8099/
      
      and got the following warning message:
      
      [   55.552626] TCP: request_sock_subflow: Possible SYN flooding on port 8099. Sending cookies.  Check SNMP counters.
      [   55.553024] ------------[ cut here ]------------
      [   55.553027] WARNING: CPU: 0 PID: 10 at net/core/flow_dissector.c:984 __skb_flow_dissect+0x280/0x1650
      ...
      [   55.553117] CPU: 0 PID: 10 Comm: ksoftirqd/0 Not tainted 5.12.0+ #18
      [   55.553121] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 02/27/2020
      [   55.553124] RIP: 0010:__skb_flow_dissect+0x280/0x1650
      ...
      [   55.553133] RSP: 0018:ffffb79580087770 EFLAGS: 00010246
      [   55.553137] RAX: 0000000000000000 RBX: ffffffff8ddb58e0 RCX: ffffb79580087888
      [   55.553139] RDX: ffffffff8ddb58e0 RSI: ffff8f7e4652b600 RDI: 0000000000000000
      [   55.553141] RBP: ffffb79580087858 R08: 0000000000000000 R09: 0000000000000008
      [   55.553143] R10: 000000008c622965 R11: 00000000d3313a5b R12: ffff8f7e4652b600
      [   55.553146] R13: ffff8f7e465c9062 R14: 0000000000000000 R15: ffffb79580087888
      [   55.553149] FS:  0000000000000000(0000) GS:ffff8f7f75e00000(0000) knlGS:0000000000000000
      [   55.553152] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   55.553154] CR2: 00007f73d1d19000 CR3: 0000000135e10004 CR4: 00000000003706f0
      [   55.553160] Call Trace:
      [   55.553166]  ? __sha256_final+0x67/0xd0
      [   55.553173]  ? sha256+0x7e/0xa0
      [   55.553177]  __skb_get_hash+0x57/0x210
      [   55.553182]  subflow_init_req_cookie_join_save+0xac/0xc0
      [   55.553189]  subflow_check_req+0x474/0x550
      [   55.553195]  ? ip_route_output_key_hash+0x67/0x90
      [   55.553200]  ? xfrm_lookup_route+0x1d/0xa0
      [   55.553207]  subflow_v4_route_req+0x8e/0xd0
      [   55.553212]  tcp_conn_request+0x31e/0xab0
      [   55.553218]  ? selinux_socket_sock_rcv_skb+0x116/0x210
      [   55.553224]  ? tcp_rcv_state_process+0x179/0x6d0
      [   55.553229]  tcp_rcv_state_process+0x179/0x6d0
      [   55.553235]  tcp_v4_do_rcv+0xaf/0x220
      [   55.553239]  tcp_v4_rcv+0xce4/0xd80
      [   55.553243]  ? ip_route_input_rcu+0x246/0x260
      [   55.553248]  ip_protocol_deliver_rcu+0x35/0x1b0
      [   55.553253]  ip_local_deliver_finish+0x44/0x50
      [   55.553258]  ip_local_deliver+0x6c/0x110
      [   55.553262]  ? ip_rcv_finish_core.isra.19+0x5a/0x400
      [   55.553267]  ip_rcv+0xd1/0xe0
      ...
      
      After debugging, I found in __skb_flow_dissect(), skb->dev and skb->sk
      are both NULL, then net is NULL, and trigger WARN_ON_ONCE(!net),
      actually net is always NULL in this code path, as skb->dev is set to
      NULL in tcp_v4_rcv(), and skb->sk is never set.
      
      Code snippet in __skb_flow_dissect() that trigger warning:
        975         if (skb) {
        976                 if (!net) {
        977                         if (skb->dev)
        978                                 net = dev_net(skb->dev);
        979                         else if (skb->sk)
        980                                 net = sock_net(skb->sk);
        981                 }
        982         }
        983
        984         WARN_ON_ONCE(!net);
      
      So, using seq and transport header derived hash.
      
      [1] https://github.com/wg/wrk
      [2] https://github.com/ourway/webfsd
      [3] https://github.com/pabeni/mptcp-tools
      
      Fixes: 9466a1cc ("mptcp: enable JOIN requests even if cookies are in use")
      Suggested-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Suggested-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarJianguo Wu <wujianguo@chinatelecom.cn>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0c71929b
  2. 09 Jul, 2021 12 commits
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 5d52c906
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2021-07-09
      
      The following pull-request contains BPF updates for your *net* tree.
      
      We've added 9 non-merge commits during the last 9 day(s) which contain
      a total of 13 files changed, 118 insertions(+), 62 deletions(-).
      
      The main changes are:
      
      1) Fix runqslower task->state access from BPF, from SanjayKumar Jeyakumar.
      
      2) Fix subprog poke descriptor tracking use-after-free, from John Fastabend.
      
      3) Fix sparse complaint from prior devmap RCU conversion, from Toke Høiland-Jørgensen.
      
      4) Fix missing va_end in bpftool JIT json dump's error path, from Gu Shengxian.
      
      5) Fix tools/bpf install target from missing runqslower install, from Wei Li.
      
      6) Fix xdpsock BPF sample to unload program on shared umem option, from Wang Hai.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5d52c906
    • Taehee Yoo's avatar
      net: validate lwtstate->data before returning from skb_tunnel_info() · 67a9c943
      Taehee Yoo authored
      skb_tunnel_info() returns pointer of lwtstate->data as ip_tunnel_info
      type without validation. lwtstate->data can have various types such as
      mpls_iptunnel_encap, etc and these are not compatible.
      So skb_tunnel_info() should validate before returning that pointer.
      
      Splat looks like:
      BUG: KASAN: slab-out-of-bounds in vxlan_get_route+0x418/0x4b0 [vxlan]
      Read of size 2 at addr ffff888106ec2698 by task ping/811
      
      CPU: 1 PID: 811 Comm: ping Not tainted 5.13.0+ #1195
      Call Trace:
       dump_stack_lvl+0x56/0x7b
       print_address_description.constprop.8.cold.13+0x13/0x2ee
       ? vxlan_get_route+0x418/0x4b0 [vxlan]
       ? vxlan_get_route+0x418/0x4b0 [vxlan]
       kasan_report.cold.14+0x83/0xdf
       ? vxlan_get_route+0x418/0x4b0 [vxlan]
       vxlan_get_route+0x418/0x4b0 [vxlan]
       [ ... ]
       vxlan_xmit_one+0x148b/0x32b0 [vxlan]
       [ ... ]
       vxlan_xmit+0x25c5/0x4780 [vxlan]
       [ ... ]
       dev_hard_start_xmit+0x1ae/0x6e0
       __dev_queue_xmit+0x1f39/0x31a0
       [ ... ]
       neigh_xmit+0x2f9/0x940
       mpls_xmit+0x911/0x1600 [mpls_iptunnel]
       lwtunnel_xmit+0x18f/0x450
       ip_finish_output2+0x867/0x2040
       [ ... ]
      
      Fixes: 61adedf3 ("route: move lwtunnel state to dst_entry")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      67a9c943
    • Hangbin Liu's avatar
      net: ip_tunnel: fix mtu calculation for ETHER tunnel devices · 9992a078
      Hangbin Liu authored
      Commit 28e104d0 ("net: ip_tunnel: fix mtu calculation") removed
      dev->hard_header_len subtraction when calculate MTU for tunnel devices
      as there is an overhead for device that has header_ops.
      
      But there are ETHER tunnel devices, like gre_tap or erspan, which don't
      have header_ops but set dev->hard_header_len during setup. This makes
      pkts greater than (MTU - ETH_HLEN) could not be xmited. Fix it by
      subtracting the ETHER tunnel devices' dev->hard_header_len for MTU
      calculation.
      
      Fixes: 28e104d0 ("net: ip_tunnel: fix mtu calculation")
      Reported-by: default avatarJianlin Shi <jishi@redhat.com>
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9992a078
    • Antoine Tenart's avatar
      net: do not reuse skbuff allocated from skbuff_fclone_cache in the skb cache · 28b34f01
      Antoine Tenart authored
      Some socket buffers allocated in the fclone cache (in __alloc_skb) can
      end-up in the following path[1]:
      
      napi_skb_finish
        __kfree_skb_defer
          napi_skb_cache_put
      
      The issue is napi_skb_cache_put is not fclone friendly and will put
      those skbuff in the skb cache to be reused later, although this cache
      only expects skbuff allocated from skbuff_head_cache. When this happens
      the skbuff is eventually freed using the wrong origin cache, and we can
      see traces similar to:
      
      [ 1223.947534] cache_from_obj: Wrong slab cache. skbuff_head_cache but object is from skbuff_fclone_cache
      [ 1223.948895] WARNING: CPU: 3 PID: 0 at mm/slab.h:442 kmem_cache_free+0x251/0x3e0
      [ 1223.950211] Modules linked in:
      [ 1223.950680] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.13.0+ #474
      [ 1223.951587] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-3.fc34 04/01/2014
      [ 1223.953060] RIP: 0010:kmem_cache_free+0x251/0x3e0
      
      Leading sometimes to other memory related issues.
      
      Fix this by using __kfree_skb for fclone skbuff, similar to what is done
      the other place __kfree_skb_defer is called.
      
      [1] At least in setups using veth pairs and tunnels. Building a kernel
          with KASAN we can for example see packets allocated in
          sk_stream_alloc_skb hit the above path and later the issue arises
          when the skbuff is reused.
      
      Fixes: 9243adfc ("skbuff: queue NAPI_MERGED_FREE skbs into NAPI cache instead of freeing")
      Cc: Alexander Lobakin <alobakin@pm.me>
      Signed-off-by: default avatarAntoine Tenart <atenart@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      28b34f01
    • Talal Ahmad's avatar
      tcp: call sk_wmem_schedule before sk_mem_charge in zerocopy path · 358ed624
      Talal Ahmad authored
      sk_wmem_schedule makes sure that sk_forward_alloc has enough
      bytes for charging that is going to be done by sk_mem_charge.
      
      In the transmit zerocopy path, there is sk_mem_charge but there was
      no call to sk_wmem_schedule. This change adds that call.
      
      Without this call to sk_wmem_schedule, sk_forward_alloc can go
      negetive which is a bug because sk_forward_alloc is a per-socket
      space that has been forward charged so this can't be negative.
      
      Fixes: f214f915 ("tcp: enable MSG_ZEROCOPY")
      Signed-off-by: default avatarTalal Ahmad <talalahmad@google.com>
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Reviewed-by: default avatarWei Wang <weiwan@google.com>
      Reviewed-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      358ed624
    • Alexander Ovechkin's avatar
      net: send SYNACK packet with accepted fwmark · 43b90bfa
      Alexander Ovechkin authored
      commit e05a90ec ("net: reflect mark on tcp syn ack packets")
      fixed IPv4 only.
      
      This part is for the IPv6 side.
      
      Fixes: e05a90ec ("net: reflect mark on tcp syn ack packets")
      Signed-off-by: default avatarAlexander Ovechkin <ovov@yandex-team.ru>
      Acked-by: default avatarDmitry Yakunin <zeil@yandex-team.ru>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      43b90bfa
    • Pavel Skripkin's avatar
      net: ti: fix UAF in tlan_remove_one · 0336f8ff
      Pavel Skripkin authored
      priv is netdev private data and it cannot be
      used after free_netdev() call. Using priv after free_netdev()
      can cause UAF bug. Fix it by moving free_netdev() at the end of the
      function.
      
      Fixes: 1e0a8b13 ("tlan: cancel work at remove path")
      Signed-off-by: default avatarPavel Skripkin <paskripkin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0336f8ff
    • Pavel Skripkin's avatar
      net: qcom/emac: fix UAF in emac_remove · ad297cd2
      Pavel Skripkin authored
      adpt is netdev private data and it cannot be
      used after free_netdev() call. Using adpt after free_netdev()
      can cause UAF bug. Fix it by moving free_netdev() at the end of the
      function.
      
      Fixes: 54e19bc7 ("net: qcom/emac: do not use devm on internal phy pdev")
      Signed-off-by: default avatarPavel Skripkin <paskripkin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ad297cd2
    • Pavel Skripkin's avatar
      net: moxa: fix UAF in moxart_mac_probe · c78eaeeb
      Pavel Skripkin authored
      In case of netdev registration failure the code path will
      jump to init_fail label:
      
      init_fail:
      	netdev_err(ndev, "init failed\n");
      	moxart_mac_free_memory(ndev);
      irq_map_fail:
      	free_netdev(ndev);
      	return ret;
      
      So, there is no need to call free_netdev() before jumping
      to error handling path, since it can cause UAF or double-free
      bug.
      
      Fixes: 6c821bd9 ("net: Add MOXA ART SoCs ethernet driver")
      Signed-off-by: default avatarPavel Skripkin <paskripkin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c78eaeeb
    • John Fastabend's avatar
      bpf: Selftest to verify mixing bpf2bpf calls and tailcalls with insn patch · 1fb5ba29
      John Fastabend authored
      This adds some extra noise to the tailcall_bpf2bpf4 tests that will cause
      verify to patch insns. This then moves around subprog start/end insn
      index and poke descriptor insn index to ensure that verify and JIT will
      continue to track these correctly.
      
      If done correctly verifier should pass this program same as before and
      JIT should emit tail call logic.
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20210707223848.14580-3-john.fastabend@gmail.com
      1fb5ba29
    • John Fastabend's avatar
      bpf: Track subprog poke descriptors correctly and fix use-after-free · f263a814
      John Fastabend authored
      Subprograms are calling map_poke_track(), but on program release there is no
      hook to call map_poke_untrack(). However, on program release, the aux memory
      (and poke descriptor table) is freed even though we still have a reference to
      it in the element list of the map aux data. When we run map_poke_run(), we then
      end up accessing free'd memory, triggering KASAN in prog_array_map_poke_run():
      
        [...]
        [  402.824689] BUG: KASAN: use-after-free in prog_array_map_poke_run+0xc2/0x34e
        [  402.824698] Read of size 4 at addr ffff8881905a7940 by task hubble-fgs/4337
        [  402.824705] CPU: 1 PID: 4337 Comm: hubble-fgs Tainted: G          I       5.12.0+ #399
        [  402.824715] Call Trace:
        [  402.824719]  dump_stack+0x93/0xc2
        [  402.824727]  print_address_description.constprop.0+0x1a/0x140
        [  402.824736]  ? prog_array_map_poke_run+0xc2/0x34e
        [  402.824740]  ? prog_array_map_poke_run+0xc2/0x34e
        [  402.824744]  kasan_report.cold+0x7c/0xd8
        [  402.824752]  ? prog_array_map_poke_run+0xc2/0x34e
        [  402.824757]  prog_array_map_poke_run+0xc2/0x34e
        [  402.824765]  bpf_fd_array_map_update_elem+0x124/0x1a0
        [...]
      
      The elements concerned are walked as follows:
      
          for (i = 0; i < elem->aux->size_poke_tab; i++) {
                 poke = &elem->aux->poke_tab[i];
          [...]
      
      The access to size_poke_tab is a 4 byte read, verified by checking offsets
      in the KASAN dump:
      
        [  402.825004] The buggy address belongs to the object at ffff8881905a7800
                       which belongs to the cache kmalloc-1k of size 1024
        [  402.825008] The buggy address is located 320 bytes inside of
                       1024-byte region [ffff8881905a7800, ffff8881905a7c00)
      
      The pahole output of bpf_prog_aux:
      
        struct bpf_prog_aux {
          [...]
          /* --- cacheline 5 boundary (320 bytes) --- */
          u32                        size_poke_tab;        /*   320     4 */
          [...]
      
      In general, subprograms do not necessarily manage their own data structures.
      For example, BTF func_info and linfo are just pointers to the main program
      structure. This allows reference counting and cleanup to be done on the latter
      which simplifies their management a bit. The aux->poke_tab struct, however,
      did not follow this logic. The initial proposed fix for this use-after-free
      bug further embedded poke data tracking into the subprogram with proper
      reference counting. However, Daniel and Alexei questioned why we were treating
      these objects special; I agree, its unnecessary. The fix here removes the per
      subprogram poke table allocation and map tracking and instead simply points
      the aux->poke_tab pointer at the main programs poke table. This way, map
      tracking is simplified to the main program and we do not need to manage them
      per subprogram.
      
      This also means, bpf_prog_free_deferred(), which unwinds the program reference
      counting and kfrees objects, needs to ensure that we don't try to double free
      the poke_tab when free'ing the subprog structures. This is easily solved by
      NULL'ing the poke_tab pointer. The second detail is to ensure that per
      subprogram JIT logic only does fixups on poke_tab[] entries it owns. To do
      this, we add a pointer in the poke structure to point at the subprogram value
      so JITs can easily check while walking the poke_tab structure if the current
      entry belongs to the current program. The aux pointer is stable and therefore
      suitable for such comparison. On the jit_subprogs() error path, we omit
      cleaning up the poke->aux field because these are only ever referenced from
      the JIT side, but on error we will never make it to the JIT, so its fine to
      leave them dangling. Removing these pointers would complicate the error path
      for no reason. However, we do need to untrack all poke descriptors from the
      main program as otherwise they could race with the freeing of JIT memory from
      the subprograms. Lastly, a748c697 ("bpf: propagate poke descriptors to
      subprograms") had an off-by-one on the subprogram instruction index range
      check as it was testing 'insn_idx >= subprog_start && insn_idx <= subprog_end'.
      However, subprog_end is the next subprogram's start instruction.
      
      Fixes: a748c697 ("bpf: propagate poke descriptors to subprograms")
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Co-developed-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20210707223848.14580-2-john.fastabend@gmail.com
      f263a814
    • Florian Fainelli's avatar
      net: bcmgenet: Ensure all TX/RX queues DMAs are disabled · 2b452550
      Florian Fainelli authored
      Make sure that we disable each of the TX and RX queues in the TDMA and
      RDMA control registers. This is a correctness change to be symmetrical
      with the code that enables the TX and RX queues.
      Tested-by: default avatarMaxime Ripard <maxime@cerno.tech>
      Fixes: 1c1008c7 ("net: bcmgenet: add main driver file")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2b452550
  3. 08 Jul, 2021 13 commits
  4. 07 Jul, 2021 9 commits
  5. 06 Jul, 2021 5 commits
    • Nicolas Dichtel's avatar
      ipv6: fix 'disable_policy' for fwd packets · ccd27f05
      Nicolas Dichtel authored
      The goal of commit df789fe7 ("ipv6: Provide ipv6 version of
      "disable_policy" sysctl") was to have the disable_policy from ipv4
      available on ipv6.
      However, it's not exactly the same mechanism. On IPv4, all packets coming
      from an interface, which has disable_policy set, bypass the policy check.
      For ipv6, this is done only for local packets, ie for packets destinated to
      an address configured on the incoming interface.
      
      Let's align ipv6 with ipv4 so that the 'disable_policy' sysctl has the same
      effect for both protocols.
      
      My first approach was to create a new kind of route cache entries, to be
      able to set DST_NOPOLICY without modifying routes. This would have added a
      lot of code. Because the local delivery path is already handled, I choose
      to focus on the forwarding path to minimize code churn.
      
      Fixes: df789fe7 ("ipv6: Provide ipv6 version of "disable_policy" sysctl")
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ccd27f05
    • Colin Ian King's avatar
      octeontx2-pf: Fix assigned error return value that is never used · ad1f3797
      Colin Ian King authored
      Currently when the call to otx2_mbox_alloc_msg_cgx_mac_addr_update fails
      the error return variable rc is being assigned -ENOMEM and does not
      return early. rc is then re-assigned and the error case is not handled
      correctly. Fix this by returning -ENOMEM rather than assigning rc.
      
      Addresses-Coverity: ("Unused value")
      Fixes: 79d2be38 ("octeontx2-pf: offload DMAC filters to CGX/RPM block")
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ad1f3797
    • David S. Miller's avatar
      Merge branch 'bonding-ipsec' · 5ddef2ad
      David S. Miller authored
      Taehee Yoo says:
      
      ====================
      net: fix bonding ipsec offload problems
      
      This series fixes some problems related to bonding ipsec offload.
      
      The 1, 5, and 8th patches are to add a missing rcu_read_lock().
      The 2nd patch is to add null check code to bond_ipsec_add_sa.
      When bonding interface doesn't have an active real interface, the
      bond->curr_active_slave pointer is null.
      But bond_ipsec_add_sa() uses that pointer without null check.
      So that it results in null-ptr-deref.
      The 3 and 4th patches are to replace xs->xso.dev with xs->xso.real_dev.
      The 6th patch is to disallow to set ipsec offload if a real interface
      type is bonding.
      The 7th patch is to add struct bond_ipsec to manage SA.
      If bond mode is changed, or active real interface is changed, SA should
      be removed from old current active real interface then it should be added
      to new active real interface.
      But it can't, because it doesn't manage SA.
      The 9th patch is to fix incorrect return value of bond_ipsec_offload_ok().
      
      v1 -> v2:
       - Add 9th patch.
       - Do not print warning when there is no SA in bond_ipsec_add_sa_all().
       - Add comment for ipsec_lock.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5ddef2ad
    • Taehee Yoo's avatar
      bonding: fix incorrect return value of bond_ipsec_offload_ok() · 168e696a
      Taehee Yoo authored
      bond_ipsec_offload_ok() is called to check whether the interface supports
      ipsec offload or not.
      bonding interface support ipsec offload only in active-backup mode.
      So, if a bond interface is not in active-backup mode, it should return
      false but it returns true.
      
      Fixes: a3b658cf ("bonding: allow xfrm offload setup post-module-load")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      168e696a
    • Taehee Yoo's avatar
      bonding: fix suspicious RCU usage in bond_ipsec_offload_ok() · 955b785e
      Taehee Yoo authored
      To dereference bond->curr_active_slave, it uses rcu_dereference().
      But it and the caller doesn't acquire RCU so a warning occurs.
      So add rcu_read_lock().
      
      Splat looks like:
      WARNING: suspicious RCU usage
      5.13.0-rc6+ #1179 Not tainted
      drivers/net/bonding/bond_main.c:571 suspicious
      rcu_dereference_check() usage!
      
      other info that might help us debug this:
      
      rcu_scheduler_active = 2, debug_locks = 1
      1 lock held by ping/974:
       #0: ffff888109e7db70 (sk_lock-AF_INET){+.+.}-{0:0},
      at: raw_sendmsg+0x1303/0x2cb0
      
      stack backtrace:
      CPU: 2 PID: 974 Comm: ping Not tainted 5.13.0-rc6+ #1179
      Call Trace:
       dump_stack+0xa4/0xe5
       bond_ipsec_offload_ok+0x1f4/0x260 [bonding]
       xfrm_output+0x179/0x890
       xfrm4_output+0xfa/0x410
       ? __xfrm4_output+0x4b0/0x4b0
       ? __ip_make_skb+0xecc/0x2030
       ? xfrm4_udp_encap_rcv+0x800/0x800
       ? ip_local_out+0x21/0x3a0
       ip_send_skb+0x37/0xa0
       raw_sendmsg+0x1bfd/0x2cb0
      
      Fixes: 18cb261a ("bonding: support hardware encryption offload to slaves")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      955b785e