1. 19 Dec, 2019 9 commits
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 0fd26005
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2019-12-19
      
      The following pull-request contains BPF updates for your *net* tree.
      
      We've added 10 non-merge commits during the last 8 day(s) which contain
      a total of 21 files changed, 269 insertions(+), 108 deletions(-).
      
      The main changes are:
      
      1) Fix lack of synchronization between xsk wakeup and destroying resources
         used by xsk wakeup, from Maxim Mikityanskiy.
      
      2) Fix pruning with tail call patching, untrack programs in case of verifier
         error and fix a cgroup local storage tracking bug, from Daniel Borkmann.
      
      3) Fix clearing skb->tstamp in bpf_redirect() when going from ingress to
         egress which otherwise cause issues e.g. on fq qdisc, from Lorenz Bauer.
      
      4) Fix compile warning of unused proc_dointvec_minmax_bpf_restricted() when
         only cBPF is present, from Alexander Lobakin.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0fd26005
    • Daniel Borkmann's avatar
      bpf: Add further test_verifier cases for record_func_key · 3123d801
      Daniel Borkmann authored
      Expand dummy prog generation such that we can easily check on return
      codes and add few more test cases to make sure we keep on tracking
      pruning behavior.
      
        # ./test_verifier
        [...]
        #1066/p XDP pkt read, pkt_data <= pkt_meta', bad access 1 OK
        #1067/p XDP pkt read, pkt_data <= pkt_meta', bad access 2 OK
        Summary: 1580 PASSED, 0 SKIPPED, 0 FAILED
      
      Also verified that JIT dump of added test cases looks good.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/df7200b6021444fd369376d227de917357285b65.1576789878.git.daniel@iogearbox.net
      3123d801
    • Daniel Borkmann's avatar
      bpf: Fix record_func_key to perform backtracking on r3 · cc52d914
      Daniel Borkmann authored
      While testing Cilium with /unreleased/ Linus' tree under BPF-based NodePort
      implementation, I noticed a strange BPF SNAT engine behavior from time to
      time. In some cases it would do the correct SNAT/DNAT service translation,
      but at a random point in time it would just stop and perform an unexpected
      translation after SYN, SYN/ACK and stack would send a RST back. While initially
      assuming that there is some sort of a race condition in BPF code, adding
      trace_printk()s for debugging purposes at some point seemed to have resolved
      the issue auto-magically.
      
      Digging deeper on this Heisenbug and reducing the trace_printk() calls to
      an absolute minimum, it turns out that a single call would suffice to
      trigger / not trigger the seen RST issue, even though the logic of the
      program itself remains unchanged. Turns out the single call changed verifier
      pruning behavior to get everything to work. Reconstructing a minimal test
      case, the incorrect JIT dump looked as follows:
      
        # bpftool p d j i 11346
        0xffffffffc0cba96c:
        [...]
          21:   movzbq 0x30(%rdi),%rax
          26:   cmp    $0xd,%rax
          2a:   je     0x000000000000003a
          2c:   xor    %edx,%edx
          2e:   movabs $0xffff89cc74e85800,%rsi
          38:   jmp    0x0000000000000049
          3a:   mov    $0x2,%edx
          3f:   movabs $0xffff89cc74e85800,%rsi
          49:   mov    -0x224(%rbp),%eax
          4f:   cmp    $0x20,%eax
          52:   ja     0x0000000000000062
          54:   add    $0x1,%eax
          57:   mov    %eax,-0x224(%rbp)
          5d:   jmpq   0xffffffffffff6911
          62:   mov    $0x1,%eax
        [...]
      
      Hence, unexpectedly, JIT emitted a direct jump even though retpoline based
      one would have been needed since in line 2c and 3a we have different slot
      keys in BPF reg r3. Verifier log of the test case reveals what happened:
      
        0: (b7) r0 = 14
        1: (73) *(u8 *)(r1 +48) = r0
        2: (71) r0 = *(u8 *)(r1 +48)
        3: (15) if r0 == 0xd goto pc+4
         R0_w=inv(id=0,umax_value=255,var_off=(0x0; 0xff)) R1=ctx(id=0,off=0,imm=0) R10=fp0
        4: (b7) r3 = 0
        5: (18) r2 = 0xffff89cc74d54a00
        7: (05) goto pc+3
        11: (85) call bpf_tail_call#12
        12: (b7) r0 = 1
        13: (95) exit
        from 3 to 8: R0_w=inv13 R1=ctx(id=0,off=0,imm=0) R10=fp0
        8: (b7) r3 = 2
        9: (18) r2 = 0xffff89cc74d54a00
        11: safe
        processed 13 insns (limit 1000000) [...]
      
      Second branch is pruned by verifier since considered safe, but issue is that
      record_func_key() couldn't have seen the index in line 3a and therefore
      decided that emitting a direct jump at this location was okay.
      
      Fix this by reusing our backtracking logic for precise scalar verification
      in order to prevent pruning on the slot key. This means verifier will track
      content of r3 all the way backwards and only prune if both scalars were
      unknown in state equivalence check and therefore poisoned in the first place
      in record_func_key(). The range is [x,x] in record_func_key() case since
      the slot always would have to be constant immediate. Correct verification
      after fix:
      
        0: (b7) r0 = 14
        1: (73) *(u8 *)(r1 +48) = r0
        2: (71) r0 = *(u8 *)(r1 +48)
        3: (15) if r0 == 0xd goto pc+4
         R0_w=invP(id=0,umax_value=255,var_off=(0x0; 0xff)) R1=ctx(id=0,off=0,imm=0) R10=fp0
        4: (b7) r3 = 0
        5: (18) r2 = 0x0
        7: (05) goto pc+3
        11: (85) call bpf_tail_call#12
        12: (b7) r0 = 1
        13: (95) exit
        from 3 to 8: R0_w=invP13 R1=ctx(id=0,off=0,imm=0) R10=fp0
        8: (b7) r3 = 2
        9: (18) r2 = 0x0
        11: (85) call bpf_tail_call#12
        12: (b7) r0 = 1
        13: (95) exit
        processed 15 insns (limit 1000000) [...]
      
      And correct corresponding JIT dump:
      
        # bpftool p d j i 11
        0xffffffffc0dc34c4:
        [...]
          21:	  movzbq 0x30(%rdi),%rax
          26:	  cmp    $0xd,%rax
          2a:	  je     0x000000000000003a
          2c:	  xor    %edx,%edx
          2e:	  movabs $0xffff9928b4c02200,%rsi
          38:	  jmp    0x0000000000000049
          3a:	  mov    $0x2,%edx
          3f:	  movabs $0xffff9928b4c02200,%rsi
          49:	  cmp    $0x4,%rdx
          4d:	  jae    0x0000000000000093
          4f:	  and    $0x3,%edx
          52:	  mov    %edx,%edx
          54:	  cmp    %edx,0x24(%rsi)
          57:	  jbe    0x0000000000000093
          59:	  mov    -0x224(%rbp),%eax
          5f:	  cmp    $0x20,%eax
          62:	  ja     0x0000000000000093
          64:	  add    $0x1,%eax
          67:	  mov    %eax,-0x224(%rbp)
          6d:	  mov    0x110(%rsi,%rdx,8),%rax
          75:	  test   %rax,%rax
          78:	  je     0x0000000000000093
          7a:	  mov    0x30(%rax),%rax
          7e:	  add    $0x19,%rax
          82:   callq  0x000000000000008e
          87:   pause
          89:   lfence
          8c:   jmp    0x0000000000000087
          8e:   mov    %rax,(%rsp)
          92:   retq
          93:   mov    $0x1,%eax
        [...]
      
      Also explicitly adding explicit env->allow_ptr_leaks to fixup_bpf_calls() since
      backtracking is enabled under former (direct jumps as well, but use different
      test). In case of only tracking different map pointers as in c93552c4 ("bpf:
      properly enforce index mask to prevent out-of-bounds speculation"), pruning
      cannot make such short-cuts, neither if there are paths with scalar and non-scalar
      types as r3. mark_chain_precision() is only needed after we know that
      register_is_const(). If it was not the case, we already poison the key on first
      path and non-const key in later paths are not matching the scalar range in regsafe()
      either. Cilium NodePort testing passes fine as well now. Note, released kernels
      not affected.
      
      Fixes: d2e4c1e6 ("bpf: Constant map key tracking for prog array pokes")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/ac43ffdeb7386c5bd688761ed266f3722bb39823.1576789878.git.daniel@iogearbox.net
      cc52d914
    • Alexander Lobakin's avatar
      net, sysctl: Fix compiler warning when only cBPF is present · 1148f9ad
      Alexander Lobakin authored
      proc_dointvec_minmax_bpf_restricted() has been firstly introduced
      in commit 2e4a3098 ("bpf: restrict access to core bpf sysctls")
      under CONFIG_HAVE_EBPF_JIT. Then, this ifdef has been removed in
      ede95a63 ("bpf: add bpf_jit_limit knob to restrict unpriv
      allocations"), because a new sysctl, bpf_jit_limit, made use of it.
      Finally, this parameter has become long instead of integer with
      fdadd049 ("bpf: fix bpf_jit_limit knob for PAGE_SIZE >= 64K")
      and thus, a new proc_dolongvec_minmax_bpf_restricted() has been
      added.
      
      With this last change, we got back to that
      proc_dointvec_minmax_bpf_restricted() is used only under
      CONFIG_HAVE_EBPF_JIT, but the corresponding ifdef has not been
      brought back.
      
      So, in configurations like CONFIG_BPF_JIT=y && CONFIG_HAVE_EBPF_JIT=n
      since v4.20 we have:
      
        CC      net/core/sysctl_net_core.o
      net/core/sysctl_net_core.c:292:1: warning: ‘proc_dointvec_minmax_bpf_restricted’ defined but not used [-Wunused-function]
        292 | proc_dointvec_minmax_bpf_restricted(struct ctl_table *table, int write,
            | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      Suppress this by guarding it with CONFIG_HAVE_EBPF_JIT again.
      
      Fixes: fdadd049 ("bpf: fix bpf_jit_limit knob for PAGE_SIZE >= 64K")
      Signed-off-by: default avatarAlexander Lobakin <alobakin@dlink.ru>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20191218091821.7080-1-alobakin@dlink.ru
      1148f9ad
    • Daniel Borkmann's avatar
      Merge branch 'bpf-fix-xsk-wakeup' · ca8d0fa7
      Daniel Borkmann authored
      Maxim Mikityanskiy says:
      
      ====================
      This series addresses the issue described in the commit message of the
      first patch: lack of synchronization between XSK wakeup and destroying
      the resources used by XSK wakeup. The idea is similar to napi_synchronize.
      The series contains fixes for the drivers that implement XSK.
      
      v2 incorporates changes suggested by Björn:
      
      1. Call synchronize_rcu in Intel drivers only if the XDP program is
         being unloaded.
      2. Don't forget rcu_read_lock when wakeup is called from xsk_poll.
      3. Use xs->zc as the condition to call ndo_xsk_wakeup.
      ====================
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      ca8d0fa7
    • Maxim Mikityanskiy's avatar
      net/ixgbe: Fix concurrency issues between config flow and XSK · c0fdccfd
      Maxim Mikityanskiy authored
      Use synchronize_rcu to wait until the XSK wakeup function finishes
      before destroying the resources it uses:
      
      1. ixgbe_down already calls synchronize_rcu after setting __IXGBE_DOWN.
      
      2. After switching the XDP program, call synchronize_rcu to let
      ixgbe_xsk_wakeup exit before the XDP program is freed.
      
      3. Changing the number of channels brings the interface down.
      
      4. Disabling UMEM sets __IXGBE_TX_DISABLED before closing hardware
      resources and resetting xsk_umem. Check that bit in ixgbe_xsk_wakeup to
      avoid using the XDP ring when it's already destroyed. synchronize_rcu is
      called from ixgbe_txrx_ring_disable.
      Signed-off-by: default avatarMaxim Mikityanskiy <maximmi@mellanox.com>
      Signed-off-by: default avatarBjörn Töpel <bjorn.topel@intel.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20191217162023.16011-5-maximmi@mellanox.com
      c0fdccfd
    • Maxim Mikityanskiy's avatar
      net/i40e: Fix concurrency issues between config flow and XSK · b3873a5b
      Maxim Mikityanskiy authored
      Use synchronize_rcu to wait until the XSK wakeup function finishes
      before destroying the resources it uses:
      
      1. i40e_down already calls synchronize_rcu. On i40e_down either
      __I40E_VSI_DOWN or __I40E_CONFIG_BUSY is set. Check the latter in
      i40e_xsk_wakeup (the former is already checked there).
      
      2. After switching the XDP program, call synchronize_rcu to let
      i40e_xsk_wakeup exit before the XDP program is freed.
      
      3. Changing the number of channels brings the interface down (see
      i40e_prep_for_reset and i40e_pf_quiesce_all_vsi).
      
      4. Disabling UMEM sets __I40E_CONFIG_BUSY, too.
      Signed-off-by: default avatarMaxim Mikityanskiy <maximmi@mellanox.com>
      Signed-off-by: default avatarBjörn Töpel <bjorn.topel@intel.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20191217162023.16011-4-maximmi@mellanox.com
      b3873a5b
    • Maxim Mikityanskiy's avatar
      net/mlx5e: Fix concurrency issues between config flow and XSK · 9cf88808
      Maxim Mikityanskiy authored
      After disabling resources necessary for XSK (the XDP program, channels,
      XSK queues), use synchronize_rcu to wait until the XSK wakeup function
      finishes, before freeing the resources.
      
      Suspend XSK wakeups during switching channels. If the XDP program is
      being removed, synchronize_rcu before closing the old channels to allow
      XSK wakeup to complete.
      Signed-off-by: default avatarMaxim Mikityanskiy <maximmi@mellanox.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20191217162023.16011-3-maximmi@mellanox.com
      9cf88808
    • Maxim Mikityanskiy's avatar
      xsk: Add rcu_read_lock around the XSK wakeup · 06870682
      Maxim Mikityanskiy authored
      The XSK wakeup callback in drivers makes some sanity checks before
      triggering NAPI. However, some configuration changes may occur during
      this function that affect the result of those checks. For example, the
      interface can go down, and all the resources will be destroyed after the
      checks in the wakeup function, but before it attempts to use these
      resources. Wrap this callback in rcu_read_lock to allow driver to
      synchronize_rcu before actually destroying the resources.
      
      xsk_wakeup is a new function that encapsulates calling ndo_xsk_wakeup
      wrapped into the RCU lock. After this commit, xsk_poll starts using
      xsk_wakeup and checks xs->zc instead of ndo_xsk_wakeup != NULL to decide
      ndo_xsk_wakeup should be called. It also fixes a bug introduced with the
      need_wakeup feature: a non-zero-copy socket may be used with a driver
      supporting zero-copy, and in this case ndo_xsk_wakeup should not be
      called, so the xs->zc check is the correct one.
      
      Fixes: 77cd0d7b ("xsk: add support for need_wakeup flag in AF_XDP rings")
      Signed-off-by: default avatarMaxim Mikityanskiy <maximmi@mellanox.com>
      Signed-off-by: default avatarBjörn Töpel <bjorn.topel@intel.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20191217162023.16011-2-maximmi@mellanox.com
      06870682
  2. 18 Dec, 2019 19 commits
  3. 17 Dec, 2019 10 commits
    • David S. Miller's avatar
      Merge tag 'wireless-drivers-2019-12-17' of... · 040cda8a
      David S. Miller authored
      Merge tag 'wireless-drivers-2019-12-17' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers
      
      Kalle Valo says:
      
      ====================
      wireless-drivers fixes for v5.5
      
      First set of fixes for v5.5. Fixing security issues, some regressions
      and few major bugs.
      
      mwifiex
      
      * security fix for handling country Information Elements (CVE-2019-14895)
      
      * security fix for handling TDLS Information Elements
      
      ath9k
      
      * fix endian issue with ath9k_pci_owl_loader
      
      mt76
      
      * fix default mac address handling
      
      iwlwifi
      
      * fix merge damage which lead to firmware crashing during boot on some devices
      
      * fix device initialisation regression on some devices
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      040cda8a
    • Ioana Ciornei's avatar
      dpaa2-ptp: fix double free of the ptp_qoriq IRQ · daa6eb5a
      Ioana Ciornei authored
      Upon reusing the ptp_qoriq driver, the ptp_qoriq_free() function was
      used on the remove path to free any allocated resources.
      The ptp_qoriq IRQ is among these resources that are freed in
      ptp_qoriq_free() even though it is also a managed one (allocated using
      devm_request_threaded_irq).
      
      Drop the resource managed version of requesting the IRQ in order to not
      trigger a double free of the interrupt as below:
      
      [  226.731005] Trying to free already-free IRQ 126
      [  226.735533] WARNING: CPU: 6 PID: 749 at kernel/irq/manage.c:1707
      __free_irq+0x9c/0x2b8
      [  226.743435] Modules linked in:
      [  226.746480] CPU: 6 PID: 749 Comm: bash Tainted: G        W
      5.4.0-03629-gfd7102c32b2c-dirty #912
      [  226.755857] Hardware name: NXP Layerscape LX2160ARDB (DT)
      [  226.761244] pstate: 40000085 (nZcv daIf -PAN -UAO)
      [  226.766022] pc : __free_irq+0x9c/0x2b8
      [  226.769758] lr : __free_irq+0x9c/0x2b8
      [  226.773493] sp : ffff8000125039f0
      (...)
      [  226.856275] Call trace:
      [  226.858710]  __free_irq+0x9c/0x2b8
      [  226.862098]  free_irq+0x30/0x70
      [  226.865229]  devm_irq_release+0x14/0x20
      [  226.869054]  release_nodes+0x1b0/0x220
      [  226.872790]  devres_release_all+0x34/0x50
      [  226.876790]  device_release_driver_internal+0x100/0x1c0
      
      Fixes: d346c9e8 ("dpaa2-ptp: reuse ptp_qoriq driver")
      Cc: Yangbo Lu <yangbo.lu@nxp.com>
      Signed-off-by: default avatarIoana Ciornei <ioana.ciornei@nxp.com>
      Reviewed-by: default avatarYangbo Lu <yangbo.lu@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      daa6eb5a
    • Daniel Borkmann's avatar
      bpf: Fix cgroup local storage prog tracking · e4730423
      Daniel Borkmann authored
      Recently noticed that we're tracking programs related to local storage maps
      through their prog pointer. This is a wrong assumption since the prog pointer
      can still change throughout the verification process, for example, whenever
      bpf_patch_insn_single() is called.
      
      Therefore, the prog pointer that was assigned via bpf_cgroup_storage_assign()
      is not guaranteed to be the same as we pass in bpf_cgroup_storage_release()
      and the map would therefore remain in busy state forever. Fix this by using
      the prog's aux pointer which is stable throughout verification and beyond.
      
      Fixes: de9cbbaa ("bpf: introduce cgroup storage maps")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/1471c69eca3022218666f909bc927a92388fd09e.1576580332.git.daniel@iogearbox.net
      e4730423
    • David S. Miller's avatar
      Merge tag 'mac80211-for-net-2019-10-16' of... · ad125c6c
      David S. Miller authored
      Merge tag 'mac80211-for-net-2019-10-16' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
      
      Johannes Berg says:
      
      ====================
      A handful of fixes:
       * disable AQL on most drivers, addressing the iwlwifi issues
       * fix double-free on network namespace changes
       * fix TID field in frames injected through monitor interfaces
       * fix ieee80211_calc_rx_airtime()
       * fix NULL pointer dereference in rfkill (and remove BUG_ON)
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ad125c6c
    • Arnd Bergmann's avatar
      net: dsa: ocelot: add NET_VENDOR_MICROSEMI dependency · 95bed1a9
      Arnd Bergmann authored
      Selecting MSCC_OCELOT_SWITCH is not possible when NET_VENDOR_MICROSEMI
      is disabled:
      
      WARNING: unmet direct dependencies detected for MSCC_OCELOT_SWITCH
        Depends on [n]: NETDEVICES [=y] && ETHERNET [=n] && NET_VENDOR_MICROSEMI [=n] && NET_SWITCHDEV [=y] && HAS_IOMEM [=y]
        Selected by [m]:
        - NET_DSA_MSCC_FELIX [=m] && NETDEVICES [=y] && HAVE_NET_DSA [=y] && NET_DSA [=y] && PCI [=y]
      
      Add a Kconfig dependency on NET_VENDOR_MICROSEMI, which also implies
      CONFIG_NETDEVICES.
      
      Depending on a vendor config violates menuconfig locality for the DSA
      driver, but is the smallest compromise since all other solutions are
      much more complicated (see [0]).
      
      https://www.spinics.net/lists/netdev/msg618808.html
      
      Fixes: 56051948 ("net: dsa: ocelot: add driver for Felix switch family")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarMao Wenan <maowenan@huawei.com>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      95bed1a9
    • Navid Emamdoost's avatar
      net: gemini: Fix memory leak in gmac_setup_txqs · f37f7103
      Navid Emamdoost authored
      In the implementation of gmac_setup_txqs() the allocated desc_ring is
      leaked if TX queue base is not aligned. Release it via
      dma_free_coherent.
      
      Fixes: 4d5ae32f ("net: ethernet: Add a driver for Gemini gigabit ethernet")
      Signed-off-by: default avatarNavid Emamdoost <navid.emamdoost@gmail.com>
      Reviewed-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f37f7103
    • Florian Fainelli's avatar
      net: dsa: b53: Fix egress flooding settings · 63cc54a6
      Florian Fainelli authored
      There were several issues with 53568438 ("net: dsa: b53: Add support for port_egress_floods callback") that resulted in breaking connectivity for standalone ports:
      
      - both user and CPU ports must allow unicast and multicast forwarding by
        default otherwise this just flat out breaks connectivity for
        standalone DSA ports
      - IP multicast is treated similarly as multicast, but has separate
        control registers
      - the UC, MC and IPMC lookup failure register offsets were wrong, and
        instead used bit values that are meaningful for the
        B53_IP_MULTICAST_CTRL register
      
      Fixes: 53568438 ("net: dsa: b53: Add support for port_egress_floods callback")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: default avatarVivien Didelot <vivien.didelot@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      63cc54a6
    • David S. Miller's avatar
      Merge branch 'vsock-fixes' · 1865a7b3
      David S. Miller authored
      Stefano Garzarella says:
      
      ====================
      vsock/virtio: fix null-pointer dereference and related precautions
      
      This series mainly solves a possible null-pointer dereference in
      virtio_transport_recv_listen() introduced with the multi-transport
      support [PATCH 1].
      
      PATCH 2 adds a WARN_ON check for the same potential issue
      and a returned error in the virtio_transport_send_pkt_info() function
      to avoid crashing the kernel.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1865a7b3
    • Stefano Garzarella's avatar
      vsock/virtio: add WARN_ON check on virtio_transport_get_ops() · 4aaf5961
      Stefano Garzarella authored
      virtio_transport_get_ops() and virtio_transport_send_pkt_info()
      can only be used on connecting/connected sockets, since a socket
      assigned to a transport is required.
      
      This patch adds a WARN_ON() on virtio_transport_get_ops() to check
      this requirement, a comment and a returned error on
      virtio_transport_send_pkt_info(),
      Signed-off-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4aaf5961
    • Stefano Garzarella's avatar
      vsock/virtio: fix null-pointer dereference in virtio_transport_recv_listen() · df18fa14
      Stefano Garzarella authored
      With multi-transport support, listener sockets are not bound to any
      transport. So, calling virtio_transport_reset(), when an error
      occurs, on a listener socket produces the following null-pointer
      dereference:
      
        BUG: kernel NULL pointer dereference, address: 00000000000000e8
        #PF: supervisor read access in kernel mode
        #PF: error_code(0x0000) - not-present page
        PGD 0 P4D 0
        Oops: 0000 [#1] SMP PTI
        CPU: 0 PID: 20 Comm: kworker/0:1 Not tainted 5.5.0-rc1-ste-00003-gb4be21f316ac-dirty #56
        Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014
        Workqueue: virtio_vsock virtio_transport_rx_work [vmw_vsock_virtio_transport]
        RIP: 0010:virtio_transport_send_pkt_info+0x20/0x130 [vmw_vsock_virtio_transport_common]
        Code: 1f 84 00 00 00 00 00 0f 1f 00 55 48 89 e5 41 57 41 56 41 55 49 89 f5 41 54 49 89 fc 53 48 83 ec 10 44 8b 76 20 e8 c0 ba fe ff <48> 8b 80 e8 00 00 00 e8 64 e3 7d c1 45 8b 45 00 41 8b 8c 24 d4 02
        RSP: 0018:ffffc900000b7d08 EFLAGS: 00010282
        RAX: 0000000000000000 RBX: ffff88807bf12728 RCX: 0000000000000000
        RDX: ffff88807bf12700 RSI: ffffc900000b7d50 RDI: ffff888035c84000
        RBP: ffffc900000b7d40 R08: ffff888035c84000 R09: ffffc900000b7d08
        R10: ffff8880781de800 R11: 0000000000000018 R12: ffff888035c84000
        R13: ffffc900000b7d50 R14: 0000000000000000 R15: ffff88807bf12724
        FS:  0000000000000000(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00000000000000e8 CR3: 00000000790f4004 CR4: 0000000000160ef0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Call Trace:
         virtio_transport_reset+0x59/0x70 [vmw_vsock_virtio_transport_common]
         virtio_transport_recv_pkt+0x5bb/0xe50 [vmw_vsock_virtio_transport_common]
         ? detach_buf_split+0xf1/0x130
         virtio_transport_rx_work+0xba/0x130 [vmw_vsock_virtio_transport]
         process_one_work+0x1c0/0x300
         worker_thread+0x45/0x3c0
         kthread+0xfc/0x130
         ? current_work+0x40/0x40
         ? kthread_park+0x90/0x90
         ret_from_fork+0x35/0x40
        Modules linked in: sunrpc kvm_intel kvm vmw_vsock_virtio_transport vmw_vsock_virtio_transport_common irqbypass vsock virtio_rng rng_core
        CR2: 00000000000000e8
        ---[ end trace e75400e2ea2fa824 ]---
      
      This happens because virtio_transport_reset() calls
      virtio_transport_send_pkt_info() that can be used only on
      connecting/connected sockets.
      
      This patch fixes the issue, using virtio_transport_reset_no_sock()
      instead of virtio_transport_reset() when we are handling a listener
      socket.
      
      Fixes: c0cfa2d8 ("vsock: add multi-transports support")
      Signed-off-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      df18fa14
  4. 16 Dec, 2019 2 commits