1. 27 Sep, 2019 40 commits
    • Colin Ian King's avatar
      net: tap: clean up an indentation issue · faeacb6d
      Colin Ian King authored
      There is a statement that is indented too deeply, remove
      the extraneous tab.
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      faeacb6d
    • Navid Emamdoost's avatar
      nfp: abm: fix memory leak in nfp_abm_u32_knode_replace · 78beef62
      Navid Emamdoost authored
      In nfp_abm_u32_knode_replace if the allocation for match fails it should
      go to the error handling instead of returning. Updated other gotos to
      have correct errno returned, too.
      Signed-off-by: default avatarNavid Emamdoost <navid.emamdoost@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      78beef62
    • Eric Dumazet's avatar
      tcp: better handle TCP_USER_TIMEOUT in SYN_SENT state · a41e8a88
      Eric Dumazet authored
      Yuchung Cheng and Marek Majkowski independently reported a weird
      behavior of TCP_USER_TIMEOUT option when used at connect() time.
      
      When the TCP_USER_TIMEOUT is reached, tcp_write_timeout()
      believes the flow should live, and the following condition
      in tcp_clamp_rto_to_user_timeout() programs one jiffie timers :
      
          remaining = icsk->icsk_user_timeout - elapsed;
          if (remaining <= 0)
              return 1; /* user timeout has passed; fire ASAP */
      
      This silly situation ends when the max syn rtx count is reached.
      
      This patch makes sure we honor both TCP_SYNCNT and TCP_USER_TIMEOUT,
      avoiding these spurious SYN packets.
      
      Fixes: b701a99e ("tcp: Add tcp_clamp_rto_to_user_timeout() helper to improve accuracy")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarYuchung Cheng <ycheng@google.com>
      Reported-by: default avatarMarek Majkowski <marek@cloudflare.com>
      Cc: Jon Maxwell <jmaxwell37@gmail.com>
      Link: https://marc.info/?l=linux-netdev&m=156940118307949&w=2Acked-by: default avatarJon Maxwell <jmaxwell37@gmail.com>
      Tested-by: default avatarMarek Majkowski <marek@cloudflare.com>
      Signed-off-by: default avatarMarek Majkowski <marek@cloudflare.com>
      Acked-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a41e8a88
    • Florian Westphal's avatar
      sk_buff: drop all skb extensions on free and skb scrubbing · 174e2381
      Florian Westphal authored
      Now that we have a 3rd extension, add a new helper that drops the
      extension space and use it when we need to scrub an sk_buff.
      
      At this time, scrubbing clears secpath and bridge netfilter data, but
      retains the tc skb extension, after this patch all three get cleared.
      
      NAPI reuse/free assumes we can only have a secpath attached to skb, but
      it seems better to clear all extensions there as well.
      
      v2: add unlikely hint (Eric Dumazet)
      
      Fixes: 95a7233c ("net: openvswitch: Set OvS recirc_id from tc chain index")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      174e2381
    • Kevin(Yudong) Yang's avatar
      tcp_bbr: fix quantization code to not raise cwnd if not probing bandwidth · 6b3656a6
      Kevin(Yudong) Yang authored
      There was a bug in the previous logic that attempted to ensure gain cycling
      gets inflight above BDP even for small BDPs. This code correctly raised and
      lowered target inflight values during the gain cycle. And this code
      correctly ensured that cwnd was raised when probing bandwidth. However, it
      did not correspondingly ensure that cwnd was *not* raised in this way when
      *not* probing for bandwidth. The result was that small-BDP flows that were
      always cwnd-bound could go for many cycles with a fixed cwnd, and not probe
      or yield bandwidth at all. This meant that multiple small-BDP flows could
      fail to converge in their bandwidth allocations.
      
      Fixes: 3c346b233c68 ("tcp_bbr: fix bw probing to raise in-flight data for very small BDPs")
      Signed-off-by: default avatarKevin(Yudong) Yang <yyd@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Acked-by: default avatarYuchung Cheng <ycheng@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Acked-by: default avatarPriyaranjan Jha <priyarjha@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6b3656a6
    • David S. Miller's avatar
      Merge branch 'mlxsw-Various-fixes' · 94e7e5da
      David S. Miller authored
      Ido Schimmel says:
      
      ====================
      mlxsw: Various fixes
      
      This patchset includes two small fixes for the mlxsw driver and one
      patch which clarifies recently introduced devlink-trap documentation.
      
      Patch #1 clears the port's VLAN filters during port initialization. This
      ensures that the drop reason reported to the user is consistent. The
      problem is explained in detail in the commit message.
      
      Patch #2 clarifies the description of one of the traps exposed via
      devlink-trap.
      
      Patch #3 from Danielle forbids the installation of a tc filter with
      multiple mirror actions since this is not supported by the device. The
      failure is communicated to the user via extack.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      94e7e5da
    • Danielle Ratson's avatar
      mlxsw: spectrum_flower: Fail in case user specifies multiple mirror actions · 52feb8b5
      Danielle Ratson authored
      The ASIC can only mirror a packet to one port, but when user is trying
      to set more than one mirror action, it doesn't fail.
      
      Add a check if more than one mirror action was specified per rule and if so,
      fail for not being supported.
      
      Fixes: d0d13c18 ("mlxsw: spectrum_acl: Add support for mirror action")
      Signed-off-by: default avatarDanielle Ratson <danieller@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      52feb8b5
    • Ido Schimmel's avatar
      Documentation: Clarify trap's description · 44bde514
      Ido Schimmel authored
      Alex noted that the below description might not be obvious to all users.
      Clarify it by adding an example.
      
      Fixes: f3047ca0 ("Documentation: Add devlink-trap documentation")
      Reported-by: default avatarAlex Kushnarov <alexanderk@mellanox.com>
      Reviewed-by: default avatarAlex Kushnarov <alexanderk@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      44bde514
    • Ido Schimmel's avatar
      mlxsw: spectrum: Clear VLAN filters during port initialization · 979b9b25
      Ido Schimmel authored
      When a port is created, its VLAN filters are not cleared by the
      firmware. This causes tagged packets to be later dropped by the ingress
      STP filters, which default to DISCARD state.
      
      The above did not matter much until commit b5ce611f ("mlxsw:
      spectrum: Add devlink-trap support") where we exposed the drop reason to
      users.
      
      Without this patch, the drop reason users will see is not consistent. If
      a port is enslaved to a VLAN-aware bridge and a packet with an invalid
      VLAN tries to ingress the bridge, it will be dropped due to ingress STP
      filter. If the VLAN is later enabled and then disabled, the packet will
      be dropped by the ingress VLAN filter despite the above being a
      seemingly NOP operation.
      
      Fix this by clearing all the VLAN filters during port initialization.
      Adjust the test accordingly.
      
      Fixes: b5ce611f ("mlxsw: spectrum: Add devlink-trap support")
      Reported-by: default avatarAlex Kushnarov <alexanderk@mellanox.com>
      Tested-by: default avatarAlex Kushnarov <alexanderk@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      979b9b25
    • Colin Ian King's avatar
      net: ena: clean up indentation issue · 4208966f
      Colin Ian King authored
      There memset is indented incorrectly, remove the extraneous tabs.
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4208966f
    • Colin Ian King's avatar
      NFC: st95hf: clean up indentation issue · 6ba5bbba
      Colin Ian King authored
      The return statement is indented incorrectly, add in a missing
      tab and remove an extraneous space after the return
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6ba5bbba
    • Hans Andersson's avatar
      net: phy: micrel: add Asym Pause workaround for KSZ9021 · 407d8098
      Hans Andersson authored
      The Micrel KSZ9031 PHY may fail to establish a link when the Asymmetric
      Pause capability is set. This issue is described in a Silicon Errata
      (DS80000691D or DS80000692D), which advises to always disable the
      capability.
      
      Micrel KSZ9021 has no errata, but has the same issue with Asymmetric Pause.
      This patch apply the same workaround as the one for KSZ9031.
      
      Fixes: 3aed3e2a ("net: phy: micrel: add Asym Pause workaround")
      Signed-off-by: default avatarHans Andersson <hans.andersson@cellavision.se>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      407d8098
    • Kunihiko Hayashi's avatar
      net: socionext: ave: Avoid using netdev_err() before calling register_netdev() · fd4a8093
      Kunihiko Hayashi authored
      Until calling register_netdev(), ndev->dev_name isn't specified, and
      netdev_err() displays "(unnamed net_device)".
      
          ave 65000000.ethernet (unnamed net_device) (uninitialized): invalid phy-mode setting
          ave: probe of 65000000.ethernet failed with error -22
      
      This replaces netdev_err() with dev_err() before calling register_netdev().
      Signed-off-by: default avatarKunihiko Hayashi <hayashi.kunihiko@socionext.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fd4a8093
    • Jacob Keller's avatar
      ptp: correctly disable flags on old ioctls · 2df4de16
      Jacob Keller authored
      Commit 41560658 ("PTP: introduce new versions of IOCTLs",
      2019-09-13) introduced new versions of the PTP ioctls which actually
      validate that the flags are acceptable values.
      
      As part of this, it cleared the flags value using a bitwise
      and+negation, in an attempt to prevent the old ioctl from accidentally
      enabling new features.
      
      This is incorrect for a couple of reasons. First, it results in
      accidentally preventing previously working flags on the request ioctl.
      By clearing the "valid" flags, we now no longer allow setting the
      enable, rising edge, or falling edge flags.
      
      Second, if we add new additional flags in the future, they must not be
      set by the old ioctl. (Since the flag wasn't checked before, we could
      potentially break userspace programs which sent garbage flag data.
      
      The correct way to resolve this is to check for and clear all but the
      originally valid flags.
      
      Create defines indicating which flags are correctly checked and
      interpreted by the original ioctls. Use these to clear any bits which
      will not be correctly interpreted by the original ioctls.
      
      In the future, new flags must be added to the VALID_FLAGS macros, but
      *not* to the V1_VALID_FLAGS macros. In this way, new features may be
      exposed over the v2 ioctls, but without breaking previous userspace
      which happened to not clear the flags value properly. The old ioctl will
      continue to behave the same way, while the new ioctl gains the benefit
      of using the flags fields.
      
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Felipe Balbi <felipe.balbi@linux.intel.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Christopher Hall <christopher.s.hall@intel.com>
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Acked-by: default avatarRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2df4de16
    • Randy Dunlap's avatar
      lib: dimlib: fix help text typos · 991ad2b2
      Randy Dunlap authored
      Fix help text typos for DIMLIB.
      
      Fixes: 4f75da36 ("linux/dim: Move implementation to .c files")
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Cc: Uwe Kleine-König <uwe@kleine-koenig.org>
      Cc: Tal Gilboa <talgi@mellanox.com>
      Cc: Saeed Mahameed <saeedm@mellanox.com>
      Acked-by: default avatarUwe Kleine-König <uwe@kleine-koenig.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      991ad2b2
    • Marek Vasut's avatar
      net: dsa: microchip: Always set regmap stride to 1 · a3aa6e65
      Marek Vasut authored
      The regmap stride is set to 1 for regmap describing 8bit registers already.
      However, for 16/32/64bit registers, the stride is 2/4/8 respectively. This
      is not correct, as the switch protocol supports unaligned register reads
      and writes and the KSZ87xx even uses such unaligned register accesses to
      read e.g. MIB counter.
      
      This patch fixes MIB counter access on KSZ87xx.
      Signed-off-by: default avatarMarek Vasut <marex@denx.de>
      Cc: Andrew Lunn <andrew@lunn.ch>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Florian Fainelli <f.fainelli@gmail.com>
      Cc: George McCollister <george.mccollister@gmail.com>
      Cc: Tristram Ha <Tristram.Ha@microchip.com>
      Cc: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
      Cc: Woojung Huh <woojung.huh@microchip.com>
      Fixes: 46558d60 ("net: dsa: microchip: Initial SPI regmap support")
      Fixes: 255b59ad ("net: dsa: microchip: Factor out regmap config generation into common header")
      Reviewed-by: default avatarGeorge McCollister <george.mccollister@gmail.com>
      Tested-by: default avatarGeorge McCollister <george.mccollister@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a3aa6e65
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · c5f095ba
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains Netfilter fixes for net:
      
      1) Add NFT_CHAIN_POLICY_UNSET to replace hardcoded -1 to
         specify that the chain policy is unset. The chain policy
         field is actually defined as an 8-bit unsigned integer.
      
      2) Remove always true condition reported by smatch in
         chain policy check.
      
      3) Fix element lookup on dynamic sets, from Florian Westphal.
      
      4) Use __u8 in ebtables uapi header, from Masahiro Yamada.
      
      5) Bogus EBUSY when removing flowtable after chain flush,
         from Laura Garcia Liebana.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c5f095ba
    • Navid Emamdoost's avatar
      nfp: flower: fix memory leak in nfp_flower_spawn_vnic_reprs · 8ce39eb5
      Navid Emamdoost authored
      In nfp_flower_spawn_vnic_reprs in the loop if initialization or the
      allocations fail memory is leaked. Appropriate releases are added.
      
      Fixes: b9452452 ("nfp: flower: add per repr private data for LAG offload")
      Signed-off-by: default avatarNavid Emamdoost <navid.emamdoost@gmail.com>
      Acked-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8ce39eb5
    • Navid Emamdoost's avatar
      nfp: flower: prevent memory leak in nfp_flower_spawn_phy_reprs · 8572cea1
      Navid Emamdoost authored
      In nfp_flower_spawn_phy_reprs, in the for loop over eth_tbl if any of
      intermediate allocations or initializations fail memory is leaked.
      requiered releases are added.
      
      Fixes: b9452452 ("nfp: flower: add per repr private data for LAG offload")
      Signed-off-by: default avatarNavid Emamdoost <navid.emamdoost@gmail.com>
      Acked-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8572cea1
    • Paul Blakey's avatar
      net/sched: Set default of CONFIG_NET_TC_SKB_EXT to N · dfe5999d
      Paul Blakey authored
      This a new feature, it is preferred that it defaults to N.
      We will probe the feature support from userspace before actually using it.
      
      Fixes: 95a7233c ('net: openvswitch: Set OvS recirc_id from tc chain index')
      Signed-off-by: default avatarPaul Blakey <paulb@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dfe5999d
    • David Ahern's avatar
      vrf: Do not attempt to create IPv6 mcast rule if IPv6 is disabled · dac91170
      David Ahern authored
      A user reported that vrf create fails when IPv6 is disabled at boot using
      'ipv6.disable=1':
         https://bugzilla.kernel.org/show_bug.cgi?id=204903
      
      The failure is adding fib rules at create time. Add RTNL_FAMILY_IP6MR to
      the check in vrf_fib_rule if ipv6_mod_enabled is disabled.
      
      Fixes: e4a38c0c ("ipv6: add vrf table handling code for ipv6 mcast")
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Cc: Patrick Ruddy <pruddy@vyatta.att-mail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dac91170
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 3c30819d
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2019-09-27
      
      The following pull-request contains BPF updates for your *net* tree.
      
      The main changes are:
      
      1) Fix libbpf's BTF dumper to not skip anonymous enum definitions, from Andrii.
      
      2) Fix BTF verifier issues when handling the BTF of vmlinux, from Alexei.
      
      3) Fix nested calls into bpf_event_output() from TCP sockops BPF
         programs, from Allan.
      
      4) Fix NULL pointer dereference in AF_XDP's xsk map creation when
         allocation fails, from Jonathan.
      
      5) Remove unneeded 64 byte alignment requirement of the AF_XDP UMEM
         headroom, from Bjorn.
      
      6) Remove unused XDP_OPTIONS getsockopt() call which results in an error
         on older kernels, from Toke.
      
      7) Fix a client/server race in tcp_rtt BPF kselftest case, from Stanislav.
      
      8) Fix indentation issue in BTF's btf_enum_check_kflag_member(), from Colin.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3c30819d
    • David S. Miller's avatar
      Merge branch 'qdisc-destroy' · 5c7ff181
      David S. Miller authored
      Vlad Buslov says:
      
      ====================
      Fix Qdisc destroy issues caused by adding fine-grained locking to filter API
      
      TC filter API unlocking introduced several new fine-grained locks. The
      change caused sleeping-while-atomic BUGs in several Qdiscs that call cls
      APIs which need to obtain new mutex while holding sch tree spinlock. This
      series fixes affected Qdiscs by ensuring that cls API that became sleeping
      is only called outside of sch tree lock critical section.
      ====================
      Acked-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5c7ff181
    • Vlad Buslov's avatar
      net: sched: sch_sfb: don't call qdisc_put() while holding tree lock · e3ae1f96
      Vlad Buslov authored
      Recent changes that removed rtnl dependency from rules update path of tc
      also made tcf_block_put() function sleeping. This function is called from
      ops->destroy() of several Qdisc implementations, which in turn is called by
      qdisc_put(). Some Qdiscs call qdisc_put() while holding sch tree spinlock,
      which results sleeping-while-atomic BUG.
      
      Steps to reproduce for sfb:
      
      tc qdisc add dev ens1f0 handle 1: root sfb
      tc qdisc add dev ens1f0 parent 1:10 handle 50: sfq perturb 10
      tc qdisc change dev ens1f0 root handle 1: sfb
      
      Resulting dmesg:
      
      [ 7265.938717] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:909
      [ 7265.940152] in_atomic(): 1, irqs_disabled(): 0, pid: 28579, name: tc
      [ 7265.941455] INFO: lockdep is turned off.
      [ 7265.942744] CPU: 11 PID: 28579 Comm: tc Tainted: G        W         5.3.0-rc8+ #721
      [ 7265.944065] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
      [ 7265.945396] Call Trace:
      [ 7265.946709]  dump_stack+0x85/0xc0
      [ 7265.947994]  ___might_sleep.cold+0xac/0xbc
      [ 7265.949282]  __mutex_lock+0x5b/0x960
      [ 7265.950543]  ? tcf_chain0_head_change_cb_del.isra.0+0x1b/0xf0
      [ 7265.951803]  ? tcf_chain0_head_change_cb_del.isra.0+0x1b/0xf0
      [ 7265.953022]  tcf_chain0_head_change_cb_del.isra.0+0x1b/0xf0
      [ 7265.954248]  tcf_block_put_ext.part.0+0x21/0x50
      [ 7265.955478]  tcf_block_put+0x50/0x70
      [ 7265.956694]  sfq_destroy+0x15/0x50 [sch_sfq]
      [ 7265.957898]  qdisc_destroy+0x5f/0x160
      [ 7265.959099]  sfb_change+0x175/0x330 [sch_sfb]
      [ 7265.960304]  tc_modify_qdisc+0x324/0x840
      [ 7265.961503]  rtnetlink_rcv_msg+0x170/0x4b0
      [ 7265.962692]  ? netlink_deliver_tap+0x95/0x400
      [ 7265.963876]  ? rtnl_dellink+0x2d0/0x2d0
      [ 7265.965064]  netlink_rcv_skb+0x49/0x110
      [ 7265.966251]  netlink_unicast+0x171/0x200
      [ 7265.967427]  netlink_sendmsg+0x224/0x3f0
      [ 7265.968595]  sock_sendmsg+0x5e/0x60
      [ 7265.969753]  ___sys_sendmsg+0x2ae/0x330
      [ 7265.970916]  ? ___sys_recvmsg+0x159/0x1f0
      [ 7265.972074]  ? do_wp_page+0x9c/0x790
      [ 7265.973233]  ? __handle_mm_fault+0xcd3/0x19e0
      [ 7265.974407]  __sys_sendmsg+0x59/0xa0
      [ 7265.975591]  do_syscall_64+0x5c/0xb0
      [ 7265.976753]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [ 7265.977938] RIP: 0033:0x7f229069f7b8
      [ 7265.979117] Code: 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 65 8f 0c 00 8b 00 85 c0 75 17 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 28 89 5
      4
      [ 7265.981681] RSP: 002b:00007ffd7ed2d158 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      [ 7265.983001] RAX: ffffffffffffffda RBX: 000000005d813ca1 RCX: 00007f229069f7b8
      [ 7265.984336] RDX: 0000000000000000 RSI: 00007ffd7ed2d1c0 RDI: 0000000000000003
      [ 7265.985682] RBP: 0000000000000000 R08: 0000000000000001 R09: 000000000165c9a0
      [ 7265.987021] R10: 0000000000404eda R11: 0000000000000246 R12: 0000000000000001
      [ 7265.988309] R13: 000000000047f640 R14: 0000000000000000 R15: 0000000000000000
      
      In sfb_change() function use qdisc_purge_queue() instead of
      qdisc_tree_flush_backlog() to properly reset old child Qdisc and save
      pointer to it into local temporary variable. Put reference to Qdisc after
      sch tree lock is released in order not to call potentially sleeping cls API
      in atomic section. This is safe to do because Qdisc has already been reset
      by qdisc_purge_queue() inside sch tree lock critical section.
      
      Reported-by: syzbot+ac54455281db908c581e@syzkaller.appspotmail.com
      Fixes: c266f64d ("net: sched: protect block state with mutex")
      Suggested-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarVlad Buslov <vladbu@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e3ae1f96
    • Vlad Buslov's avatar
      net: sched: multiq: don't call qdisc_put() while holding tree lock · c2999f7f
      Vlad Buslov authored
      Recent changes that removed rtnl dependency from rules update path of tc
      also made tcf_block_put() function sleeping. This function is called from
      ops->destroy() of several Qdisc implementations, which in turn is called by
      qdisc_put(). Some Qdiscs call qdisc_put() while holding sch tree spinlock,
      which results sleeping-while-atomic BUG.
      
      Steps to reproduce for multiq:
      
      tc qdisc add dev ens1f0 root handle 1: multiq
      tc qdisc add dev ens1f0 parent 1:10 handle 50: sfq perturb 10
      ethtool -L ens1f0 combined 2
      tc qdisc change dev ens1f0 root handle 1: multiq
      
      Resulting dmesg:
      
      [ 5539.419344] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:909
      [ 5539.420945] in_atomic(): 1, irqs_disabled(): 0, pid: 27658, name: tc
      [ 5539.422435] INFO: lockdep is turned off.
      [ 5539.423904] CPU: 21 PID: 27658 Comm: tc Tainted: G        W         5.3.0-rc8+ #721
      [ 5539.425400] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
      [ 5539.426911] Call Trace:
      [ 5539.428380]  dump_stack+0x85/0xc0
      [ 5539.429823]  ___might_sleep.cold+0xac/0xbc
      [ 5539.431262]  __mutex_lock+0x5b/0x960
      [ 5539.432682]  ? tcf_chain0_head_change_cb_del.isra.0+0x1b/0xf0
      [ 5539.434103]  ? __nla_validate_parse+0x51/0x840
      [ 5539.435493]  ? tcf_chain0_head_change_cb_del.isra.0+0x1b/0xf0
      [ 5539.436903]  tcf_chain0_head_change_cb_del.isra.0+0x1b/0xf0
      [ 5539.438327]  tcf_block_put_ext.part.0+0x21/0x50
      [ 5539.439752]  tcf_block_put+0x50/0x70
      [ 5539.441165]  sfq_destroy+0x15/0x50 [sch_sfq]
      [ 5539.442570]  qdisc_destroy+0x5f/0x160
      [ 5539.444000]  multiq_tune+0x14a/0x420 [sch_multiq]
      [ 5539.445421]  tc_modify_qdisc+0x324/0x840
      [ 5539.446841]  rtnetlink_rcv_msg+0x170/0x4b0
      [ 5539.448269]  ? netlink_deliver_tap+0x95/0x400
      [ 5539.449691]  ? rtnl_dellink+0x2d0/0x2d0
      [ 5539.451116]  netlink_rcv_skb+0x49/0x110
      [ 5539.452522]  netlink_unicast+0x171/0x200
      [ 5539.453914]  netlink_sendmsg+0x224/0x3f0
      [ 5539.455304]  sock_sendmsg+0x5e/0x60
      [ 5539.456686]  ___sys_sendmsg+0x2ae/0x330
      [ 5539.458071]  ? ___sys_recvmsg+0x159/0x1f0
      [ 5539.459461]  ? do_wp_page+0x9c/0x790
      [ 5539.460846]  ? __handle_mm_fault+0xcd3/0x19e0
      [ 5539.462263]  __sys_sendmsg+0x59/0xa0
      [ 5539.463661]  do_syscall_64+0x5c/0xb0
      [ 5539.465044]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [ 5539.466454] RIP: 0033:0x7f1fe08177b8
      [ 5539.467863] Code: 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 65 8f 0c 00 8b 00 85 c0 75 17 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 28 89 5
      4
      [ 5539.470906] RSP: 002b:00007ffe812de5d8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      [ 5539.472483] RAX: ffffffffffffffda RBX: 000000005d8135e3 RCX: 00007f1fe08177b8
      [ 5539.474069] RDX: 0000000000000000 RSI: 00007ffe812de640 RDI: 0000000000000003
      [ 5539.475655] RBP: 0000000000000000 R08: 0000000000000001 R09: 000000000182e9b0
      [ 5539.477203] R10: 0000000000404eda R11: 0000000000000246 R12: 0000000000000001
      [ 5539.478699] R13: 000000000047f640 R14: 0000000000000000 R15: 0000000000000000
      
      Rearrange locking in multiq_tune() in following ways:
      
      - In loop that removes Qdiscs from disabled queues, call
        qdisc_purge_queue() instead of qdisc_tree_flush_backlog() on Qdisc that
        is being destroyed. Save the Qdisc in temporary allocated array and call
        qdisc_put() on each element of the array after sch tree lock is released.
        This is safe to do because Qdiscs have already been reset by
        qdisc_purge_queue() inside sch tree lock critical section.
      
      - Do the same change for second loop that initializes Qdiscs for newly
        enabled queues in multiq_tune() function. Since sch tree lock is obtained
        and released on each iteration of this loop, just call qdisc_put()
        directly outside of critical section. Don't verify that old Qdisc is not
        noop_qdisc before releasing reference to it because such check is already
        performed by qdisc_put*() functions.
      
      Fixes: c266f64d ("net: sched: protect block state with mutex")
      Signed-off-by: default avatarVlad Buslov <vladbu@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c2999f7f
    • Vlad Buslov's avatar
      net: sched: sch_htb: don't call qdisc_put() while holding tree lock · 4ce70b4a
      Vlad Buslov authored
      Recent changes that removed rtnl dependency from rules update path of tc
      also made tcf_block_put() function sleeping. This function is called from
      ops->destroy() of several Qdisc implementations, which in turn is called by
      qdisc_put(). Some Qdiscs call qdisc_put() while holding sch tree spinlock,
      which results sleeping-while-atomic BUG.
      
      Steps to reproduce for htb:
      
      tc qdisc add dev ens1f0 root handle 1: htb default 12
      tc class add dev ens1f0 parent 1: classid 1:1 htb rate 100kbps ceil 100kbps
      tc qdisc add dev ens1f0 parent 1:1 handle 40: sfq perturb 10
      tc class add dev ens1f0 parent 1:1 classid 1:2 htb rate 100kbps ceil 100kbps
      
      Resulting dmesg:
      
      [ 4791.148551] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:909
      [ 4791.151354] in_atomic(): 1, irqs_disabled(): 0, pid: 27273, name: tc
      [ 4791.152805] INFO: lockdep is turned off.
      [ 4791.153605] CPU: 19 PID: 27273 Comm: tc Tainted: G        W         5.3.0-rc8+ #721
      [ 4791.154336] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
      [ 4791.155075] Call Trace:
      [ 4791.155803]  dump_stack+0x85/0xc0
      [ 4791.156529]  ___might_sleep.cold+0xac/0xbc
      [ 4791.157251]  __mutex_lock+0x5b/0x960
      [ 4791.157966]  ? console_unlock+0x363/0x5d0
      [ 4791.158676]  ? tcf_chain0_head_change_cb_del.isra.0+0x1b/0xf0
      [ 4791.159395]  ? tcf_chain0_head_change_cb_del.isra.0+0x1b/0xf0
      [ 4791.160103]  tcf_chain0_head_change_cb_del.isra.0+0x1b/0xf0
      [ 4791.160815]  tcf_block_put_ext.part.0+0x21/0x50
      [ 4791.161530]  tcf_block_put+0x50/0x70
      [ 4791.162233]  sfq_destroy+0x15/0x50 [sch_sfq]
      [ 4791.162936]  qdisc_destroy+0x5f/0x160
      [ 4791.163642]  htb_change_class.cold+0x5df/0x69d [sch_htb]
      [ 4791.164505]  tc_ctl_tclass+0x19d/0x480
      [ 4791.165360]  rtnetlink_rcv_msg+0x170/0x4b0
      [ 4791.166191]  ? netlink_deliver_tap+0x95/0x400
      [ 4791.166907]  ? rtnl_dellink+0x2d0/0x2d0
      [ 4791.167625]  netlink_rcv_skb+0x49/0x110
      [ 4791.168345]  netlink_unicast+0x171/0x200
      [ 4791.169058]  netlink_sendmsg+0x224/0x3f0
      [ 4791.169771]  sock_sendmsg+0x5e/0x60
      [ 4791.170475]  ___sys_sendmsg+0x2ae/0x330
      [ 4791.171183]  ? ___sys_recvmsg+0x159/0x1f0
      [ 4791.171894]  ? do_wp_page+0x9c/0x790
      [ 4791.172595]  ? __handle_mm_fault+0xcd3/0x19e0
      [ 4791.173309]  __sys_sendmsg+0x59/0xa0
      [ 4791.174024]  do_syscall_64+0x5c/0xb0
      [ 4791.174725]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [ 4791.175435] RIP: 0033:0x7f0aa41497b8
      [ 4791.176129] Code: 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 65 8f 0c 00 8b 00 85 c0 75 17 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 28 89 5
      4
      [ 4791.177532] RSP: 002b:00007fff4e37d588 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      [ 4791.178243] RAX: ffffffffffffffda RBX: 000000005d8132f7 RCX: 00007f0aa41497b8
      [ 4791.178947] RDX: 0000000000000000 RSI: 00007fff4e37d5f0 RDI: 0000000000000003
      [ 4791.179662] RBP: 0000000000000000 R08: 0000000000000001 R09: 00000000020149a0
      [ 4791.180382] R10: 0000000000404eda R11: 0000000000000246 R12: 0000000000000001
      [ 4791.181100] R13: 000000000047f640 R14: 0000000000000000 R15: 0000000000000000
      
      In htb_change_class() function save parent->leaf.q to local temporary
      variable and put reference to it after sch tree lock is released in order
      not to call potentially sleeping cls API in atomic section. This is safe to
      do because Qdisc has already been reset by qdisc_purge_queue() inside sch
      tree lock critical section.
      
      Fixes: c266f64d ("net: sched: protect block state with mutex")
      Signed-off-by: default avatarVlad Buslov <vladbu@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4ce70b4a
    • Ka-Cheong Poon's avatar
      net/rds: Check laddr_check before calling it · 05733434
      Ka-Cheong Poon authored
      In rds_bind(), laddr_check is called without checking if it is NULL or
      not.  And rs_transport should be reset if rds_add_bound() fails.
      
      Fixes: c5c1a030 ("net/rds: An rds_sock is added too early to the hash table")
      Reported-by: syzbot+fae39afd2101a17ec624@syzkaller.appspotmail.com
      Signed-off-by: default avatarKa-Cheong Poon <ka-cheong.poon@oracle.com>
      Acked-by: default avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      05733434
    • David S. Miller's avatar
      Merge branch 'SO_PRIORITY' · 4e1e83be
      David S. Miller authored
      Eric Dumazet says:
      
      ====================
      tcp: provide correct skb->priority
      
      SO_PRIORITY socket option requests TCP egress packets
      to contain a user provided value.
      
      TCP manages to send most packets with the requested values,
      notably for TCP_ESTABLISHED state, but fails to do so for
      few packets.
      
      These packets are control packets sent on behalf
      of SYN_RECV or TIME_WAIT states.
      
      Note that to test this with packetdrill, it is a bit
      of a hassle, since packetdrill can not verify priority
      of egress packets, other than indirect observations,
      using for example sch_prio on its tunnel device.
      
      The bad skb priorities cause problems for GCP,
      as this field is one of the keys used in routing.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4e1e83be
    • Eric Dumazet's avatar
      tcp: honor SO_PRIORITY in TIME_WAIT state · f6c0f5d2
      Eric Dumazet authored
      ctl packets sent on behalf of TIME_WAIT sockets currently
      have a zero skb->priority, which can cause various problems.
      
      In this patch we :
      
      - add a tw_priority field in struct inet_timewait_sock.
      
      - populate it from sk->sk_priority when a TIME_WAIT is created.
      
      - For IPv4, change ip_send_unicast_reply() and its two
        callers to propagate tw_priority correctly.
        ip_send_unicast_reply() no longer changes sk->sk_priority.
      
      - For IPv6, make sure TIME_WAIT sockets pass their tw_priority
        field to tcp_v6_send_response() and tcp_v6_send_ack().
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f6c0f5d2
    • Eric Dumazet's avatar
      ipv6: tcp: provide sk->sk_priority to ctl packets · e9a5dcee
      Eric Dumazet authored
      We can populate skb->priority for some ctl packets
      instead of always using zero.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e9a5dcee
    • Eric Dumazet's avatar
      ipv6: add priority parameter to ip6_xmit() · 4f6570d7
      Eric Dumazet authored
      Currently, ip6_xmit() sets skb->priority based on sk->sk_priority
      
      This is not desirable for TCP since TCP shares the same ctl socket
      for a given netns. We want to be able to send RST or ACK packets
      with a non zero skb->priority.
      
      This patch has no functional change.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4f6570d7
    • Allan Zhang's avatar
      bpf: Fix bpf_event_output re-entry issue · 768fb61f
      Allan Zhang authored
      BPF_PROG_TYPE_SOCK_OPS program can reenter bpf_event_output because it
      can be called from atomic and non-atomic contexts since we don't have
      bpf_prog_active to prevent it happen.
      
      This patch enables 3 levels of nesting to support normal, irq and nmi
      context.
      
      We can easily reproduce the issue by running netperf crr mode with 100
      flows and 10 threads from netperf client side.
      
      Here is the whole stack dump:
      
      [  515.228898] WARNING: CPU: 20 PID: 14686 at kernel/trace/bpf_trace.c:549 bpf_event_output+0x1f9/0x220
      [  515.228903] CPU: 20 PID: 14686 Comm: tcp_crr Tainted: G        W        4.15.0-smp-fixpanic #44
      [  515.228904] Hardware name: Intel TBG,ICH10/Ikaria_QC_1b, BIOS 1.22.0 06/04/2018
      [  515.228905] RIP: 0010:bpf_event_output+0x1f9/0x220
      [  515.228906] RSP: 0018:ffff9a57ffc03938 EFLAGS: 00010246
      [  515.228907] RAX: 0000000000000012 RBX: 0000000000000001 RCX: 0000000000000000
      [  515.228907] RDX: 0000000000000000 RSI: 0000000000000096 RDI: ffffffff836b0f80
      [  515.228908] RBP: ffff9a57ffc039c8 R08: 0000000000000004 R09: 0000000000000012
      [  515.228908] R10: ffff9a57ffc1de40 R11: 0000000000000000 R12: 0000000000000002
      [  515.228909] R13: ffff9a57e13bae00 R14: 00000000ffffffff R15: ffff9a57ffc1e2c0
      [  515.228910] FS:  00007f5a3e6ec700(0000) GS:ffff9a57ffc00000(0000) knlGS:0000000000000000
      [  515.228910] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  515.228911] CR2: 0000537082664fff CR3: 000000061fed6002 CR4: 00000000000226f0
      [  515.228911] Call Trace:
      [  515.228913]  <IRQ>
      [  515.228919]  [<ffffffff82c6c6cb>] bpf_sockopt_event_output+0x3b/0x50
      [  515.228923]  [<ffffffff8265daee>] ? bpf_ktime_get_ns+0xe/0x10
      [  515.228927]  [<ffffffff8266fda5>] ? __cgroup_bpf_run_filter_sock_ops+0x85/0x100
      [  515.228930]  [<ffffffff82cf90a5>] ? tcp_init_transfer+0x125/0x150
      [  515.228933]  [<ffffffff82cf9159>] ? tcp_finish_connect+0x89/0x110
      [  515.228936]  [<ffffffff82cf98e4>] ? tcp_rcv_state_process+0x704/0x1010
      [  515.228939]  [<ffffffff82c6e263>] ? sk_filter_trim_cap+0x53/0x2a0
      [  515.228942]  [<ffffffff82d90d1f>] ? tcp_v6_inbound_md5_hash+0x6f/0x1d0
      [  515.228945]  [<ffffffff82d92160>] ? tcp_v6_do_rcv+0x1c0/0x460
      [  515.228947]  [<ffffffff82d93558>] ? tcp_v6_rcv+0x9f8/0xb30
      [  515.228951]  [<ffffffff82d737c0>] ? ip6_route_input+0x190/0x220
      [  515.228955]  [<ffffffff82d5f7ad>] ? ip6_protocol_deliver_rcu+0x6d/0x450
      [  515.228958]  [<ffffffff82d60246>] ? ip6_rcv_finish+0xb6/0x170
      [  515.228961]  [<ffffffff82d5fb90>] ? ip6_protocol_deliver_rcu+0x450/0x450
      [  515.228963]  [<ffffffff82d60361>] ? ipv6_rcv+0x61/0xe0
      [  515.228966]  [<ffffffff82d60190>] ? ipv6_list_rcv+0x330/0x330
      [  515.228969]  [<ffffffff82c4976b>] ? __netif_receive_skb_one_core+0x5b/0xa0
      [  515.228972]  [<ffffffff82c497d1>] ? __netif_receive_skb+0x21/0x70
      [  515.228975]  [<ffffffff82c4a8d2>] ? process_backlog+0xb2/0x150
      [  515.228978]  [<ffffffff82c4aadf>] ? net_rx_action+0x16f/0x410
      [  515.228982]  [<ffffffff830000dd>] ? __do_softirq+0xdd/0x305
      [  515.228986]  [<ffffffff8252cfdc>] ? irq_exit+0x9c/0xb0
      [  515.228989]  [<ffffffff82e02de5>] ? smp_call_function_single_interrupt+0x65/0x120
      [  515.228991]  [<ffffffff82e020e1>] ? call_function_single_interrupt+0x81/0x90
      [  515.228992]  </IRQ>
      [  515.228996]  [<ffffffff82a11ff0>] ? io_serial_in+0x20/0x20
      [  515.229000]  [<ffffffff8259c040>] ? console_unlock+0x230/0x490
      [  515.229003]  [<ffffffff8259cbaa>] ? vprintk_emit+0x26a/0x2a0
      [  515.229006]  [<ffffffff8259cbff>] ? vprintk_default+0x1f/0x30
      [  515.229008]  [<ffffffff8259d9f5>] ? vprintk_func+0x35/0x70
      [  515.229011]  [<ffffffff8259d4bb>] ? printk+0x50/0x66
      [  515.229013]  [<ffffffff82637637>] ? bpf_event_output+0xb7/0x220
      [  515.229016]  [<ffffffff82c6c6cb>] ? bpf_sockopt_event_output+0x3b/0x50
      [  515.229019]  [<ffffffff8265daee>] ? bpf_ktime_get_ns+0xe/0x10
      [  515.229023]  [<ffffffff82c29e87>] ? release_sock+0x97/0xb0
      [  515.229026]  [<ffffffff82ce9d6a>] ? tcp_recvmsg+0x31a/0xda0
      [  515.229029]  [<ffffffff8266fda5>] ? __cgroup_bpf_run_filter_sock_ops+0x85/0x100
      [  515.229032]  [<ffffffff82ce77c1>] ? tcp_set_state+0x191/0x1b0
      [  515.229035]  [<ffffffff82ced10e>] ? tcp_disconnect+0x2e/0x600
      [  515.229038]  [<ffffffff82cecbbb>] ? tcp_close+0x3eb/0x460
      [  515.229040]  [<ffffffff82d21082>] ? inet_release+0x42/0x70
      [  515.229043]  [<ffffffff82d58809>] ? inet6_release+0x39/0x50
      [  515.229046]  [<ffffffff82c1f32d>] ? __sock_release+0x4d/0xd0
      [  515.229049]  [<ffffffff82c1f3e5>] ? sock_close+0x15/0x20
      [  515.229052]  [<ffffffff8273b517>] ? __fput+0xe7/0x1f0
      [  515.229055]  [<ffffffff8273b66e>] ? ____fput+0xe/0x10
      [  515.229058]  [<ffffffff82547bf2>] ? task_work_run+0x82/0xb0
      [  515.229061]  [<ffffffff824086df>] ? exit_to_usermode_loop+0x7e/0x11f
      [  515.229064]  [<ffffffff82408171>] ? do_syscall_64+0x111/0x130
      [  515.229067]  [<ffffffff82e0007c>] ? entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      
      Fixes: a5a3a828 ("bpf: add perf event notificaton support for sock_ops")
      Signed-off-by: default avatarAllan Zhang <allanzhang@google.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarStanislav Fomichev <sdf@google.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/20190925234312.94063-2-allanzhang@google.com
      768fb61f
    • Andrew Lunn's avatar
      net: dsa: qca8k: Fix port enable for CPU port · 2b6fd3ea
      Andrew Lunn authored
      The CPU port does not have a PHY connected to it. So calling
      phy_support_asym_pause() results in an Opps. As with other DSA
      drivers, add a guard that the port is a user port.
      Reported-by: default avatarMichal Vokáč <michal.vokac@ysoft.com>
      Fixes: 0394a63a ("net: dsa: enable and disable all ports")
      Signed-off-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Tested-by: default avatarMichal Vokáč <michal.vokac@ysoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2b6fd3ea
    • Eric Dumazet's avatar
      sch_netem: fix rcu splat in netem_enqueue() · 159d2c7d
      Eric Dumazet authored
      qdisc_root() use from netem_enqueue() triggers a lockdep warning.
      
      __dev_queue_xmit() uses rcu_read_lock_bh() which is
      not equivalent to rcu_read_lock() + local_bh_disable_bh as far
      as lockdep is concerned.
      
      WARNING: suspicious RCU usage
      5.3.0-rc7+ #0 Not tainted
      -----------------------------
      include/net/sch_generic.h:492 suspicious rcu_dereference_check() usage!
      
      other info that might help us debug this:
      
      rcu_scheduler_active = 2, debug_locks = 1
      3 locks held by syz-executor427/8855:
       #0: 00000000b5525c01 (rcu_read_lock_bh){....}, at: lwtunnel_xmit_redirect include/net/lwtunnel.h:92 [inline]
       #0: 00000000b5525c01 (rcu_read_lock_bh){....}, at: ip_finish_output2+0x2dc/0x2570 net/ipv4/ip_output.c:214
       #1: 00000000b5525c01 (rcu_read_lock_bh){....}, at: __dev_queue_xmit+0x20a/0x3650 net/core/dev.c:3804
       #2: 00000000364bae92 (&(&sch->q.lock)->rlock){+.-.}, at: spin_lock include/linux/spinlock.h:338 [inline]
       #2: 00000000364bae92 (&(&sch->q.lock)->rlock){+.-.}, at: __dev_xmit_skb net/core/dev.c:3502 [inline]
       #2: 00000000364bae92 (&(&sch->q.lock)->rlock){+.-.}, at: __dev_queue_xmit+0x14b8/0x3650 net/core/dev.c:3838
      
      stack backtrace:
      CPU: 0 PID: 8855 Comm: syz-executor427 Not tainted 5.3.0-rc7+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x172/0x1f0 lib/dump_stack.c:113
       lockdep_rcu_suspicious+0x153/0x15d kernel/locking/lockdep.c:5357
       qdisc_root include/net/sch_generic.h:492 [inline]
       netem_enqueue+0x1cfb/0x2d80 net/sched/sch_netem.c:479
       __dev_xmit_skb net/core/dev.c:3527 [inline]
       __dev_queue_xmit+0x15d2/0x3650 net/core/dev.c:3838
       dev_queue_xmit+0x18/0x20 net/core/dev.c:3902
       neigh_hh_output include/net/neighbour.h:500 [inline]
       neigh_output include/net/neighbour.h:509 [inline]
       ip_finish_output2+0x1726/0x2570 net/ipv4/ip_output.c:228
       __ip_finish_output net/ipv4/ip_output.c:308 [inline]
       __ip_finish_output+0x5fc/0xb90 net/ipv4/ip_output.c:290
       ip_finish_output+0x38/0x1f0 net/ipv4/ip_output.c:318
       NF_HOOK_COND include/linux/netfilter.h:294 [inline]
       ip_mc_output+0x292/0xf40 net/ipv4/ip_output.c:417
       dst_output include/net/dst.h:436 [inline]
       ip_local_out+0xbb/0x190 net/ipv4/ip_output.c:125
       ip_send_skb+0x42/0xf0 net/ipv4/ip_output.c:1555
       udp_send_skb.isra.0+0x6b2/0x1160 net/ipv4/udp.c:887
       udp_sendmsg+0x1e96/0x2820 net/ipv4/udp.c:1174
       inet_sendmsg+0x9e/0xe0 net/ipv4/af_inet.c:807
       sock_sendmsg_nosec net/socket.c:637 [inline]
       sock_sendmsg+0xd7/0x130 net/socket.c:657
       ___sys_sendmsg+0x3e2/0x920 net/socket.c:2311
       __sys_sendmmsg+0x1bf/0x4d0 net/socket.c:2413
       __do_sys_sendmmsg net/socket.c:2442 [inline]
       __se_sys_sendmmsg net/socket.c:2439 [inline]
       __x64_sys_sendmmsg+0x9d/0x100 net/socket.c:2439
       do_syscall_64+0xfd/0x6a0 arch/x86/entry/common.c:296
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      159d2c7d
    • Eric Dumazet's avatar
      kcm: disable preemption in kcm_parse_func_strparser() · 0355d6c1
      Eric Dumazet authored
      After commit a2c11b03 ("kcm: use BPF_PROG_RUN")
      syzbot easily triggers the warning in cant_sleep().
      
      As explained in commit 6cab5e90 ("bpf: run bpf programs
      with preemption disabled") we need to disable preemption before
      running bpf programs.
      
      BUG: assuming atomic context at net/kcm/kcmsock.c:382
      in_atomic(): 0, irqs_disabled(): 0, pid: 7, name: kworker/u4:0
      3 locks held by kworker/u4:0/7:
       #0: ffff888216726128 ((wq_completion)kstrp){+.+.}, at: __write_once_size include/linux/compiler.h:226 [inline]
       #0: ffff888216726128 ((wq_completion)kstrp){+.+.}, at: arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
       #0: ffff888216726128 ((wq_completion)kstrp){+.+.}, at: atomic64_set include/asm-generic/atomic-instrumented.h:855 [inline]
       #0: ffff888216726128 ((wq_completion)kstrp){+.+.}, at: atomic_long_set include/asm-generic/atomic-long.h:40 [inline]
       #0: ffff888216726128 ((wq_completion)kstrp){+.+.}, at: set_work_data kernel/workqueue.c:620 [inline]
       #0: ffff888216726128 ((wq_completion)kstrp){+.+.}, at: set_work_pool_and_clear_pending kernel/workqueue.c:647 [inline]
       #0: ffff888216726128 ((wq_completion)kstrp){+.+.}, at: process_one_work+0x88b/0x1740 kernel/workqueue.c:2240
       #1: ffff8880a989fdc0 ((work_completion)(&strp->work)){+.+.}, at: process_one_work+0x8c1/0x1740 kernel/workqueue.c:2244
       #2: ffff888098998d10 (sk_lock-AF_INET){+.+.}, at: lock_sock include/net/sock.h:1522 [inline]
       #2: ffff888098998d10 (sk_lock-AF_INET){+.+.}, at: strp_sock_lock+0x2e/0x40 net/strparser/strparser.c:440
      CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 5.3.0+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Workqueue: kstrp strp_work
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x172/0x1f0 lib/dump_stack.c:113
       __cant_sleep kernel/sched/core.c:6826 [inline]
       __cant_sleep.cold+0xa4/0xbc kernel/sched/core.c:6803
       kcm_parse_func_strparser+0x54/0x200 net/kcm/kcmsock.c:382
       __strp_recv+0x5dc/0x1b20 net/strparser/strparser.c:221
       strp_recv+0xcf/0x10b net/strparser/strparser.c:343
       tcp_read_sock+0x285/0xa00 net/ipv4/tcp.c:1639
       strp_read_sock+0x14d/0x200 net/strparser/strparser.c:366
       do_strp_work net/strparser/strparser.c:414 [inline]
       strp_work+0xe3/0x130 net/strparser/strparser.c:423
       process_one_work+0x9af/0x1740 kernel/workqueue.c:2269
      
      Fixes: a2c11b03 ("kcm: use BPF_PROG_RUN")
      Fixes: 6cab5e90 ("bpf: run bpf programs with preemption disabled")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0355d6c1
    • Dan Carpenter's avatar
      net: ethernet: stmmac: Fix signedness bug in ipq806x_gmac_of_parse() · 23104218
      Dan Carpenter authored
      The "gmac->phy_mode" variable is an enum and in this context GCC will
      treat it as an unsigned int so the error handling will never be
      triggered.
      
      Fixes: b1c17215 ("stmmac: add ipq806x glue layer")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      23104218
    • Dan Carpenter's avatar
      net: nixge: Fix a signedness bug in nixge_probe() · 1a4b62a0
      Dan Carpenter authored
      The "priv->phy_mode" is an enum and in this context GCC will treat it
      as an unsigned int so it can never be less than zero.
      
      Fixes: 492caffa ("net: ethernet: nixge: Add support for National Instruments XGE netdev")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1a4b62a0
    • Dan Carpenter's avatar
      of: mdio: Fix a signedness bug in of_phy_get_and_connect() · d7eb6512
      Dan Carpenter authored
      The "iface" variable is an enum and in this context GCC treats it as
      an unsigned int so the error handling is never triggered.
      
      Fixes: b7862412 ("of_mdio: Abstract a general interface for phy connect")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d7eb6512
    • Dan Carpenter's avatar
      net: axienet: fix a signedness bug in probe · 73e211e1
      Dan Carpenter authored
      The "lp->phy_mode" is an enum but in this context GCC treats it as an
      unsigned int so the error handling is never triggered.
      
      Fixes: ee06b172 ("net: axienet: add support for standard phy-mode binding")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Reviewed-by: default avatarRadhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      73e211e1
    • Dan Carpenter's avatar
      net: stmmac: dwmac-meson8b: Fix signedness bug in probe · f1021051
      Dan Carpenter authored
      The "dwmac->phy_mode" is an enum and in this context GCC treats it as
      an unsigned int so the error handling is never triggered.
      
      Fixes: 566e8251 ("net: stmmac: add a glue driver for the Amlogic Meson 8b / GXBB DWMAC")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Reviewed-by: default avatarMartin Blumenstingl <martin.blumenstingl@googlemail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f1021051