1. 22 Sep, 2022 5 commits
  2. 21 Sep, 2022 8 commits
  3. 20 Sep, 2022 27 commits
    • Florian Westphal's avatar
      netfilter: nf_ct_ftp: fix deadlock when nat rewrite is needed · d2508893
      Florian Westphal authored
      We can't use ct->lock, this is already used by the seqadj internals.
      When using ftp helper + nat, seqadj will attempt to acquire ct->lock
      again.
      
      Revert back to a global lock for now.
      
      Fixes: c783a29c ("netfilter: nf_ct_ftp: prefer skb_linearize")
      Reported-by: default avatarBruno de Paula Larini <bruno.larini@riosoft.com.br>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      d2508893
    • Florian Westphal's avatar
      netfilter: ebtables: fix memory leak when blob is malformed · 62ce44c4
      Florian Westphal authored
      The bug fix was incomplete, it "replaced" crash with a memory leak.
      The old code had an assignment to "ret" embedded into the conditional,
      restore this.
      
      Fixes: 7997eff8 ("netfilter: ebtables: reject blobs that don't provide all entry points")
      Reported-and-tested-by: syzbot+a24c5252f3e3ab733464@syzkaller.appspotmail.com
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      62ce44c4
    • Tetsuo Handa's avatar
      netfilter: nf_tables: fix percpu memory leak at nf_tables_addchain() · 9a4d6dd5
      Tetsuo Handa authored
      It seems to me that percpu memory for chain stats started leaking since
      commit 3bc158f8 ("netfilter: nf_tables: map basechain priority to
      hardware priority") when nft_chain_offload_priority() returned an error.
      Signed-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Fixes: 3bc158f8 ("netfilter: nf_tables: map basechain priority to hardware priority")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      9a4d6dd5
    • Tetsuo Handa's avatar
      netfilter: nf_tables: fix nft_counters_enabled underflow at nf_tables_addchain() · 921ebde3
      Tetsuo Handa authored
      syzbot is reporting underflow of nft_counters_enabled counter at
      nf_tables_addchain() [1], for commit 43eb8949 ("netfilter:
      nf_tables: do not leave chain stats enabled on error") missed that
      nf_tables_chain_destroy() after nft_basechain_init() in the error path of
      nf_tables_addchain() decrements the counter because nft_basechain_init()
      makes nft_is_base_chain() return true by setting NFT_CHAIN_BASE flag.
      
      Increment the counter immediately after returning from
      nft_basechain_init().
      
      Link:  https://syzkaller.appspot.com/bug?extid=b5d82a651b71cd8a75ab [1]
      Reported-by: default avatarsyzbot <syzbot+b5d82a651b71cd8a75ab@syzkaller.appspotmail.com>
      Signed-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Tested-by: default avatarsyzbot <syzbot+b5d82a651b71cd8a75ab@syzkaller.appspotmail.com>
      Fixes: 43eb8949 ("netfilter: nf_tables: do not leave chain stats enabled on error")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      921ebde3
    • Pablo Neira Ayuso's avatar
      netfilter: conntrack: remove nf_conntrack_helper documentation · 76b907ee
      Pablo Neira Ayuso authored
      This toggle has been already remove by b1185090 ("netfilter: remove
      nf_conntrack_helper sysctl and modparam toggles").
      
      Remove the documentation entry for this toggle too.
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      76b907ee
    • Bhupesh Sharma's avatar
      MAINTAINERS: Add myself as a reviewer for Qualcomm ETHQOS Ethernet driver · 603ccb3a
      Bhupesh Sharma authored
      As suggested by Vinod, adding myself as the reviewer
      for the Qualcomm ETHQOS Ethernet driver.
      
      Recently I have enabled this driver on a few Qualcomm
      SoCs / boards and hence trying to keep a close eye on
      it.
      Signed-off-by: default avatarBhupesh Sharma <bhupesh.sharma@linaro.org>
      Acked-by: default avatarVinod Koul <vkoul@kernel.org>
      Link: https://lore.kernel.org/r/20220915112804.3950680-1-bhupesh.sharma@linaro.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      603ccb3a
    • Mateusz Palczewski's avatar
      ice: Fix interface being down after reset with link-down-on-close flag on · 8ac71327
      Mateusz Palczewski authored
      When performing a reset on ice driver with link-down-on-close flag on
      interface would always stay down. Fix this by moving a check of this
      flag to ice_stop() that is called only when user wants to bring
      interface down.
      
      Fixes: ab4ab73f ("ice: Add ethtool private flag to make forcing link down optional")
      Signed-off-by: default avatarMateusz Palczewski <mateusz.palczewski@intel.com>
      Tested-by: default avatarPetr Oros <poros@redhat.com>
      Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      8ac71327
    • Michal Swiatkowski's avatar
      ice: config netdev tc before setting queues number · 122045ca
      Michal Swiatkowski authored
      After lowering number of tx queues the warning appears:
      "Number of in use tx queues changed invalidating tc mappings. Priority
      traffic classification disabled!"
      Example command to reproduce:
      ethtool -L enp24s0f0 tx 36 rx 36
      
      Fix this by setting correct tc mapping before setting real number of
      queues on netdev.
      
      Fixes: 0754d65b ("ice: Add infrastructure for mqprio support via ndo_setup_tc")
      Signed-off-by: default avatarMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
      Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      122045ca
    • Jakub Kicinski's avatar
      Merge branch 'fixes-for-tc-taprio-software-mode' · da847246
      Jakub Kicinski authored
      Vladimir Oltean says:
      
      ====================
      Fixes for tc-taprio software mode
      
      While working on some new features for tc-taprio, I found some strange
      behavior which looked like bugs. I was able to eventually trigger a NULL
      pointer dereference. This patch set fixes 2 issues I saw. Detailed
      explanation in patches.
      ====================
      
      Link: https://lore.kernel.org/r/20220915100802.2308279-1-vladimir.oltean@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      da847246
    • Vladimir Oltean's avatar
      net/sched: taprio: make qdisc_leaf() see the per-netdev-queue pfifo child qdiscs · 1461d212
      Vladimir Oltean authored
      taprio can only operate as root qdisc, and to that end, there exists the
      following check in taprio_init(), just as in mqprio:
      
      	if (sch->parent != TC_H_ROOT)
      		return -EOPNOTSUPP;
      
      And indeed, when we try to attach taprio to an mqprio child, it fails as
      expected:
      
      $ tc qdisc add dev swp0 root handle 1: mqprio num_tc 8 \
      	map 0 1 2 3 4 5 6 7 \
      	queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 hw 0
      $ tc qdisc replace dev swp0 parent 1:2 taprio num_tc 8 \
      	map 0 1 2 3 4 5 6 7 \
      	queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
      	base-time 0 sched-entry S 0x7f 990000 sched-entry S 0x80 100000 \
      	flags 0x0 clockid CLOCK_TAI
      Error: sch_taprio: Can only be attached as root qdisc.
      
      (extack message added by me)
      
      But when we try to attach a taprio child to a taprio root qdisc,
      surprisingly it doesn't fail:
      
      $ tc qdisc replace dev swp0 root handle 1: taprio num_tc 8 \
      	map 0 1 2 3 4 5 6 7 queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
      	base-time 0 sched-entry S 0x7f 990000 sched-entry S 0x80 100000 \
      	flags 0x0 clockid CLOCK_TAI
      $ tc qdisc replace dev swp0 parent 1:2 taprio num_tc 8 \
      	map 0 1 2 3 4 5 6 7 \
      	queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
      	base-time 0 sched-entry S 0x7f 990000 sched-entry S 0x80 100000 \
      	flags 0x0 clockid CLOCK_TAI
      
      This is because tc_modify_qdisc() behaves differently when mqprio is
      root, vs when taprio is root.
      
      In the mqprio case, it finds the parent qdisc through
      p = qdisc_lookup(dev, TC_H_MAJ(clid)), and then the child qdisc through
      q = qdisc_leaf(p, clid). This leaf qdisc q has handle 0, so it is
      ignored according to the comment right below ("It may be default qdisc,
      ignore it"). As a result, tc_modify_qdisc() goes through the
      qdisc_create() code path, and this gives taprio_init() a chance to check
      for sch_parent != TC_H_ROOT and error out.
      
      Whereas in the taprio case, the returned q = qdisc_leaf(p, clid) is
      different. It is not the default qdisc created for each netdev queue
      (both taprio and mqprio call qdisc_create_dflt() and keep them in
      a private q->qdiscs[], or priv->qdiscs[], respectively). Instead, taprio
      makes qdisc_leaf() return the _root_ qdisc, aka itself.
      
      When taprio does that, tc_modify_qdisc() goes through the qdisc_change()
      code path, because the qdisc layer never finds out about the child qdisc
      of the root. And through the ->change() ops, taprio has no reason to
      check whether its parent is root or not, just through ->init(), which is
      not called.
      
      The problem is the taprio_leaf() implementation. Even though code wise,
      it does the exact same thing as mqprio_leaf() which it is copied from,
      it works with different input data. This is because mqprio does not
      attach itself (the root) to each device TX queue, but one of the default
      qdiscs from its private array.
      
      In fact, since commit 13511704 ("net: taprio offload: enforce qdisc
      to netdev queue mapping"), taprio does this too, but just for the full
      offload case. So if we tried to attach a taprio child to a fully
      offloaded taprio root qdisc, it would properly fail too; just not to a
      software root taprio.
      
      To fix the problem, stop looking at the Qdisc that's attached to the TX
      queue, and instead, always return the default qdiscs that we've
      allocated (and to which we privately enqueue and dequeue, in software
      scheduling mode).
      
      Since Qdisc_class_ops :: leaf  is only called from tc_modify_qdisc(),
      the risk of unforeseen side effects introduced by this change is
      minimal.
      
      Fixes: 5a781ccb ("tc: Add support for configuring the taprio scheduler")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarVinicius Costa Gomes <vinicius.gomes@intel.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1461d212
    • Vladimir Oltean's avatar
      net/sched: taprio: avoid disabling offload when it was never enabled · db46e3a8
      Vladimir Oltean authored
      In an incredibly strange API design decision, qdisc->destroy() gets
      called even if qdisc->init() never succeeded, not exclusively since
      commit 87b60cfa ("net_sched: fix error recovery at qdisc creation"),
      but apparently also earlier (in the case of qdisc_create_dflt()).
      
      The taprio qdisc does not fully acknowledge this when it attempts full
      offload, because it starts off with q->flags = TAPRIO_FLAGS_INVALID in
      taprio_init(), then it replaces q->flags with TCA_TAPRIO_ATTR_FLAGS
      parsed from netlink (in taprio_change(), tail called from taprio_init()).
      
      But in taprio_destroy(), we call taprio_disable_offload(), and this
      determines what to do based on FULL_OFFLOAD_IS_ENABLED(q->flags).
      
      But looking at the implementation of FULL_OFFLOAD_IS_ENABLED()
      (a bitwise check of bit 1 in q->flags), it is invalid to call this macro
      on q->flags when it contains TAPRIO_FLAGS_INVALID, because that is set
      to U32_MAX, and therefore FULL_OFFLOAD_IS_ENABLED() will return true on
      an invalid set of flags.
      
      As a result, it is possible to crash the kernel if user space forces an
      error between setting q->flags = TAPRIO_FLAGS_INVALID, and the calling
      of taprio_enable_offload(). This is because drivers do not expect the
      offload to be disabled when it was never enabled.
      
      The error that we force here is to attach taprio as a non-root qdisc,
      but instead as child of an mqprio root qdisc:
      
      $ tc qdisc add dev swp0 root handle 1: \
      	mqprio num_tc 8 map 0 1 2 3 4 5 6 7 \
      	queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 hw 0
      $ tc qdisc replace dev swp0 parent 1:1 \
      	taprio num_tc 8 map 0 1 2 3 4 5 6 7 \
      	queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 base-time 0 \
      	sched-entry S 0x7f 990000 sched-entry S 0x80 100000 \
      	flags 0x0 clockid CLOCK_TAI
      Unable to handle kernel paging request at virtual address fffffffffffffff8
      [fffffffffffffff8] pgd=0000000000000000, p4d=0000000000000000
      Internal error: Oops: 96000004 [#1] PREEMPT SMP
      Call trace:
       taprio_dump+0x27c/0x310
       vsc9959_port_setup_tc+0x1f4/0x460
       felix_port_setup_tc+0x24/0x3c
       dsa_slave_setup_tc+0x54/0x27c
       taprio_disable_offload.isra.0+0x58/0xe0
       taprio_destroy+0x80/0x104
       qdisc_create+0x240/0x470
       tc_modify_qdisc+0x1fc/0x6b0
       rtnetlink_rcv_msg+0x12c/0x390
       netlink_rcv_skb+0x5c/0x130
       rtnetlink_rcv+0x1c/0x2c
      
      Fix this by keeping track of the operations we made, and undo the
      offload only if we actually did it.
      
      I've added "bool offloaded" inside a 4 byte hole between "int clockid"
      and "atomic64_t picos_per_byte". Now the first cache line looks like
      below:
      
      $ pahole -C taprio_sched net/sched/sch_taprio.o
      struct taprio_sched {
              struct Qdisc * *           qdiscs;               /*     0     8 */
              struct Qdisc *             root;                 /*     8     8 */
              u32                        flags;                /*    16     4 */
              enum tk_offsets            tk_offset;            /*    20     4 */
              int                        clockid;              /*    24     4 */
              bool                       offloaded;            /*    28     1 */
      
              /* XXX 3 bytes hole, try to pack */
      
              atomic64_t                 picos_per_byte;       /*    32     0 */
      
              /* XXX 8 bytes hole, try to pack */
      
              spinlock_t                 current_entry_lock;   /*    40     0 */
      
              /* XXX 8 bytes hole, try to pack */
      
              struct sched_entry *       current_entry;        /*    48     8 */
              struct sched_gate_list *   oper_sched;           /*    56     8 */
              /* --- cacheline 1 boundary (64 bytes) --- */
      
      Fixes: 9c66d156 ("taprio: Add support for hardware offloading")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarVinicius Costa Gomes <vinicius.gomes@intel.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      db46e3a8
    • Ido Schimmel's avatar
      ipv6: Fix crash when IPv6 is administratively disabled · 76dd0728
      Ido Schimmel authored
      The global 'raw_v6_hashinfo' variable can be accessed even when IPv6 is
      administratively disabled via the 'ipv6.disable=1' kernel command line
      option, leading to a crash [1].
      
      Fix by restoring the original behavior and always initializing the
      variable, regardless of IPv6 support being administratively disabled or
      not.
      
      [1]
       BUG: unable to handle page fault for address: ffffffffffffffc8
       #PF: supervisor read access in kernel mode
       #PF: error_code(0x0000) - not-present page
       PGD 173e18067 P4D 173e18067 PUD 173e1a067 PMD 0
       Oops: 0000 [#1] PREEMPT SMP KASAN
       CPU: 3 PID: 271 Comm: ss Not tainted 6.0.0-rc4-custom-00136-g0727a9a5 #1396
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-1.fc36 04/01/2014
       RIP: 0010:raw_diag_dump+0x310/0x7f0
       [...]
       Call Trace:
        <TASK>
        __inet_diag_dump+0x10f/0x2e0
        netlink_dump+0x575/0xfd0
        __netlink_dump_start+0x67b/0x940
        inet_diag_handler_cmd+0x273/0x2d0
        sock_diag_rcv_msg+0x317/0x440
        netlink_rcv_skb+0x15e/0x430
        sock_diag_rcv+0x2b/0x40
        netlink_unicast+0x53b/0x800
        netlink_sendmsg+0x945/0xe60
        ____sys_sendmsg+0x747/0x960
        ___sys_sendmsg+0x13a/0x1e0
        __sys_sendmsg+0x118/0x1e0
        do_syscall_64+0x34/0x80
        entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Fixes: 0daf07e5 ("raw: convert raw sockets to RCU")
      Reported-by: default avatarRoberto Ricci <rroberto2r@gmail.com>
      Tested-by: default avatarRoberto Ricci <rroberto2r@gmail.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Link: https://lore.kernel.org/r/20220916084821.229287-1-idosch@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      76dd0728
    • Vladimir Oltean's avatar
      net: enetc: deny offload of tc-based TSN features on VF interfaces · 5641c751
      Vladimir Oltean authored
      TSN features on the ENETC (taprio, cbs, gate, police) are configured
      through a mix of command BD ring messages and port registers:
      enetc_port_rd(), enetc_port_wr().
      
      Port registers are a region of the ENETC memory map which are only
      accessible from the PCIe Physical Function. They are not accessible from
      the Virtual Functions.
      
      Moreover, attempting to access these registers crashes the kernel:
      
      $ echo 1 > /sys/bus/pci/devices/0000\:00\:00.0/sriov_numvfs
      pci 0000:00:01.0: [1957:ef00] type 00 class 0x020001
      fsl_enetc_vf 0000:00:01.0: Adding to iommu group 15
      fsl_enetc_vf 0000:00:01.0: enabling device (0000 -> 0002)
      fsl_enetc_vf 0000:00:01.0 eno0vf0: renamed from eth0
      $ tc qdisc replace dev eno0vf0 root taprio num_tc 8 map 0 1 2 3 4 5 6 7 \
      	queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 base-time 0 \
      	sched-entry S 0x7f 900000 sched-entry S 0x80 100000 flags 0x2
      Unable to handle kernel paging request at virtual address ffff800009551a08
      Internal error: Oops: 96000007 [#1] PREEMPT SMP
      pc : enetc_setup_tc_taprio+0x170/0x47c
      lr : enetc_setup_tc_taprio+0x16c/0x47c
      Call trace:
       enetc_setup_tc_taprio+0x170/0x47c
       enetc_setup_tc+0x38/0x2dc
       taprio_change+0x43c/0x970
       taprio_init+0x188/0x1e0
       qdisc_create+0x114/0x470
       tc_modify_qdisc+0x1fc/0x6c0
       rtnetlink_rcv_msg+0x12c/0x390
      
      Split enetc_setup_tc() into separate functions for the PF and for the
      VF drivers. Also remove enetc_qos.o from being included into
      enetc-vf.ko, since it serves absolutely no purpose there.
      
      Fixes: 34c6adf1 ("enetc: Configure the Time-Aware Scheduler via tc-taprio offload")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://lore.kernel.org/r/20220916133209.3351399-2-vladimir.oltean@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5641c751
    • Vladimir Oltean's avatar
      net: enetc: move enetc_set_psfp() out of the common enetc_set_features() · fed38e64
      Vladimir Oltean authored
      The VF netdev driver shouldn't respond to changes in the NETIF_F_HW_TC
      flag; only PFs should. Moreover, TSN-specific code should go to
      enetc_qos.c, which should not be included in the VF driver.
      
      Fixes: 79e49982 ("net: enetc: add hw tc hw offload features for PSPF capability")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://lore.kernel.org/r/20220916133209.3351399-1-vladimir.oltean@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      fed38e64
    • Jakub Kicinski's avatar
      Merge branch 'wireguard-patches-for-6-0-rc6' · 0507246d
      Jakub Kicinski authored
      Jason A. Donenfeld says:
      
      ====================
      wireguard patches for 6.0-rc6
      
      1) The ratelimiter timing test doesn't help outside of development, yet
         it is currently preventing the module from being inserted on some
         kernels when it flakes at insertion time. So we disable it.
      
      2) A fix for a build error on UML, caused by a recent change in a
         different tree.
      
      3) A WARN_ON() is triggered by Kees' new fortified memcpy() patch, due
         to memcpy()ing over a sockaddr pointer with the size of a
         sockaddr_in[6]. The type safe fix is pretty simple. Given how classic
         of a thing sockaddr punning is, I suspect this may be the first in a
         few patches like this throughout the net tree, once Kees' fortify
         series is more widely deployed (current it's just in next).
      ====================
      
      Link: https://lore.kernel.org/r/20220916143740.831881-1-Jason@zx2c4.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0507246d
    • Jason A. Donenfeld's avatar
      wireguard: netlink: avoid variable-sized memcpy on sockaddr · 26c01310
      Jason A. Donenfeld authored
      Doing a variable-sized memcpy is slower, and the compiler isn't smart
      enough to turn this into a constant-size assignment.
      
      Further, Kees' latest fortified memcpy will actually bark, because the
      destination pointer is type sockaddr, not explicitly sockaddr_in or
      sockaddr_in6, so it thinks there's an overflow:
      
          memcpy: detected field-spanning write (size 28) of single field
          "&endpoint.addr" at drivers/net/wireguard/netlink.c:446 (size 16)
      
      Fix this by just assigning by using explicit casts for each checked
      case.
      
      Fixes: e7096c13 ("net: WireGuard secure network tunnel")
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Reported-by: syzbot+a448cda4dba2dac50de5@syzkaller.appspotmail.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      26c01310
    • Jason A. Donenfeld's avatar
      wireguard: selftests: do not install headers on UML · 8e25c02b
      Jason A. Donenfeld authored
      Since 1b620d53 ("kbuild: disable header exports for UML in a
      straightforward way"), installing headers fails on UML, so just disable
      installing them, since they're not needed anyway on the architecture.
      
      Fixes: b438b3b8 ("wireguard: selftests: support UML")
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8e25c02b
    • Jason A. Donenfeld's avatar
      wireguard: ratelimiter: disable timings test by default · 684dec3c
      Jason A. Donenfeld authored
      A previous commit tried to make the ratelimiter timings test more
      reliable but in the process made it less reliable on other
      configurations. This is an impossible problem to solve without
      increasingly ridiculous heuristics. And it's not even a problem that
      actually needs to be solved in any comprehensive way, since this is only
      ever used during development. So just cordon this off with a DEBUG_
      ifdef, just like we do for the trie's randomized tests, so it can be
      enabled while hacking on the code, and otherwise disabled in CI. In the
      process we also revert 151c8e49.
      
      Fixes: 151c8e49 ("wireguard: ratelimiter: use hrtimer in selftest")
      Fixes: e7096c13 ("net: WireGuard secure network tunnel")
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      684dec3c
    • Íñigo Huguet's avatar
      sfc/siena: fix null pointer dereference in efx_hard_start_xmit · 589c6ede
      Íñigo Huguet authored
      Like in previous patch for sfc, prevent potential (but unlikely) NULL
      pointer dereference.
      
      Fixes: 12804793 ("sfc: decouple TXQ type from label")
      Reported-by: default avatarTianhao Zhao <tizhao@redhat.com>
      Signed-off-by: default avatarÍñigo Huguet <ihuguet@redhat.com>
      Link: https://lore.kernel.org/r/20220915141958.16458-1-ihuguet@redhat.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      589c6ede
    • Íñigo Huguet's avatar
      sfc/siena: fix TX channel offset when using legacy interrupts · 974bb793
      Íñigo Huguet authored
      As in previous commit for sfc, fix TX channels offset when
      efx_siena_separate_tx_channels is false (the default)
      
      Fixes: 25bde571 ("sfc/siena: fix wrong tx channel offset with efx_separate_tx_channels")
      Reported-by: default avatarTianhao Zhao <tizhao@redhat.com>
      Signed-off-by: default avatarÍñigo Huguet <ihuguet@redhat.com>
      Link: https://lore.kernel.org/r/20220915141653.15504-1-ihuguet@redhat.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      974bb793
    • Tetsuo Handa's avatar
      net: clear msg_get_inq in __get_compat_msghdr() · d547c1b7
      Tetsuo Handa authored
      syzbot is still complaining uninit-value in tcp_recvmsg(), for
      commit 1228b34c ("net: clear msg_get_inq in __sys_recvfrom() and
      __copy_msghdr_from_user()") missed that __get_compat_msghdr() is called
      instead of copy_msghdr_from_user() when MSG_CMSG_COMPAT is specified.
      Signed-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Fixes: 1228b34c ("net: clear msg_get_inq in __sys_recvfrom() and __copy_msghdr_from_user()")
      Reviewed-by: default avatarJens Axboe <axboe@kernel.dk>
      Link: https://lore.kernel.org/r/d06d0f7f-696c-83b4-b2d5-70b5f2730a37@I-love.SAKURA.ne.jpSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d547c1b7
    • Jakub Kicinski's avatar
      Merge branch 'ipmr-always-call-ip-6-_mr_forward-from-rcu-read-side-critical-section' · 68fe503c
      Jakub Kicinski authored
      Ido Schimmel says:
      
      ====================
      ipmr: Always call ip{,6}_mr_forward() from RCU read-side critical section
      
      Patch #1 fixes a bug in ipmr code.
      
      Patch #2 adds corresponding test cases.
      ====================
      
      Link: https://lore.kernel.org/r/20220914075339.4074096-1-idosch@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      68fe503c
    • Ido Schimmel's avatar
      selftests: forwarding: Add test cases for unresolved multicast routes · 2b5a8c8f
      Ido Schimmel authored
      Add IPv4 and IPv6 test cases for unresolved multicast routes, testing
      that queued packets are forwarded after installing a matching (S, G)
      route.
      
      The test cases can be used to reproduce the bugs fixed in "ipmr: Always
      call ip{,6}_mr_forward() from RCU read-side critical section".
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2b5a8c8f
    • Ido Schimmel's avatar
      ipmr: Always call ip{,6}_mr_forward() from RCU read-side critical section · b07a9b26
      Ido Schimmel authored
      These functions expect to be called from RCU read-side critical section,
      but this only happens when invoked from the data path via
      ip{,6}_mr_input(). They can also be invoked from process context in
      response to user space adding a multicast route which resolves a cache
      entry with queued packets [1][2].
      
      Fix by adding missing rcu_read_lock() / rcu_read_unlock() in these call
      paths.
      
      [1]
      WARNING: suspicious RCU usage
      6.0.0-rc3-custom-15969-g049d233c8bcc-dirty #1387 Not tainted
      -----------------------------
      net/ipv4/ipmr.c:84 suspicious rcu_dereference_check() usage!
      
      other info that might help us debug this:
      
      rcu_scheduler_active = 2, debug_locks = 1
      1 lock held by smcrouted/246:
       #0: ffffffff862389b0 (rtnl_mutex){+.+.}-{3:3}, at: ip_mroute_setsockopt+0x11c/0x1420
      
      stack backtrace:
      CPU: 0 PID: 246 Comm: smcrouted Not tainted 6.0.0-rc3-custom-15969-g049d233c8bcc-dirty #1387
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-1.fc36 04/01/2014
      Call Trace:
       <TASK>
       dump_stack_lvl+0x91/0xb9
       vif_dev_read+0xbf/0xd0
       ipmr_queue_xmit+0x135/0x1ab0
       ip_mr_forward+0xe7b/0x13d0
       ipmr_mfc_add+0x1a06/0x2ad0
       ip_mroute_setsockopt+0x5c1/0x1420
       do_ip_setsockopt+0x23d/0x37f0
       ip_setsockopt+0x56/0x80
       raw_setsockopt+0x219/0x290
       __sys_setsockopt+0x236/0x4d0
       __x64_sys_setsockopt+0xbe/0x160
       do_syscall_64+0x34/0x80
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      [2]
      WARNING: suspicious RCU usage
      6.0.0-rc3-custom-15969-g049d233c8bcc-dirty #1387 Not tainted
      -----------------------------
      net/ipv6/ip6mr.c:69 suspicious rcu_dereference_check() usage!
      
      other info that might help us debug this:
      
      rcu_scheduler_active = 2, debug_locks = 1
      1 lock held by smcrouted/246:
       #0: ffffffff862389b0 (rtnl_mutex){+.+.}-{3:3}, at: ip6_mroute_setsockopt+0x6b9/0x2630
      
      stack backtrace:
      CPU: 1 PID: 246 Comm: smcrouted Not tainted 6.0.0-rc3-custom-15969-g049d233c8bcc-dirty #1387
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-1.fc36 04/01/2014
      Call Trace:
       <TASK>
       dump_stack_lvl+0x91/0xb9
       vif_dev_read+0xbf/0xd0
       ip6mr_forward2.isra.0+0xc9/0x1160
       ip6_mr_forward+0xef0/0x13f0
       ip6mr_mfc_add+0x1ff2/0x31f0
       ip6_mroute_setsockopt+0x1825/0x2630
       do_ipv6_setsockopt+0x462/0x4440
       ipv6_setsockopt+0x105/0x140
       rawv6_setsockopt+0xd8/0x690
       __sys_setsockopt+0x236/0x4d0
       __x64_sys_setsockopt+0xbe/0x160
       do_syscall_64+0x34/0x80
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      Fixes: ebc31979 ("ipmr: add rcu protection over (struct vif_device)->dev")
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b07a9b26
    • Alex Elder's avatar
      net: ipa: properly limit modem routing table use · cf412ec3
      Alex Elder authored
      IPA can route packets between IPA-connected entities.  The AP and
      modem are currently the only such entities supported, and no routing
      is required to transfer packets between them.
      
      The number of entries in each routing table is fixed, and defined at
      initialization time.  Some of these entries are designated for use
      by the modem, and the rest are available for the AP to use.  The AP
      sends a QMI message to the modem which describes (among other
      things) information about routing table memory available for the
      modem to use.
      
      Currently the QMI initialization packet gives wrong information in
      its description of routing tables.  What *should* be supplied is the
      maximum index that the modem can use for the routing table memory
      located at a given location.  The current code instead supplies the
      total *number* of routing table entries.  Furthermore, the modem is
      granted the entire table, not just the subset it's supposed to use.
      
      This patch fixes this.  First, the ipa_mem_bounds structure is
      generalized so its "end" field can be interpreted either as a final
      byte offset, or a final array index.  Second, the IPv4 and IPv6
      (non-hashed and hashed) table information fields in the QMI
      ipa_init_modem_driver_req structure are changed to be ipa_mem_bounds
      rather than ipa_mem_array structures.  Third, we set the "end" value
      for each routing table to be the last index, rather than setting the
      "count" to be the number of indices.  Finally, instead of allowing
      the modem to use all of a routing table's memory, it is limited to
      just the portion meant to be used by the modem.  In all versions of
      IPA currently supported, that is IPA_ROUTE_MODEM_COUNT (8) entries.
      
      Update a few comments for clarity.
      
      Fixes: 530f9216 ("soc: qcom: ipa: AP/modem communications")
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Link: https://lore.kernel.org/r/20220913204602.1803004-1-elder@linaro.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      cf412ec3
    • Liang He's avatar
      of: mdio: Add of_node_put() when breaking out of for_each_xx · 1c48709e
      Liang He authored
      In of_mdiobus_register(), we should call of_node_put() for 'child'
      escaped out of for_each_available_child_of_node().
      
      Fixes: 66bdede4 ("of_mdio: Fix broken PHY IRQ in case of probe deferral")
      Co-developed-by: default avatarMiaoqian Lin <linmq006@gmail.com>
      Signed-off-by: default avatarMiaoqian Lin <linmq006@gmail.com>
      Signed-off-by: default avatarLiang He <windhl@126.com>
      Link: https://lore.kernel.org/r/20220913125659.3331969-1-windhl@126.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1c48709e
    • Cong Wang's avatar
      tcp: read multiple skbs in tcp_read_skb() · db4192a7
      Cong Wang authored
      Before we switched to ->read_skb(), ->read_sock() was passed with
      desc.count=1, which technically indicates we only read one skb per
      ->sk_data_ready() call. However, for TCP, this is not true.
      
      TCP at least has sk_rcvlowat which intentionally holds skb's in
      receive queue until this watermark is reached. This means when
      ->sk_data_ready() is invoked there could be multiple skb's in the
      queue, therefore we have to read multiple skbs in tcp_read_skb()
      instead of one.
      
      Fixes: 965b57b4 ("net: Introduce a new proto_ops ->read_skb()")
      Reported-by: default avatarPeilin Ye <peilin.ye@bytedance.com>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jakub Sitnicki <jakub@cloudflare.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarCong Wang <cong.wang@bytedance.com>
      Link: https://lore.kernel.org/r/20220912173553.235838-1-xiyou.wangcong@gmail.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      db4192a7