1. 20 Dec, 2022 7 commits
    • Jakub Kicinski's avatar
      Merge tag 'linux-can-fixes-for-6.2-20221219' of... · 4be84df3
      Jakub Kicinski authored
      Merge tag 'linux-can-fixes-for-6.2-20221219' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
      
      Marc Kleine-Budde says:
      
      ====================
      pull-request: can 2022-12-19
      
      The first patch is by Vincent Mailhol and adds the etas_es58x
      devlink documentation to the index.
      
      Haibo Chen's patch for the flexcan driver fixes a unbalanced
      pm_runtime_enable warning.
      
      The last patch is by me, targets the kvaser_usb driver and fixes
      an error occurring with gcc-13.
      
      * tag 'linux-can-fixes-for-6.2-20221219' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
        can: kvaser_usb: hydra: help gcc-13 to figure out cmd_len
        can: flexcan: avoid unbalanced pm_runtime_enable warning
        Documentation: devlink: add missing toc entry for etas_es58x devlink doc
      ====================
      
      Link: https://lore.kernel.org/r/20221219155210.1143439-1-mkl@pengutronix.deSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      4be84df3
    • Jakub Kicinski's avatar
      Merge branch 'stop-corrupting-socket-s-task_frag' · 918fb1aa
      Jakub Kicinski authored
      Benjamin Coddington says:
      
      ====================
      Stop corrupting socket's task_frag
      
      The networking code uses flags in sk_allocation to determine if it can use
      current->task_frag, however in-kernel users of sockets may stop setting
      sk_allocation when they convert to the preferred memalloc_nofs_save/restore,
      as SUNRPC has done in commit a1231fda ("SUNRPC: Set memalloc_nofs_save()
      on all rpciod/xprtiod jobs").
      
      This will cause corruption in current->task_frag when recursing into the
      network layer for those subsystems during page fault or reclaim.  The
      corruption is difficult to diagnose because stack traces may not contain the
      offending subsystem at all.  The corruption is unlikely to show up in
      testing because it requires memory pressure, and so subsystems that
      convert to memalloc_nofs_save/restore are likely to continue to run into
      this issue.
      
      Previous reports and proposed fixes:
      https://lore.kernel.org/netdev/96a18bd00cbc6cb554603cc0d6ef1c551965b078.1663762494.git.gnault@redhat.com/
      https://lore.kernel.org/netdev/b4d8cb09c913d3e34f853736f3f5628abfd7f4b6.1656699567.git.gnault@redhat.com/
      https://lore.kernel.org/linux-nfs/de6d99321d1dcaa2ad456b92b3680aa77c07a747.1665401788.git.gnault@redhat.com/
      
      Guilluame Nault has done all of the hard work tracking this problem down and
      finding the best fix for this issue.  I'm just taking a turn posting another
      fix.
      ====================
      
      Link: https://lore.kernel.org/r/cover.1671194454.git.bcodding@redhat.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      918fb1aa
    • Benjamin Coddington's avatar
      net: simplify sk_page_frag · 08f65892
      Benjamin Coddington authored
      Now that in-kernel socket users that may recurse during reclaim have benn
      converted to sk_use_task_frag = false, we can have sk_page_frag() simply
      check that value.
      Signed-off-by: default avatarBenjamin Coddington <bcodding@redhat.com>
      Reviewed-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      08f65892
    • Benjamin Coddington's avatar
      Treewide: Stop corrupting socket's task_frag · 98123866
      Benjamin Coddington authored
      Since moving to memalloc_nofs_save/restore, SUNRPC has stopped setting the
      GFP_NOIO flag on sk_allocation which the networking system uses to decide
      when it is safe to use current->task_frag.  The results of this are
      unexpected corruption in task_frag when SUNRPC is involved in memory
      reclaim.
      
      The corruption can be seen in crashes, but the root cause is often
      difficult to ascertain as a crashing machine's stack trace will have no
      evidence of being near NFS or SUNRPC code.  I believe this problem to
      be much more pervasive than reports to the community may indicate.
      
      Fix this by having kernel users of sockets that may corrupt task_frag due
      to reclaim set sk_use_task_frag = false.  Preemptively correcting this
      situation for users that still set sk_allocation allows them to convert to
      memalloc_nofs_save/restore without the same unexpected corruptions that are
      sure to follow, unlikely to show up in testing, and difficult to bisect.
      
      CC: Philipp Reisner <philipp.reisner@linbit.com>
      CC: Lars Ellenberg <lars.ellenberg@linbit.com>
      CC: "Christoph Böhmwalder" <christoph.boehmwalder@linbit.com>
      CC: Jens Axboe <axboe@kernel.dk>
      CC: Josef Bacik <josef@toxicpanda.com>
      CC: Keith Busch <kbusch@kernel.org>
      CC: Christoph Hellwig <hch@lst.de>
      CC: Sagi Grimberg <sagi@grimberg.me>
      CC: Lee Duncan <lduncan@suse.com>
      CC: Chris Leech <cleech@redhat.com>
      CC: Mike Christie <michael.christie@oracle.com>
      CC: "James E.J. Bottomley" <jejb@linux.ibm.com>
      CC: "Martin K. Petersen" <martin.petersen@oracle.com>
      CC: Valentina Manea <valentina.manea.m@gmail.com>
      CC: Shuah Khan <shuah@kernel.org>
      CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      CC: David Howells <dhowells@redhat.com>
      CC: Marc Dionne <marc.dionne@auristor.com>
      CC: Steve French <sfrench@samba.org>
      CC: Christine Caulfield <ccaulfie@redhat.com>
      CC: David Teigland <teigland@redhat.com>
      CC: Mark Fasheh <mark@fasheh.com>
      CC: Joel Becker <jlbec@evilplan.org>
      CC: Joseph Qi <joseph.qi@linux.alibaba.com>
      CC: Eric Van Hensbergen <ericvh@gmail.com>
      CC: Latchesar Ionkov <lucho@ionkov.net>
      CC: Dominique Martinet <asmadeus@codewreck.org>
      CC: Ilya Dryomov <idryomov@gmail.com>
      CC: Xiubo Li <xiubli@redhat.com>
      CC: Chuck Lever <chuck.lever@oracle.com>
      CC: Jeff Layton <jlayton@kernel.org>
      CC: Trond Myklebust <trond.myklebust@hammerspace.com>
      CC: Anna Schumaker <anna@kernel.org>
      CC: Steffen Klassert <steffen.klassert@secunet.com>
      CC: Herbert Xu <herbert@gondor.apana.org.au>
      Suggested-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarBenjamin Coddington <bcodding@redhat.com>
      Reviewed-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      98123866
    • Guillaume Nault's avatar
      net: Introduce sk_use_task_frag in struct sock. · fb87bd47
      Guillaume Nault authored
      Sockets that can be used while recursing into memory reclaim, like
      those used by network block devices and file systems, mustn't use
      current->task_frag: if the current process is already using it, then
      the inner memory reclaim call would corrupt the task_frag structure.
      
      To avoid this, sk_page_frag() uses ->sk_allocation to detect sockets
      that mustn't use current->task_frag, assuming that those used during
      memory reclaim had their allocation constraints reflected in
      ->sk_allocation.
      
      This unfortunately doesn't cover all cases: in an attempt to remove all
      usage of GFP_NOFS and GFP_NOIO, sunrpc stopped setting these flags in
      ->sk_allocation, and used memalloc_nofs critical sections instead.
      This breaks the sk_page_frag() heuristic since the allocation
      constraints are now stored in current->flags, which sk_page_frag()
      can't read without risking triggering a cache miss and slowing down
      TCP's fast path.
      
      This patch creates a new field in struct sock, named sk_use_task_frag,
      which sockets with memory reclaim constraints can set to false if they
      can't safely use current->task_frag. In such cases, sk_page_frag() now
      always returns the socket's page_frag (->sk_frag). The first user is
      sunrpc, which needs to avoid using current->task_frag but can keep
      ->sk_allocation set to GFP_KERNEL otherwise.
      
      Eventually, it might be possible to simplify sk_page_frag() by only
      testing ->sk_use_task_frag and avoid relying on the ->sk_allocation
      heuristic entirely (assuming other sockets will set ->sk_use_task_frag
      according to their constraints in the future).
      
      The new ->sk_use_task_frag field is placed in a hole in struct sock and
      belongs to a cache line shared with ->sk_shutdown. Therefore it should
      be hot and shouldn't have negative performance impacts on TCP's fast
      path (sk_shutdown is tested just before the while() loop in
      tcp_sendmsg_locked()).
      
      Link: https://lore.kernel.org/netdev/b4d8cb09c913d3e34f853736f3f5628abfd7f4b6.1656699567.git.gnault@redhat.com/Signed-off-by: default avatarGuillaume Nault <gnault@redhat.com>
      Reviewed-by: default avatarBenjamin Coddington <bcodding@redhat.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      fb87bd47
    • Matt Johnston's avatar
      mctp: Remove device type check at unregister · b389a902
      Matt Johnston authored
      The unregister check could be incorrectly triggered if a netdev
      changes its type after register. That is possible for a tun device
      using TUNSETLINK ioctl, resulting in mctp unregister failing
      and the netdev unregister waiting forever.
      
      This was encountered by https://github.com/openthread/openthread/issues/8523
      
      Neither check at register or unregister is required. They were added in
      an attempt to track down mctp_ptr being set unexpectedly, which should
      not happen in normal operation.
      
      Fixes: 7b1871af ("mctp: Warn if pointer is set for a wrong dev type")
      Signed-off-by: default avatarMatt Johnston <matt@codeconstruct.com.au>
      Link: https://lore.kernel.org/r/20221215054933.2403401-1-matt@codeconstruct.com.auSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b389a902
    • Arun Ramadoss's avatar
      net: dsa: microchip: remove IRQF_TRIGGER_FALLING in request_threaded_irq · 62e027fb
      Arun Ramadoss authored
      KSZ swithes used interrupts for detecting the phy link up and down.
      During registering the interrupt handler, it used IRQF_TRIGGER_FALLING
      flag. But this flag has to be retrieved from device tree instead of hard
      coding in the driver, so removing the flag.
      
      Fixes: ff319a64 ("net: dsa: microchip: move interrupt handling logic from lan937x to ksz_common")
      Reported-by: default avatarChristian Eggers <ceggers@arri.de>
      Signed-off-by: default avatarArun Ramadoss <arun.ramadoss@microchip.com>
      Link: https://lore.kernel.org/r/20221213101440.24667-1-arun.ramadoss@microchip.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      62e027fb
  2. 19 Dec, 2022 19 commits
  3. 18 Dec, 2022 1 commit
    • David S. Miller's avatar
      Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · 89529367
      David S. Miller authored
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2022-12-15 (igc)
      
      Muhammad Husaini Zulkifli says:
      
      This patch series fixes bugs for the Time-Sensitive Networking(TSN)
      Qbv Scheduling features.
      
      An overview of each patch series is given below:
      
      Patch 1: Using a first flag bit to schedule a packet to the next cycle if
      packet cannot fit in current Qbv cycle.
      Patch 2: Enable strict cycle for Qbv scheduling.
      Patch 3: Prevent user to set basetime less than zero during tc config.
      Patch 4: Allow the basetime enrollment with zero value.
      Patch 5: Calculate the new end time value to exclude the time interval that
      exceed the cycle time as user can specify the cycle time in tc config.
      Patch 6: Resolve the HW bugs where the gate is not fully closed.
      ---
      This contains the net patches from this original pull request:
      https://lore.kernel.org/netdev/20221205212414.3197525-1-anthony.l.nguyen@intel.com/
      
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      89529367
  4. 17 Dec, 2022 3 commits
  5. 16 Dec, 2022 7 commits
    • Jakub Kicinski's avatar
      Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 13e3c779
      Jakub Kicinski authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2022-12-16
      
      We've added 7 non-merge commits during the last 2 day(s) which contain
      a total of 9 files changed, 119 insertions(+), 36 deletions(-).
      
      1) Fix for recent syzkaller XDP dispatcher update splat, from Jiri Olsa.
      
      2) Fix BPF program refcount leak in LSM attachment failure path,
         from Milan Landaverde.
      
      3) Fix BPF program type in map compatibility check for fext,
         from Toke Høiland-Jørgensen.
      
      4) Fix a BPF selftest compilation error under !CONFIG_SMP config,
         from Yonghong Song.
      
      5) Fix CI to enable CONFIG_FUNCTION_ERROR_INJECTION after it got changed
         to a prompt, from Song Liu.
      
      6) Various BPF documentation fixes for socket local storage,
         from Donald Hunter.
      
      * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
        selftests/bpf: Add a test for using a cpumap from an freplace-to-XDP program
        bpf: Resolve fext program type when checking map compatibility
        bpf: Synchronize dispatcher update with bpf_dispatcher_xdp_func
        bpf: prevent leak of lsm program after failed attach
        selftests/bpf: Select CONFIG_FUNCTION_ERROR_INJECTION
        selftests/bpf: Fix a selftest compilation error with CONFIG_SMP=n
        docs/bpf: Reword docs for BPF_MAP_TYPE_SK_STORAGE
      ====================
      
      Link: https://lore.kernel.org/r/20221216174540.16598-1-daniel@iogearbox.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      13e3c779
    • Eelco Chaudron's avatar
      openvswitch: Fix flow lookup to use unmasked key · 68bb1010
      Eelco Chaudron authored
      The commit mentioned below causes the ovs_flow_tbl_lookup() function
      to be called with the masked key. However, it's supposed to be called
      with the unmasked key. This due to the fact that the datapath supports
      installing wider flows, and OVS relies on this behavior. For example
      if ipv4(src=1.1.1.1/192.0.0.0, dst=1.1.1.2/192.0.0.0) exists, a wider
      flow (smaller mask) of ipv4(src=192.1.1.1/128.0.0.0,dst=192.1.1.2/
      128.0.0.0) is allowed to be added.
      
      However, if we try to add a wildcard rule, the installation fails:
      
      $ ovs-appctl dpctl/add-flow system@myDP "in_port(1),eth_type(0x0800), \
        ipv4(src=1.1.1.1/192.0.0.0,dst=1.1.1.2/192.0.0.0,frag=no)" 2
      $ ovs-appctl dpctl/add-flow system@myDP "in_port(1),eth_type(0x0800), \
        ipv4(src=192.1.1.1/0.0.0.0,dst=49.1.1.2/0.0.0.0,frag=no)" 2
      ovs-vswitchd: updating flow table (File exists)
      
      The reason is that the key used to determine if the flow is already
      present in the system uses the original key ANDed with the mask.
      This results in the IP address not being part of the (miniflow) key,
      i.e., being substituted with an all-zero value. When doing the actual
      lookup, this results in the key wrongfully matching the first flow,
      and therefore the flow does not get installed.
      
      This change reverses the commit below, but rather than having the key
      on the stack, it's allocated.
      
      Fixes: 190aa3e7 ("openvswitch: Fix Frame-size larger than 1024 bytes warning.")
      Signed-off-by: default avatarEelco Chaudron <echaudro@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      68bb1010
    • David S. Miller's avatar
      Merge branch 'devlink-fixes' · 3e31d209
      David S. Miller authored
      Jakub Kicinski says:
      
      ====================
      devlink: region snapshot locking fix and selftest adjustments
      
      Minor fix for region snapshot locking and adjustments to selftests.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3e31d209
    • Jakub Kicinski's avatar
      selftests: devlink: add a warning for interfaces coming up · d1c4a346
      Jakub Kicinski authored
      NetworkManager (and other daemons) may bring the interface up
      and cause failures in quiescence checks. Print a helpful warning,
      and take the interface down again.
      
      I seem to forget about this every time I run these tests on a new VM.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d1c4a346
    • Jakub Kicinski's avatar
      selftests: devlink: fix the fd redirect in dummy_reporter_test · 2fc60e2f
      Jakub Kicinski authored
      $number + > bash means redirect FD $number, e.g. commonly
      used 2> redirects stderr (fd 2). The test uses 8192> to
      write the number 8192 to a file, this results in:
      
        ./devlink.sh: line 499: 8192: Bad file descriptor
      
      Oddly the test also papers over this issue by checking
      for failure (expecting an error rather than success)
      so it passes, anyway.
      
      Fixes: ff18176a ("selftests: Add a test of large binary to devlink health test")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2fc60e2f
    • Jakub Kicinski's avatar
      devlink: hold region lock when flushing snapshots · b4cafb3d
      Jakub Kicinski authored
      Netdevsim triggers a splat on reload, when it destroys regions
      with snapshots pending:
      
        WARNING: CPU: 1 PID: 787 at net/core/devlink.c:6291 devlink_region_snapshot_del+0x12e/0x140
        CPU: 1 PID: 787 Comm: devlink Not tainted 6.1.0-07460-g7ae9888d #580
        RIP: 0010:devlink_region_snapshot_del+0x12e/0x140
        Call Trace:
         <TASK>
         devl_region_destroy+0x70/0x140
         nsim_dev_reload_down+0x2f/0x60 [netdevsim]
         devlink_reload+0x1f7/0x360
         devlink_nl_cmd_reload+0x6ce/0x860
         genl_family_rcv_msg_doit.isra.0+0x145/0x1c0
      
      This is the locking assert in devlink_region_snapshot_del(),
      we're supposed to be holding the region->snapshot_lock here.
      
      Fixes: 2dec18ad ("net: devlink: remove region snapshots list dependency on devlink->lock")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b4cafb3d
    • Daniel Golle's avatar
      net: dsa: mt7530: remove redundant assignment · 32f1002e
      Daniel Golle authored
      Russell King correctly pointed out that the MAC_2500FD capability is
      already added for port 5 (if not in RGMII mode) and port 6 (which only
      supports SGMII) by mt7531_mac_port_get_caps. Remove the reduntant
      setting of this capability flag which was added by a previous commit.
      
      Fixes: e19de30d ("net: dsa: mt7530: add support for in-band link status")
      Reported-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Reviewed-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarDaniel Golle <daniel@makrotopia.org>
      Link: https://lore.kernel.org/r/Y5qY7x6la5TxZxzX@makrotopia.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      32f1002e
  6. 15 Dec, 2022 3 commits