1. 30 Jul, 2022 3 commits
    • Sebastian Andrzej Siewior's avatar
      fs/dcache: Move the wakeup from __d_lookup_done() to the caller. · 45f78b0a
      Sebastian Andrzej Siewior authored
      __d_lookup_done() wakes waiters on dentry->d_wait.  On PREEMPT_RT we are
      not allowed to do that with preemption disabled, since the wakeup
      acquired wait_queue_head::lock, which is a "sleeping" spinlock on RT.
      
      Calling it under dentry->d_lock is not a problem, since that is also a
      "sleeping" spinlock on the same configs.  Unfortunately, two of its
      callers (__d_add() and __d_move()) are holding more than just ->d_lock
      and that needs to be dealt with.
      
      The key observation is that wakeup can be moved to any point before
      dropping ->d_lock.
      
      As a first step to solve this, move the wake up outside of the
      hlist_bl_lock() held section.
      
      This is safe because:
      
      Waiters get inserted into ->d_wait only after they'd taken ->d_lock
      and observed DCACHE_PAR_LOOKUP in flags.  As long as they are
      woken up (and evicted from the queue) between the moment __d_lookup_done()
      has removed DCACHE_PAR_LOOKUP and dropping ->d_lock, we are safe,
      since the waitqueue ->d_wait points to won't get destroyed without
      having __d_lookup_done(dentry) called (under ->d_lock).
      
      ->d_wait is set only by d_alloc_parallel() and only in case when
      it returns a freshly allocated in-lookup dentry.  Whenever that happens,
      we are guaranteed that __d_lookup_done() will be called for resulting
      dentry (under ->d_lock) before the wq in question gets destroyed.
      
      With two exceptions wq lives in call frame of the caller of
      d_alloc_parallel() and we have an explicit d_lookup_done() on the
      resulting in-lookup dentry before we leave that frame.
      
      One of those exceptions is nfs_call_unlink(), where wq is embedded into
      (dynamically allocated) struct nfs_unlinkdata.  It is destroyed in
      nfs_async_unlink_release() after an explicit d_lookup_done() on the
      dentry wq went into.
      
      Remaining exception is d_add_ci(). There wq is what we'd found in
      ->d_wait of d_add_ci() argument. Callers of d_add_ci() are two
      instances of ->d_lookup() and they must have been given an in-lookup
      dentry.  Which means that they'd been called by __lookup_slow() or
      lookup_open(), with wq in the call frame of one of those.
      
      Result of d_alloc_parallel() in d_add_ci() is fed to
      d_splice_alias(), which either returns non-NULL (and d_add_ci() does
      d_lookup_done()) or feeds dentry to __d_add() that will do
      __d_lookup_done() under ->d_lock.  That concludes the analysis.
      
      Let __d_lookup_unhash():
      
        1) Lock the lookup hash and clear DCACHE_PAR_LOOKUP
        2) Unhash the dentry
        3) Retrieve and clear dentry::d_wait
        4) Unlock the hash and return the retrieved waitqueue head pointer
        5) Let the caller handle the wake up.
        6) Rename __d_lookup_done() to __d_lookup_unhash_wake() to enforce
           build failures for OOT code that used __d_lookup_done() and is not
           aware of the new return value.
      
      This does not yet solve the PREEMPT_RT problem completely because
      preemption is still disabled due to i_dir_seq being held for write. This
      will be addressed in subsequent steps.
      
      An alternative solution would be to switch the waitqueue to a simple
      waitqueue, but aside of Linus not being a fan of them, moving the wake up
      closer to the place where dentry::lock is unlocked reduces lock contention
      time for the woken up waiter.
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Link: https://lkml.kernel.org/r/20220613140712.77932-3-bigeasy@linutronix.deSigned-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      45f78b0a
    • Sebastian Andrzej Siewior's avatar
      fs/dcache: Disable preemption on i_dir_seq write side on PREEMPT_RT · cf634d54
      Sebastian Andrzej Siewior authored
      i_dir_seq is a sequence counter with a lock which is represented by the
      lowest bit. The writer atomically updates the counter which ensures that it
      can be modified by only one writer at a time. This requires preemption to
      be disabled across the write side critical section.
      
      On !PREEMPT_RT kernels this is implicit by the caller acquiring
      dentry::lock. On PREEMPT_RT kernels spin_lock() does not disable preemption
      which means that a preempting writer or reader would live lock. It's
      therefore required to disable preemption explicitly.
      
      An alternative solution would be to replace i_dir_seq with a seqlock_t for
      PREEMPT_RT, but that comes with its own set of problems due to arbitrary
      lock nesting. A pure sequence count with an associated spinlock is not
      possible because the locks held by the caller are not necessarily related.
      
      As the critical section is small, disabling preemption is a sensible
      solution.
      
      Reported-by: Oleg.Karfich@wago.com
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Link: https://lkml.kernel.org/r/20220613140712.77932-2-bigeasy@linutronix.deSigned-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      cf634d54
    • Al Viro's avatar
      d_add_ci(): make sure we don't miss d_lookup_done() · 40a3cb0d
      Al Viro authored
      All callers of d_alloc_parallel() must make sure that resulting
      in-lookup dentry (if any) will encounter __d_lookup_done() before
      the final dput().  d_add_ci() might end up creating in-lookup
      dentries; they are fed to d_splice_alias(), which will normally
      make sure they meet __d_lookup_done().  However, it is possible
      to end up with d_splice_alias() failing with ERR_PTR(-ELOOP)
      without having done so.  It takes a corrupted ntfs or case-insensitive
      xfs image, but neither should end up with memory corruption...
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      40a3cb0d
  2. 12 Jun, 2022 10 commits
  3. 11 Jun, 2022 9 commits
    • Linus Torvalds's avatar
      Merge tag 'gpio-fixes-for-v5.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux · 7a68065e
      Linus Torvalds authored
      Pull gpio fixes from Bartosz Golaszewski:
       "A set of fixes. Most address the new warning we emit at build time
        when irq chips are not immutable with some additional tweaks to
        gpio-crystalcove from Andy and a small tweak to gpio-dwapd.
      
         - make irq_chip structs immutable in several Diolan and intel drivers
           to get rid of the new warning we emit when fiddling with irq chips
      
         - don't print error messages on probe deferral in gpio-dwapb"
      
      * tag 'gpio-fixes-for-v5.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
        gpio: dwapb: Don't print error on -EPROBE_DEFER
        gpio: dln2: make irq_chip immutable
        gpio: sch: make irq_chip immutable
        gpio: merrifield: make irq_chip immutable
        gpio: wcove: make irq_chip immutable
        gpio: crystalcove: Join function declarations and long lines
        gpio: crystalcove: Use specific type and API for IRQ number
        gpio: crystalcove: make irq_chip immutable
      7a68065e
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · cecb3540
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "Driver fixes and and one core patch.
      
        Nine of the driver patches are minor fixes and reworks to lpfc and the
        rest are trivial and minor fixes elsewhere"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: pmcraid: Fix missing resource cleanup in error case
        scsi: ipr: Fix missing/incorrect resource cleanup in error case
        scsi: mpt3sas: Fix out-of-bounds compiler warning
        scsi: lpfc: Update lpfc version to 14.2.0.4
        scsi: lpfc: Allow reduced polling rate for nvme_admin_async_event cmd completion
        scsi: lpfc: Add more logging of cmd and cqe information for aborted NVMe cmds
        scsi: lpfc: Fix port stuck in bypassed state after LIP in PT2PT topology
        scsi: lpfc: Resolve NULL ptr dereference after an ELS LOGO is aborted
        scsi: lpfc: Address NULL pointer dereference after starget_to_rport()
        scsi: lpfc: Resolve some cleanup issues following SLI path refactoring
        scsi: lpfc: Resolve some cleanup issues following abort path refactoring
        scsi: lpfc: Correct BDE type for XMIT_SEQ64_WQE in lpfc_ct_reject_event()
        scsi: vmw_pvscsi: Expand vcpuHint to 16 bits
        scsi: sd: Fix interpretation of VPD B9h length
      cecb3540
    • Linus Torvalds's avatar
      Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost · abe71eb3
      Linus Torvalds authored
      Pull virtio fixes from Michael Tsirkin:
       "Fixes all over the place, most notably fixes for latent bugs in
        drivers that got exposed by suppressing interrupts before DRIVER_OK,
        which in turn has been done by 8b4ec69d ("virtio: harden vring
        IRQ")"
      
      * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
        um: virt-pci: set device ready in probe()
        vdpa: make get_vq_group and set_group_asid optional
        virtio: Fix all occurences of the "the the" typo
        vduse: Fix NULL pointer dereference on sysfs access
        vringh: Fix loop descriptors check in the indirect cases
        vdpa/mlx5: clean up indenting in handle_ctrl_vlan()
        vdpa/mlx5: fix error code for deleting vlan
        virtio-mmio: fix missing put_device() when vm_cmdline_parent registration failed
        vdpa/mlx5: Fix syntax errors in comments
        virtio-rng: make device ready before making request
      abe71eb3
    • Linus Torvalds's avatar
      Merge tag 'loongarch-fixes-5.19-1' of... · 0678afa6
      Linus Torvalds authored
      Merge tag 'loongarch-fixes-5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
      
      Pull LoongArch fixes from Huacai Chen.
       "Fix build errors and a stale comment"
      
      * tag 'loongarch-fixes-5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
        LoongArch: Remove MIPS comment about cycle counter
        LoongArch: Fix copy_thread() build errors
        LoongArch: Fix the !CONFIG_SMP build
      0678afa6
    • Linus Torvalds's avatar
      iov_iter: fix build issue due to possible type mis-match · 1c27f1fc
      Linus Torvalds authored
      Commit 6c776766 ("iov_iter: Fix iter_xarray_get_pages{,_alloc}()")
      introduced a problem on some 32-bit architectures (at least arm, xtensa,
      csky,sparc and mips), that have a 'size_t' that is 'unsigned int'.
      
      The reason is that we now do
      
          min(nr * PAGE_SIZE - offset, maxsize);
      
      where 'nr' and 'offset' and both 'unsigned int', and PAGE_SIZE is
      'unsigned long'.  As a result, the normal C type rules means that the
      first argument to 'min()' ends up being 'unsigned long'.
      
      In contrast, 'maxsize' is of type 'size_t'.
      
      Now, 'size_t' and 'unsigned long' are always the same physical type in
      the kernel, so you'd think this doesn't matter, and from an actual
      arithmetic standpoint it doesn't.
      
      But on 32-bit architectures 'size_t' is commonly 'unsigned int', even if
      it could also be 'unsigned long'.  In that situation, both are unsigned
      32-bit types, but they are not the *same* type.
      
      And as a result 'min()' will complain about the distinct types (ignore
      the "pointer types" part of the error message: that's an artifact of the
      way we have made 'min()' check types for being the same):
      
        lib/iov_iter.c: In function 'iter_xarray_get_pages':
        include/linux/minmax.h:20:35: error: comparison of distinct pointer types lacks a cast [-Werror]
           20 |         (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1)))
              |                                   ^~
        lib/iov_iter.c:1464:16: note: in expansion of macro 'min'
         1464 |         return min(nr * PAGE_SIZE - offset, maxsize);
              |                ^~~
      
      This was not visible on 64-bit architectures (where we always define
      'size_t' to be 'unsigned long').
      
      Force these cases to use 'min_t(size_t, x, y)' to make the type explicit
      and avoid the issue.
      
      [ Nit-picky note: technically 'size_t' doesn't have to match 'unsigned
        long' arithmetically. We've certainly historically seen environments
        with 16-bit address spaces and 32-bit 'unsigned long'.
      
        Similarly, even in 64-bit modern environments, 'size_t' could be its
        own type distinct from 'unsigned long', even if it were arithmetically
        identical.
      
        So the above type commentary is only really descriptive of the kernel
        environment, not some kind of universal truth for the kinds of wild
        and crazy situations that are allowed by the C standard ]
      Reported-by: default avatarSudip Mukherjee <sudipm.mukherjee@gmail.com>
      Link: https://lore.kernel.org/all/YqRyL2sIqQNDfky2@debian/
      Cc: Jeff Layton <jlayton@kernel.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1c27f1fc
    • Jason A. Donenfeld's avatar
      wireguard: selftests: use maximum cpu features and allow rng seeding · 17b0128a
      Jason A. Donenfeld authored
      By forcing the maximum CPU that QEMU has available, we expose additional
      capabilities, such as the RNDR instruction, which increases test
      coverage. This then allows the CI to skip the fake seeding step in some
      cases. Also enable STRICT_KERNEL_RWX to catch issues related to early
      jump labels when the RNG is initialized at boot.
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      17b0128a
    • Kuan-Ying Lee's avatar
      scripts/gdb: change kernel config dumping method · 1f7a6cf6
      Kuan-Ying Lee authored
      MAGIC_START("IKCFG_ST") and MAGIC_END("IKCFG_ED") are moved out
      from the kernel_config_data variable.
      
      Thus, we parse kernel_config_data directly instead of considering
      offset of MAGIC_START and MAGIC_END.
      
      Fixes: 13610aa9 ("kernel/configs: use .incbin directive to embed config_data.gz")
      Signed-off-by: default avatarKuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      1f7a6cf6
    • Vincent Whitchurch's avatar
      um: virt-pci: set device ready in probe() · eacea844
      Vincent Whitchurch authored
      Call virtio_device_ready() to make this driver work after commit
      b4ec69d7e09 ("virtio: harden vring IRQ"), since the driver uses the
      virtqueues in the probe function.  (The virtio core sets the device
      ready when probe returns.)
      
      Fixes: 8b4ec69d ("virtio: harden vring IRQ")
      Fixes: 68f5d3f3 ("um: add PCI over virtio emulation driver")
      Signed-off-by: default avatarVincent Whitchurch <vincent.whitchurch@axis.com>
      Message-Id: <20220610151203.3492541-1-vincent.whitchurch@axis.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Tested-by: default avatarJohannes Berg <johannes@sipsolutions.net>
      eacea844
    • Linus Torvalds's avatar
      Merge tag 'nfsd-5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux · 0885eacd
      Linus Torvalds authored
      Pull nfsd fixes from Chuck Lever:
       "Notable changes:
      
         - There is now a backup maintainer for NFSD
      
        Notable fixes:
      
         - Prevent array overruns in svc_rdma_build_writes()
      
         - Prevent buffer overruns when encoding NFSv3 READDIR results
      
         - Fix a potential UAF in nfsd_file_put()"
      
      * tag 'nfsd-5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
        SUNRPC: Remove pointer type casts from xdr_get_next_encode_buffer()
        SUNRPC: Clean up xdr_get_next_encode_buffer()
        SUNRPC: Clean up xdr_commit_encode()
        SUNRPC: Optimize xdr_reserve_space()
        SUNRPC: Fix the calculation of xdr->end in xdr_get_next_encode_buffer()
        SUNRPC: Trap RDMA segment overflows
        NFSD: Fix potential use-after-free in nfsd_file_put()
        MAINTAINERS: reciprocal co-maintainership for file locking and nfsd
      0885eacd
  4. 10 Jun, 2022 18 commits