1. 25 Feb, 2024 16 commits
    • Linus Torvalds's avatar
      Merge tag 'pull-fixes.pathwalk-rcu-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 66a97c2e
      Linus Torvalds authored
      Pull RCU pathwalk fixes from Al Viro:
       "We still have some races in filesystem methods when exposed to RCU
        pathwalk. This series is a result of code audit (the second round of
        it) and it should deal with most of that stuff.
      
        Still pending: ntfs3 ->d_hash()/->d_compare() and ceph_d_revalidate().
        Up to maintainers (a note for NTFS folks - when documentation says
        that a method may not block, it *does* imply that blocking allocations
        are to be avoided. Really)"
      
      [ More explanations for people who aren't familiar with the vagaries of
        RCU path walking: most of it is hidden from filesystems, but if a
        filesystem actively participates in the low-level path walking it
        needs to make sure the fields involved in that walk are RCU-safe.
      
        That "actively participate in low-level path walking" includes things
        like having its own ->d_hash()/->d_compare() routines, or by having
        its own directory permission function that doesn't just use the common
        helpers.  Having a ->d_revalidate() function will also have this issue.
      
        Note that instead of making everything RCU safe you can also choose to
        abort the RCU pathwalk if your operation cannot be done safely under
        RCU, but that obviously comes with a performance penalty. One common
        pattern is to allow the simple cases under RCU, and abort only if you
        need to do something more complicated.
      
        So not everything needs to be RCU-safe, and things like the inode etc
        that the VFS itself maintains obviously already are. But these fixes
        tend to be about properly RCU-delaying things like ->s_fs_info that
        are maintained by the filesystem and that got potentially released too
        early.   - Linus ]
      
      * tag 'pull-fixes.pathwalk-rcu-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        ext4_get_link(): fix breakage in RCU mode
        cifs_get_link(): bail out in unsafe case
        fuse: fix UAF in rcu pathwalks
        procfs: make freeing proc_fs_info rcu-delayed
        procfs: move dropping pde and pid from ->evict_inode() to ->free_inode()
        nfs: fix UAF on pathwalk running into umount
        nfs: make nfs_set_verifier() safe for use in RCU pathwalk
        afs: fix __afs_break_callback() / afs_drop_open_mmap() race
        hfsplus: switch to rcu-delayed unloading of nls and freeing ->s_fs_info
        exfat: move freeing sbi, upcase table and dropping nls into rcu-delayed helper
        affs: free affs_sb_info with kfree_rcu()
        rcu pathwalk: prevent bogus hard errors from may_lookup()
        fs/super.c: don't drop ->s_user_ns until we free struct super_block itself
      66a97c2e
    • Linus Torvalds's avatar
      Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 9b243492
      Linus Torvalds authored
      Pull vfs fixes from Al Viro:
       "A couple of fixes - revert of regression from this cycle and a fix for
        erofs failure exit breakage (had been there since way back)"
      
      * tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        erofs: fix handling kern_mount() failure
        Revert "get rid of DCACHE_GENOCIDE"
      9b243492
    • Al Viro's avatar
      ext4_get_link(): fix breakage in RCU mode · 9fa8e282
      Al Viro authored
      1) errors from ext4_getblk() should not be propagated to caller
      unless we are really sure that we would've gotten the same error
      in non-RCU pathwalk.
      2) we leak buffer_heads if ext4_getblk() is successful, but bh is
      not uptodate.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      9fa8e282
    • Al Viro's avatar
      cifs_get_link(): bail out in unsafe case · 0511fdb4
      Al Viro authored
      ->d_revalidate() bails out there, anyway.  It's not enough
      to prevent getting into ->get_link() in RCU mode, but that
      could happen only in a very contrieved setup.  Not worth
      trying to do anything fancy here unless ->d_revalidate()
      stops kicking out of RCU mode at least in some cases.
      Reviewed-by: default avatarChristian Brauner <brauner@kernel.org>
      Acked-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      0511fdb4
    • Al Viro's avatar
      fuse: fix UAF in rcu pathwalks · 053fc4f7
      Al Viro authored
      ->permission(), ->get_link() and ->inode_get_acl() might dereference
      ->s_fs_info (and, in case of ->permission(), ->s_fs_info->fc->user_ns
      as well) when called from rcu pathwalk.
      
      Freeing ->s_fs_info->fc is rcu-delayed; we need to make freeing ->s_fs_info
      and dropping ->user_ns rcu-delayed too.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      053fc4f7
    • Al Viro's avatar
      procfs: make freeing proc_fs_info rcu-delayed · e31f0a57
      Al Viro authored
      makes proc_pid_ns() safe from rcu pathwalk (put_pid_ns()
      is still synchronous, but that's not a problem - it does
      rcu-delay everything that needs to be)
      Reviewed-by: default avatarChristian Brauner <brauner@kernel.org>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      e31f0a57
    • Al Viro's avatar
      procfs: move dropping pde and pid from ->evict_inode() to ->free_inode() · 47458802
      Al Viro authored
      that keeps both around until struct inode is freed, making access
      to them safe from rcu-pathwalk
      Acked-by: default avatarChristian Brauner <brauner@kernel.org>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      47458802
    • Al Viro's avatar
      nfs: fix UAF on pathwalk running into umount · c1b967d0
      Al Viro authored
      NFS ->d_revalidate(), ->permission() and ->get_link() need to access
      some parts of nfs_server when called in RCU mode:
      	server->flags
      	server->caps
      	*(server->io_stats)
      and, worst of all, call
      	server->nfs_client->rpc_ops->have_delegation
      (the last one - as NFS_PROTO(inode)->have_delegation()).  We really
      don't want to RCU-delay the entire nfs_free_server() (it would have
      to be done with schedule_work() from RCU callback, since it can't
      be made to run from interrupt context), but actual freeing of
      nfs_server and ->io_stats can be done via call_rcu() just fine.
      nfs_client part is handled simply by making nfs_free_client() use
      kfree_rcu().
      Acked-by: default avatarChristian Brauner <brauner@kernel.org>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      c1b967d0
    • Al Viro's avatar
      nfs: make nfs_set_verifier() safe for use in RCU pathwalk · 10a973fc
      Al Viro authored
      nfs_set_verifier() relies upon dentry being pinned; if that's
      the case, grabbing ->d_lock stabilizes ->d_parent and guarantees
      that ->d_parent points to a positive dentry.  For something
      we'd run into in RCU mode that is *not* true - dentry might've
      been through dentry_kill() just as we grabbed ->d_lock, with
      its parent going through the same just as we get to into
      nfs_set_verifier_locked().  It might get to detaching inode
      (and zeroing ->d_inode) before nfs_set_verifier_locked() gets
      to fetching that; we get an oops as the result.
      
      That can happen in nfs{,4} ->d_revalidate(); the call chain in
      question is nfs_set_verifier_locked() <- nfs_set_verifier() <-
      nfs_lookup_revalidate_delegated() <- nfs{,4}_do_lookup_revalidate().
      We have checked that the parent had been positive, but that's
      done before we get to nfs_set_verifier() and it's possible for
      memory pressure to pick our dentry as eviction candidate by that
      time.  If that happens, back-to-back attempts to kill dentry and
      its parent are quite normal.  Sure, in case of eviction we'll
      fail the ->d_seq check in the caller, but we need to survive
      until we return there...
      Acked-by: default avatarChristian Brauner <brauner@kernel.org>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      10a973fc
    • Al Viro's avatar
      afs: fix __afs_break_callback() / afs_drop_open_mmap() race · 275655d3
      Al Viro authored
      In __afs_break_callback() we might check ->cb_nr_mmap and if it's non-zero
      do queue_work(&vnode->cb_work).  In afs_drop_open_mmap() we decrement
      ->cb_nr_mmap and do flush_work(&vnode->cb_work) if it reaches zero.
      
      The trouble is, there's nothing to prevent __afs_break_callback() from
      seeing ->cb_nr_mmap before the decrement and do queue_work() after both
      the decrement and flush_work().  If that happens, we might be in trouble -
      vnode might get freed before the queued work runs.
      
      __afs_break_callback() is always done under ->cb_lock, so let's make
      sure that ->cb_nr_mmap can change from non-zero to zero while holding
      ->cb_lock (the spinlock component of it - it's a seqlock and we don't
      need to mess with the counter).
      Acked-by: default avatarChristian Brauner <brauner@kernel.org>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      275655d3
    • Al Viro's avatar
      hfsplus: switch to rcu-delayed unloading of nls and freeing ->s_fs_info · af072cf6
      Al Viro authored
      ->d_hash() and ->d_compare() use those, so we need to delay freeing
      them.
      Reviewed-by: default avatarChristian Brauner <brauner@kernel.org>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      af072cf6
    • Al Viro's avatar
      exfat: move freeing sbi, upcase table and dropping nls into rcu-delayed helper · a13d1a4d
      Al Viro authored
      That stuff can be accessed by ->d_hash()/->d_compare(); as it is, we have
      a hard-to-hit UAF if rcu pathwalk manages to get into ->d_hash() on a filesystem
      that is in process of getting shut down.
      
      Besides, having nls and upcase table cleanup moved from ->put_super() towards
      the place where sbi is freed makes for simpler failure exits.
      Acked-by: default avatarChristian Brauner <brauner@kernel.org>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      a13d1a4d
    • Al Viro's avatar
      affs: free affs_sb_info with kfree_rcu() · 529f89a9
      Al Viro authored
      one of the flags in it is used by ->d_hash()/->d_compare()
      Reviewed-by: default avatarChristian Brauner <brauner@kernel.org>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      529f89a9
    • Al Viro's avatar
      rcu pathwalk: prevent bogus hard errors from may_lookup() · cdb67fde
      Al Viro authored
      If lazy call of ->permission() returns a hard error, check that
      try_to_unlazy() succeeds before returning it.  That both makes
      life easier for ->permission() instances and closes the race
      in ENOTDIR handling - it is possible that positive d_can_lookup()
      seen in link_path_walk() applies to the state *after* unlink() +
      mkdir(), while nd->inode matches the state prior to that.
      
      Normally seeing e.g. EACCES from permission check in rcu pathwalk
      means that with some timings non-rcu pathwalk would've run into
      the same; however, running into a non-executable regular file
      in the middle of a pathname would not get to permission check -
      it would fail with ENOTDIR instead.
      Reviewed-by: default avatarChristian Brauner <brauner@kernel.org>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      cdb67fde
    • Al Viro's avatar
      fs/super.c: don't drop ->s_user_ns until we free struct super_block itself · 583340de
      Al Viro authored
      Avoids fun races in RCU pathwalk...  Same goes for freeing LSM shite
      hanging off super_block's arse.
      Reviewed-by: default avatarChristian Brauner <brauner@kernel.org>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      583340de
    • Linus Torvalds's avatar
      Merge tag 'powerpc-6.8-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · ab0a97cf
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
      
       - Fix a crash when hot adding a PCI device to an LPAR since
         recent changes
      
       - Fix nested KVM level-2 guest reboot failure due to empty
         'arch_compat'
      
      Thanks to Amit Machhiwal, Aneesh Kumar K.V (IBM), Brian King, Gaurav
      Batra, and Vaibhav Jain.
      
      * tag 'powerpc-6.8-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        KVM: PPC: Book3S HV: Fix L2 guest reboot failure due to empty 'arch_compat'
        powerpc/pseries/iommu: DLPAR add doesn't completely initialize pci_controller
      ab0a97cf
  2. 24 Feb, 2024 7 commits
    • Linus Torvalds's avatar
      Merge tag 'iommu-fixes-v6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu · 91403d50
      Linus Torvalds authored
      Pull iommu fixes from Joerg Roedel:
      
       - Intel VT-d fixes for nested domain handling:
      
            - Cache invalidation for changes in a parent domain
      
            - Dirty tracking setting for parent and nested domains
      
            - Fix a constant-out-of-range warning
      
       - ARM SMMU fixes:
      
            - Fix CD allocation from atomic context when using SVA with SMMUv3
      
            - Revert the conversion of SMMUv2 to domain_alloc_paging(), as it
              breaks the boot for Qualcomm MSM8996 devices
      
       - Restore SVA handle sharing in core code as it turned out there are
         still drivers relying on it
      
      * tag 'iommu-fixes-v6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
        iommu/sva: Restore SVA handle sharing
        iommu/arm-smmu-v3: Do not use GFP_KERNEL under as spinlock
        iommu/vt-d: Fix constant-out-of-range warning
        iommu/vt-d: Set SSADE when attaching to a parent with dirty tracking
        iommu/vt-d: Add missing dirty tracking set for parent domain
        iommu/vt-d: Wrap the dirty tracking loop to be a helper
        iommu/vt-d: Remove domain parameter for intel_pasid_setup_dirty_tracking()
        iommu/vt-d: Add missing device iotlb flush for parent domain
        iommu/vt-d: Update iotlb in nested domain attach
        iommu/vt-d: Add missing iotlb flush for parent domain
        iommu/vt-d: Add __iommu_flush_iotlb_psi()
        iommu/vt-d: Track nested domains in parent
        Revert "iommu/arm-smmu: Convert to domain_alloc_paging()"
      91403d50
    • Linus Torvalds's avatar
      Merge tag 'cxl-fixes-6.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl · ac389bc0
      Linus Torvalds authored
      Pull cxl fixes from Dan Williams:
       "A collection of significant fixes for the CXL subsystem.
      
        The largest change in this set, that bordered on "new development", is
        the fix for the fact that the location of the new qos_class attribute
        did not match the Documentation. The fix ends up deleting more code
        than it added, and it has a new unit test to backstop basic errors in
        this interface going forward. So the "red-diff" and unit test saved
        the "rip it out and try again" response.
      
        In contrast, the new notification path for firmware reported CXL
        errors (CXL CPER notifications) has a locking context bug that can not
        be fixed with a red-diff. Given where the release cycle stands, it is
        not comfortable to squeeze in that fix in these waning days. So, that
        receives the "back it out and try again later" treatment.
      
        There is a regression fix in the code that establishes memory NUMA
        nodes for platform CXL regions. That has an ack from x86 folks. There
        are a couple more fixups for Linux to understand (reassemble) CXL
        regions instantiated by platform firmware. The policy around platforms
        that do not match host-physical-address with system-physical-address
        (i.e. systems that have an address translation mechanism between the
        address range reported in the ACPI CEDT.CFMWS and endpoint decoders)
        has been softened to abort driver load rather than teardown the memory
        range (can cause system hangs). Lastly, there is a robustness /
        regression fix for cases where the driver would previously continue in
        the face of error, and a fixup for PCI error notification handling.
      
        Summary:
      
         - Fix NUMA initialization from ACPI CEDT.CFMWS
      
         - Fix region assembly failures due to async init order
      
         - Fix / simplify export of qos_class information
      
         - Fix cxl_acpi initialization vs single-window-init failures
      
         - Fix handling of repeated 'pci_channel_io_frozen' notifications
      
         - Workaround platforms that violate host-physical-address ==
           system-physical address assumptions
      
         - Defer CXL CPER notification handling to v6.9"
      
      * tag 'cxl-fixes-6.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
        cxl/acpi: Fix load failures due to single window creation failure
        acpi/ghes: Remove CXL CPER notifications
        cxl/pci: Fix disabling memory if DVSEC CXL Range does not match a CFMWS window
        cxl/test: Add support for qos_class checking
        cxl: Fix sysfs export of qos_class for memdev
        cxl: Remove unnecessary type cast in cxl_qos_class_verify()
        cxl: Change 'struct cxl_memdev_state' *_perf_list to single 'struct cxl_dpa_perf'
        cxl/region: Allow out of order assembly of autodiscovered regions
        cxl/region: Handle endpoint decoders in cxl_region_find_decoder()
        x86/numa: Fix the sort compare func used in numa_fill_memblks()
        x86/numa: Fix the address overlap check in numa_fill_memblks()
        cxl/pci: Skip to handle RAS errors if CXL.mem device is detached
      ac389bc0
    • Linus Torvalds's avatar
      Merge tag 'for-6.8/dm-fix-3' of... · f2e367d6
      Linus Torvalds authored
      Merge tag 'for-6.8/dm-fix-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
      
      Pull device mapper fix from Mike Snitzer:
      
       - Fix DM integrity and verity targets to not use excessive stack when
         they recheck in the error path.
      
      * tag 'for-6.8/dm-fix-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
        dm-integrity, dm-verity: reduce stack usage for recheck
      f2e367d6
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 6d20acbf
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "Six fixes: the four driver ones are pretty trivial.
      
        The larger two core changes are to try to fix various USB attached
        devices which have somewhat eccentric ways of handling the VPD and
        other mode pages which necessitate multiple revalidates (that were
        removed in the interests of efficiency) and updating the heuristic for
        supported VPD pages"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: jazz_esp: Only build if SCSI core is builtin
        scsi: smartpqi: Fix disable_managed_interrupts
        scsi: ufs: Uninitialized variable in ufshcd_devfreq_target()
        scsi: target: pscsi: Fix bio_put() for error case
        scsi: core: Consult supported VPD page list prior to fetching page
        scsi: sd: usb_storage: uas: Access media prior to querying device properties
      6d20acbf
    • Linus Torvalds's avatar
      Merge tag 'i2c-for-6.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · fef85269
      Linus Torvalds authored
      Pull i2c fix from Wolfram Sang:
       "A bugfix for host drivers"
      
      * tag 'i2c-for-6.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
        i2c: imx: when being a target, mark the last read as processed
      fef85269
    • Linus Torvalds's avatar
      Merge tag 'loongarch-fixes-6.8-3' of... · c6a597fc
      Linus Torvalds authored
      Merge tag 'loongarch-fixes-6.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
      
      Pull LoongArch fixes from Huacai Chen:
       "Fix two cpu-hotplug issues, fix the init sequence about FDT system,
        fix the coding style of dts, and fix the wrong CPUCFG ID handling of
        KVM"
      
      * tag 'loongarch-fixes-6.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
        LoongArch: KVM: Streamline kvm_check_cpucfg() and improve comments
        LoongArch: KVM: Rename _kvm_get_cpucfg() to _kvm_get_cpucfg_mask()
        LoongArch: KVM: Fix input validation of _kvm_get_cpucfg() & kvm_check_cpucfg()
        LoongArch: dts: Minor whitespace cleanup
        LoongArch: Call early_init_fdt_scan_reserved_mem() earlier
        LoongArch: Update cpu_sibling_map when disabling nonboot CPUs
        LoongArch: Disable IRQ before init_fn() for nonboot CPUs
      c6a597fc
    • Arnd Bergmann's avatar
      dm-integrity, dm-verity: reduce stack usage for recheck · 66ad2fbc
      Arnd Bergmann authored
      The newly added integrity_recheck() function has another larger stack
      allocation, just like its caller integrity_metadata(). When it gets
      inlined, the combination of the two exceeds the warning limit for 32-bit
      architectures and possibly risks an overflow when this is called from
      a deep call chain through a file system:
      
      drivers/md/dm-integrity.c:1767:13: error: stack frame size (1048) exceeds limit (1024) in 'integrity_metadata' [-Werror,-Wframe-larger-than]
       1767 | static void integrity_metadata(struct work_struct *w)
      
      Since the caller at this point is done using its checksum buffer,
      just reuse the same buffer in the new function to avoid the double
      allocation.
      
      [Mikulas: add "noinline" to integrity_recheck and verity_recheck.
      These functions are only called on error, so they shouldn't bloat the
      stack frame or code size of the caller.]
      
      Fixes: c88f5e55 ("dm-integrity: recheck the integrity tag after a failure")
      Fixes: 9177f3c0 ("dm-verity: recheck the hash after a failure")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@kernel.org>
      66ad2fbc
  3. 23 Feb, 2024 17 commits