1. 27 May, 2021 2 commits
    • Dave Chinner's avatar
      xfs: bunmapi has unnecessary AG lock ordering issues · 0fe0bbe0
      Dave Chinner authored
      large directory block size operations are assert failing because
      xfs_bunmapi() is not completely removing fragmented directory blocks
      like so:
      
      XFS: Assertion failed: done, file: fs/xfs/libxfs/xfs_dir2.c, line: 677
      ....
      Call Trace:
       xfs_dir2_shrink_inode+0x1a8/0x210
       xfs_dir2_block_to_sf+0x2ae/0x410
       xfs_dir2_block_removename+0x21a/0x280
       xfs_dir_removename+0x195/0x1d0
       xfs_rename+0xb79/0xc50
       ? avc_has_perm+0x8d/0x1a0
       ? avc_has_perm_noaudit+0x9a/0x120
       xfs_vn_rename+0xdb/0x150
       vfs_rename+0x719/0xb50
       ? __lookup_hash+0x6a/0xa0
       do_renameat2+0x413/0x5e0
       __x64_sys_rename+0x45/0x50
       do_syscall_64+0x3a/0x70
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      We are aborting the bunmapi() pass because of this specific chunk of
      code:
      
                      /*
                       * Make sure we don't touch multiple AGF headers out of order
                       * in a single transaction, as that could cause AB-BA deadlocks.
                       */
                      if (!wasdel && !isrt) {
                              agno = XFS_FSB_TO_AGNO(mp, del.br_startblock);
                              if (prev_agno != NULLAGNUMBER && prev_agno > agno)
                                      break;
                              prev_agno = agno;
                      }
      
      This is designed to prevent deadlocks in AGF locking when freeing
      multiple extents by ensuring that we only ever lock in increasing
      AG number order. Unfortunately, this also violates the "bunmapi will
      always succeed" semantic that some high level callers depend on,
      such as xfs_dir2_shrink_inode(), xfs_da_shrink_inode() and
      xfs_inactive_symlink_rmt().
      
      This AG lock ordering was introduced back in 2017 to fix deadlocks
      triggered by generic/299 as reported here:
      
      https://lore.kernel.org/linux-xfs/800468eb-3ded-9166-20a4-047de8018582@gmail.com/
      
      This codebase is old enough that it was before we were defering all
      AG based extent freeing from within xfs_bunmapi(). THat is, we never
      actually lock AGs in xfs_bunmapi() any more - every non-rt based
      extent free is added to the defer ops list, as is all BMBT block
      freeing. And RT extents are not RT based, so there's no lock
      ordering issues associated with them.
      
      Hence this AGF lock ordering code is both broken and dead. Let's
      just remove it so that the large directory block code works reliably
      again.
      
      Tested against xfs/538 and generic/299 which is the original test
      that exposed the deadlocks that this code fixed.
      
      Fixes: 5b094d6d ("xfs: fix multi-AG deadlock in xfs_bunmapi")
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      0fe0bbe0
    • Dave Chinner's avatar
      xfs: btree format inode forks can have zero extents · 991c2c59
      Dave Chinner authored
      xfs/538 is assert failing with this trace when testing with
      directory block sizes of 64kB:
      
      XFS: Assertion failed: !xfs_need_iread_extents(ifp), file: fs/xfs/libxfs/xfs_bmap.c, line: 608
      ....
      Call Trace:
       xfs_bmap_btree_to_extents+0x2a9/0x470
       ? kmem_cache_alloc+0xe7/0x220
       __xfs_bunmapi+0x4ca/0xdf0
       xfs_bunmapi+0x1a/0x30
       xfs_dir2_shrink_inode+0x71/0x210
       xfs_dir2_block_to_sf+0x2ae/0x410
       xfs_dir2_block_removename+0x21a/0x280
       xfs_dir_removename+0x195/0x1d0
       xfs_remove+0x244/0x460
       xfs_vn_unlink+0x53/0xa0
       ? selinux_inode_unlink+0x13/0x20
       vfs_unlink+0x117/0x220
       do_unlinkat+0x1a2/0x2d0
       __x64_sys_unlink+0x42/0x60
       do_syscall_64+0x3a/0x70
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      This is a check to ensure that the extents have been read into
      memory before we are doing a ifork btree manipulation. This assert
      is bogus in the above case.
      
      We have a fragmented directory block that has more extents in it
      than can fit in extent format, so the inode data fork is in btree
      format. xfs_dir2_shrink_inode() asks to remove all remaining 16
      filesystem blocks from the inode so it can convert to short form,
      and __xfs_bunmapi() removes all the extents. We now have a data fork
      in btree format but have zero extents in the fork. This incorrectly
      trips the xfs_need_iread_extents() assert because it assumes that an
      empty extent btree means the extent tree has not been read into
      memory yet. This is clearly not the case with xfs_bunmapi(), as it
      has an explicit call to xfs_iread_extents() in it to pull the
      extents into memory before it starts unmapping.
      
      Also, the assert directly after this bogus one is:
      
      	ASSERT(ifp->if_format == XFS_DINODE_FMT_BTREE);
      
      Which covers the context in which it is legal to call
      xfs_bmap_btree_to_extents just fine. Hence we should just remove the
      bogus assert as it is clearly wrong and causes a regression.
      
      The returns the test behaviour to the pre-existing assert failure in
      xfs_dir2_shrink_inode() that indicates xfs_bunmapi() has failed to
      remove all the extents in the range it was asked to unmap.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      991c2c59
  2. 26 May, 2021 1 commit
  3. 25 May, 2021 3 commits
    • Darrick J. Wong's avatar
      xfs: validate extsz hints against rt extent size when rtinherit is set · 603f000b
      Darrick J. Wong authored
      The RTINHERIT bit can be set on a directory so that newly created
      regular files will have the REALTIME bit set to store their data on the
      realtime volume.  If an extent size hint (and EXTSZINHERIT) are set on
      the directory, the hint will also be copied into the new file.
      
      As pointed out in previous patches, for realtime files we require the
      extent size hint be an integer multiple of the realtime extent, but we
      don't perform the same validation on a directory with both RTINHERIT and
      EXTSZINHERIT set, even though the only use-case of that combination is
      to propagate extent size hints into new realtime files.  This leads to
      inode corruption errors when the bad values are propagated.
      
      Because there may be existing filesystems with such a configuration, we
      cannot simply amend the inode verifier to trip on these directories and
      call it a day because that will cause previously "working" filesystems
      to start throwing errors abruptly.  Note that it's valid to have
      directories with rtinherit set even if there is no realtime volume, in
      which case the problem does not manifest because rtinherit is ignored if
      there's no realtime device; and it's possible that someone set the flag,
      crashed, repaired the filesystem (which clears the hint on the realtime
      file) and continued.
      
      Therefore, mitigate this issue in several ways: First, if we try to
      write out an inode with both rtinherit/extszinherit set and an unaligned
      extent size hint, turn off the hint to correct the error.  Second, if
      someone tries to misconfigure a directory via the fssetxattr ioctl, fail
      the ioctl.  Third, reverify both extent size hint values when we
      propagate heritable inode attributes from parent to child, to prevent
      misconfigurations from spreading.
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarCarlos Maiolino <cmaiolino@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      603f000b
    • Darrick J. Wong's avatar
      xfs: standardize extent size hint validation · 6b69e485
      Darrick J. Wong authored
      While chasing a bug involving invalid extent size hints being propagated
      into newly created realtime files, I noticed that the xfs_ioctl_setattr
      checks for the extent size hints weren't the same as the ones now
      encoded in libxfs and used for validation in repair and mkfs.
      
      Because the checks in libxfs are more stringent than the ones in the
      ioctl, it's possible for a live system to set inode flags that
      immediately result in corruption warnings.  Specifically, it's possible
      to set an extent size hint on an rtinherit directory without checking if
      the hint is aligned to the realtime extent size, which makes no sense
      since that combination is used only to seed new realtime files.
      
      Replace the open-coded and inadequate checks with the libxfs verifier
      versions and update the code comments a bit.
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      6b69e485
    • Darrick J. Wong's avatar
      xfs: check free AG space when making per-AG reservations · 0f934251
      Darrick J. Wong authored
      The new online shrink code exposed a gap in the per-AG reservation
      code, which is that we only return ENOSPC to callers if the entire fs
      doesn't have enough free blocks.  Except for debugging mode, the
      reservation init code doesn't ever check that there's enough free space
      in that AG to cover the reservation.
      
      Not having enough space is not considered an immediate fatal error that
      requires filesystem offlining because (a) it's shouldn't be possible to
      wind up in that state through normal file operations and (b) even if
      one did, freeing data blocks would recover the situation.
      
      However, online shrink now needs to know if shrinking would not leave
      enough space so that it can abort the shrink operation.  Hence we need
      to promote this assertion into an actual error return.
      
      Observed by running xfs/168 with a 1k block size, though in theory this
      could happen with any configuration.
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarCarlos Maiolino <cmaiolino@redhat.com>
      Reviewed-by: default avatarGao Xiang <hsiangkao@linux.alibaba.com>
      0f934251
  4. 20 May, 2021 3 commits
  5. 17 May, 2021 1 commit
    • Darrick J. Wong's avatar
      xfs: adjust rt allocation minlen when extszhint > rtextsize · 9d5e8492
      Darrick J. Wong authored
      xfs_bmap_rtalloc doesn't handle realtime extent files with extent size
      hints larger than the rt volume's extent size properly, because
      xfs_bmap_extsize_align can adjust the offset/length parameters to try to
      fit the extent size hint.
      
      Under these conditions, minlen has to be large enough so that any
      allocation returned by xfs_rtallocate_extent will be large enough to
      cover at least one of the blocks that the caller asked for.  If the
      allocation is too short, bmapi_write will return no mapping for the
      requested range, which causes ENOSPC errors in other parts of the
      filesystem.
      
      Therefore, adjust minlen upwards to fix this.  This can be found by
      running generic/263 (g/127 or g/522) with a realtime extent size hint
      that's larger than the rt volume extent size.
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarAllison Henderson <allison.henderson@oracle.com>
      9d5e8492
  6. 16 May, 2021 7 commits
    • Linus Torvalds's avatar
      Linux 5.13-rc2 · d07f6ca9
      Linus Torvalds authored
      d07f6ca9
    • Linus Torvalds's avatar
      Merge tag 'driver-core-5.13-rc2' of... · 28183dbf
      Linus Torvalds authored
      Merge tag 'driver-core-5.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
      
      Pull driver core fixes from Greg KH:
       "Here are two driver fixes for driver core changes that happened in
        5.13-rc1.
      
        The clk driver fix resolves a many-reported issue with booting some
        devices, and the USB typec fix resolves the reported problem of USB
        systems on some embedded boards.
      
        Both of these have been in linux-next this week with no reported
        issues"
      
      * tag 'driver-core-5.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
        clk: Skip clk provider registration when np is NULL
        usb: typec: tcpm: Don't block probing of consumers of "connector" nodes
      28183dbf
    • Linus Torvalds's avatar
      Merge tag 'staging-5.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging · 6942d81a
      Linus Torvalds authored
      Pull staging and IIO driver fixes from Greg KH:
       "Here are some small IIO driver fixes and one Staging driver fix for
        5.13-rc2.
      
        Nothing major, just some resolutions for reported problems:
      
         - gcc-11 bogus warning fix for rtl8723bs
      
         - iio driver tiny fixes
      
        All of these have been in linux-next for many days with no reported
        issues"
      
      * tag 'staging-5.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
        iio: tsl2583: Fix division by a zero lux_val
        iio: core: return ENODEV if ioctl is unknown
        iio: core: fix ioctl handlers removal
        iio: gyro: mpu3050: Fix reported temperature value
        iio: hid-sensors: select IIO_TRIGGERED_BUFFER under HID_SENSOR_IIO_TRIGGER
        iio: proximity: pulsedlight: Fix rumtime PM imbalance on error
        iio: light: gp2ap002: Fix rumtime PM imbalance on error
        staging: rtl8723bs: avoid bogus gcc warning
      6942d81a
    • Linus Torvalds's avatar
      Merge tag 'usb-5.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · 4a668429
      Linus Torvalds authored
      Pull USB fixes from Greg KH:
       "Here are some small USB fixes for 5.13-rc2. They consist of a number
        of resolutions for reported issues:
      
         - typec fixes for found problems
      
         - xhci fixes and quirk additions
      
         - dwc3 driver fixes
      
         - minor fixes found by Coverity
      
         - cdc-wdm fixes for reported problems
      
        All of these have been in linux-next for a few days with no reported
        issues"
      
      * tag 'usb-5.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (28 commits)
        usb: core: hub: fix race condition about TRSMRCY of resume
        usb: typec: tcpm: Fix SINK_DISCOVERY current limit for Rp-default
        xhci: Add reset resume quirk for AMD xhci controller.
        usb: xhci: Increase timeout for HC halt
        xhci: Do not use GFP_KERNEL in (potentially) atomic context
        xhci: Fix giving back cancelled URBs even if halted endpoint can't reset
        xhci-pci: Allow host runtime PM as default for Intel Alder Lake xHCI
        usb: musb: Fix an error message
        usb: typec: tcpm: Fix wrong handling for Not_Supported in VDM AMS
        usb: typec: tcpm: Send DISCOVER_IDENTITY from dedicated work
        usb: typec: ucsi: Retrieve all the PDOs instead of just the first 4
        usb: fotg210-hcd: Fix an error message
        docs: usb: function: Modify path name
        usb: dwc3: omap: improve extcon initialization
        usb: typec: ucsi: Put fwnode in any case during ->probe()
        usb: typec: tcpm: Fix wrong handling in GET_SINK_CAP
        usb: dwc2: Remove obsolete MODULE_ constants from platform.c
        usb: dwc3: imx8mp: fix error return code in dwc3_imx8mp_probe()
        usb: dwc3: imx8mp: detect dwc3 core node via compatible string
        usb: dwc3: gadget: Return success always for kick transfer in ep queue
        ...
      4a668429
    • Linus Torvalds's avatar
      Merge tag 'timers-urgent-2021-05-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 8ce36481
      Linus Torvalds authored
      Pull timer fixes from Thomas Gleixner:
       "Two fixes for timers:
      
         - Use the ALARM feature check in the alarmtimer core code insted of
           the old method of checking for the set_alarm() callback.
      
           Drivers can have that callback set but the feature bit cleared. If
           such a RTC device is selected then alarms wont work.
      
         - Use a proper define to let the preprocessor check whether Hyper-V
           VDSO clocksource should be active.
      
           The code used a constant in an enum with #ifdef, which evaluates to
           always false and disabled the clocksource for VDSO"
      
      * tag 'timers-urgent-2021-05-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        clocksource/drivers/hyper-v: Re-enable VDSO_CLOCKMODE_HVCLOCK on X86
        alarmtimer: Check RTC features instead of ops
      8ce36481
    • Linus Torvalds's avatar
      Merge tag 'for-linus-5.13b-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · f44e58bb
      Linus Torvalds authored
      Pull xen fixes from Juergen Gross:
      
       - two patches for error path fixes
      
       - a small series for fixing a regression with swiotlb with Xen on Arm
      
      * tag 'for-linus-5.13b-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
        xen/swiotlb: check if the swiotlb has already been initialized
        arm64: do not set SWIOTLB_NO_FORCE when swiotlb is required
        xen/arm: move xen_swiotlb_detect to arm/swiotlb-xen.h
        xen/unpopulated-alloc: fix error return code in fill_list()
        xen/gntdev: fix gntdev_mmap() error exit path
      f44e58bb
    • Linus Torvalds's avatar
      Merge tag 'x86_urgent_for_v5.13_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · ccb013c2
      Linus Torvalds authored
      Pull x86 fixes from Borislav Petkov:
       "The three SEV commits are not really urgent material. But we figured
        since getting them in now will avoid a huge amount of conflicts
        between future SEV changes touching tip, the kvm and probably other
        trees, sending them to you now would be best.
      
        The idea is that the tip, kvm etc branches for 5.14 will all base
        ontop of -rc2 and thus everything will be peachy. What is more, those
        changes are purely mechanical and defines movement so they should be
        fine to go now (famous last words).
      
        Summary:
      
         - Enable -Wundef for the compressed kernel build stage
      
         - Reorganize SEV code to streamline and simplify future development"
      
      * tag 'x86_urgent_for_v5.13_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/boot/compressed: Enable -Wundef
        x86/msr: Rename MSR_K8_SYSCFG to MSR_AMD64_SYSCFG
        x86/sev: Move GHCB MSR protocol and NAE definitions in a common header
        x86/sev-es: Rename sev-es.{ch} to sev.{ch}
      ccb013c2
  7. 15 May, 2021 23 commits