1. 19 Oct, 2023 7 commits
    • Omar Sandoval's avatar
      xfs: don't try redundant allocations in xfs_rtallocate_extent_near() · 85fa2c77
      Omar Sandoval authored
      xfs_rtallocate_extent_near() tries to find a free extent as close to a
      target bitmap block given by bbno as possible, which may be before or
      after bbno. Searching backwards has a complication: the realtime summary
      accounts for free space _starting_ in a bitmap block, but not straddling
      or ending in a bitmap block. So, when the negative search finds a free
      extent in the realtime summary, in order to end up closer to the target,
      it looks for the end of the free extent. For example, if bbno - 2 has a
      free extent, then it will check bbno - 1, then bbno - 2. But then if
      bbno - 3 has a free extent, it will check bbno - 1 again, then bbno - 2
      again, and then bbno - 3. This results in a quadratic loop, which is
      completely pointless since the repeated checks won't find anything new.
      
      Fix it by remembering where we last checked up to and continue from
      there. This also obviates the need for a check of the realtime summary.
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      85fa2c77
    • Omar Sandoval's avatar
      xfs: limit maxlen based on available space in xfs_rtallocate_extent_near() · ec5857bf
      Omar Sandoval authored
      xfs_rtallocate_extent_near() calls xfs_rtallocate_extent_block() with
      the minlen and maxlen that were passed to it.
      xfs_rtallocate_extent_block() then scans the bitmap block looking for a
      free range of size maxlen. If there is none, it has to scan the whole
      bitmap block before returning the largest range of at least size minlen.
      For a fragmented realtime device and a large allocation request, it's
      almost certain that this will have to search the whole bitmap block,
      leading to high CPU usage.
      
      However, the realtime summary tells us the maximum size available in the
      bitmap block. We can limit the search in xfs_rtallocate_extent_block()
      to that size and often stop before scanning the whole bitmap block.
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      ec5857bf
    • Omar Sandoval's avatar
      xfs: return maximum free size from xfs_rtany_summary() · 1b5d6396
      Omar Sandoval authored
      Instead of only returning whether there is any free space, return the
      maximum size, which is fast thanks to the previous commit. This will be
      used by two upcoming optimizations.
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      1b5d6396
    • Omar Sandoval's avatar
      xfs: invert the realtime summary cache · e23aaf45
      Omar Sandoval authored
      In commit 355e3532 ("xfs: cache minimum realtime summary level"), I
      added a cache of the minimum level of the realtime summary that has any
      free extents. However, it turns out that the _maximum_ level is more
      useful for upcoming optimizations, and basically equivalent for the
      existing usage. So, let's change the meaning of the cache to be the
      maximum level + 1, or 0 if there are no free extents.
      
      For example, if the cache contains:
      
      {0, 4}
      
      then there are no free extents starting in realtime bitmap block 0, and
      there are no free extents larger than or equal to 2^4 blocks starting in
      realtime bitmap block 1. The cache is a loose upper bound, so there may
      or may not be free extents smaller than 2^4 blocks in realtime bitmap
      block 1.
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      e23aaf45
    • Darrick J. Wong's avatar
      xfs: simplify rt bitmap/summary block accessor functions · e2cf427c
      Darrick J. Wong authored
      Simplify the calling convention of these functions since the
      xfs_rtalloc_args structure contains the parameters we need.
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      e2cf427c
    • Darrick J. Wong's avatar
      xfs: simplify xfs_rtbuf_get calling conventions · 5b1d0ae9
      Darrick J. Wong authored
      Now that xfs_rtalloc_args holds references to the last-read bitmap and
      summary blocks, we don't need to pass the buffer pointer out of
      xfs_rtbuf_get.
      
      Callers no longer have to xfs_trans_brelse on their own, though they are
      required to call xfs_rtbuf_cache_relse before the xfs_rtalloc_args goes
      out of scope.
      
      While we're at it, create some trivial helpers so that we don't have to
      remember if "0" means "bitmap" and "1" means "summary".
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      5b1d0ae9
    • Omar Sandoval's avatar
      xfs: cache last bitmap block in realtime allocator · e94b53ff
      Omar Sandoval authored
      Profiling a workload on a highly fragmented realtime device showed a ton
      of CPU cycles being spent in xfs_trans_read_buf() called by
      xfs_rtbuf_get(). Further tracing showed that much of that was repeated
      calls to xfs_rtbuf_get() for the same block of the realtime bitmap.
      These come from xfs_rtallocate_extent_block(): as it walks through
      ranges of free bits in the bitmap, each call to xfs_rtcheck_range() and
      xfs_rtfind_{forw,back}() gets the same bitmap block. If the bitmap block
      is very fragmented, then this is _a lot_ of buffer lookups.
      
      The realtime allocator already passes around a cache of the last used
      realtime summary block to avoid repeated reads (the parameters rbpp and
      rsb). We can do the same for the realtime bitmap.
      
      This replaces rbpp and rsb with a struct xfs_rtbuf_cache, which caches
      the most recently used block for both the realtime bitmap and summary.
      xfs_rtbuf_get() now handles the caching instead of the callers, which
      requires plumbing xfs_rtbuf_cache to more functions but also makes sure
      we don't miss anything.
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      e94b53ff
  2. 18 Oct, 2023 9 commits
  3. 17 Oct, 2023 20 commits
  4. 15 Oct, 2023 4 commits
    • Linus Torvalds's avatar
      Linux 6.6-rc6 · 58720809
      Linus Torvalds authored
      58720809
    • Linus Torvalds's avatar
      Revert "x86/smp: Put CPUs into INIT on shutdown if possible" · fbe1bf1e
      Linus Torvalds authored
      This reverts commit 45e34c8a, and the
      two subsequent fixes to it:
      
        3f874c9b ("x86/smp: Don't send INIT to non-present and non-booted CPUs")
        b1472a60 ("x86/smp: Don't send INIT to boot CPU")
      
      because it seems to result in hung machines at shutdown.  Particularly
      some Dell machines, but Thomas says
      
       "The rest seems to be Lenovo and Sony with Alderlake/Raptorlake CPUs -
        at least that's what I could figure out from the various bug reports.
      
        I don't know which CPUs the DELL machines have, so I can't say it's a
        pattern.
      
        I agree with the revert for now"
      
      Ashok Raj chimes in:
      
       "There was a report (probably this same one), and it turns out it was a
        bug in the BIOS SMI handler.
      
        The client BIOS's were waiting for the lowest APICID to be the SMI
        rendevous master. If this is MeteorLake, the BSP wasn't the one with
        the lowest APIC and it triped here.
      
        The BIOS change is also being pushed to others for assimilation :)
      
        Server BIOS's had this correctly for a while now"
      
      and it does look likely to be some bad interaction between SMI and the
      non-BSP cores having put into INIT (and thus unresponsive until reset).
      
      Link: https://bbs.archlinux.org/viewtopic.php?pid=2124429
      Link: https://www.reddit.com/r/openSUSE/comments/16qq99b/tumbleweed_shutdown_did_not_finish_completely/
      Link: https://forum.artixlinux.org/index.php/topic,5997.0.html
      Link: https://bugzilla.redhat.com/show_bug.cgi?id=2241279Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Ashok Raj <ashok.raj@intel.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fbe1bf1e
    • Xuan Zhuo's avatar
      virtio_net: fix the missing of the dma cpu sync · 5720c43d
      Xuan Zhuo authored
      Commit 295525e2 ("virtio_net: merge dma operations when filling
      mergeable buffers") unmaps the buffer with DMA_ATTR_SKIP_CPU_SYNC when
      the dma->ref is zero. We do that with DMA_ATTR_SKIP_CPU_SYNC, because we
      do not want to do the sync for the entire page_frag. But that misses the
      sync for the current area.
      
      This patch does cpu sync regardless of whether the ref is zero or not.
      
      Fixes: 295525e2 ("virtio_net: merge dma operations when filling mergeable buffers")
      Reported-by: default avatarMichael Roth <michael.roth@amd.com>
      Closes: http://lore.kernel.org/all/20230926130451.axgodaa6tvwqs3ut@amd.comSigned-off-by: default avatarXuan Zhuo <xuanzhuo@linux.alibaba.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5720c43d
    • Linus Torvalds's avatar
      Merge tag 'usb-6.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · 11d3f726
      Linus Torvalds authored
      Pull USB / Thunderbolt fixes from Greg KH:
       "Here are some USB and Thunderbolt driver fixes for 6.6-rc6 to resolve
        a number of small reported issues. Included in here are:
      
         - thunderbolt driver fixes
      
         - xhci driver fixes
      
         - cdns3 driver fixes
      
         - musb driver fixes
      
         - a number of typec driver fixes
      
         - a few other small driver fixes
      
        All of these have been in linux-next with no reported issues"
      
      * tag 'usb-6.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (22 commits)
        usb: typec: ucsi: Use GET_CAPABILITY attributes data to set power supply scope
        usb: typec: ucsi: Fix missing link removal
        usb: typec: altmodes/displayport: Signal hpd low when exiting mode
        xhci: Preserve RsvdP bits in ERSTBA register correctly
        xhci: Clear EHB bit only at end of interrupt handler
        xhci: track port suspend state correctly in unsuccessful resume cases
        usb: xhci: xhci-ring: Use sysdev for mapping bounce buffer
        usb: typec: ucsi: Clear EVENT_PENDING bit if ucsi_send_command fails
        usb: misc: onboard_hub: add support for Microchip USB2412 USB 2.0 hub
        usb: gadget: udc-xilinx: replace memcpy with memcpy_toio
        usb: cdns3: Modify the return value of cdns_set_active () to void when CONFIG_PM_SLEEP is disabled
        usb: dwc3: Soft reset phy on probe for host
        usb: hub: Guard against accesses to uninitialized BOS descriptors
        usb: typec: qcom: Update the logic of regulator enable and disable
        usb: gadget: ncm: Handle decoding of multiple NTB's in unwrap call
        usb: musb: Get the musb_qh poniter after musb_giveback
        usb: musb: Modify the "HWVers" register address
        usb: cdnsp: Fixes issue with dequeuing not queued requests
        thunderbolt: Restart XDomain discovery handshake after failure
        thunderbolt: Correct TMU mode initialization from hardware
        ...
      11d3f726