1. 23 Sep, 2022 1 commit
    • Feng Tang's avatar
      mm/slub: enable debugging memory wasting of kmalloc · 6edf2576
      Feng Tang authored
      kmalloc's API family is critical for mm, with one nature that it will
      round up the request size to a fixed one (mostly power of 2). Say
      when user requests memory for '2^n + 1' bytes, actually 2^(n+1) bytes
      could be allocated, so in worst case, there is around 50% memory
      space waste.
      
      The wastage is not a big issue for requests that get allocated/freed
      quickly, but may cause problems with objects that have longer life
      time.
      
      We've met a kernel boot OOM panic (v5.10), and from the dumped slab
      info:
      
          [   26.062145] kmalloc-2k            814056KB     814056KB
      
      From debug we found there are huge number of 'struct iova_magazine',
      whose size is 1032 bytes (1024 + 8), so each allocation will waste
      1016 bytes. Though the issue was solved by giving the right (bigger)
      size of RAM, it is still nice to optimize the size (either use a
      kmalloc friendly size or create a dedicated slab for it).
      
      And from lkml archive, there was another crash kernel OOM case [1]
      back in 2019, which seems to be related with the similar slab waste
      situation, as the log is similar:
      
          [    4.332648] iommu: Adding device 0000:20:02.0 to group 16
          [    4.338946] swapper/0 invoked oom-killer: gfp_mask=0x6040c0(GFP_KERNEL|__GFP_COMP), nodemask=(null), order=0, oom_score_adj=0
          ...
          [    4.857565] kmalloc-2048           59164KB      59164KB
      
      The crash kernel only has 256M memory, and 59M is pretty big here.
      (Note: the related code has been changed and optimised in recent
      kernel [2], these logs are just picked to demo the problem, also
      a patch changing its size to 1024 bytes has been merged)
      
      So add an way to track each kmalloc's memory waste info, and
      leverage the existing SLUB debug framework (specifically
      SLUB_STORE_USER) to show its call stack of original allocation,
      so that user can evaluate the waste situation, identify some hot
      spots and optimize accordingly, for a better utilization of memory.
      
      The waste info is integrated into existing interface:
      '/sys/kernel/debug/slab/kmalloc-xx/alloc_traces', one example of
      'kmalloc-4k' after boot is:
      
       126 ixgbe_alloc_q_vector+0xbe/0x830 [ixgbe] waste=233856/1856 age=280763/281414/282065 pid=1330 cpus=32 nodes=1
           __kmem_cache_alloc_node+0x11f/0x4e0
           __kmalloc_node+0x4e/0x140
           ixgbe_alloc_q_vector+0xbe/0x830 [ixgbe]
           ixgbe_init_interrupt_scheme+0x2ae/0xc90 [ixgbe]
           ixgbe_probe+0x165f/0x1d20 [ixgbe]
           local_pci_probe+0x78/0xc0
           work_for_cpu_fn+0x26/0x40
           ...
      
      which means in 'kmalloc-4k' slab, there are 126 requests of
      2240 bytes which got a 4KB space (wasting 1856 bytes each
      and 233856 bytes in total), from ixgbe_alloc_q_vector().
      
      And when system starts some real workload like multiple docker
      instances, there could are more severe waste.
      
      [1]. https://lkml.org/lkml/2019/8/12/266
      [2]. https://lore.kernel.org/lkml/2920df89-9975-5785-f79b-257d3052dfaf@huawei.com/
      
      [Thanks Hyeonggon for pointing out several bugs about sorting/format]
      [Thanks Vlastimil for suggesting way to reduce memory usage of
       orig_size and keep it only for kmalloc objects]
      Signed-off-by: default avatarFeng Tang <feng.tang@intel.com>
      Reviewed-by: default avatarHyeonggon Yoo <42.hyeyoo@gmail.com>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      6edf2576
  2. 16 Sep, 2022 5 commits
    • Thomas Gleixner's avatar
      slub: Make PREEMPT_RT support less convoluted · 1f04b07d
      Thomas Gleixner authored
      The slub code already has a few helpers depending on PREEMPT_RT. Add a few
      more and get rid of the CONFIG_PREEMPT_RT conditionals all over the place.
      
      No functional change.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: linux-mm@kvack.org
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: default avatarHyeonggon Yoo <42.hyeyoo@gmail.com>
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      1f04b07d
    • Vlastimil Babka's avatar
      mm/slub: simplify __cmpxchg_double_slab() and slab_[un]lock() · 5875e598
      Vlastimil Babka authored
      The PREEMPT_RT specific disabling of irqs in __cmpxchg_double_slab()
      (through slab_[un]lock()) is unnecessary as bit_spin_lock() disables
      preemption and that's sufficient on PREEMPT_RT where no allocation/free
      operation is performed in hardirq context and so can't interrupt the
      current operation.
      
      That means we no longer need the slab_[un]lock() wrappers, so delete
      them and rename the current __slab_[un]lock() to slab_[un]lock().
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Reviewed-by: default avatarHyeonggon Yoo <42.hyeyoo@gmail.com>
      Reviewed-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      5875e598
    • Vlastimil Babka's avatar
      mm/slub: convert object_map_lock to non-raw spinlock · 4ef3f5a3
      Vlastimil Babka authored
      The only remaining user of object_map_lock is list_slab_objects().
      Obtaining the lock there used to happen under slab_lock() which implied
      disabling irqs on PREEMPT_RT, thus it's a raw_spinlock. With the
      slab_lock() removed, we can convert it to a normal spinlock.
      
      Also remove the get_map()/put_map() wrappers as list_slab_objects()
      became their only remaining user.
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Reviewed-by: default avatarHyeonggon Yoo <42.hyeyoo@gmail.com>
      Reviewed-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      4ef3f5a3
    • Vlastimil Babka's avatar
      mm/slub: remove slab_lock() usage for debug operations · 41bec7c3
      Vlastimil Babka authored
      All alloc and free operations on debug caches are now serialized by
      n->list_lock, so we can remove slab_lock() usage in validate_slab()
      and list_slab_objects() as those also happen under n->list_lock.
      
      Note the usage in list_slab_objects() could happen even on non-debug
      caches, but only during cache shutdown time, so there should not be any
      parallel freeing activity anymore. Except for buggy slab users, but in
      that case the slab_lock() would not help against the common cmpxchg
      based fast paths (in non-debug caches) anyway.
      
      Also adjust documentation comments accordingly.
      Suggested-by: default avatarHyeonggon Yoo <42.hyeyoo@gmail.com>
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: default avatarHyeonggon Yoo <42.hyeyoo@gmail.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      41bec7c3
    • Vlastimil Babka's avatar
      mm/slub: restrict sysfs validation to debug caches and make it safe · c7323a5a
      Vlastimil Babka authored
      Rongwei Wang reports [1] that cache validation triggered by writing to
      /sys/kernel/slab/<cache>/validate is racy against normal cache
      operations (e.g. freeing) in a way that can cause false positive
      inconsistency reports for caches with debugging enabled. The problem is
      that debugging actions that mark object free or active and actual
      freelist operations are not atomic, and the validation can see an
      inconsistent state.
      
      For caches that do or don't have debugging enabled, additional races
      involving n->nr_slabs are possible that result in false reports of wrong
      slab counts.
      
      This patch attempts to solve these issues while not adding overhead to
      normal (especially fastpath) operations for caches that do not have
      debugging enabled. Such overhead would not be justified to make possible
      userspace-triggered validation safe. Instead, disable the validation for
      caches that don't have debugging enabled and make their sysfs validate
      handler return -EINVAL.
      
      For caches that do have debugging enabled, we can instead extend the
      existing approach of not using percpu freelists to force all alloc/free
      operations to the slow paths where debugging flags is checked and acted
      upon. There can adjust the debug-specific paths to increase n->list_lock
      coverage against concurrent validation as necessary.
      
      The processing on free in free_debug_processing() already happens under
      n->list_lock so we can extend it to actually do the freeing as well and
      thus make it atomic against concurrent validation. As observed by
      Hyeonggon Yoo, we do not really need to take slab_lock() anymore here
      because all paths we could race with are protected by n->list_lock under
      the new scheme, so drop its usage here.
      
      The processing on alloc in alloc_debug_processing() currently doesn't
      take any locks, but we have to first allocate the object from a slab on
      the partial list (as debugging caches have no percpu slabs) and thus
      take the n->list_lock anyway. Add a function alloc_single_from_partial()
      that grabs just the allocated object instead of the whole freelist, and
      does the debug processing. The n->list_lock coverage again makes it
      atomic against validation and it is also ultimately more efficient than
      the current grabbing of freelist immediately followed by slab
      deactivation.
      
      To prevent races on n->nr_slabs updates, make sure that for caches with
      debugging enabled, inc_slabs_node() or dec_slabs_node() is called under
      n->list_lock. When allocating a new slab for a debug cache, handle the
      allocation by a new function alloc_single_from_new_slab() instead of the
      current forced deactivation path.
      
      Neither of these changes affect the fast paths at all. The changes in
      slow paths are negligible for non-debug caches.
      
      [1] https://lore.kernel.org/all/20220529081535.69275-1-rongwei.wang@linux.alibaba.com/Reported-by: default avatarRongwei Wang <rongwei.wang@linux.alibaba.com>
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: default avatarHyeonggon Yoo <42.hyeyoo@gmail.com>
      c7323a5a
  3. 25 Aug, 2022 1 commit
  4. 22 Aug, 2022 1 commit
  5. 21 Aug, 2022 17 commits
  6. 20 Aug, 2022 15 commits
    • Linus Torvalds's avatar
      Merge tag 'kbuild-fixes-v6.0' of... · 15b3f48a
      Linus Torvalds authored
      Merge tag 'kbuild-fixes-v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
      
      Pull Kbuild fixes from Masahiro Yamada:
      
       - Fix module versioning broken on some architectures
      
       - Make dummy-tools enable CONFIG_PPC_LONG_DOUBLE_128
      
       - Remove -Wformat-zero-length, which has no warning instance
      
       - Fix the order between drivers and libs in modules.order
      
       - Fix false-positive warnings in clang-analyzer
      
      * tag 'kbuild-fixes-v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
        scripts/clang-tools: Remove DeprecatedOrUnsafeBufferHandling check
        kbuild: fix the modules order between drivers and libs
        scripts/Makefile.extrawarn: Do not disable clang's -Wformat-zero-length
        kbuild: dummy-tools: pretend we understand __LONG_DOUBLE_128__
        modpost: fix module versioning when a symbol lacks valid CRC
      15b3f48a
    • Linus Torvalds's avatar
      Merge tag 'perf-tools-fixes-for-v6.0-2022-08-19' of... · 16b3d851
      Linus Torvalds authored
      Merge tag 'perf-tools-fixes-for-v6.0-2022-08-19' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
      
      Pull perf tools fixes from Arnaldo Carvalho de Melo:
      
       - Fix alignment for cpu map masks in event encoding.
      
       - Support reading PERF_FORMAT_LOST, perf tool counterpart for a feature
         that was added in this merge window.
      
       - Sync perf tools copies of kernel headers: socket, msr-index, fscrypt,
         cpufeatures, i915_drm, kvm, vhost, perf_event.
      
      * tag 'perf-tools-fixes-for-v6.0-2022-08-19' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
        perf tools: Support reading PERF_FORMAT_LOST
        libperf: Add a test case for read formats
        libperf: Handle read format in perf_evsel__read()
        tools headers UAPI: Sync linux/perf_event.h with the kernel sources
        tools headers UAPI: Sync x86's asm/kvm.h with the kernel sources
        tools headers UAPI: Sync KVM's vmx.h header with the kernel sources
        tools include UAPI: Sync linux/vhost.h with the kernel sources
        tools headers kvm s390: Sync headers with the kernel sources
        tools headers UAPI: Sync linux/kvm.h with the kernel sources
        tools headers UAPI: Sync drm/i915_drm.h with the kernel sources
        tools headers cpufeatures: Sync with the kernel sources
        tools headers UAPI: Sync linux/fscrypt.h with the kernel sources
        tools arch x86: Sync the msr-index.h copy with the kernel sources
        perf beauty: Update copy of linux/socket.h with the kernel sources
        perf cpumap: Fix alignment for masks in event encoding
        perf cpumap: Compute mask size in constant time
        perf cpumap: Synthetic events and const/static
        perf cpumap: Const map for max()
      16b3d851
    • Linus Torvalds's avatar
      Merge tag 's390-6.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · cc1807b9
      Linus Torvalds authored
      Pull s390 updates from Alexander Gordeev:
      
       - Fix a KVM crash on z12 and older machines caused by a wrong
         assumption that Query AP Configuration Information is always
         available.
      
       - Lower severity of excessive Hypervisor filesystem error messages
         when booting under KVM.
      
      * tag 's390-6.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
        s390/ap: fix crash on older machines based on QCI info missing
        s390/hypfs: avoid error message under KVM
      cc1807b9
    • Linus Torvalds's avatar
      Merge tag 'powerpc-6.0-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 32dd68f1
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
      
       - Fix atomic sleep warnings at boot due to get_phb_number() taking a
         mutex with a spinlock held on some machines.
      
       - Add missing PMU selftests to .gitignores.
      
      Thanks to Guenter Roeck and Russell Currey.
      
      * tag 'powerpc-6.0-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        selftests/powerpc: Add missing PMU selftests to .gitignores
        powerpc/pci: Fix get_phb_number() locking
      32dd68f1
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma · f31c32ef
      Linus Torvalds authored
      Pull rdma fixes from Jason Gunthorpe:
       "A few minor fixes:
      
         - Fix buffer management in SRP to correct a regression with the login
           authentication feature from v5.17
      
         - Don't iterate over non-present ports in mlx5
      
         - Fix an error introduced by the foritify work in cxgb4
      
         - Two bug fixes for the recently merged ERDMA driver
      
         - Unbreak RDMA dmabuf support, a regresion from v5.19"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
        RDMA: Handle the return code from dma_resv_wait_timeout() properly
        RDMA/erdma: Correct the max_qp and max_cq capacities of the device
        RDMA/erdma: Using the key in FMR WR instead of MR structure
        RDMA/cxgb4: fix accept failure due to increased cpl_t5_pass_accept_rpl size
        RDMA/mlx5: Use the proper number of ports
        IB/iser: Fix login with authentication
      f31c32ef
    • Guru Das Srinagesh's avatar
      scripts/clang-tools: Remove DeprecatedOrUnsafeBufferHandling check · 4be72c1b
      Guru Das Srinagesh authored
      This `clang-analyzer` check flags the use of memset(), suggesting a more
      secure version of the API, such as memset_s(), which does not exist in
      the kernel:
      
        warning: Call to function 'memset' is insecure as it does not provide
        security checks introduced in the C11 standard. Replace with analogous
        functions that support length arguments or provides boundary checks such
        as 'memset_s' in case of C11
        [clang-analyzer-security.insecureAPI.DeprecatedOrUnsafeBufferHandling]
      Signed-off-by: default avatarGuru Das Srinagesh <quic_gurus@quicinc.com>
      Reviewed-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      4be72c1b
    • Masahiro Yamada's avatar
      kbuild: fix the modules order between drivers and libs · 11314751
      Masahiro Yamada authored
      Commit b2c88554 ("kbuild: update modules.order only when contained
      modules are updated") accidentally changed the modules order.
      
      Prior to that commit, the modules order was determined based on
      vmlinux-dirs, which lists core-y/m, drivers-y/m, libs-y/m, in this order.
      
      Now, subdir-modorder lists them in a different order: core-y/m, libs-y/m,
      drivers-y/m.
      
      Presumably, there was no practical issue because the modules in drivers
      and libs are orthogonal, but there is no reason to have this distortion.
      
      Get back to the original order.
      
      Fixes: b2c88554 ("kbuild: update modules.order only when contained modules are updated")
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      11314751
    • Nathan Chancellor's avatar
      scripts/Makefile.extrawarn: Do not disable clang's -Wformat-zero-length · 370655bc
      Nathan Chancellor authored
      There are no instances of this warning in the tree across several
      difference architectures and configurations. This was added by
      commit 26ea6bb1 ("kbuild, LLVMLinux: Supress warnings unless W=1-3")
      back in 2014, where it might have been necessary, but there are no
      instances of it now so stop disabling it to increase warning coverage
      for clang.
      Signed-off-by: default avatarNathan Chancellor <nathan@kernel.org>
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      370655bc
    • Jiri Slaby's avatar
      kbuild: dummy-tools: pretend we understand __LONG_DOUBLE_128__ · 0df499ea
      Jiri Slaby authored
      There is a test in powerpc's Kconfig which checks __LONG_DOUBLE_128__
      and sets CONFIG_PPC_LONG_DOUBLE_128 if it is understood by the compiler.
      
      We currently don't handle it, so this results in PPC_LONG_DOUBLE_128 not
      being in super-config generated by dummy-tools. So take this into
      account in the gcc script and preprocess __LONG_DOUBLE_128__ as "1".
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      0df499ea
    • Masahiro Yamada's avatar
      modpost: fix module versioning when a symbol lacks valid CRC · 5b8a9a8f
      Masahiro Yamada authored
      Since commit 7b453719 ("kbuild: link symbol CRCs at final link,
      removing CONFIG_MODULE_REL_CRCS"), module versioning is broken on
      some architectures. Loading a module fails with "disagrees about
      version of symbol module_layout".
      
      On such architectures (e.g. ARCH=sparc build with sparc64_defconfig),
      modpost shows a warning, like follows:
      
        WARNING: modpost: EXPORT symbol "_mcount" [vmlinux] version generation failed, symbol will not be versioned.
        Is "_mcount" prototyped in <asm/asm-prototypes.h>?
      
      Previously, it was a harmless warning (CRC check was just skipped),
      but now wrong CRCs are used for comparison because invalid CRCs are
      just skipped.
      
        $ sparc64-linux-gnu-nm -n vmlinux
          [snip]
        0000000000c2cea0 r __ksymtab__kstrtol
        0000000000c2ceb8 r __ksymtab__kstrtoul
        0000000000c2ced0 r __ksymtab__local_bh_enable
        0000000000c2cee8 r __ksymtab__mcount
        0000000000c2cf00 r __ksymtab__printk
        0000000000c2cf18 r __ksymtab__raw_read_lock
        0000000000c2cf30 r __ksymtab__raw_read_lock_bh
          [snip]
        0000000000c53b34 D __crc__kstrtol
        0000000000c53b38 D __crc__kstrtoul
        0000000000c53b3c D __crc__local_bh_enable
        0000000000c53b40 D __crc__printk
        0000000000c53b44 D __crc__raw_read_lock
        0000000000c53b48 D __crc__raw_read_lock_bh
      
      Please notice __crc__mcount is missing here.
      
      When the module subsystem looks up a CRC that comes after, it results
      in reading out a wrong address. For example, when __crc__printk is
      needed, the module subsystem reads 0xc53b44 instead of 0xc53b40.
      
      All CRC entries must be output for correct index accessing. Invalid
      CRCs will be unused, but are needed to keep the one-to-one mapping
      between __ksymtab_* and __crc_*.
      
      The best is to fix all modpost warnings, but several warnings are still
      remaining on less popular architectures.
      
      Fixes: 7b453719 ("kbuild: link symbol CRCs at final link, removing CONFIG_MODULE_REL_CRCS")
      Reported-by: default avatarmatoro <matoro_mailinglist_kernel@matoro.tk>
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      Tested-by: default avatarmatoro <matoro_mailinglist_kernel@matoro.tk>
      5b8a9a8f
    • Linus Torvalds's avatar
      Merge tag 'block-6.0-2022-08-19' of git://git.kernel.dk/linux-block · b9bce6e5
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
       "A few fixes that should go into this release:
      
         - Small series of patches for ublk (ZiyangZhang)
      
         - Remove dead function (Yu)
      
         - Fix for running a block queue in case of resource starvation
           (Yufen)"
      
      * tag 'block-6.0-2022-08-19' of git://git.kernel.dk/linux-block:
        blk-mq: run queue no matter whether the request is the last request
        blk-mq: remove unused function blk_mq_queue_stopped()
        ublk_drv: do not add a re-issued request aborted previously to ioucmd's task_work
        ublk_drv: update comment for __ublk_fail_req()
        ublk_drv: check ubq_daemon_is_dying() in __ublk_rq_task_work()
        ublk_drv: update iod->addr for UBLK_IO_NEED_GET_DATA
      b9bce6e5
    • Linus Torvalds's avatar
      Merge tag 'io_uring-6.0-2022-08-19' of git://git.kernel.dk/linux-block · beaf1397
      Linus Torvalds authored
      Pull io_uring fixes from Jens Axboe:
       "A few fixes for regressions in this cycle:
      
         - Two instances of using the wrong "has async data" helper (Pavel)
      
         - Fixup zero-copy address import (Pavel)
      
         - Bump zero-copy notification slot limit (Pavel)"
      
      * tag 'io_uring-6.0-2022-08-19' of git://git.kernel.dk/linux-block:
        io_uring/net: use right helpers for async_data
        io_uring/notif: raise limit on notification slots
        io_uring/net: improve zc addr import error handling
        io_uring/net: use right helpers for async recycle
      beaf1397
    • Linus Torvalds's avatar
      Merge tag 'ata-6.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata · 044610f8
      Linus Torvalds authored
      Pull ATA fixes from Damien Le Moal:
      
       - Add a missing command name definition for ata_get_cmd_name(), from
         me.
      
       - A fix to address a performance regression due to the default
         max_sectors queue limit for ATA devices connected to AHCI adapters
         being too small, from John.
      
      * tag 'ata-6.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
        ata: libata: Set __ATA_BASE_SHT max_sectors
        ata: libata-eh: Add missing command name
      044610f8
    • Linus Torvalds's avatar
      Merge tag 'mmc-v6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc · 4d099c33
      Linus Torvalds authored
      Pull MMC host fixes from Ulf Hansson:
      
       - meson-gx: Fix error handling in ->probe()
      
       - mtk-sd: Fix a command problem when using cqe off/disable
      
       - pxamci: Fix error handling in ->probe()
      
       - sdhci-of-dwcmshc: Fix broken support for the BlueField-3 variant
      
      * tag 'mmc-v6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
        mmc: sdhci-of-dwcmshc: Re-enable support for the BlueField-3 SoC
        mmc: meson-gx: Fix an error handling path in meson_mmc_probe()
        mmc: mtk-sd: Clear interrupts when cqe off/disable
        mmc: pxamci: Fix another error handling path in pxamci_probe()
        mmc: pxamci: Fix an error handling path in pxamci_probe()
      4d099c33
    • John Garry's avatar
      ata: libata: Set __ATA_BASE_SHT max_sectors · a357f7b4
      John Garry authored
      Commit 0568e612 ("ata: libata-scsi: cap ata_device->max_sectors
      according to shost->max_sectors") inadvertently capped the max_sectors
      value for some SATA disks to a value which is lower than we would want.
      
      For a device which supports LBA48, we would previously have request queue
      max_sectors_kb and max_hw_sectors_kb values of 1280 and 32767 respectively.
      
      For AHCI controllers, the value chosen for shost max sectors comes from
      the minimum of the SCSI host default max sectors in
      SCSI_DEFAULT_MAX_SECTORS (1024) and the shost DMA device mapping limit.
      
      This means that we would now set the max_sectors_kb and max_hw_sectors_kb
      values for a disk which supports LBA48 at 512, ignoring DMA mapping limit.
      
      As report by Oliver at [0], this caused a performance regression.
      
      Fix by picking a large enough max sectors value for ATA host controllers
      such that we don't needlessly reduce max_sectors_kb for LBA48 disks.
      
      [0] https://lore.kernel.org/linux-ide/YvsGbidf3na5FpGb@xsang-OptiPlex-9020/T/#m22d9fc5ad15af66066dd9fecf3d50f1b1ef11da3
      
      Fixes: 0568e612 ("ata: libata-scsi: cap ata_device->max_sectors according to shost->max_sectors")
      Reported-by: default avatarOliver Sang <oliver.sang@intel.com>
      Signed-off-by: default avatarJohn Garry <john.garry@huawei.com>
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@opensource.wdc.com>
      a357f7b4