1. 23 May, 2024 2 commits
  2. 21 May, 2024 1 commit
    • Nilay Shroff's avatar
      nvme-multipath: find NUMA path only for online numa-node · d3a04373
      Nilay Shroff authored
      In current native multipath design when a shared namespace is created,
      we loop through each possible numa-node, calculate the NUMA distance of
      that node from each nvme controller and then cache the optimal IO path
      for future reference while sending IO. The issue with this design is that
      we may refer to the NUMA distance table for an offline node which may not
      be populated at the time and so we may inadvertently end up finding and
      caching a non-optimal path for IO. Then latter when the corresponding
      numa-node becomes online and hence the NUMA distance table entry for that
      node is created, ideally we should re-calculate the multipath node distance
      for the newly added node however that doesn't happen unless we rescan/reset
      the controller. So essentially, we may keep using non-optimal IO path for a
      node which is made online after namespace is created.
      This patch helps fix this issue ensuring that when a shared namespace is
      created, we calculate the multipath node distance for each online numa-node
      instead of each possible numa-node. Then latter when a node becomes online
      and we receive any IO on that newly added node, we would calculate the
      multipath node distance for newly added node but this time NUMA distance
      table would have been already populated for newly added node. Hence we
      would be able to correctly calculate the multipath node distance and choose
      the optimal path for the IO.
      Signed-off-by: default avatarNilay Shroff <nilay@linux.ibm.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarKeith Busch <kbusch@kernel.org>
      d3a04373
  3. 20 May, 2024 1 commit
  4. 17 May, 2024 1 commit
  5. 16 May, 2024 3 commits
  6. 15 May, 2024 1 commit
  7. 14 May, 2024 7 commits
  8. 13 May, 2024 24 commits
    • Bart Van Assche's avatar
      null_blk: Fix two sparse warnings · 25260555
      Bart Van Assche authored
      Fix the following sparse warnings:
      
      drivers/block/null_blk/main.c:1243:35: warning: incorrect type in return expression (different base types)
      drivers/block/null_blk/main.c:1243:35:    expected int
      drivers/block/null_blk/main.c:1243:35:    got restricted blk_status_t
      drivers/block/null_blk/main.c:1291:30: warning: incorrect type in return expression (different base types)
      drivers/block/null_blk/main.c:1291:30:    expected restricted blk_status_t
      drivers/block/null_blk/main.c:1291:30:    got int
      
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Damien Le Moal <dlemoal@kernel.org>
      Signed-off-by: default avatarBart Van Assche <bvanassche@acm.org>
      Link: https://lore.kernel.org/r/20240510201816.24921-1-bvanassche@acm.orgSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      25260555
    • Jens Axboe's avatar
      ublk_drv: set DMA alignment mask to 3 · 928b607d
      Jens Axboe authored
      By default, this will be 511, as that's the block layer default. But
      drivers these days can support memory alignments that aren't tied to
      the sector sizes, instead just being limited by what the DMA engine
      supports. An example is NVMe, where it's generally set to a 32-bit or
      64-bit boundary. As ublk itself doesn't really care, just set it low
      enough that we don't run into issues with NVMe where the required
      O_DIRECT memory alignment is now more restrictive on ublk than it is
      on the underlying device.
      
      This was triggered by spurious -EINVAL returns on O_DIRECT IO on a
      setup with ublk managing NVMe devices, which previously worked just
      fine on the NVMe device itself. With the alignment relaxed, the test
      works fine.
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      928b607d
    • Linus Torvalds's avatar
      Merge tag 'tag-chrome-platform-firmware-for-v6.10' of... · a7c840ba
      Linus Torvalds authored
      Merge tag 'tag-chrome-platform-firmware-for-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux
      
      Pull chrome platform firmware updates from Tzung-Bi Shih:
      
       - Set driver owner in the core registration so that coreboot drivers
         don't need to set it individually
      
      * tag 'tag-chrome-platform-firmware-for-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux:
        firmware: google: cbmem: drop driver owner initialization
        firmware: coreboot: store owner from modules with coreboot_driver_register()
      a7c840ba
    • Linus Torvalds's avatar
      Merge tag 'tag-chrome-platform-for-v6.10' of... · 59729c8a
      Linus Torvalds authored
      Merge tag 'tag-chrome-platform-for-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux
      
      Pull chrome platform updates from Tzung-Bi Shih:
       "New:
         - Support Framework Laptop 13 and 16 (AMD Ryzen)
      
        Improvements:
         - Use sysfs_emit() instead of sprintf() for sysfs' show()
      
        Fixes:
         - Fix flex-array-member-not-at-end compiler warnings by using
           DEFINE_RAW_FLEX()
         - Add HAS_IOPORT dependencies
         - Fix long pending events during suspend after resume
      
        Misc cleanups:
         - Provide ID tables for avoiding fallback match
         - Replace deprecated UNIVERSAL_DEV_PM_OPS()"
      
      * tag 'tag-chrome-platform-for-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux: (22 commits)
        platform/chrome: cros_ec: Handle events during suspend after resume completion
        platform/chrome: cros_ec_lpc: add quirks for the Framework Laptop (AMD)
        platform/chrome: cros_ec_lpc: add a "quirks" system
        platform/chrome: cros_ec_lpc: pass driver_data from DMI to the device
        platform/chrome: cros_ec_lpc: introduce a priv struct for the lpc device
        platform/chrome: add HAS_IOPORT dependencies
        platform/chrome: cros_hps_i2c: Replace deprecated UNIVERSAL_DEV_PM_OPS()
        platform/chrome: cros_kbd_led_backlight: provide ID table for avoiding fallback match
        platform/chrome: wilco_ec: core: provide ID table for avoiding fallback match
        platform/chrome: wilco_ec: event: remove redundant MODULE_ALIAS
        platform/chrome: wilco_ec: debugfs: provide ID table for avoiding fallback match
        platform/chrome: wilco_ec: telemetry: provide ID table for avoiding fallback match
        platform/chrome: cros_ec_vbc: provide ID table for avoiding fallback match
        platform/chrome: cros_ec_lightbar: provide ID table for avoiding fallback match
        platform/chrome: cros_ec_sysfs: provide ID table for avoiding fallback match
        platform/chrome: cros_ec_debugfs: provide ID table for avoiding fallback match
        platform/chrome: cros_ec_chardev: provide ID table for avoiding fallback match
        platform/chrome: cros_usbpd_notify: provide ID table for avoiding fallback match
        platform/chrome: cros_usbpd_logger: provide ID table for avoiding fallback match
        platform/chrome: cros_ec_sensorhub: provide ID table for avoiding fallback match
        ...
      59729c8a
    • Linus Torvalds's avatar
      Merge tag 'rust-6.10' of https://github.com/Rust-for-Linux/linux · 8f5b5f78
      Linus Torvalds authored
      Pull Rust updates from Miguel Ojeda:
       "The most notable change is the drop of the 'alloc' in-tree fork. This
        is nicely reflected in the diffstat as a ~10k lines drop. In turn,
        this makes the version upgrades way simpler and smaller in the future,
        e.g. the latest one in commit 56f64b37 ("rust: upgrade to Rust
        1.78.0").
      
        More importantly, this increases the chances that a newer compiler
        version just works, which in turn means supporting several compiler
        versions is easier now. Thus we will look into finally setting a
        minimum version in the near future.
      
        Toolchain and infrastructure:
      
         - Upgrade to Rust 1.78.0
      
           This time around, due to how the kernel and Rust schedules have
           aligned, there are two upgrades in fact. These allow us to remove
           one more unstable feature ('offset_of') from the list, among other
           improvements
      
         - Drop 'alloc' in-tree fork of the standard library crate, which
           means all the unstable features used by 'alloc' (~30 language ones,
           ~60 library ones) are not a concern anymore
      
         - Support DWARFv5 via the '-Zdwarf-version' flag
      
         - Support zlib and zstd debuginfo compression via the
           '-Zdebuginfo-compression' flag
      
        'kernel' crate:
      
         - Support allocation flags ('GFP_*'), particularly in 'Box' (via
           'BoxExt'), 'Vec' (via 'VecExt'), 'Arc' and 'UniqueArc', as well as
           in the 'init' module APIs
      
         - Remove usage of the 'allocator_api' unstable feature
      
         - Remove 'try_' prefix in allocation APIs' names
      
         - Add 'VecExt' (an extension trait) to be able to drop the 'alloc'
           fork
      
         - Add the '{make,to}_{upper,lower}case()' methods to 'CStr'/'CString'
      
         - Add the 'as_ptr' method to 'ThisModule'
      
         - Add the 'from_raw' method to 'ArcBorrow'
      
         - Add the 'into_unique_or_drop' method to 'Arc'
      
         - Display column number in the 'dbg!' macro output by applying the
           equivalent change done to the standard library one
      
         - Migrate 'Work' to '#[pin_data]' thanks to the changes in the
           'macros' crate, which allows to remove an unsafe call in its 'new'
           associated function
      
         - Prevent namespacing issues when using the '[try_][pin_]init!'
           macros by changing the generated name of guard variables
      
         - Make the 'get' method in 'Opaque' const
      
         - Implement the 'Default' trait for 'LockClassKey'
      
         - Remove unneeded 'kernel::prelude' imports from doctests
      
         - Remove redundant imports
      
        'macros' crate:
      
         - Add 'decl_generics' to 'parse_generics()' to support default
           values, and use that to allow them in '#[pin_data]'
      
        Helpers:
      
         - Trivial English grammar fix
      
        Documentation:
      
         - Add section on Rust Kselftests to the 'Testing' document
      
         - Expand the 'Abstractions vs. bindings' section of the 'General
           Information' document"
      
      * tag 'rust-6.10' of https://github.com/Rust-for-Linux/linux: (31 commits)
        rust: alloc: fix dangling pointer in VecExt<T>::reserve()
        rust: upgrade to Rust 1.78.0
        rust: kernel: remove redundant imports
        rust: sync: implement `Default` for `LockClassKey`
        docs: rust: extend abstraction and binding documentation
        docs: rust: Add instructions for the Rust kselftest
        rust: remove unneeded `kernel::prelude` imports from doctests
        rust: update `dbg!()` to format column number
        rust: helpers: Fix grammar in comment
        rust: init: change the generated name of guard variables
        rust: sync: add `Arc::into_unique_or_drop`
        rust: sync: add `ArcBorrow::from_raw`
        rust: types: Make Opaque::get const
        rust: kernel: remove usage of `allocator_api` unstable feature
        rust: init: update `init` module to take allocation flags
        rust: sync: update `Arc` and `UniqueArc` to take allocation flags
        rust: alloc: update `VecExt` to take allocation flags
        rust: alloc: introduce the `BoxExt` trait
        rust: alloc: introduce allocation flags
        rust: alloc: remove our fork of the `alloc` crate
        ...
      8f5b5f78
    • Linus Torvalds's avatar
      Merge tag 'v6.10-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · 84c7d76b
      Linus Torvalds authored
      Pull crypto updates from Herbert Xu:
       "API:
         - Remove crypto stats interface
      
        Algorithms:
         - Add faster AES-XTS on modern x86_64 CPUs
         - Forbid curves with order less than 224 bits in ecc (FIPS 186-5)
         - Add ECDSA NIST P521
      
        Drivers:
         - Expose otp zone in atmel
         - Add dh fallback for primes > 4K in qat
         - Add interface for live migration in qat
         - Use dma for aes requests in starfive
         - Add full DMA support for stm32mpx in stm32
         - Add Tegra Security Engine driver
      
        Others:
         - Introduce scope-based x509_certificate allocation"
      
      * tag 'v6.10-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (123 commits)
        crypto: atmel-sha204a - provide the otp content
        crypto: atmel-sha204a - add reading from otp zone
        crypto: atmel-i2c - rename read function
        crypto: atmel-i2c - add missing arg description
        crypto: iaa - Use kmemdup() instead of kzalloc() and memcpy()
        crypto: sahara - use 'time_left' variable with wait_for_completion_timeout()
        crypto: api - use 'time_left' variable with wait_for_completion_killable_timeout()
        crypto: caam - i.MX8ULP donot have CAAM page0 access
        crypto: caam - init-clk based on caam-page0-access
        crypto: starfive - Use fallback for unaligned dma access
        crypto: starfive - Do not free stack buffer
        crypto: starfive - Skip unneeded fallback allocation
        crypto: starfive - Skip dma setup for zeroed message
        crypto: hisilicon/sec2 - fix for register offset
        crypto: hisilicon/debugfs - mask the unnecessary info from the dump
        crypto: qat - specify firmware files for 402xx
        crypto: x86/aes-gcm - simplify GCM hash subkey derivation
        crypto: x86/aes-gcm - delete unused GCM assembly code
        crypto: x86/aes-xts - simplify loop in xts_crypt_slowpath()
        hwrng: stm32 - repair clock handling
        ...
      84c7d76b
    • Linus Torvalds's avatar
      Merge tag 'hardening-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux · 87caef42
      Linus Torvalds authored
      Pull hardening updates from Kees Cook:
       "The bulk of the changes here are related to refactoring and expanding
        the KUnit tests for string helper and fortify behavior.
      
        Some trivial strncpy replacements in fs/ were carried in my tree. Also
        some fixes to SCSI string handling were carried in my tree since the
        helper for those was introduce here. Beyond that, just little fixes
        all around: objtool getting confused about LKDTM+KCFI, preparing for
        future refactors (constification of sysctl tables, additional
        __counted_by annotations), a Clang UBSAN+i386 crash fix, and adding
        more options in the hardening.config Kconfig fragment.
      
        Summary:
      
         - selftests: Add str*cmp tests (Ivan Orlov)
      
         - __counted_by: provide UAPI for _le/_be variants (Erick Archer)
      
         - Various strncpy deprecation refactors (Justin Stitt)
      
         - stackleak: Use a copy of soon-to-be-const sysctl table (Thomas
           Weißschuh)
      
         - UBSAN: Work around i386 -regparm=3 bug with Clang prior to
           version 19
      
         - Provide helper to deal with non-NUL-terminated string copying
      
         - SCSI: Fix older string copying bugs (with new helper)
      
         - selftests: Consolidate string helper behavioral tests
      
         - selftests: add memcpy() fortify tests
      
         - string: Add additional __realloc_size() annotations for "dup"
           helpers
      
         - LKDTM: Fix KCFI+rodata+objtool confusion
      
         - hardening.config: Enable KCFI"
      
      * tag 'hardening-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (29 commits)
        uapi: stddef.h: Provide UAPI macros for __counted_by_{le, be}
        stackleak: Use a copy of the ctl_table argument
        string: Add additional __realloc_size() annotations for "dup" helpers
        kunit/fortify: Fix replaced failure path to unbreak __alloc_size
        hardening: Enable KCFI and some other options
        lkdtm: Disable CFI checking for perms functions
        kunit/fortify: Add memcpy() tests
        kunit/fortify: Do not spam logs with fortify WARNs
        kunit/fortify: Rename tests to use recommended conventions
        init: replace deprecated strncpy with strscpy_pad
        kunit/fortify: Fix mismatched kvalloc()/vfree() usage
        scsi: qla2xxx: Avoid possible run-time warning with long model_num
        scsi: mpi3mr: Avoid possible run-time warning with long manufacturer strings
        scsi: mptfusion: Avoid possible run-time warning with long manufacturer strings
        fs: ecryptfs: replace deprecated strncpy with strscpy
        hfsplus: refactor copy_name to not use strncpy
        reiserfs: replace deprecated strncpy with scnprintf
        virt: acrn: replace deprecated strncpy with strscpy
        ubsan: Avoid i386 UBSAN handler crashes with Clang
        ubsan: Remove 1-element array usage in debug reporting
        ...
      87caef42
    • Linus Torvalds's avatar
      Merge tag 'execve-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux · 92f74f7f
      Linus Torvalds authored
      Pull execve updates from Kees Cook:
      
       - Provide knob to change (previously fixed) coredump NOTES size
         (Allen Pais)
      
       - Add sched_prepare_exec tracepoint (Marco Elver)
      
       - Make /proc/$pid/auxv work under binfmt_elf_fdpic (Max Filippov)
      
       - Convert ARCH_HAVE_EXTRA_ELF_NOTES to proper Kconfig (Vignesh
         Balasubramanian)
      
       - Leave a gap between .bss and brk
      
      * tag 'execve-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
        fs/coredump: Enable dynamic configuration of max file note size
        binfmt_elf_fdpic: fix /proc/<pid>/auxv
        binfmt_elf: Leave a gap between .bss and brk
        Replace macro "ARCH_HAVE_EXTRA_ELF_NOTES" with kconfig
        tracing: Add sched_prepare_exec tracepoint
      92f74f7f
    • Linus Torvalds's avatar
      Merge tag 'seccomp-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux · 1ba58f1a
      Linus Torvalds authored
      Pull seccomp update from Kees Cook:
      
       - Prepare for sysctl table constification
      
      * tag 'seccomp-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
        seccomp: Constify sysctl subhelpers
      1ba58f1a
    • Linus Torvalds's avatar
      Merge tag 'for-6.10/block-20240511' of git://git.kernel.dk/linux · 0c9f4ac8
      Linus Torvalds authored
      Pull block updates from Jens Axboe:
      
       - Add a partscan attribute in sysfs, fixing an issue with systemd
         relying on an internal interface that went away.
      
       - Attempt #2 at making long running discards interruptible. The
         previous attempt went into 6.9, but we ended up mostly reverting it
         as it had issues.
      
       - Remove old ida_simple API in bcache
      
       - Support for zoned write plugging, greatly improving the performance
         on zoned devices.
      
       - Remove the old throttle low interface, which has been experimental
         since 2017 and never made it beyond that and isn't being used.
      
       - Remove page->index debugging checks in brd, as it hasn't caught
         anything and prepares us for removing in struct page.
      
       - MD pull request from Song
      
       - Don't schedule block workers on isolated CPUs
      
      * tag 'for-6.10/block-20240511' of git://git.kernel.dk/linux: (84 commits)
        blk-throttle: delay initialization until configuration
        blk-throttle: remove CONFIG_BLK_DEV_THROTTLING_LOW
        block: fix that util can be greater than 100%
        block: support to account io_ticks precisely
        block: add plug while submitting IO
        bcache: fix variable length array abuse in btree_iter
        bcache: Remove usage of the deprecated ida_simple_xx() API
        md: Revert "md: Fix overflow in is_mddev_idle"
        blk-lib: check for kill signal in ioctl BLKDISCARD
        block: add a bio_await_chain helper
        block: add a blk_alloc_discard_bio helper
        block: add a bio_chain_and_submit helper
        block: move discard checks into the ioctl handler
        block: remove the discard_granularity check in __blkdev_issue_discard
        block/ioctl: prefer different overflow check
        null_blk: Fix the WARNING: modpost: missing MODULE_DESCRIPTION()
        block: fix and simplify blkdevparts= cmdline parsing
        block: refine the EOF check in blkdev_iomap_begin
        block: add a partscan sysfs attribute for disks
        block: add a disk_has_partscan helper
        ...
      0c9f4ac8
    • Linus Torvalds's avatar
      Merge tag 'for-6.10/io_uring-20240511' of git://git.kernel.dk/linux · 9961a785
      Linus Torvalds authored
      Pull io_uring updates from Jens Axboe:
      
       - Greatly improve send zerocopy performance, by enabling coalescing of
         sent buffers.
      
         MSG_ZEROCOPY already does this with send(2) and sendmsg(2), but the
         io_uring side did not. In local testing, the crossover point for send
         zerocopy being faster is now around 3000 byte packets, and it
         performs better than the sync syscall variants as well.
      
         This feature relies on a shared branch with net-next, which was
         pulled into both branches.
      
       - Unification of how async preparation is done across opcodes.
      
         Previously, opcodes that required extra memory for async retry would
         allocate that as needed, using on-stack state until that was the
         case. If async retry was needed, the on-stack state was adjusted
         appropriately for a retry and then copied to the allocated memory.
      
         This led to some fragile and ugly code, particularly for read/write
         handling, and made storage retries more difficult than they needed to
         be. Allocate the memory upfront, as it's cheap from our pools, and
         use that state consistently both initially and also from the retry
         side.
      
       - Move away from using remap_pfn_range() for mapping the rings.
      
         This is really not the right interface to use and can cause lifetime
         issues or leaks. Additionally, it means the ring sq/cq arrays need to
         be physically contigious, which can cause problems in production with
         larger rings when services are restarted, as memory can be very
         fragmented at that point.
      
         Move to using vm_insert_page(s) for the ring sq/cq arrays, and apply
         the same treatment to mapped ring provided buffers. This also helps
         unify the code we have dealing with allocating and mapping memory.
      
         Hard to see in the diffstat as we're adding a few features as well,
         but this kills about ~400 lines of code from the codebase as well.
      
       - Add support for bundles for send/recv.
      
         When used with provided buffers, bundles support sending or receiving
         more than one buffer at the time, improving the efficiency by only
         needing to call into the networking stack once for multiple sends or
         receives.
      
       - Tweaks for our accept operations, supporting both a DONTWAIT flag for
         skipping poll arm and retry if we can, and a POLLFIRST flag that the
         application can use to skip the initial accept attempt and rely
         purely on poll for triggering the operation. Both of these have
         identical flags on the receive side already.
      
       - Make the task_work ctx locking unconditional.
      
         We had various code paths here that would do a mix of lock/trylock
         and set the task_work state to whether or not it was locked. All of
         that goes away, we lock it unconditionally and get rid of the state
         flag indicating whether it's locked or not.
      
         The state struct still exists as an empty type, can go away in the
         future.
      
       - Add support for specifying NOP completion values, allowing it to be
         used for error handling testing.
      
       - Use set/test bit for io-wq worker flags. Not strictly needed, but
         also doesn't hurt and helps silence a KCSAN warning.
      
       - Cleanups for io-wq locking and work assignments, closing a tiny race
         where cancelations would not be able to find the work item reliably.
      
       - Misc fixes, cleanups, and improvements
      
      * tag 'for-6.10/io_uring-20240511' of git://git.kernel.dk/linux: (97 commits)
        io_uring: support to inject result for NOP
        io_uring: fail NOP if non-zero op flags is passed in
        io_uring/net: add IORING_ACCEPT_POLL_FIRST flag
        io_uring/net: add IORING_ACCEPT_DONTWAIT flag
        io_uring/filetable: don't unnecessarily clear/reset bitmap
        io_uring/io-wq: Use set_bit() and test_bit() at worker->flags
        io_uring/msg_ring: cleanup posting to IOPOLL vs !IOPOLL ring
        io_uring: Require zeroed sqe->len on provided-buffers send
        io_uring/notif: disable LAZY_WAKE for linked notifs
        io_uring/net: fix sendzc lazy wake polling
        io_uring/msg_ring: reuse ctx->submitter_task read using READ_ONCE instead of re-reading it
        io_uring/rw: reinstate thread check for retries
        io_uring/notif: implement notification stacking
        io_uring/notif: simplify io_notif_flush()
        net: add callback for setting a ubuf_info to skb
        net: extend ubuf_info callback to ops structure
        io_uring/net: support bundles for recv
        io_uring/net: support bundles for send
        io_uring/kbuf: add helpers for getting/peeking multiple buffers
        io_uring/net: add provided buffer support for IORING_OP_SEND
        ...
      9961a785
    • Linus Torvalds's avatar
      Merge tag 'vfs-6.10.rw' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs · f4e8d802
      Linus Torvalds authored
      Pull vfs rw iterator updates from Christian Brauner:
       "The core fs signalfd, userfaultfd, and timerfd subsystems did still
        use f_op->read() instead of f_op->read_iter(). Convert them over since
        we should aim to get rid of f_op->read() at some point.
      
        Aside from that io_uring and others want to mark files as FMODE_NOWAIT
        so it can make use of per-IO nonblocking hints to enable more
        efficient IO. Converting those users to f_op->read_iter() allows them
        to be marked with FMODE_NOWAIT"
      
      * tag 'vfs-6.10.rw' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
        signalfd: convert to ->read_iter()
        userfaultfd: convert to ->read_iter()
        timerfd: convert to ->read_iter()
        new helper: copy_to_iter_full()
      f4e8d802
    • Linus Torvalds's avatar
      Merge tag 'vfs-6.10.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs · ef31ea6c
      Linus Torvalds authored
      Pull netfs updates from Christian Brauner:
       "This reworks the netfslib writeback implementation so that pages read
        from the cache are written to the cache through ->writepages(),
        thereby allowing the fscache page flag to be retired.
      
        The reworking also:
      
         - builds on top of the new writeback_iter() infrastructure
      
         - makes it possible to use vectored write RPCs as discontiguous
           streams of pages can be accommodated
      
         - makes it easier to do simultaneous content crypto and stream
           division
      
         - provides support for retrying writes and re-dividing a stream
      
         - replaces the ->launder_folio() op, so that ->writepages() is used
           instead
      
         - uses mempools to allocate the netfs_io_request and
           netfs_io_subrequest structs to avoid allocation failure in the
           writeback path
      
        Some code that uses the fscache page flag is retained for
        compatibility purposes with nfs and ceph. The code is switched to
        using the synonymous private_2 label instead and marked with
        deprecation comments.
      
        The merge commit contains additional details on the new algorithm that
        I've left out of here as it would probably be excessively detailed.
      
        On top of the netfslib infrastructure this contains the work to
        convert cifs over to netfslib"
      
      * tag 'vfs-6.10.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (38 commits)
        cifs: Enable large folio support
        cifs: Remove some code that's no longer used, part 3
        cifs: Remove some code that's no longer used, part 2
        cifs: Remove some code that's no longer used, part 1
        cifs: Cut over to using netfslib
        cifs: Implement netfslib hooks
        cifs: Make add_credits_and_wake_if() clear deducted credits
        cifs: Add mempools for cifs_io_request and cifs_io_subrequest structs
        cifs: Set zero_point in the copy_file_range() and remap_file_range()
        cifs: Move cifs_loose_read_iter() and cifs_file_write_iter() to file.c
        cifs: Replace the writedata replay bool with a netfs sreq flag
        cifs: Make wait_mtu_credits take size_t args
        cifs: Use more fields from netfs_io_subrequest
        cifs: Replace cifs_writedata with a wrapper around netfs_io_subrequest
        cifs: Replace cifs_readdata with a wrapper around netfs_io_subrequest
        cifs: Use alternative invalidation to using launder_folio
        netfs, afs: Use writeback retry to deal with alternate keys
        netfs: Miscellaneous tidy ups
        netfs: Remove the old writeback code
        netfs: Cut over to using new writeback code
        ...
      ef31ea6c
    • Linus Torvalds's avatar
      Merge tag 'vfs-6.10.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs · 103fb219
      Linus Torvalds authored
      Pull vfs mount API conversions from Christian Brauner:
       "This converts qnx6, minix, debugfs, tracefs, freevxfs, and openpromfs
        to the new mount api, further reducing the number of filesystems
        relying on the legacy mount api"
      
      * tag 'vfs-6.10.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
        minix: convert minix to use the new mount api
        vfs: Convert tracefs to use the new mount API
        vfs: Convert debugfs to use the new mount API
        openpromfs: finish conversion to the new mount API
        freevxfs: Convert freevxfs to the new mount API.
        qnx6: convert qnx6 to use the new mount api
      103fb219
    • Linus Torvalds's avatar
      Merge tag 'vfs-6.10.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs · 1b0aabcc
      Linus Torvalds authored
      Pull misc vfs updates from Christian Brauner:
       "This contains the usual miscellaneous features, cleanups, and fixes
        for vfs and individual fses.
      
        Features:
      
         - Free up FMODE_* bits. I've freed up bits 6, 7, 8, and 24. That
           means we now have six free FMODE_* bits in total (but bit #6
           already got used for FMODE_WRITE_RESTRICTED)
      
         - Add FOP_HUGE_PAGES flag (follow-up to FMODE_* cleanup)
      
         - Add fd_raw cleanup class so we can make use of automatic cleanup
           provided by CLASS(fd_raw, f)(fd) for O_PATH fds as well
      
         - Optimize seq_puts()
      
         - Simplify __seq_puts()
      
         - Add new anon_inode_getfile_fmode() api to allow specifying f_mode
           instead of open-coding it in multiple places
      
         - Annotate struct file_handle with __counted_by() and use
           struct_size()
      
         - Warn in get_file() whether f_count resurrection from zero is
           attempted (epoll/drm discussion)
      
         - Folio-sophize aio
      
         - Export the subvolume id in statx() for both btrfs and bcachefs
      
         - Relax linkat(AT_EMPTY_PATH) requirements
      
         - Add F_DUPFD_QUERY fcntl() allowing to compare two file descriptors
           for dup*() equality replacing kcmp()
      
        Cleanups:
      
         - Compile out swapfile inode checks when swap isn't enabled
      
         - Use (1 << n) notation for FMODE_* bitshifts for clarity
      
         - Remove redundant variable assignment in fs/direct-io
      
         - Cleanup uses of strncpy in orangefs
      
         - Speed up and cleanup writeback
      
         - Move fsparam_string_empty() helper into header since it's currently
           open-coded in multiple places
      
         - Add kernel-doc comments to proc_create_net_data_write()
      
         - Don't needlessly read dentry->d_flags twice
      
        Fixes:
      
         - Fix out-of-range warning in nilfs2
      
         - Fix ecryptfs overflow due to wrong encryption packet size
           calculation
      
         - Fix overly long line in xfs file_operations (follow-up to FMODE_*
           cleanup)
      
         - Don't raise FOP_BUFFER_{R,W}ASYNC for directories in xfs (follow-up
           to FMODE_* cleanup)
      
         - Don't call xfs_file_open from xfs_dir_open (follow-up to FMODE_*
           cleanup)
      
         - Fix stable offset api to prevent endless loops
      
         - Fix afs file server rotations
      
         - Prevent xattr node from overflowing the eraseblock in jffs2
      
         - Move fdinfo PTRACE_MODE_READ procfs check into the .permission()
           operation instead of .open() operation since this caused userspace
           regressions"
      
      * tag 'vfs-6.10.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (39 commits)
        afs: Fix fileserver rotation getting stuck
        selftests: add F_DUPDFD_QUERY selftests
        fcntl: add F_DUPFD_QUERY fcntl()
        file: add fd_raw cleanup class
        fs: WARN when f_count resurrection is attempted
        seq_file: Simplify __seq_puts()
        seq_file: Optimize seq_puts()
        proc: Move fdinfo PTRACE_MODE_READ check into the inode .permission operation
        fs: Create anon_inode_getfile_fmode()
        xfs: don't call xfs_file_open from xfs_dir_open
        xfs: drop fop_flags for directories
        xfs: fix overly long line in the file_operations
        shmem: Fix shmem_rename2()
        libfs: Add simple_offset_rename() API
        libfs: Fix simple_offset_rename_exchange()
        jffs2: prevent xattr node from overflowing the eraseblock
        vfs, swap: compile out IS_SWAPFILE() on swapless configs
        vfs: relax linkat() AT_EMPTY_PATH - aka flink() - requirements
        fs/direct-io: remove redundant assignment to variable retval
        fs/dcache: Re-use value stored to dentry->d_flags instead of re-reading
        ...
      1b0aabcc
    • Linus Torvalds's avatar
      Merge tag 'vfs-6.10.iomap' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs · c117a437
      Linus Torvalds authored
      Pull vfs iomap updates from Christian Brauner:
       "This contains a few cleanups to the iomap code. Nothing particularly
        stands out"
      
      * tag 'vfs-6.10.iomap' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
        iomap: do some small logical cleanup in buffered write
        iomap: make iomap_write_end() return a boolean
        iomap: use a new variable to handle the written bytes in iomap_write_iter()
        iomap: don't increase i_size if it's not a write operation
        iomap: drop the write failure handles when unsharing and zeroing
        iomap: convert iomap_writepages to writeack_iter
      c117a437
    • Linus Torvalds's avatar
      Merge tag 'docs-6.10' of git://git.lwn.net/linux · 8815da98
      Linus Torvalds authored
      Pull documentation updates from Jonathan Corbet:
       "Another not-too-busy cycle for documentation, including:
      
         - Some build-system changes to detect the variable fonts installed by
           some distributions that can break the PDF build.
      
         - Various updates and additions to the Spanish, Chinese, Italian, and
           Japanese translations.
      
         - Update the stable-kernel rules to match modern practice
      
        ... and the usual array of corrections, updates, and typo fixes"
      
      * tag 'docs-6.10' of git://git.lwn.net/linux: (42 commits)
        cgroup: Add documentation for missing zswap memory.stat
        kernel-doc: Added "*" in $type_constants2 to fix 'make htmldocs' warning.
        docs:core-api: fixed typos and grammar in printk-index page
        Documentation: tracing: Fix spelling mistakes
        docs/zh_CN/rust: Update the translation of quick-start to 6.9-rc4
        docs/zh_CN/rust: Update the translation of general-information to 6.9-rc4
        docs/zh_CN/rust: Update the translation of coding-guidelines to 6.9-rc4
        docs/zh_CN/rust: Update the translation of arch-support to 6.9-rc4
        docs: stable-kernel-rules: fix typo sent->send
        docs/zh_CN: remove two inconsistent spaces
        docs: scripts/check-variable-fonts.sh: Improve commands for detection
        docs: stable-kernel-rules: create special tag to flag 'no backporting'
        docs: stable-kernel-rules: explain use of stable@kernel.org (w/o @vger.)
        docs: stable-kernel-rules: remove code-labels tags and a indention level
        docs: stable-kernel-rules: call mainline by its name and change example
        docs: stable-kernel-rules: reduce redundancy
        docs, kprobes: Add riscv as supported architecture
        Docs: typos/spelling
        docs: kernel_include.py: Cope with docutils 0.21
        docs: ja_JP/howto: Catch up update in v6.8
        ...
      8815da98
    • Linus Torvalds's avatar
      Merge tag 'keys-next-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd · 25c73642
      Linus Torvalds authored
      Pull keys updates from Jarkko Sakkinen:
      
       - do not overwrite the key expiration once it is set
      
       - move key quota updates earlier into key_put(), instead of updating
         them in key_gc_unused_keys()
      
      * tag 'keys-next-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
        keys: Fix overwrite of key expiration on instantiation
        keys: update key quotas in key_put()
      25c73642
    • Linus Torvalds's avatar
      Merge tag 'tpmdd-next-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd · b1923914
      Linus Torvalds authored
      Pull TPM updates from Jarkko Sakkinen:
       "These are the changes for the TPM driver with a single major new
        feature: TPM bus encryption and integrity protection. The key pair on
        TPM side is generated from so called null random seed per power on of
        the machine [1]. This supports the TPM encryption of the hard drive by
        adding layer of protection against bus interposer attacks.
      
        Other than that, a few minor fixes and documentation for tpm_tis to
        clarify basics of TPM localities for future patch review discussions
        (will be extended and refined over times, just a seed)"
      
      Link: https://lore.kernel.org/linux-integrity/20240429202811.13643-1-James.Bottomley@HansenPartnership.com/ [1]
      
      * tag 'tpmdd-next-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd: (28 commits)
        Documentation: tpm: Add TPM security docs toctree entry
        tpm: disable the TPM if NULL name changes
        Documentation: add tpm-security.rst
        tpm: add the null key name as a sysfs export
        KEYS: trusted: Add session encryption protection to the seal/unseal path
        tpm: add session encryption protection to tpm2_get_random()
        tpm: add hmac checks to tpm2_pcr_extend()
        tpm: Add the rest of the session HMAC API
        tpm: Add HMAC session name/handle append
        tpm: Add HMAC session start and end functions
        tpm: Add TCG mandated Key Derivation Functions (KDFs)
        tpm: Add NULL primary creation
        tpm: export the context save and load commands
        tpm: add buffer function to point to returned parameters
        crypto: lib - implement library version of AES in CFB mode
        KEYS: trusted: tpm2: Use struct tpm_buf for sized buffers
        tpm: Add tpm_buf_read_{u8,u16,u32}
        tpm: TPM2B formatted buffers
        tpm: Store the length of the tpm_buf data separately.
        tpm: Update struct tpm_buf documentation comments
        ...
      b1923914
    • Linus Torvalds's avatar
      Merge tag 'keys-trusted-next-6.10-rc1' of... · c0248148
      Linus Torvalds authored
      Merge tag 'keys-trusted-next-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd
      
      Pull trusted keys updates from Jarkko Sakkinen:
       "This contains a new key type for the Data Co-Processor (DCP), which is
        an IP core built into many NXP SoCs such as i.mx6ull"
      
      * tag 'keys-trusted-next-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
        docs: trusted-encrypted: add DCP as new trust source
        docs: document DCP-backed trusted keys kernel params
        MAINTAINERS: add entry for DCP-based trusted keys
        KEYS: trusted: Introduce NXP DCP-backed trusted keys
        KEYS: trusted: improve scalability of trust source config
        crypto: mxs-dcp: Add support for hardware-bound keys
      c0248148
    • Linus Torvalds's avatar
      Merge tag 'slab-for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab · cd97950c
      Linus Torvalds authored
      Pull slab updates from Vlastimil Babka:
       "This time it's mostly random cleanups and fixes, with two performance
        fixes that might have significant impact, but limited to systems
        experiencing particular bad corner case scenarios rather than general
        performance improvements.
      
        The memcg hook changes are going through the mm tree due to
        dependencies.
      
         - Prevent stalls when reading /proc/slabinfo (Jianfeng Wang)
      
           This fixes the long-standing problem that can happen with workloads
           that have alloc/free patterns resulting in many partially used
           slabs (in e.g. dentry cache). Reading /proc/slabinfo will traverse
           the long partial slab list under spinlock with disabled irqs and
           thus can stall other processes or even trigger the lockup
           detection. The traversal is only done to count free objects so that
           <active_objs> column can be reported along with <num_objs>.
      
           To avoid affecting fast paths with another shared counter
           (attempted in the past) or complex partial list traversal schemes
           that allow rescheduling, the chosen solution resorts to
           approximation - when the partial list is over 10000 slabs long, we
           will only traverse first 5000 slabs from head and tail each and use
           the average of those to estimate the whole list. Both head and tail
           are used as the slabs near head to tend to have more free objects
           than the slabs towards the tail.
      
           It is expected the approximation should not break existing
           /proc/slabinfo consumers. The <num_objs> field is still accurate
           and reflects the overall kmem_cache footprint. The <active_objs>
           was already imprecise due to cpu and percpu-partial slabs, so can't
           be relied upon to determine exact cache usage. The difference
           between <active_objs> and <num_objs> is mainly useful to determine
           the slab fragmentation, and that will be possible even with the
           approximation in place.
      
         - Prevent allocating many slabs when a NUMA node is full (Chen Jun)
      
           Currently, on NUMA systems with a node under significantly bigger
           pressure than other nodes, the fallback strategy may result in each
           kmalloc_node() that can't be safisfied from the preferred node, to
           allocate a new slab on a fallback node, and not reuse the slabs
           already on that node's partial list.
      
           This is now fixed and partial lists of fallback nodes are checked
           even for kmalloc_node() allocations. It's still preferred to
           allocate a new slab on the requested node before a fallback, but
           only with a GFP_NOWAIT attempt, which will fail quickly when the
           node is under a significant memory pressure.
      
         - More SLAB removal related cleanups (Xiu Jianfeng, Hyunmin Lee)
      
         - Fix slub_kunit self-test with hardened freelists (Guenter Roeck)
      
         - Mark racy accesses for KCSAN (linke li)
      
         - Misc cleanups (Xiongwei Song, Haifeng Xu, Sangyun Kim)"
      
      * tag 'slab-for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
        mm/slub: remove the check for NULL kmalloc_caches
        mm/slub: create kmalloc 96 and 192 caches regardless cache size order
        mm/slub: mark racy access on slab->freelist
        slub: use count_partial_free_approx() in slab_out_of_memory()
        slub: introduce count_partial_free_approx()
        slub: Set __GFP_COMP in kmem_cache by default
        mm/slub: remove duplicate initialization for early_kmem_cache_node_alloc()
        mm/slub: correct comment in do_slab_free()
        mm/slub, kunit: Use inverted data to corrupt kmem cache
        mm/slub: simplify get_partial_node()
        mm/slub: add slub_get_cpu_partial() helper
        mm/slub: remove the check of !kmem_cache_has_cpu_partial()
        mm/slub: Reduce memory consumption in extreme scenarios
        mm/slub: mark racy accesses on slab->slabs
        mm/slub: remove dummy slabinfo functions
      cd97950c
    • Linus Torvalds's avatar
      Merge tag 'kcsan.2024.05.10a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu · c07ea940
      Linus Torvalds authored
      Pull kcsan update from Paul McKenney:
       "Introduce __data_racy type qualifier
      
        This adds a __data_racy type qualifier that enables kernel developers
        to inform KCSAN that a given variable is a shared variable without
        needing to mark each and every access.
      
        This allows pre-KCSAN code to be correctly (if approximately)
        instrumented withh very little effort, and also provides people
        reading the code a clear indication that the variable is in fact
        shared.
      
        In addition, it permits incremental transition to per-access KCSAN
        marking, so that (for example) a given subsystem can be transitioned
        one variable at a time, while avoiding large numbers of KCSAN warnings
        during this transition"
      
      * tag 'kcsan.2024.05.10a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
        kcsan, compiler_types: Introduce __data_racy type qualifier
      c07ea940
    • Linus Torvalds's avatar
      Merge tag 'lkmm.2024.05.10a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu · ee202601
      Linus Torvalds authored
      Pull LKMM documentation updates from Paul McKenney:
       "This upgrades LKMM documentation, perhaps most notably adding a number
        of litmus tests illustrating cmpxchg() ordering properties.
      
        TL;DR: Failing cmpxchg() operations provide no ordering"
      
      * tag 'lkmm.2024.05.10a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
        Documentation/litmus-tests: Make cmpxchg() tests safe for klitmus
        Documentation/atomic_t: Emphasize that failed atomic operations give no ordering
        Documentation/litmus-tests: Demonstrate unordered failing cmpxchg
        Documentation/litmus-tests: Add locking tests to README
      ee202601
    • Linus Torvalds's avatar
      Merge tag 'cmpxchg.2024.05.11a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu · 2e57d1d6
      Linus Torvalds authored
      Pull cmpxchg updates from Paul McKenney:
       "Provide one-byte and two-byte cmpxchg() support on sparc32, parisc,
        and csky
      
        This provides native one-byte and two-byte cmpxchg() support for
        sparc32 and parisc, courtesy of Al Viro. This support is provided by
        the same hashed-array-of-locks technique used for the other atomic
        operations provided for these two platforms.
      
        There is also emulated one-byte cmpxchg() support for csky using a new
        cmpxchg_emu_u8() function that uses a four-byte cmpxchg() to emulate
        the one-byte variant.
      
        Similar patches for emulation of one-byte cmpxchg() for arc, sh, and
        xtensa have not yet received maintainer acks, so they are slated for
        the v6.11 merge window"
      
      * tag 'cmpxchg.2024.05.11a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
        csky: Emulate one-byte cmpxchg
        lib: Add one-byte emulation function
        parisc: add u16 support to cmpxchg()
        parisc: add missing export of __cmpxchg_u8()
        parisc: unify implementations of __cmpxchg_u{8,32,64}
        parisc: __cmpxchg_u32(): lift conversion into the callers
        sparc32: add __cmpxchg_u{8,16}() and teach __cmpxchg() to handle those sizes
        sparc32: unify __cmpxchg_u{32,64}
        sparc32: make the first argument of __cmpxchg_u64() volatile u64 *
        sparc32: make __cmpxchg_u32() return u32
      2e57d1d6