1. 08 Jan, 2024 10 commits
    • Linus Torvalds's avatar
      Merge tag 'x86_misc_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 41a80ca4
      Linus Torvalds authored
      Pull misc x86 updates from Borislav Petkov:
      
       - Add an informational message which gets issued when IA32 emulation
         has been disabled on the cmdline
      
       - Clarify in detail how /proc/cpuinfo is used on x86
      
       - Fix a theoretical overflow in num_digits()
      
      * tag 'x86_misc_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/ia32: State that IA32 emulation is disabled
        Documentation/x86: Document what /proc/cpuinfo is for
        x86/lib: Fix overflow when counting digits
      41a80ca4
    • Linus Torvalds's avatar
      Merge tag 'x86_microcode_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 6e0b9391
      Linus Torvalds authored
      Pull x86 microcode updates from Borislav Petkov:
      
       - Correct minor issues after the microcode revision reporting
         sanitization
      
      * tag 'x86_microcode_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/microcode/intel: Set new revision only after a successful update
        x86/microcode/intel: Remove redundant microcode late updated message
      6e0b9391
    • Linus Torvalds's avatar
      Merge tag 'edac_updates_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras · 1dee7f50
      Linus Torvalds authored
      Pull EDAC updates from Borislav Petkov:
      
       - The EDAC drivers part of the effort to make the ->remove() platform
         driver callback return void
      
       - Add support for AMD AI accelerators
      
       - Add support for a number of Intel SoCs: Alder Lake-N, Raptor Lake-P,
         Meteor Lake-{P,PS}
      
       - Random fixes and cleanups all over the place
      
      * tag 'edac_updates_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: (39 commits)
        EDAC/skx_common: Filter out the invalid address
        EDAC, pnd2: Sort headers alphabetically
        EDAC, pnd2: Correct misleading error message in mk_region_mask()
        EDAC, pnd2: Apply bit macros and helpers where it makes sense
        EDAC, pnd2: Replace custom definition by one from sizes.h
        EDAC/igen6: Add Intel Meteor Lake-P SoCs support
        EDAC/igen6: Add Intel Meteor Lake-PS SoCs support
        EDAC/igen6: Add Intel Raptor Lake-P SoCs support
        EDAC/igen6: Add Intel Alder Lake-N SoCs support
        EDAC/igen6: Make get_mchbar() helper function
        EDAC/amd64: Add support for family 0x19, models 0x90-9f devices
        EDAC/mc: Add support for HBM3 memory type
        EDAC/{sb,i7core}_edac: Do not use a plain integer for a NULL pointer
        EDAC/armada_xp: Explicitly include correct DT includes
        EDAC/pci_sysfs: Use PCI_HEADER_TYPE_MASK instead of literals
        EDAC/thunderx: Fix possible out-of-bounds string access
        EDAC/fsl_ddr: Convert to platform remove callback returning void
        EDAC/zynqmp: Convert to platform remove callback returning void
        EDAC/xgene: Convert to platform remove callback returning void
        EDAC/ti: Convert to platform remove callback returning void
        ...
      1dee7f50
    • Linus Torvalds's avatar
      Merge tag 'vfs-6.8.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs · 5db8752c
      Linus Torvalds authored
      Pull vfs iov_iter cleanups from Christian Brauner:
       "This contains a minor cleanup. The patches drop an unused argument
        from import_single_range() allowing to replace import_single_range()
        with import_ubuf() and dropping import_single_range() completely"
      
      * tag 'vfs-6.8.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
        iov_iter: replace import_single_range() with import_ubuf()
        iov_iter: remove unused 'iov' argument from import_single_range()
      5db8752c
    • Linus Torvalds's avatar
      Merge tag 'vfs-6.8.cachefiles' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs · 26458409
      Linus Torvalds authored
      Pull vfs cachefiles updates from Christian Brauner:
       "This contains improvements for on-demand cachefiles.
      
        If the daemon crashes and the on-demand cachefiles fd is unexpectedly
        closed in-flight requests and subsequent read operations associated
        with the fd will fail with EIO. This causes issues in various
        scenarios as this failure is currently unrecoverable.
      
        The work contained in this pull request introduces a failover mode and
        enables the daemon to recover in-flight requested-related objects. A
        restarted daemon will be able to process requests as usual.
      
        This requires that in-flight requests are stored during daemon crash
        or while the daemon is offline. In addition, a handle to
        /dev/cachefiles needs to be stored.
      
        This can be done by e.g., systemd's fdstore (cf. [1]) which enables
        the restarted daemon to recover state.
      
        Three new states are introduced in this patchset:
      
         (1) CLOSE
             Object is closed by the daemon.
      
         (2) OPEN
             Object is open and ready for processing. IOW, the open request
             has been handled successfully.
      
         (3) REOPENING
             Object has been previously closed and is now reopened due to a
             read request.
      
        A restarted daemon can recover the /dev/cachefiles fd from systemd's
        fdstore and writes "restore" to the device. This causes the object
        state to be reset from CLOSE to REOPENING and reinitializes the
        object.
      
        The daemon may now handle the open request. Any in-flight operations
        are restored and handled avoiding interruptions for users"
      
      Link: https://systemd.io/FILE_DESCRIPTOR_STORE [1]
      
      * tag 'vfs-6.8.cachefiles' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
        cachefiles: add restore command to recover inflight ondemand read requests
        cachefiles: narrow the scope of triggering EPOLLIN events in ondemand mode
        cachefiles: resend an open request if the read request's object is closed
        cachefiles: extract ondemand info field from cachefiles_object
        cachefiles: introduce object ondemand state
      26458409
    • Linus Torvalds's avatar
      Merge tag 'vfs-6.8.rw' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs · bb93c5ed
      Linus Torvalds authored
      Pull vfs rw updates from Christian Brauner:
       "This contains updates from Amir for read-write backing file helpers
        for stacking filesystems such as overlayfs:
      
         - Fanotify is currently in the process of introducing pre content
           events. Roughly, a new permission event will be added indicating
           that it is safe to write to the file being accessed. These events
           are used by hierarchical storage managers to e.g., fill the content
           of files on first access.
      
           During that work we noticed that our current permission checking is
           inconsistent in rw_verify_area() and remap_verify_area().
           Especially in the splice code permission checking is done multiple
           times. For example, one time for the whole range and then again for
           partial ranges inside the iterator.
      
           In addition, we mostly do permission checking before we call
           file_start_write() except for a few places where we call it after.
           For pre-content events we need such permission checking to be done
           before file_start_write(). So this is a nice reason to clean this
           all up.
      
           After this series, all permission checking is done before
           file_start_write().
      
           As part of this cleanup we also massaged the splice code a bit. We
           got rid of a few helpers because we are alredy drowning in special
           read-write helpers. We also cleaned up the return types for splice
           helpers.
      
         - Introduce generic read-write helpers for backing files. This lifts
           some overlayfs code to common code so it can be used by the FUSE
           passthrough work coming in over the next cycles. Make Amir and
           Miklos the maintainers for this new subsystem of the vfs"
      
      * tag 'vfs-6.8.rw' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (30 commits)
        fs: fix __sb_write_started() kerneldoc formatting
        fs: factor out backing_file_mmap() helper
        fs: factor out backing_file_splice_{read,write}() helpers
        fs: factor out backing_file_{read,write}_iter() helpers
        fs: prepare for stackable filesystems backing file helpers
        fsnotify: optionally pass access range in file permission hooks
        fsnotify: assert that file_start_write() is not held in permission hooks
        fsnotify: split fsnotify_perm() into two hooks
        fs: use splice_copy_file_range() inline helper
        splice: return type ssize_t from all helpers
        fs: use do_splice_direct() for nfsd/ksmbd server-side-copy
        fs: move file_start_write() into direct_splice_actor()
        fs: fork splice_file_range() from do_splice_direct()
        fs: create {sb,file}_write_not_started() helpers
        fs: create file_write_started() helper
        fs: create __sb_write_started() helper
        fs: move kiocb_start_write() into vfs_iocb_iter_write()
        fs: move permission hook out of do_iter_read()
        fs: move permission hook out of do_iter_write()
        fs: move file_start_write() into vfs_iter_write()
        ...
      bb93c5ed
    • Linus Torvalds's avatar
      Merge tag 'vfs-6.8.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs · 8c9440fe
      Linus Torvalds authored
      Pull vfs mount updates from Christian Brauner:
       "This contains the work to retrieve detailed information about mounts
        via two new system calls. This is hopefully the beginning of the end
        of the saga that started with fsinfo() years ago.
      
        The LWN articles in [1] and [2] can serve as a summary so we can avoid
        rehashing everything here.
      
        At LSFMM in May 2022 we got into a room and agreed on what we want to
        do about fsinfo(). Basically, split it into pieces. This is the first
        part of that agreement. Specifically, it is concerned with retrieving
        information about mounts. So this only concerns the mount information
        retrieval, not the mount table change notification, or the extended
        filesystem specific mount option work. That is separate work.
      
        Currently mounts have a 32bit id. Mount ids are already in heavy use
        by libmount and other low-level userspace but they can't be relied
        upon because they're recycled very quickly. We agreed that mounts
        should carry a unique 64bit id by which they can be referenced
        directly. This is now implemented as part of this work.
      
        The new 64bit mount id is exposed in statx() through the new
        STATX_MNT_ID_UNIQUE flag. If the flag isn't raised the old mount id is
        returned. If it is raised and the kernel supports the new 64bit mount
        id the flag is raised in the result mask and the new 64bit mount id is
        returned. New and old mount ids do not overlap so they cannot be
        conflated.
      
        Two new system calls are introduced that operate on the 64bit mount
        id: statmount() and listmount(). A summary of the api and usage can be
        found on LWN as well (cf. [3]) but of course, I'll provide a summary
        here as well.
      
        Both system calls rely on struct mnt_id_req. Which is the request
        struct used to pass the 64bit mount id identifying the mount to
        operate on. It is extensible to allow for the addition of new
        parameters and for future use in other apis that make use of mount
        ids.
      
        statmount() mimicks the semantics of statx() and exposes a set flags
        that userspace may raise in mnt_id_req to request specific information
        to be retrieved. A statmount() call returns a struct statmount filled
        in with information about the requested mount. Supported requests are
        indicated by raising the request flag passed in struct mnt_id_req in
        the @mask argument in struct statmount.
      
        Currently we do support:
      
         - STATMOUNT_SB_BASIC:
           Basic filesystem info
      
         - STATMOUNT_MNT_BASIC
           Mount information (mount id, parent mount id, mount attributes etc)
      
         - STATMOUNT_PROPAGATE_FROM
           Propagation from what mount in current namespace
      
         - STATMOUNT_MNT_ROOT
           Path of the root of the mount (e.g., mount --bind /bla /mnt returns /bla)
      
         - STATMOUNT_MNT_POINT
           Path of the mount point (e.g., mount --bind /bla /mnt returns /mnt)
      
         - STATMOUNT_FS_TYPE
           Name of the filesystem type as the magic number isn't enough due to submounts
      
        The string options STATMOUNT_MNT_{ROOT,POINT} and STATMOUNT_FS_TYPE
        are appended to the end of the struct. Userspace can use the offsets
        in @fs_type, @mnt_root, and @mnt_point to reference those strings
        easily.
      
        The struct statmount reserves quite a bit of space currently for
        future extensibility. This isn't really a problem and if this bothers
        us we can just send a follow-up pull request during this cycle.
      
        listmount() is given a 64bit mount id via mnt_id_req just as
        statmount(). It takes a buffer and a size to return an array of the
        64bit ids of the child mounts of the requested mount. Userspace can
        thus choose to either retrieve child mounts for a mount in batches or
        iterate through the child mounts. For most use-cases it will be
        sufficient to just leave space for a few child mounts. But for big
        mount tables having an iterator is really helpful. Iterating through a
        mount table works by setting @param in mnt_id_req to the mount id of
        the last child mount retrieved in the previous listmount() call"
      
      Link: https://lwn.net/Articles/934469 [1]
      Link: https://lwn.net/Articles/829212 [2]
      Link: https://lwn.net/Articles/950569 [3]
      
      * tag 'vfs-6.8.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
        add selftest for statmount/listmount
        fs: keep struct mnt_id_req extensible
        wire up syscalls for statmount/listmount
        add listmount(2) syscall
        statmount: simplify string option retrieval
        statmount: simplify numeric option retrieval
        add statmount(2) syscall
        namespace: extract show_path() helper
        mounts: keep list of mounts in an rbtree
        add unique mount ID
      8c9440fe
    • Linus Torvalds's avatar
      Merge tag 'vfs-6.8.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs · 3f6984e7
      Linus Torvalds authored
      Pull vfs super updates from Christian Brauner:
       "This contains the super work for this cycle including the long-awaited
        series by Jan to make it possible to prevent writing to mounted block
        devices:
      
         - Writing to mounted devices is dangerous and can lead to filesystem
           corruption as well as crashes. Furthermore syzbot comes with more
           and more involved examples how to corrupt block device under a
           mounted filesystem leading to kernel crashes and reports we can do
           nothing about. Add tracking of writers to each block device and a
           kernel cmdline argument which controls whether other writeable
           opens to block devices open with BLK_OPEN_RESTRICT_WRITES flag are
           allowed.
      
           Note that this effectively only prevents modification of the
           particular block device's page cache by other writers. The actual
           device content can still be modified by other means - e.g. by
           issuing direct scsi commands, by doing writes through devices lower
           in the storage stack (e.g. in case loop devices, DM, or MD are
           involved) etc. But blocking direct modifications of the block
           device page cache is enough to give filesystems a chance to perform
           data validation when loading data from the underlying storage and
           thus prevent kernel crashes.
      
           Syzbot can use this cmdline argument option to avoid uninteresting
           crashes. Also users whose userspace setup does not need writing to
           mounted block devices can set this option for hardening. We expect
           that this will be interesting to quite a few workloads.
      
           Btrfs is currently opted out of this because they still haven't
           merged patches we require for this to work from three kernel
           releases ago.
      
         - Reimplement block device freezing and thawing as holder operations
           on the block device.
      
           This allows us to extend block device freezing to all devices
           associated with a superblock and not just the main device. It also
           allows us to remove get_active_super() and thus another function
           that scans the global list of superblocks.
      
           Freezing via additional block devices only works if the filesystem
           chooses to use @fs_holder_ops for these additional devices as well.
           That currently only includes ext4 and xfs.
      
           Earlier releases switched get_tree_bdev() and mount_bdev() to use
           @fs_holder_ops. The remaining nilfs2 open-coded version of
           mount_bdev() has been converted to rely on @fs_holder_ops as well.
           So block device freezing for the main block device will continue to
           work as before.
      
           There should be no regressions in functionality. The only special
           case is btrfs where block device freezing for the main block device
           never worked because sb->s_bdev isn't set. Block device freezing
           for btrfs can be fixed once they can switch to @fs_holder_ops but
           that can happen whenever they're ready"
      
      * tag 'vfs-6.8.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (27 commits)
        block: Fix a memory leak in bdev_open_by_dev()
        super: don't bother with WARN_ON_ONCE()
        super: massage wait event mechanism
        ext4: Block writes to journal device
        xfs: Block writes to log device
        fs: Block writes to mounted block devices
        btrfs: Do not restrict writes to btrfs devices
        block: Add config option to not allow writing to mounted devices
        block: Remove blkdev_get_by_*() functions
        bcachefs: Convert to bdev_open_by_path()
        fs: handle freezing from multiple devices
        fs: remove dead check
        nilfs2: simplify device handling
        fs: streamline thaw_super_locked
        ext4: simplify device handling
        xfs: simplify device handling
        fs: simplify setup_bdev_super() calls
        blkdev: comment fs_holder_ops
        porting: document block device freeze and thaw changes
        fs: remove unused helper
        ...
      3f6984e7
    • Linus Torvalds's avatar
      Merge tag 'vfs-6.8.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs · c604110e
      Linus Torvalds authored
      Pull misc vfs updates from Christian Brauner:
       "This contains the usual miscellaneous features, cleanups, and fixes
        for vfs and individual fses.
      
        Features:
      
         - Add Jan Kara as VFS reviewer
      
         - Show correct device and inode numbers in proc/<pid>/maps for vma
           files on stacked filesystems. This is now easily doable thanks to
           the backing file work from the last cycles. This comes with
           selftests
      
        Cleanups:
      
         - Remove a redundant might_sleep() from wait_on_inode()
      
         - Initialize pointer with NULL, not 0
      
         - Clarify comment on access_override_creds()
      
         - Rework and simplify eventfd_signal() and eventfd_signal_mask()
           helpers
      
         - Process aio completions in batches to avoid needless wakeups
      
         - Completely decouple struct mnt_idmap from namespaces. We now only
           keep the actual idmapping around and don't stash references to
           namespaces
      
         - Reformat maintainer entries to indicate that a given subsystem
           belongs to fs/
      
         - Simplify fput() for files that were never opened
      
         - Get rid of various pointless file helpers
      
         - Rename various file helpers
      
         - Rename struct file members after SLAB_TYPESAFE_BY_RCU switch from
           last cycle
      
         - Make relatime_need_update() return bool
      
         - Use GFP_KERNEL instead of GFP_USER when allocating superblocks
      
         - Replace deprecated ida_simple_*() calls with their current ida_*()
           counterparts
      
        Fixes:
      
         - Fix comments on user namespace id mapping helpers. They aren't
           kernel doc comments so they shouldn't be using /**
      
         - s/Retuns/Returns/g in various places
      
         - Add missing parameter documentation on can_move_mount_beneath()
      
         - Rename i_mapping->private_data to i_mapping->i_private_data
      
         - Fix a false-positive lockdep warning in pipe_write() for watch
           queues
      
         - Improve __fget_files_rcu() code generation to improve performance
      
         - Only notify writer that pipe resizing has finished after setting
           pipe->max_usage otherwise writers are never notified that the pipe
           has been resized and hang
      
         - Fix some kernel docs in hfsplus
      
         - s/passs/pass/g in various places
      
         - Fix kernel docs in ntfs
      
         - Fix kcalloc() arguments order reported by gcc 14
      
         - Fix uninitialized value in reiserfs"
      
      * tag 'vfs-6.8.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (36 commits)
        reiserfs: fix uninit-value in comp_keys
        watch_queue: fix kcalloc() arguments order
        ntfs: dir.c: fix kernel-doc function parameter warnings
        fs: fix doc comment typo fs tree wide
        selftests/overlayfs: verify device and inode numbers in /proc/pid/maps
        fs/proc: show correct device and inode numbers in /proc/pid/maps
        eventfd: Remove usage of the deprecated ida_simple_xx() API
        fs: super: use GFP_KERNEL instead of GFP_USER for super block allocation
        fs/hfsplus: wrapper.c: fix kernel-doc warnings
        fs: add Jan Kara as reviewer
        fs/inode: Make relatime_need_update return bool
        pipe: wakeup wr_wait after setting max_usage
        file: remove __receive_fd()
        file: stop exposing receive_fd_user()
        fs: replace f_rcuhead with f_task_work
        file: remove pointless wrapper
        file: s/close_fd_get_file()/file_close_fd()/g
        Improve __fget_files_rcu() code generation (and thus __fget_light())
        file: massage cleanup of files that failed to open
        fs/pipe: Fix lockdep false-positive in watchqueue pipe_write()
        ...
      c604110e
    • Dmitry Torokhov's avatar
      asm-generic: make sparse happy with odd-sized put_unaligned_*() · 1ab33c03
      Dmitry Torokhov authored
      __put_unaligned_be24() and friends use implicit casts to convert
      larger-sized data to bytes, which trips sparse truncation warnings when
      the argument is a constant:
      
          CC [M]  drivers/input/touchscreen/hynitron_cstxxx.o
          CHECK   drivers/input/touchscreen/hynitron_cstxxx.c
        drivers/input/touchscreen/hynitron_cstxxx.c: note: in included file (through arch/x86/include/generated/asm/unaligned.h):
        include/asm-generic/unaligned.h:119:16: warning: cast truncates bits from constant value (aa01a0 becomes a0)
        include/asm-generic/unaligned.h:120:20: warning: cast truncates bits from constant value (aa01 becomes 1)
        include/asm-generic/unaligned.h:119:16: warning: cast truncates bits from constant value (ab00d0 becomes d0)
        include/asm-generic/unaligned.h:120:20: warning: cast truncates bits from constant value (ab00 becomes 0)
      
      To avoid this let's mask off upper bits explicitly, the resulting code
      should be exactly the same, but it will keep sparse happy.
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Suggested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Closes: https://lore.kernel.org/oe-kbuild-all/202401070147.gqwVulOn-lkp@intel.com/Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1ab33c03
  2. 07 Jan, 2024 1 commit
  3. 06 Jan, 2024 2 commits
  4. 05 Jan, 2024 25 commits
  5. 04 Jan, 2024 2 commits
    • Linus Torvalds's avatar
      x86/csum: clean up `csum_partial' further · a476aae3
      Linus Torvalds authored
      Commit 688eb819 ("x86/csum: Improve performance of `csum_partial`")
      ended up improving the code generation for the IP csum calculations, and
      in particular special-casing the 40-byte case that is a hot case for
      IPv6 headers.
      
      It then had _another_ special case for the 64-byte unrolled loop, which
      did two chains of 32-byte blocks, which allows modern CPU's to improve
      performance by doing the chains in parallel thanks to renaming the carry
      flag.
      
      This just unifies the special cases and combines them into just one
      single helper the 40-byte csum case, and replaces the 64-byte case by a
      80-byte case that just does that single helper twice.  It avoids having
      all these different versions of inline assembly, and actually improved
      performance further in my tests.
      
      There was never anything magical about the 64-byte unrolled case, even
      though it happens to be a common size (and typically is the cacheline
      size).
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a476aae3
    • Noah Goldstein's avatar
      x86/csum: Remove unnecessary odd handling · 5d4acb62
      Noah Goldstein authored
      The special case for odd aligned buffers is unnecessary and mostly
      just adds overhead. Aligned buffers is the expectations, and even for
      unaligned buffer, the only case that was helped is if the buffer was
      1-byte from word aligned which is ~1/7 of the cases. Overall it seems
      highly unlikely to be worth to extra branch.
      
      It was left in the previous perf improvement patch because I was
      erroneously comparing the exact output of `csum_partial(...)`, but
      really we only need `csum_fold(csum_partial(...))` to match so its
      safe to remove.
      
      All csum kunit tests pass.
      Signed-off-by: default avatarNoah Goldstein <goldstein.w.n@gmail.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarDavid Laight <david.laight@aculab.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5d4acb62