Commits · 64e3d02b43b17390f0fa9af6708a4eafdf20ba01 · Kirill Smelkov / linux

24 May, 2024 1 commit

Kanchan Joshi authored May 24, 2024

sgs/sws are unused, so remove these from nvme_ns_head structure.
Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>

64e3d02b

23 May, 2024 3 commits

nvmet: fix ns enable/disable possible hang · f97914e3

Sagi Grimberg authored May 21, 2024

When disabling an nvmet namespace, there is a period where the
subsys->lock is released, as the ns disable waits for backend IO to
complete, and the ns percpu ref to be properly killed. The original
intent was to avoid taking the subsystem lock for a prolong period as
other processes may need to acquire it (for example new incoming
connections).

However, it opens up a window where another process may come in and
enable the ns, (re)intiailizing the ns percpu_ref, causing the disable
sequence to hang.

Solve this by taking the global nvmet_config_sem over the entire configfs
enable/disable sequence.

Fixes: a07b4970 ("nvmet: add a generic NVMe target")
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>

f97914e3

nvme-multipath: fix io accounting on failover · a2e4c5f5

Keith Busch authored May 21, 2024

There are io stats accounting that needs to be handled, so don't call
blk_mq_end_request() directly. Use the existing nvme_end_req() helper
that already handles everything.

Fixes: d4d957b5 ("nvme-multipath: support io stats on the mpath device")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>

a2e4c5f5

nvme: fix multipath batched completion accounting · 2fe7b422

Keith Busch authored May 21, 2024

Batched completions were missing the io stats accounting and bio trace
events. Move the common code to a helper and call it from the batched
and non-batched functions.

Fixes: d4d957b5 ("nvme-multipath: support io stats on the mpath device")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>

2fe7b422

21 May, 2024 1 commit

nvme-multipath: find NUMA path only for online numa-node · d3a04373

Nilay Shroff authored May 16, 2024

In current native multipath design when a shared namespace is created,
we loop through each possible numa-node, calculate the NUMA distance of
that node from each nvme controller and then cache the optimal IO path
for future reference while sending IO. The issue with this design is that
we may refer to the NUMA distance table for an offline node which may not
be populated at the time and so we may inadvertently end up finding and
caching a non-optimal path for IO. Then latter when the corresponding
numa-node becomes online and hence the NUMA distance table entry for that
node is created, ideally we should re-calculate the multipath node distance
for the newly added node however that doesn't happen unless we rescan/reset
the controller. So essentially, we may keep using non-optimal IO path for a
node which is made online after namespace is created.
This patch helps fix this issue ensuring that when a shared namespace is
created, we calculate the multipath node distance for each online numa-node
instead of each possible numa-node. Then latter when a node becomes online
and we receive any IO on that newly added node, we would calculate the
multipath node distance for newly added node but this time NUMA distance
table would have been already populated for newly added node. Hence we
would be able to correctly calculate the multipath node distance and choose
the optimal path for the IO.
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>

d3a04373

20 May, 2024 1 commit

block: t10-pi: add MODULE_DESCRIPTION() · f0eab3e8

Jeff Johnson authored May 16, 2024

Fix the allmodconfig 'make W=1' issue:

WARNING: modpost: missing MODULE_DESCRIPTION() in block/t10-pi.o
Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20240516-md-t10-pi-v1-1-44a3469374aa@quicinc.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

f0eab3e8

17 May, 2024 1 commit

blk-mq: add helper for checking if one CPU is mapped to specified hctx · 7b815817

Ming Lei authored May 17, 2024

Commit a46c2702 ("blk-mq: don't schedule block kworker on isolated CPUs")
rules out isolated CPUs from hctx->cpumask, and hctx->cpumask should only be
used for scheduling kworker.

Add helper blk_mq_cpu_mapped_to_hctx() and apply it into cpuhp handlers.

This patch avoids to forget clearing INACTIVE of hctx state in case that one
isolated CPU becomes online, and fixes hang issue when allocating request
from this hctx's tags.

Cc: Raju Cheerla <rcheerla@redhat.com>
Fixes: a46c2702 ("blk-mq: don't schedule block kworker on isolated CPUs")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20240517020514.149771-1-ming.lei@redhat.comTested-by: Raju Cheerla <rcheerla@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

7b815817

16 May, 2024 3 commits

blk-cgroup: Properly propagate the iostat update up the hierarchy · 9d230c09

Waiman Long authored May 15, 2024

During a cgroup_rstat_flush() call, the lowest level of nodes are flushed
first before their parents. Since commit 3b8cc629 ("blk-cgroup:
Optimize blkcg_rstat_flush()"), iostat propagation was still done to
the parent. Grandparent, however, may not get the iostat update if the
parent has no blkg_iostat_set queued in its lhead lockless list.

Fix this iostat propagation problem by queuing the parent's global
blkg->iostat into one of its percpu lockless lists to make sure that
the delta will always be propagated up to the grandparent and so on
toward the root blkcg.

Note that successive calls to __blkcg_rstat_flush() are serialized by
the cgroup_rstat_lock. So no special barrier is used in the reading
and writing of blkg->iostat.lqueued.

Fixes: 3b8cc629 ("blk-cgroup: Optimize blkcg_rstat_flush()")
Reported-by: Dan Schatzberg <schatzberg.dan@gmail.com>
Closes: https://lore.kernel.org/lkml/ZkO6l%2FODzadSgdhC@dschatzberg-fedora-PF3DHTBV/Signed-off-by: Waiman Long <longman@redhat.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20240515143059.276677-1-longman@redhat.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

9d230c09

blk-cgroup: fix list corruption from reorder of WRITE ->lqueued · d0aac236

Ming Lei authored May 15, 2024

__blkcg_rstat_flush() can be run anytime, especially when blk_cgroup_bio_start
is being executed.

If WRITE of `->lqueued` is re-ordered with READ of 'bisc->lnode.next' in
the loop of __blkcg_rstat_flush(), `next_bisc` can be assigned with one
stat instance being added in blk_cgroup_bio_start(), then the local
list in __blkcg_rstat_flush() could be corrupted.

Fix the issue by adding one barrier.

Cc: Tejun Heo <tj@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Fixes: 3b8cc629 ("blk-cgroup: Optimize blkcg_rstat_flush()")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20240515013157.443672-3-ming.lei@redhat.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

d0aac236

blk-cgroup: fix list corruption from resetting io stat · 6da66806

Ming Lei authored May 15, 2024

Since commit 3b8cc629 ("blk-cgroup: Optimize blkcg_rstat_flush()"),
each iostat instance is added to blkcg percpu list, so blkcg_reset_stats()
can't reset the stat instance by memset(), otherwise the llist may be
corrupted.

Fix the issue by only resetting the counter part.

Cc: Tejun Heo <tj@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Jay Shin <jaeshin@redhat.com>
Fixes: 3b8cc629 ("blk-cgroup: Optimize blkcg_rstat_flush()")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Waiman Long <longman@redhat.com>
Link: https://lore.kernel.org/r/20240515013157.443672-2-ming.lei@redhat.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

6da66806

15 May, 2024 1 commit

cdrom: rearrange last_media_change check to avoid unintentional overflow · efb905ae

Justin Stitt authored May 07, 2024

When running syzkaller with the newly reintroduced signed integer wrap
sanitizer we encounter this splat:

[  366.015950] UBSAN: signed-integer-overflow in ../drivers/cdrom/cdrom.c:2361:33
[  366.021089] -9223372036854775808 - 346321 cannot be represented in type '__s64' (aka 'long long')
[  366.025894] program syz-executor.4 is using a deprecated SCSI ioctl, please convert it to SG_IO
[  366.027502] CPU: 5 PID: 28472 Comm: syz-executor.7 Not tainted 6.8.0-rc2-00035-gb3ef86b5a957 #1
[  366.027512] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[  366.027518] Call Trace:
[  366.027523]  <TASK>
[  366.027533]  dump_stack_lvl+0x93/0xd0
[  366.027899]  handle_overflow+0x171/0x1b0
[  366.038787] ata1.00: invalid multi_count 32 ignored
[  366.043924]  cdrom_ioctl+0x2c3f/0x2d10
[  366.063932]  ? __pm_runtime_resume+0xe6/0x130
[  366.071923]  sr_block_ioctl+0x15d/0x1d0
[  366.074624]  ? __pfx_sr_block_ioctl+0x10/0x10
[  366.077642]  blkdev_ioctl+0x419/0x500
[  366.080231]  ? __pfx_blkdev_ioctl+0x10/0x10
...

Historically, the signed integer overflow sanitizer did not work in the
kernel due to its interaction with `-fwrapv` but this has since been
changed [1] in the newest version of Clang. It was re-enabled in the
kernel with Commit 557f8c58 ("ubsan: Reintroduce signed overflow
sanitizer").

Let's rearrange the check to not perform any arithmetic, thus not
tripping the sanitizer.

Link: https://github.com/llvm/llvm-project/pull/82432 [1]
Closes: https://github.com/KSPP/linux/issues/354
Cc: linux-hardening@vger.kernel.org
Signed-off-by: Justin Stitt <justinstitt@google.com>
Link: https://lore.kernel.org/lkml/20240507-b4-sio-ata1-v1-1-810ffac6080a@google.comReviewed-by: Phillip Potter <phil@philpotter.co.uk>
Link: https://lore.kernel.org/lkml/ZjqU0fbzHrlnad8D@equinoxSigned-off-by: Phillip Potter <phil@philpotter.co.uk>
Link: https://lore.kernel.org/r/20240507222520.1445-2-phil@philpotter.co.ukSigned-off-by: Jens Axboe <axboe@kernel.dk>

efb905ae

14 May, 2024 7 commits

Merge tag 'nvme-6.10-2024-05-14' of git://git.infradead.org/nvme into block-6.10 · 803fbb96

Jens Axboe authored May 14, 2024

Pull NVMe updates and fixes from Keith:

"nvme updates for Linux 6.10

 - Fabrics connection retries (Daniel, Hannes)
 - Fabrics logging enhancements (Tokunori)
 - RDMA delete optimization (Sagi)"

* tag 'nvme-6.10-2024-05-14' of git://git.infradead.org/nvme:
  nvme-rdma, nvme-tcp: include max reconnects for reconnect logging
  nvmet-rdma: Avoid o(n^2) loop in delete_ctrl
  nvme: do not retry authentication failures
  nvme-fabrics: short-circuit reconnect retries
  nvme: return kernel error codes for admin queue connect
  nvmet: return DHCHAP status codes from nvmet_setup_auth()
  nvmet: lock config semaphore when accessing DH-HMAC-CHAP key

803fbb96

nbd: Fix signal handling · e56d4b63

Bart Van Assche authored May 10, 2024

Both nbd_send_cmd() and nbd_handle_cmd() return either a negative error
number or a positive blk_status_t value. nbd_queue_rq() converts these
return values into a blk_status_t value. There is a bug in the conversion
code: if nbd_send_cmd() returns BLK_STS_RESOURCE, nbd_queue_rq() should
return BLK_STS_RESOURCE instead of BLK_STS_OK. Fix this, move the
conversion code into nbd_handle_cmd() and fix the remaining sparse warnings.

This patch fixes the following sparse warnings:

drivers/block/nbd.c:673:32: warning: incorrect type in return expression (different base types)
drivers/block/nbd.c:673:32: expected int
drivers/block/nbd.c:673:32: got restricted blk_status_t [usertype]
drivers/block/nbd.c:714:48: warning: incorrect type in return expression (different base types)
drivers/block/nbd.c:714:48: expected int
drivers/block/nbd.c:714:48: got restricted blk_status_t [usertype]
drivers/block/nbd.c:1120:21: warning: incorrect type in assignment (different base types)
drivers/block/nbd.c:1120:21: expected int [assigned] ret
drivers/block/nbd.c:1120:21: got restricted blk_status_t [usertype]
drivers/block/nbd.c:1125:16: warning: incorrect type in return expression (different base types)
drivers/block/nbd.c:1125:16: expected restricted blk_status_t
drivers/block/nbd.c:1125:16: got int [assigned] ret

Cc: Christoph Hellwig <hch@lst.de>
Cc: Josef Bacik <jbacik@fb.com>
Cc: Yu Kuai <yukuai3@huawei.com>
Cc: Markus Pargmann <mpa@pengutronix.de>
Fixes: fc17b653 ("blk-mq: switch ->queue_rq return value to blk_status_t")
Cc: stable@vger.kernel.org
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20240510202313.25209-6-bvanassche@acm.orgSigned-off-by: Jens Axboe <axboe@kernel.dk>

e56d4b63

nbd: Remove a local variable from nbd_send_cmd() · f6cb9a2c

Bart Van Assche authored May 10, 2024

blk_rq_bytes() returns an unsigned int while 'size' has type unsigned long.
This is confusing. Improve code readability by removing the local variable
'size'.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Josef Bacik <jbacik@fb.com>
Cc: Yu Kuai <yukuai3@huawei.com>
Cc: Markus Pargmann <mpa@pengutronix.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20240510202313.25209-5-bvanassche@acm.orgSigned-off-by: Jens Axboe <axboe@kernel.dk>

f6cb9a2c

nbd: Improve the documentation of the locking assumptions · 2a6751e0

Bart Van Assche authored May 10, 2024

Document locking assumptions with lockdep_assert_held() instead of source
code comments. The advantage of lockdep_assert_held() is that it is
verified at runtime if lockdep is enabled in the kernel config.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Josef Bacik <jbacik@fb.com>
Cc: Yu Kuai <yukuai3@huawei.com>
Cc: Markus Pargmann <mpa@pengutronix.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20240510202313.25209-4-bvanassche@acm.orgSigned-off-by: Jens Axboe <axboe@kernel.dk>

2a6751e0

nbd: Remove superfluous casts · 40639e9a

Bart Van Assche authored May 10, 2024

In Linux kernel code it is preferred not to use a cast when converting a
void pointer to another pointer type.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Josef Bacik <jbacik@fb.com>
Cc: Yu Kuai <yukuai3@huawei.com>
Cc: Markus Pargmann <mpa@pengutronix.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20240510202313.25209-3-bvanassche@acm.orgSigned-off-by: Jens Axboe <axboe@kernel.dk>

40639e9a

nbd: Use NULL to represent a pointer · 08190cc4

Bart Van Assche authored May 10, 2024

This patch fixes the following sparse warnings:

drivers/block/nbd.c: note: in included file (through include/trace/trace_events.h, include/trace/define_trace.h, include/trace/events/nbd.h):
./include/trace/events/nbd.h:61:1: warning: Using plain integer as NULL pointer
drivers/block/nbd.c: note: in included file (through include/trace/perf.h, include/trace/define_trace.h, include/trace/events/nbd.h):
./include/trace/events/nbd.h:61:1: warning: Using plain integer as NULL pointer

Cc: Christoph Hellwig <hch@lst.de>
Cc: Josef Bacik <jbacik@fb.com>
Cc: Yu Kuai <yukuai3@huawei.com>
Cc: Markus Pargmann <mpa@pengutronix.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20240510202313.25209-2-bvanassche@acm.orgSigned-off-by: Jens Axboe <axboe@kernel.dk>

08190cc4

brd: implement discard support · 9ead7efc

Keith Busch authored Apr 29, 2024

The ramdisk memory utilization can only go up when data is written to
new pages. Implement discard to provide the possibility to reduce memory
usage for pages no longer in use. Aligned discards will free the
associated pages, if any, and determinisitically return zeroed data
until written again.
Signed-off-by: Keith Busch <kbusch@kernel.org>
Link: https://lore.kernel.org/r/20240429102308.147627-1-kbusch@meta.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

9ead7efc

13 May, 2024 22 commits

null_blk: Fix two sparse warnings · 25260555

Bart Van Assche authored May 10, 2024

Fix the following sparse warnings:

drivers/block/null_blk/main.c:1243:35: warning: incorrect type in return expression (different base types)
drivers/block/null_blk/main.c:1243:35: expected int
drivers/block/null_blk/main.c:1243:35: got restricted blk_status_t
drivers/block/null_blk/main.c:1291:30: warning: incorrect type in return expression (different base types)
drivers/block/null_blk/main.c:1291:30: expected restricted blk_status_t
drivers/block/null_blk/main.c:1291:30: got int

Cc: Christoph Hellwig <hch@lst.de>
Cc: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20240510201816.24921-1-bvanassche@acm.orgSigned-off-by: Jens Axboe <axboe@kernel.dk>

25260555

ublk_drv: set DMA alignment mask to 3 · 928b607d

Jens Axboe authored May 11, 2024

By default, this will be 511, as that's the block layer default. But
drivers these days can support memory alignments that aren't tied to
the sector sizes, instead just being limited by what the DMA engine
supports. An example is NVMe, where it's generally set to a 32-bit or
64-bit boundary. As ublk itself doesn't really care, just set it low
enough that we don't run into issues with NVMe where the required
O_DIRECT memory alignment is now more restrictive on ublk than it is
on the underlying device.

This was triggered by spurious -EINVAL returns on O_DIRECT IO on a
setup with ublk managing NVMe devices, which previously worked just
fine on the NVMe device itself. With the alignment relaxed, the test
works fine.
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

928b607d

Merge tag 'tag-chrome-platform-firmware-for-v6.10' of... · a7c840ba

Linus Torvalds authored May 13, 2024

Merge tag 'tag-chrome-platform-firmware-for-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux

Pull chrome platform firmware updates from Tzung-Bi Shih:

 - Set driver owner in the core registration so that coreboot drivers
   don't need to set it individually

* tag 'tag-chrome-platform-firmware-for-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux:
  firmware: google: cbmem: drop driver owner initialization
  firmware: coreboot: store owner from modules with coreboot_driver_register()

a7c840ba

Merge tag 'tag-chrome-platform-for-v6.10' of... · 59729c8a

Linus Torvalds authored May 13, 2024

Merge tag 'tag-chrome-platform-for-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux

Pull chrome platform updates from Tzung-Bi Shih:
 "New:
   - Support Framework Laptop 13 and 16 (AMD Ryzen)

  Improvements:
   - Use sysfs_emit() instead of sprintf() for sysfs' show()

  Fixes:
   - Fix flex-array-member-not-at-end compiler warnings by using
     DEFINE_RAW_FLEX()
   - Add HAS_IOPORT dependencies
   - Fix long pending events during suspend after resume

  Misc cleanups:
   - Provide ID tables for avoiding fallback match
   - Replace deprecated UNIVERSAL_DEV_PM_OPS()"

* tag 'tag-chrome-platform-for-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux: (22 commits)
  platform/chrome: cros_ec: Handle events during suspend after resume completion
  platform/chrome: cros_ec_lpc: add quirks for the Framework Laptop (AMD)
  platform/chrome: cros_ec_lpc: add a "quirks" system
  platform/chrome: cros_ec_lpc: pass driver_data from DMI to the device
  platform/chrome: cros_ec_lpc: introduce a priv struct for the lpc device
  platform/chrome: add HAS_IOPORT dependencies
  platform/chrome: cros_hps_i2c: Replace deprecated UNIVERSAL_DEV_PM_OPS()
  platform/chrome: cros_kbd_led_backlight: provide ID table for avoiding fallback match
  platform/chrome: wilco_ec: core: provide ID table for avoiding fallback match
  platform/chrome: wilco_ec: event: remove redundant MODULE_ALIAS
  platform/chrome: wilco_ec: debugfs: provide ID table for avoiding fallback match
  platform/chrome: wilco_ec: telemetry: provide ID table for avoiding fallback match
  platform/chrome: cros_ec_vbc: provide ID table for avoiding fallback match
  platform/chrome: cros_ec_lightbar: provide ID table for avoiding fallback match
  platform/chrome: cros_ec_sysfs: provide ID table for avoiding fallback match
  platform/chrome: cros_ec_debugfs: provide ID table for avoiding fallback match
  platform/chrome: cros_ec_chardev: provide ID table for avoiding fallback match
  platform/chrome: cros_usbpd_notify: provide ID table for avoiding fallback match
  platform/chrome: cros_usbpd_logger: provide ID table for avoiding fallback match
  platform/chrome: cros_ec_sensorhub: provide ID table for avoiding fallback match
  ...

59729c8a

Merge tag 'rust-6.10' of https://github.com/Rust-for-Linux/linux · 8f5b5f78

Linus Torvalds authored May 13, 2024

Pull Rust updates from Miguel Ojeda:
 "The most notable change is the drop of the 'alloc' in-tree fork. This
  is nicely reflected in the diffstat as a ~10k lines drop. In turn,
  this makes the version upgrades way simpler and smaller in the future,
  e.g. the latest one in commit 56f64b37 ("rust: upgrade to Rust
  1.78.0").

  More importantly, this increases the chances that a newer compiler
  version just works, which in turn means supporting several compiler
  versions is easier now. Thus we will look into finally setting a
  minimum version in the near future.

  Toolchain and infrastructure:

   - Upgrade to Rust 1.78.0

     This time around, due to how the kernel and Rust schedules have
     aligned, there are two upgrades in fact. These allow us to remove
     one more unstable feature ('offset_of') from the list, among other
     improvements

   - Drop 'alloc' in-tree fork of the standard library crate, which
     means all the unstable features used by 'alloc' (~30 language ones,
     ~60 library ones) are not a concern anymore

   - Support DWARFv5 via the '-Zdwarf-version' flag

   - Support zlib and zstd debuginfo compression via the
     '-Zdebuginfo-compression' flag

  'kernel' crate:

   - Support allocation flags ('GFP_*'), particularly in 'Box' (via
     'BoxExt'), 'Vec' (via 'VecExt'), 'Arc' and 'UniqueArc', as well as
     in the 'init' module APIs

   - Remove usage of the 'allocator_api' unstable feature

   - Remove 'try_' prefix in allocation APIs' names

   - Add 'VecExt' (an extension trait) to be able to drop the 'alloc'
     fork

   - Add the '{make,to}_{upper,lower}case()' methods to 'CStr'/'CString'

   - Add the 'as_ptr' method to 'ThisModule'

   - Add the 'from_raw' method to 'ArcBorrow'

   - Add the 'into_unique_or_drop' method to 'Arc'

   - Display column number in the 'dbg!' macro output by applying the
     equivalent change done to the standard library one

   - Migrate 'Work' to '#[pin_data]' thanks to the changes in the
     'macros' crate, which allows to remove an unsafe call in its 'new'
     associated function

   - Prevent namespacing issues when using the '[try_][pin_]init!'
     macros by changing the generated name of guard variables

   - Make the 'get' method in 'Opaque' const

   - Implement the 'Default' trait for 'LockClassKey'

   - Remove unneeded 'kernel::prelude' imports from doctests

   - Remove redundant imports

  'macros' crate:

   - Add 'decl_generics' to 'parse_generics()' to support default
     values, and use that to allow them in '#[pin_data]'

  Helpers:

   - Trivial English grammar fix

  Documentation:

   - Add section on Rust Kselftests to the 'Testing' document

   - Expand the 'Abstractions vs. bindings' section of the 'General
     Information' document"

* tag 'rust-6.10' of https://github.com/Rust-for-Linux/linux: (31 commits)
  rust: alloc: fix dangling pointer in VecExt<T>::reserve()
  rust: upgrade to Rust 1.78.0
  rust: kernel: remove redundant imports
  rust: sync: implement `Default` for `LockClassKey`
  docs: rust: extend abstraction and binding documentation
  docs: rust: Add instructions for the Rust kselftest
  rust: remove unneeded `kernel::prelude` imports from doctests
  rust: update `dbg!()` to format column number
  rust: helpers: Fix grammar in comment
  rust: init: change the generated name of guard variables
  rust: sync: add `Arc::into_unique_or_drop`
  rust: sync: add `ArcBorrow::from_raw`
  rust: types: Make Opaque::get const
  rust: kernel: remove usage of `allocator_api` unstable feature
  rust: init: update `init` module to take allocation flags
  rust: sync: update `Arc` and `UniqueArc` to take allocation flags
  rust: alloc: update `VecExt` to take allocation flags
  rust: alloc: introduce the `BoxExt` trait
  rust: alloc: introduce allocation flags
  rust: alloc: remove our fork of the `alloc` crate
  ...

8f5b5f78

Merge tag 'v6.10-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · 84c7d76b

Linus Torvalds authored May 13, 2024

Pull crypto updates from Herbert Xu:
 "API:
   - Remove crypto stats interface

  Algorithms:
   - Add faster AES-XTS on modern x86_64 CPUs
   - Forbid curves with order less than 224 bits in ecc (FIPS 186-5)
   - Add ECDSA NIST P521

  Drivers:
   - Expose otp zone in atmel
   - Add dh fallback for primes > 4K in qat
   - Add interface for live migration in qat
   - Use dma for aes requests in starfive
   - Add full DMA support for stm32mpx in stm32
   - Add Tegra Security Engine driver

  Others:
   - Introduce scope-based x509_certificate allocation"

* tag 'v6.10-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (123 commits)
  crypto: atmel-sha204a - provide the otp content
  crypto: atmel-sha204a - add reading from otp zone
  crypto: atmel-i2c - rename read function
  crypto: atmel-i2c - add missing arg description
  crypto: iaa - Use kmemdup() instead of kzalloc() and memcpy()
  crypto: sahara - use 'time_left' variable with wait_for_completion_timeout()
  crypto: api - use 'time_left' variable with wait_for_completion_killable_timeout()
  crypto: caam - i.MX8ULP donot have CAAM page0 access
  crypto: caam - init-clk based on caam-page0-access
  crypto: starfive - Use fallback for unaligned dma access
  crypto: starfive - Do not free stack buffer
  crypto: starfive - Skip unneeded fallback allocation
  crypto: starfive - Skip dma setup for zeroed message
  crypto: hisilicon/sec2 - fix for register offset
  crypto: hisilicon/debugfs - mask the unnecessary info from the dump
  crypto: qat - specify firmware files for 402xx
  crypto: x86/aes-gcm - simplify GCM hash subkey derivation
  crypto: x86/aes-gcm - delete unused GCM assembly code
  crypto: x86/aes-xts - simplify loop in xts_crypt_slowpath()
  hwrng: stm32 - repair clock handling
  ...

84c7d76b

Merge tag 'hardening-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux · 87caef42

Linus Torvalds authored May 13, 2024

Pull hardening updates from Kees Cook:
 "The bulk of the changes here are related to refactoring and expanding
  the KUnit tests for string helper and fortify behavior.

  Some trivial strncpy replacements in fs/ were carried in my tree. Also
  some fixes to SCSI string handling were carried in my tree since the
  helper for those was introduce here. Beyond that, just little fixes
  all around: objtool getting confused about LKDTM+KCFI, preparing for
  future refactors (constification of sysctl tables, additional
  __counted_by annotations), a Clang UBSAN+i386 crash fix, and adding
  more options in the hardening.config Kconfig fragment.

  Summary:

   - selftests: Add str*cmp tests (Ivan Orlov)

   - __counted_by: provide UAPI for _le/_be variants (Erick Archer)

   - Various strncpy deprecation refactors (Justin Stitt)

   - stackleak: Use a copy of soon-to-be-const sysctl table (Thomas
     Weißschuh)

   - UBSAN: Work around i386 -regparm=3 bug with Clang prior to
     version 19

   - Provide helper to deal with non-NUL-terminated string copying

   - SCSI: Fix older string copying bugs (with new helper)

   - selftests: Consolidate string helper behavioral tests

   - selftests: add memcpy() fortify tests

   - string: Add additional __realloc_size() annotations for "dup"
     helpers

   - LKDTM: Fix KCFI+rodata+objtool confusion

   - hardening.config: Enable KCFI"

* tag 'hardening-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (29 commits)
  uapi: stddef.h: Provide UAPI macros for __counted_by_{le, be}
  stackleak: Use a copy of the ctl_table argument
  string: Add additional __realloc_size() annotations for "dup" helpers
  kunit/fortify: Fix replaced failure path to unbreak __alloc_size
  hardening: Enable KCFI and some other options
  lkdtm: Disable CFI checking for perms functions
  kunit/fortify: Add memcpy() tests
  kunit/fortify: Do not spam logs with fortify WARNs
  kunit/fortify: Rename tests to use recommended conventions
  init: replace deprecated strncpy with strscpy_pad
  kunit/fortify: Fix mismatched kvalloc()/vfree() usage
  scsi: qla2xxx: Avoid possible run-time warning with long model_num
  scsi: mpi3mr: Avoid possible run-time warning with long manufacturer strings
  scsi: mptfusion: Avoid possible run-time warning with long manufacturer strings
  fs: ecryptfs: replace deprecated strncpy with strscpy
  hfsplus: refactor copy_name to not use strncpy
  reiserfs: replace deprecated strncpy with scnprintf
  virt: acrn: replace deprecated strncpy with strscpy
  ubsan: Avoid i386 UBSAN handler crashes with Clang
  ubsan: Remove 1-element array usage in debug reporting
  ...

87caef42

Merge tag 'execve-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux · 92f74f7f

Linus Torvalds authored May 13, 2024

Pull execve updates from Kees Cook:

 - Provide knob to change (previously fixed) coredump NOTES size
   (Allen Pais)

 - Add sched_prepare_exec tracepoint (Marco Elver)

 - Make /proc/$pid/auxv work under binfmt_elf_fdpic (Max Filippov)

 - Convert ARCH_HAVE_EXTRA_ELF_NOTES to proper Kconfig (Vignesh
   Balasubramanian)

 - Leave a gap between .bss and brk

* tag 'execve-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  fs/coredump: Enable dynamic configuration of max file note size
  binfmt_elf_fdpic: fix /proc/<pid>/auxv
  binfmt_elf: Leave a gap between .bss and brk
  Replace macro "ARCH_HAVE_EXTRA_ELF_NOTES" with kconfig
  tracing: Add sched_prepare_exec tracepoint

92f74f7f

Merge tag 'seccomp-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux · 1ba58f1a

Linus Torvalds authored May 13, 2024

Pull seccomp update from Kees Cook:

 - Prepare for sysctl table constification

* tag 'seccomp-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  seccomp: Constify sysctl subhelpers

1ba58f1a

Merge tag 'for-6.10/block-20240511' of git://git.kernel.dk/linux · 0c9f4ac8

Linus Torvalds authored May 13, 2024

Pull block updates from Jens Axboe:

 - Add a partscan attribute in sysfs, fixing an issue with systemd
   relying on an internal interface that went away.

 - Attempt #2 at making long running discards interruptible. The
   previous attempt went into 6.9, but we ended up mostly reverting it
   as it had issues.

 - Remove old ida_simple API in bcache

 - Support for zoned write plugging, greatly improving the performance
   on zoned devices.

 - Remove the old throttle low interface, which has been experimental
   since 2017 and never made it beyond that and isn't being used.

 - Remove page->index debugging checks in brd, as it hasn't caught
   anything and prepares us for removing in struct page.

 - MD pull request from Song

 - Don't schedule block workers on isolated CPUs

* tag 'for-6.10/block-20240511' of git://git.kernel.dk/linux: (84 commits)
  blk-throttle: delay initialization until configuration
  blk-throttle: remove CONFIG_BLK_DEV_THROTTLING_LOW
  block: fix that util can be greater than 100%
  block: support to account io_ticks precisely
  block: add plug while submitting IO
  bcache: fix variable length array abuse in btree_iter
  bcache: Remove usage of the deprecated ida_simple_xx() API
  md: Revert "md: Fix overflow in is_mddev_idle"
  blk-lib: check for kill signal in ioctl BLKDISCARD
  block: add a bio_await_chain helper
  block: add a blk_alloc_discard_bio helper
  block: add a bio_chain_and_submit helper
  block: move discard checks into the ioctl handler
  block: remove the discard_granularity check in __blkdev_issue_discard
  block/ioctl: prefer different overflow check
  null_blk: Fix the WARNING: modpost: missing MODULE_DESCRIPTION()
  block: fix and simplify blkdevparts= cmdline parsing
  block: refine the EOF check in blkdev_iomap_begin
  block: add a partscan sysfs attribute for disks
  block: add a disk_has_partscan helper
  ...

0c9f4ac8

Merge tag 'for-6.10/io_uring-20240511' of git://git.kernel.dk/linux · 9961a785

Linus Torvalds authored May 13, 2024

Pull io_uring updates from Jens Axboe:

- Greatly improve send zerocopy performance, by enabling coalescing of
sent buffers.

MSG_ZEROCOPY already does this with send(2) and sendmsg(2), but the
io_uring side did not. In local testing, the crossover point for send
zerocopy being faster is now around 3000 byte packets, and it
performs better than the sync syscall variants as well.

This feature relies on a shared branch with net-next, which was
pulled into both branches.

- Unification of how async preparation is done across opcodes.

Previously, opcodes that required extra memory for async retry would
allocate that as needed, using on-stack state until that was the
case. If async retry was needed, the on-stack state was adjusted
appropriately for a retry and then copied to the allocated memory.

This led to some fragile and ugly code, particularly for read/write
handling, and made storage retries more difficult than they needed to
be. Allocate the memory upfront, as it's cheap from our pools, and
use that state consistently both initially and also from the retry
side.

- Move away from using remap_pfn_range() for mapping the rings.

This is really not the right interface to use and can cause lifetime
issues or leaks. Additionally, it means the ring sq/cq arrays need to
be physically contigious, which can cause problems in production with
larger rings when services are restarted, as memory can be very
fragmented at that point.

Move to using vm_insert_page(s) for the ring sq/cq arrays, and apply
the same treatment to mapped ring provided buffers. This also helps
unify the code we have dealing with allocating and mapping memory.

Hard to see in the diffstat as we're adding a few features as well,
but this kills about ~400 lines of code from the codebase as well.

- Add support for bundles for send/recv.

When used with provided buffers, bundles support sending or receiving
more than one buffer at the time, improving the efficiency by only
needing to call into the networking stack once for multiple sends or
receives.

- Tweaks for our accept operations, supporting both a DONTWAIT flag for
skipping poll arm and retry if we can, and a POLLFIRST flag that the
application can use to skip the initial accept attempt and rely
purely on poll for triggering the operation. Both of these have
identical flags on the receive side already.

- Make the task_work ctx locking unconditional.

We had various code paths here that would do a mix of lock/trylock
and set the task_work state to whether or not it was locked. All of
that goes away, we lock it unconditionally and get rid of the state
flag indicating whether it's locked or not.

The state struct still exists as an empty type, can go away in the
future.

- Add support for specifying NOP completion values, allowing it to be
used for error handling testing.

- Use set/test bit for io-wq worker flags. Not strictly needed, but
also doesn't hurt and helps silence a KCSAN warning.

- Cleanups for io-wq locking and work assignments, closing a tiny race
where cancelations would not be able to find the work item reliably.

- Misc fixes, cleanups, and improvements

* tag 'for-6.10/io_uring-20240511' of git://git.kernel.dk/linux: (97 commits)
io_uring: support to inject result for NOP
io_uring: fail NOP if non-zero op flags is passed in
io_uring/net: add IORING_ACCEPT_POLL_FIRST flag
io_uring/net: add IORING_ACCEPT_DONTWAIT flag
io_uring/filetable: don't unnecessarily clear/reset bitmap
io_uring/io-wq: Use set_bit() and test_bit() at worker->flags
io_uring/msg_ring: cleanup posting to IOPOLL vs !IOPOLL ring
io_uring: Require zeroed sqe->len on provided-buffers send
io_uring/notif: disable LAZY_WAKE for linked notifs
io_uring/net: fix sendzc lazy wake polling
io_uring/msg_ring: reuse ctx->submitter_task read using READ_ONCE instead of re-reading it
io_uring/rw: reinstate thread check for retries
io_uring/notif: implement notification stacking
io_uring/notif: simplify io_notif_flush()
net: add callback for setting a ubuf_info to skb
net: extend ubuf_info callback to ops structure
io_uring/net: support bundles for recv
io_uring/net: support bundles for send
io_uring/kbuf: add helpers for getting/peeking multiple buffers
io_uring/net: add provided buffer support for IORING_OP_SEND
...

9961a785

Merge tag 'vfs-6.10.rw' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs · f4e8d802

Linus Torvalds authored May 13, 2024

Pull vfs rw iterator updates from Christian Brauner:
 "The core fs signalfd, userfaultfd, and timerfd subsystems did still
  use f_op->read() instead of f_op->read_iter(). Convert them over since
  we should aim to get rid of f_op->read() at some point.

  Aside from that io_uring and others want to mark files as FMODE_NOWAIT
  so it can make use of per-IO nonblocking hints to enable more
  efficient IO. Converting those users to f_op->read_iter() allows them
  to be marked with FMODE_NOWAIT"

* tag 'vfs-6.10.rw' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  signalfd: convert to ->read_iter()
  userfaultfd: convert to ->read_iter()
  timerfd: convert to ->read_iter()
  new helper: copy_to_iter_full()

f4e8d802

Merge tag 'vfs-6.10.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs · ef31ea6c

Linus Torvalds authored May 13, 2024

Pull netfs updates from Christian Brauner:
 "This reworks the netfslib writeback implementation so that pages read
  from the cache are written to the cache through ->writepages(),
  thereby allowing the fscache page flag to be retired.

  The reworking also:

   - builds on top of the new writeback_iter() infrastructure

   - makes it possible to use vectored write RPCs as discontiguous
     streams of pages can be accommodated

   - makes it easier to do simultaneous content crypto and stream
     division

   - provides support for retrying writes and re-dividing a stream

   - replaces the ->launder_folio() op, so that ->writepages() is used
     instead

   - uses mempools to allocate the netfs_io_request and
     netfs_io_subrequest structs to avoid allocation failure in the
     writeback path

  Some code that uses the fscache page flag is retained for
  compatibility purposes with nfs and ceph. The code is switched to
  using the synonymous private_2 label instead and marked with
  deprecation comments.

  The merge commit contains additional details on the new algorithm that
  I've left out of here as it would probably be excessively detailed.

  On top of the netfslib infrastructure this contains the work to
  convert cifs over to netfslib"

* tag 'vfs-6.10.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (38 commits)
  cifs: Enable large folio support
  cifs: Remove some code that's no longer used, part 3
  cifs: Remove some code that's no longer used, part 2
  cifs: Remove some code that's no longer used, part 1
  cifs: Cut over to using netfslib
  cifs: Implement netfslib hooks
  cifs: Make add_credits_and_wake_if() clear deducted credits
  cifs: Add mempools for cifs_io_request and cifs_io_subrequest structs
  cifs: Set zero_point in the copy_file_range() and remap_file_range()
  cifs: Move cifs_loose_read_iter() and cifs_file_write_iter() to file.c
  cifs: Replace the writedata replay bool with a netfs sreq flag
  cifs: Make wait_mtu_credits take size_t args
  cifs: Use more fields from netfs_io_subrequest
  cifs: Replace cifs_writedata with a wrapper around netfs_io_subrequest
  cifs: Replace cifs_readdata with a wrapper around netfs_io_subrequest
  cifs: Use alternative invalidation to using launder_folio
  netfs, afs: Use writeback retry to deal with alternate keys
  netfs: Miscellaneous tidy ups
  netfs: Remove the old writeback code
  netfs: Cut over to using new writeback code
  ...

ef31ea6c

Merge tag 'vfs-6.10.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs · 103fb219

Linus Torvalds authored May 13, 2024

Pull vfs mount API conversions from Christian Brauner:
 "This converts qnx6, minix, debugfs, tracefs, freevxfs, and openpromfs
  to the new mount api, further reducing the number of filesystems
  relying on the legacy mount api"

* tag 'vfs-6.10.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  minix: convert minix to use the new mount api
  vfs: Convert tracefs to use the new mount API
  vfs: Convert debugfs to use the new mount API
  openpromfs: finish conversion to the new mount API
  freevxfs: Convert freevxfs to the new mount API.
  qnx6: convert qnx6 to use the new mount api

103fb219

Merge tag 'vfs-6.10.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs · 1b0aabcc

Linus Torvalds authored May 13, 2024

Pull misc vfs updates from Christian Brauner:
 "This contains the usual miscellaneous features, cleanups, and fixes
  for vfs and individual fses.

  Features:

   - Free up FMODE_* bits. I've freed up bits 6, 7, 8, and 24. That
     means we now have six free FMODE_* bits in total (but bit #6
     already got used for FMODE_WRITE_RESTRICTED)

   - Add FOP_HUGE_PAGES flag (follow-up to FMODE_* cleanup)

   - Add fd_raw cleanup class so we can make use of automatic cleanup
     provided by CLASS(fd_raw, f)(fd) for O_PATH fds as well

   - Optimize seq_puts()

   - Simplify __seq_puts()

   - Add new anon_inode_getfile_fmode() api to allow specifying f_mode
     instead of open-coding it in multiple places

   - Annotate struct file_handle with __counted_by() and use
     struct_size()

   - Warn in get_file() whether f_count resurrection from zero is
     attempted (epoll/drm discussion)

   - Folio-sophize aio

   - Export the subvolume id in statx() for both btrfs and bcachefs

   - Relax linkat(AT_EMPTY_PATH) requirements

   - Add F_DUPFD_QUERY fcntl() allowing to compare two file descriptors
     for dup*() equality replacing kcmp()

  Cleanups:

   - Compile out swapfile inode checks when swap isn't enabled

   - Use (1 << n) notation for FMODE_* bitshifts for clarity

   - Remove redundant variable assignment in fs/direct-io

   - Cleanup uses of strncpy in orangefs

   - Speed up and cleanup writeback

   - Move fsparam_string_empty() helper into header since it's currently
     open-coded in multiple places

   - Add kernel-doc comments to proc_create_net_data_write()

   - Don't needlessly read dentry->d_flags twice

  Fixes:

   - Fix out-of-range warning in nilfs2

   - Fix ecryptfs overflow due to wrong encryption packet size
     calculation

   - Fix overly long line in xfs file_operations (follow-up to FMODE_*
     cleanup)

   - Don't raise FOP_BUFFER_{R,W}ASYNC for directories in xfs (follow-up
     to FMODE_* cleanup)

   - Don't call xfs_file_open from xfs_dir_open (follow-up to FMODE_*
     cleanup)

   - Fix stable offset api to prevent endless loops

   - Fix afs file server rotations

   - Prevent xattr node from overflowing the eraseblock in jffs2

   - Move fdinfo PTRACE_MODE_READ procfs check into the .permission()
     operation instead of .open() operation since this caused userspace
     regressions"

* tag 'vfs-6.10.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (39 commits)
  afs: Fix fileserver rotation getting stuck
  selftests: add F_DUPDFD_QUERY selftests
  fcntl: add F_DUPFD_QUERY fcntl()
  file: add fd_raw cleanup class
  fs: WARN when f_count resurrection is attempted
  seq_file: Simplify __seq_puts()
  seq_file: Optimize seq_puts()
  proc: Move fdinfo PTRACE_MODE_READ check into the inode .permission operation
  fs: Create anon_inode_getfile_fmode()
  xfs: don't call xfs_file_open from xfs_dir_open
  xfs: drop fop_flags for directories
  xfs: fix overly long line in the file_operations
  shmem: Fix shmem_rename2()
  libfs: Add simple_offset_rename() API
  libfs: Fix simple_offset_rename_exchange()
  jffs2: prevent xattr node from overflowing the eraseblock
  vfs, swap: compile out IS_SWAPFILE() on swapless configs
  vfs: relax linkat() AT_EMPTY_PATH - aka flink() - requirements
  fs/direct-io: remove redundant assignment to variable retval
  fs/dcache: Re-use value stored to dentry->d_flags instead of re-reading
  ...

1b0aabcc

Merge tag 'vfs-6.10.iomap' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs · c117a437

Linus Torvalds authored May 13, 2024

Pull vfs iomap updates from Christian Brauner:
 "This contains a few cleanups to the iomap code. Nothing particularly
  stands out"

* tag 'vfs-6.10.iomap' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  iomap: do some small logical cleanup in buffered write
  iomap: make iomap_write_end() return a boolean
  iomap: use a new variable to handle the written bytes in iomap_write_iter()
  iomap: don't increase i_size if it's not a write operation
  iomap: drop the write failure handles when unsharing and zeroing
  iomap: convert iomap_writepages to writeack_iter

c117a437

Merge tag 'docs-6.10' of git://git.lwn.net/linux · 8815da98

Linus Torvalds authored May 13, 2024

Pull documentation updates from Jonathan Corbet:
 "Another not-too-busy cycle for documentation, including:

   - Some build-system changes to detect the variable fonts installed by
     some distributions that can break the PDF build.

   - Various updates and additions to the Spanish, Chinese, Italian, and
     Japanese translations.

   - Update the stable-kernel rules to match modern practice

  ... and the usual array of corrections, updates, and typo fixes"

* tag 'docs-6.10' of git://git.lwn.net/linux: (42 commits)
  cgroup: Add documentation for missing zswap memory.stat
  kernel-doc: Added "*" in $type_constants2 to fix 'make htmldocs' warning.
  docs:core-api: fixed typos and grammar in printk-index page
  Documentation: tracing: Fix spelling mistakes
  docs/zh_CN/rust: Update the translation of quick-start to 6.9-rc4
  docs/zh_CN/rust: Update the translation of general-information to 6.9-rc4
  docs/zh_CN/rust: Update the translation of coding-guidelines to 6.9-rc4
  docs/zh_CN/rust: Update the translation of arch-support to 6.9-rc4
  docs: stable-kernel-rules: fix typo sent->send
  docs/zh_CN: remove two inconsistent spaces
  docs: scripts/check-variable-fonts.sh: Improve commands for detection
  docs: stable-kernel-rules: create special tag to flag 'no backporting'
  docs: stable-kernel-rules: explain use of stable@kernel.org (w/o @vger.)
  docs: stable-kernel-rules: remove code-labels tags and a indention level
  docs: stable-kernel-rules: call mainline by its name and change example
  docs: stable-kernel-rules: reduce redundancy
  docs, kprobes: Add riscv as supported architecture
  Docs: typos/spelling
  docs: kernel_include.py: Cope with docutils 0.21
  docs: ja_JP/howto: Catch up update in v6.8
  ...

8815da98

Merge tag 'keys-next-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd · 25c73642

Linus Torvalds authored May 13, 2024

Pull keys updates from Jarkko Sakkinen:

 - do not overwrite the key expiration once it is set

 - move key quota updates earlier into key_put(), instead of updating
   them in key_gc_unused_keys()

* tag 'keys-next-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
  keys: Fix overwrite of key expiration on instantiation
  keys: update key quotas in key_put()

25c73642

Merge tag 'tpmdd-next-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd · b1923914

Linus Torvalds authored May 13, 2024

Pull TPM updates from Jarkko Sakkinen:
 "These are the changes for the TPM driver with a single major new
  feature: TPM bus encryption and integrity protection. The key pair on
  TPM side is generated from so called null random seed per power on of
  the machine [1]. This supports the TPM encryption of the hard drive by
  adding layer of protection against bus interposer attacks.

  Other than that, a few minor fixes and documentation for tpm_tis to
  clarify basics of TPM localities for future patch review discussions
  (will be extended and refined over times, just a seed)"

Link: https://lore.kernel.org/linux-integrity/20240429202811.13643-1-James.Bottomley@HansenPartnership.com/ [1]

* tag 'tpmdd-next-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd: (28 commits)
  Documentation: tpm: Add TPM security docs toctree entry
  tpm: disable the TPM if NULL name changes
  Documentation: add tpm-security.rst
  tpm: add the null key name as a sysfs export
  KEYS: trusted: Add session encryption protection to the seal/unseal path
  tpm: add session encryption protection to tpm2_get_random()
  tpm: add hmac checks to tpm2_pcr_extend()
  tpm: Add the rest of the session HMAC API
  tpm: Add HMAC session name/handle append
  tpm: Add HMAC session start and end functions
  tpm: Add TCG mandated Key Derivation Functions (KDFs)
  tpm: Add NULL primary creation
  tpm: export the context save and load commands
  tpm: add buffer function to point to returned parameters
  crypto: lib - implement library version of AES in CFB mode
  KEYS: trusted: tpm2: Use struct tpm_buf for sized buffers
  tpm: Add tpm_buf_read_{u8,u16,u32}
  tpm: TPM2B formatted buffers
  tpm: Store the length of the tpm_buf data separately.
  tpm: Update struct tpm_buf documentation comments
  ...

b1923914

Merge tag 'keys-trusted-next-6.10-rc1' of... · c0248148

Linus Torvalds authored May 13, 2024

Merge tag 'keys-trusted-next-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd

Pull trusted keys updates from Jarkko Sakkinen:
 "This contains a new key type for the Data Co-Processor (DCP), which is
  an IP core built into many NXP SoCs such as i.mx6ull"

* tag 'keys-trusted-next-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
  docs: trusted-encrypted: add DCP as new trust source
  docs: document DCP-backed trusted keys kernel params
  MAINTAINERS: add entry for DCP-based trusted keys
  KEYS: trusted: Introduce NXP DCP-backed trusted keys
  KEYS: trusted: improve scalability of trust source config
  crypto: mxs-dcp: Add support for hardware-bound keys

c0248148

Merge tag 'slab-for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab · cd97950c

Linus Torvalds authored May 13, 2024

Pull slab updates from Vlastimil Babka:
 "This time it's mostly random cleanups and fixes, with two performance
  fixes that might have significant impact, but limited to systems
  experiencing particular bad corner case scenarios rather than general
  performance improvements.

  The memcg hook changes are going through the mm tree due to
  dependencies.

   - Prevent stalls when reading /proc/slabinfo (Jianfeng Wang)

     This fixes the long-standing problem that can happen with workloads
     that have alloc/free patterns resulting in many partially used
     slabs (in e.g. dentry cache). Reading /proc/slabinfo will traverse
     the long partial slab list under spinlock with disabled irqs and
     thus can stall other processes or even trigger the lockup
     detection. The traversal is only done to count free objects so that
     <active_objs> column can be reported along with <num_objs>.

     To avoid affecting fast paths with another shared counter
     (attempted in the past) or complex partial list traversal schemes
     that allow rescheduling, the chosen solution resorts to
     approximation - when the partial list is over 10000 slabs long, we
     will only traverse first 5000 slabs from head and tail each and use
     the average of those to estimate the whole list. Both head and tail
     are used as the slabs near head to tend to have more free objects
     than the slabs towards the tail.

     It is expected the approximation should not break existing
     /proc/slabinfo consumers. The <num_objs> field is still accurate
     and reflects the overall kmem_cache footprint. The <active_objs>
     was already imprecise due to cpu and percpu-partial slabs, so can't
     be relied upon to determine exact cache usage. The difference
     between <active_objs> and <num_objs> is mainly useful to determine
     the slab fragmentation, and that will be possible even with the
     approximation in place.

   - Prevent allocating many slabs when a NUMA node is full (Chen Jun)

     Currently, on NUMA systems with a node under significantly bigger
     pressure than other nodes, the fallback strategy may result in each
     kmalloc_node() that can't be safisfied from the preferred node, to
     allocate a new slab on a fallback node, and not reuse the slabs
     already on that node's partial list.

     This is now fixed and partial lists of fallback nodes are checked
     even for kmalloc_node() allocations. It's still preferred to
     allocate a new slab on the requested node before a fallback, but
     only with a GFP_NOWAIT attempt, which will fail quickly when the
     node is under a significant memory pressure.

   - More SLAB removal related cleanups (Xiu Jianfeng, Hyunmin Lee)

   - Fix slub_kunit self-test with hardened freelists (Guenter Roeck)

   - Mark racy accesses for KCSAN (linke li)

   - Misc cleanups (Xiongwei Song, Haifeng Xu, Sangyun Kim)"

* tag 'slab-for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
  mm/slub: remove the check for NULL kmalloc_caches
  mm/slub: create kmalloc 96 and 192 caches regardless cache size order
  mm/slub: mark racy access on slab->freelist
  slub: use count_partial_free_approx() in slab_out_of_memory()
  slub: introduce count_partial_free_approx()
  slub: Set __GFP_COMP in kmem_cache by default
  mm/slub: remove duplicate initialization for early_kmem_cache_node_alloc()
  mm/slub: correct comment in do_slab_free()
  mm/slub, kunit: Use inverted data to corrupt kmem cache
  mm/slub: simplify get_partial_node()
  mm/slub: add slub_get_cpu_partial() helper
  mm/slub: remove the check of !kmem_cache_has_cpu_partial()
  mm/slub: Reduce memory consumption in extreme scenarios
  mm/slub: mark racy accesses on slab->slabs
  mm/slub: remove dummy slabinfo functions

cd97950c

Merge tag 'kcsan.2024.05.10a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu · c07ea940

Linus Torvalds authored May 13, 2024

Pull kcsan update from Paul McKenney:
 "Introduce __data_racy type qualifier

  This adds a __data_racy type qualifier that enables kernel developers
  to inform KCSAN that a given variable is a shared variable without
  needing to mark each and every access.

  This allows pre-KCSAN code to be correctly (if approximately)
  instrumented withh very little effort, and also provides people
  reading the code a clear indication that the variable is in fact
  shared.

  In addition, it permits incremental transition to per-access KCSAN
  marking, so that (for example) a given subsystem can be transitioned
  one variable at a time, while avoiding large numbers of KCSAN warnings
  during this transition"

* tag 'kcsan.2024.05.10a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
  kcsan, compiler_types: Introduce __data_racy type qualifier

c07ea940